使用ElementTree示例在Python中parsingXML

我很难find一个很好的，基本的例子，如何使用元素树在Python中parsingXML。从我所能find的，这似乎是用于parsingXML的最简单的库。以下是我正在使用的XML示例：

<timeSeriesResponse> <queryInfo> <locationParam>01474500</locationParam> <variableParam>99988</variableParam> <timeParam> <beginDateTime>2009-09-24T15:15:55.271</beginDateTime> <endDateTime>2009-11-23T15:15:55.271</endDateTime> </timeParam> </queryInfo> <timeSeries name="NWIS Time Series Instantaneous Values"> <values count="2876"> <value dateTime="2009-09-24T15:30:00.000-04:00" qualifiers="P">550</value> <value dateTime="2009-09-24T16:00:00.000-04:00" qualifiers="P">419</value> <value dateTime="2009-09-24T16:30:00.000-04:00" qualifiers="P">370</value> ..... </values> </timeSeries> </timeSeriesResponse>

我能够用硬编码的方法做我所需要的。但是我需要我的代码更有活力。这是什么工作：

 tree = ET.parse(sample.xml) doc = tree.getroot() timeseries = doc[1] values = timeseries[2] print child.attrib['dateTime'], child.text #prints 2009-09-24T15:30:00.000-04:00, 550

以下是我尝试过的一些事情，他们都没有工作，报告他们找不到timeSeries（或其他我试过的）：

 tree = ET.parse(sample.xml) tree.find('timeSeries') tree = ET.parse(sample.xml) doc = tree.getroot() doc.find('timeSeries')

基本上，我想加载XML文件，searchtimeSeries标签，并遍历值标签，返回dateTime和标签本身的值; 我在上面的例子中所做的一切，但没有硬编码我感兴趣的XML的部分。任何人都可以指向我的一些例子，或者给我一些关于如何通过这个工作的build议吗？

感谢所有的帮助。在我提供的示例文件中使用了以下两个build议，但是它们不能在完整的文件上工作。以下是我使用Ed Carrel方法时从真实文件中获得的错误：

  (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'attrib'",), <traceback object at 0x011EFB70>)

我觉得在真正的文件里有东西是不喜欢的，所以我增加了删除东西，直到它的工作。这里是我改变的行：

 originally: <timeSeriesResponse xsi:schemaLocation="a URL I removed" xmlns="a URL I removed" xmlns:xsi="a URL I removed"> changed to: <timeSeriesResponse> originally: <sourceInfo xsi:type="SiteInfoType"> changed to: <sourceInfo> originally: <geogLocation xsi:type="LatLonPointType" srs="EPSG:4326"> changed to: <geogLocation>

删除具有'xsi：…'的属性修复了这个问题。 'xsi：…'是不是有效的XML？我很难以编程方式删除这些内容。任何build议的解决方法？

这里是完整的XML文件： http : //www.sendspace.com/file/lofcpt

当我最初提出这个问题时，我并不知道XML中的命名空间。现在，我知道发生了什么，我不必删除“xsi”属性，这是名称空间声明。我只是将它们包含在我的xpathsearch中。有关lxml中命名空间的更多信息，请参阅此页面。

所以我现在在我的盒子上已经有了ElementTree 1.2.6，并且对你发布的XML块运行了下面的代码：

 import elementtree.ElementTree as ET tree = ET.parse("test.xml") doc = tree.getroot() thingy = doc.find('timeSeries') print thingy.attrib

并得到了以下回：

 {'name': 'NWIS Time Series Instantaneous Values'}

它似乎find了timeSeries元素，而不需要使用数字索引。

现在什么是有用的知道你是什么意思，当你说“这是行不通的”。由于在给定相同的input的情况下对我有用，所以ElementTree不太可能以某种明显的方式被破坏。使用任何错误消息，回溯或可以提供的任何内容来更新您的问题，以帮助我们为您提供帮助。

如果我正确理解你的问题：

 for elem in doc.findall('timeSeries/values/value'): print elem.get('dateTime'), elem.text

或者如果你愿意（如果只有一次timeSeries/values ：

 values = doc.find('timeSeries/values') for value in values: print value.get('dateTime'), elem.text

findall()方法返回所有匹配元素的列表，而find()只返回第一个匹配的元素。第一个示例循环遍历所有find的元素，第二个循环遍历values元素的子元素，在这种情况下导致相同的结果。

但是我没有看到没有findtimeSeries的问题来自哪里。也许你只是忘了getroot()调用？（请注意，您并不需要它，因为如果您将pathexpression式更改为例如/timeSeriesResponse/timeSeries/values或//timeSeries/values ，您也可以从elementtree本身进行工作）

使用ElementTree示例在Python中parsingXML

访问ElementTree节点父节点

对大型XML文件使用Python Iterparse

Python ElementTree模块：如何在使用方法“find”，“findall”时忽略XML文件的命名空间来定位匹配元素，

ElementTree可以被告知保存属性的顺序吗？

XMLparsing – ElementTree与SAX和DOM

ElementTree XPath – 基于属性select元素