如何在PHP中使用XMLReader？

我有以下的XML文件，该文件是相当大的，我还没有能够得到simplexml打开和阅读文件，所以我想在XML中没有成功的XMLReader

<?xml version="1.0" encoding="ISO-8859-1"?> <products> <last_updated>2009-11-30 13:52:40</last_updated> <product> <element_1>foo</element_1> <element_2>foo</element_2> <element_3>foo</element_3> <element_4>foo</element_4> </product> <product> <element_1>bar</element_1> <element_2>bar</element_2> <element_3>bar</element_3> <element_4>bar</element_4> </product> </products>

我很遗憾没有find这个PHP的好教程，并希望看到如何获得每个元素内容存储在数据库中。

这一切都取决于工作单元的大小，但是我想你正试图连续处理每个<product/>节点。

为此，最简单的方法是使用XMLReader访问每个节点，然后使用SimpleXML来访问它们。通过这种方式，您可以保持较低的内存使用量，因为您一次只处理一个节点，并且仍然利用SimpleXML的易用性。例如：

 $z = new XMLReader; $z->open('data.xml'); $doc = new DOMDocument; // move to the first <product /> node while ($z->read() && $z->name !== 'product'); // now that we're at the right depth, hop to the next <product/> until the end of the tree while ($z->name === 'product') { // either one should work //$node = new SimpleXMLElement($z->readOuterXML()); $node = simplexml_import_dom($doc->importNode($z->expand(), true)); // now you can use $node without going insane about parsing var_dump($node->element_1); // go to next <product /> $z->next('product'); }

快速查看不同方法的优缺点：

仅限XMLReader

优点：速度快，占用内存less
缺点：过于难以编写和debugging，需要大量的用户级代码来做任何有用的事情。用户级代码很慢，容易出错。另外，它留下更多的代码行来维护

XMLReader + SimpleXML

优点：不使用太多的内存（只有内存需要处理一个节点），正如其名称所示，SimpleXML非常易于使用。
缺点：为每个节点创build一个SimpleXMLElement对象不是很快。你真的必须对它进行基准testing，以了解这对你是否有问题。但即使是一个适度的机器也能够每秒钟处理1000个节点。

XMLReader + DOM

优点：与SimpleXML一样使用大量内存， XMLReader :: expand（）比创build新的SimpleXMLElement更快。我希望有可能使用simplexml_import_dom()但似乎没有在这种情况下工作
缺点：DOM是烦人的工作。这是XMLReader和SimpleXML之间的一半。不像XMLReader那样复杂，笨拙，但是与SimpleXML的距离还很远。

我的build议是：用SimpleXML编写一个原型，看看它是否适合你。如果性能至关重要，请尝试DOM。尽可能远离XMLReader。请记住，您编写的代码越多，引入错误或引入性能退步的可能性就越高。

对于XML格式的属性…

data.xml中：

 <building_data> <building address="some address" lat="28.902914" lng="-71.007235" /> <building address="some address" lat="48.892342" lng="-75.0423423" /> <building address="some address" lat="58.929753" lng="-79.1236987" /> </building_data>

php代码：

 $reader = new XMLReader(); if (!$reader->open("data.xml")) { die("Failed to open 'data.xml'"); } while($reader->read()) { if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'building') { $address = $reader->getAttribute('address'); $latitude = $reader->getAttribute('lat'); $longitude = $reader->getAttribute('lng'); } $reader->close();

我的XMLparsing生活中的大部分时间都花费在从整车的XML（亚马逊MWS）中提取有用信息的块。因此，我的答案假定你只需要特定的信息，而且你知道它在哪里。

我发现使用XMLReader最简单的方法是知道哪些标签我想要的信息和使用它们。如果你知道XML的结构，并且它有很多独特的标签，我发现使用第一种情况是很容易的。案例2和3只是向您展示如何为更复杂的标签完成。这是非常快的; 我有一个关于速度的讨论什么是PHP中最快的XMLparsing器？

在做这种基于标签的parsing时，最重要的事情就是使用if ($myXML->nodeType == XMLReader::ELEMENT) {... – 检查以确保我们只处理打开的节点，不是空白或closures节点或其他。

 function parseMyXML ($xml) { //pass in an XML string $myXML = new XMLReader(); $myXML->xml($xml); while ($myXML->read()) { //start reading. if ($myXML->nodeType == XMLReader::ELEMENT) { //only opening tags. $tag = $myXML->name; //make $tag contain the name of the tag switch ($tag) { case 'Tag1': //this tag contains no child elements, only the content we need. And it's unique. $variable = $myXML->readInnerXML(); //now variable contains the contents of tag1 break; case 'Tag2': //this tag contains child elements, of which we only want one. while($myXML->read()) { //so we tell it to keep reading if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { // and when it finds the amount tag... $variable2 = $myXML->readInnerXML(); //...put it in $variable2. break; } } break; case 'Tag3': //tag3 also has children, which are not unique, but we need two of the children this time. while($myXML->read()) { if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { $variable3 = $myXML->readInnerXML(); break; } else if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Currency') { $variable4 = $myXML->readInnerXML(); break; } } break; } } } $myXML->close(); }

XMLReader在PHP网站上有很好的文档。这是一个XML Pull Parser，这意味着它用于遍历给定XML文档的节点（或DOM节点）。例如，你可以像这样浏览整个文档：

 <?php $reader = new XMLReader(); if (!$reader->open("data.xml")) { die("Failed to open 'data.xml'"); } while($reader->read()) { $node = $reader->expand(); // process $node... } $reader->close(); ?>

然后决定如何处理由XMLReader :: expand（）返回的节点。

 Simple example: public function productsAction() { $saveFileName = 'ceneo.xml'; $filename = $this->path . $saveFileName; if(file_exists($filename)) { $reader = new XMLReader(); $reader->open($filename); $countElements = 0; while($reader->read()) { if($reader->nodeType == XMLReader::ELEMENT) { $nodeName = $reader->name; } if($reader->nodeType == XMLReader::TEXT && !empty($nodeName)) { switch ($nodeName) { case 'id': var_dump($reader->value); break; } } if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'offer') { $countElements++; } } $reader->close(); exit(print('<pre>') . var_dump($countElements)); } }

被接受的答案给了我一个好的开始，但带来了更多的课程和更多的处理，比我想要的更多; 所以这是我的解释：

 $xml_reader = new XMLReader; $xml_reader->open($feed_url); // move the pointer to the first product while ($xml_reader->read() && $xml_reader->name != 'product'); // loop through the products while ($xml_reader->name == 'product') { // load the current xml element into simplexml and we're off and running! $xml = simplexml_load_string($xml_reader->readOuterXML()); // now you can use your simpleXML object ($xml). echo $xml->element_1; // move the pointer to the next product $xml_reader->next('product'); } // don't forget to close the file $xml_reader->close();

这个话题很久以前就退出了，但我刚刚find了。感谢上帝。

我的问题是我必须读取ONIX文件（书籍数据），并将其存储到我们的数据库。我以前使用simplexml_load，虽然它使用了大量的内存，但仍然可以相对较小的文件（高达300MB）。超过这个规模对我来说是一场灾难。

阅读后，尤其是弗朗西斯·刘易斯的解释，我使用xmlreader和simplexml的组合。结果是例外，内存使用量很小，并将其插入到数据库足够快，为我。

这是我的代码：

 <?php $dbhost = "localhost"; // mysql host $dbuser = ""; //mysql username $dbpw = ""; // mysql user password $db = ""; // mysql database name //i need to truncate the old data first $conn2 = mysql_connect($dbhost, $dbuser, $dbpw); mysql_select_db($db); mysql_query ("truncate ebiblio",$conn2); //$xmlFile = $_POST['xmlFile']; //$xml=simplexml_load_file("ebiblio.xml") or die("Error: Cannot create object"); $reader = new XMLReader(); //load the selected XML file to the DOM if (!$reader->open("ebiblio.xml")) { die("Failed to open 'ebiblio.xml'"); } while ($reader->read()): if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'product'){ $xml = simplexml_load_string($reader->readOuterXML()); $productcode = (string)$xml->a001; $title = (string)$xml->title->b203; $author = (string)$xml->contributor->b037; $language = (string)$xml->language->b252; $category = $xml->subject->b069; $description = (string)$xml->othertext->d104; $publisher = (string)$xml->publisher->b081; $pricecover = (string)$xml->supplydetail->price->j151; $salesright = (string)$xml->salesrights->b090; @$productcode1 = htmlentities($productcode,ENT_QUOTES,'latin1_swedish_ci'); @$title1 = htmlentities($title,ENT_QUOTES,'latin1_swedish_ci'); @$author1 = htmlentities($author,ENT_QUOTES,'latin1_swedish_ci'); @$language1 = htmlentities($language,ENT_QUOTES,'latin1_swedish_ci'); @$category1 = htmlentities($category,ENT_QUOTES,'latin1_swedish_ci'); @$description1 = htmlentities($description,ENT_QUOTES,'latin1_swedish_ci'); @$publisher1 = htmlentities($publisher,ENT_QUOTES,'latin1_swedish_ci'); @$pricecover1 = htmlentities($pricecover,ENT_QUOTES,'latin1_swedish_ci'); @$salesright1 = htmlentities($salesright,ENT_QUOTES,'latin1_swedish_ci'); $conn = mysql_connect($dbhost, $dbuser, $dbpw); mysql_select_db($db); $sql = "INSERT INTO ebiblio VALUES ('" . $productcode1 . "','" . $title1 . "','" . $author1 . "','" . $language1 . "','" . $category1 . "','" . $description1 . "','" . $publisher1 . "','" . $pricecover1 . "','" . $salesright1 . "')"; mysql_query($sql, $conn); $reader->next('product'); } endwhile; ?>

我担心，使用XmlReader :: expand（）可能会消耗相当多的内存，当子树不是很小。我不确定这是XmlReader的一个很好的select。但是我同意XmlReader真的很弱，不适合处理复杂的嵌套XML树。我真的不喜欢两件事情：第一，当前节点在属性中没有可访问的XML树中的path，第二，在读取节点时不能运行类似XPath的处理。当然，真正的XPath查询对于大型XML来说是非常耗时的，但是也可以使用“path挂钩”，比如当前元素path匹配（根）子树时，PHP函数/方法会触发。因此，我几年前在XmlReader之上开发了自己的类。他们并不完美，也许我今天会写得更好，但对某些人来说还是有用的：

https://bitbucket.org/sdvpartnership/questpc-framework/src/c481a8b051dbba0a6644ab8a77a71e58119e7441/includes/Xml/Reader/?at=master

我自己构buildXMLpath“node1 / node2”，然后使用PCRE匹配的钩子，这些匹配不如XPath强大，但足以满足我的需求。我用这些类处理了非常复杂的大型XML。

如何在PHP中使用XMLReader？

JavaScript中的JavaScriptparsing器

用Gson把JSON数组parsing成java.util.List

为什么要添加一个方法添加一个模糊的调用，如果它不会涉及到模糊性

如何将C ++string转换为int？

用于Java的HTML / XMLparsing器

while（true）的独特重载分辨率

select正确的IOS XMLparsing器

HTMLparsing如果不使用正则expression式，如何工作？

如何在C＃中编写parsing器？

在Excel VBA中parsingJSON