如何使用SAX解析器解析XML

我正在学习这个教程。

它工作得很好，但我希望它返回一个数组与所有的字符串，而不是与最后一个元素的单个字符串。

任何想法如何做到这一点？

所以你想要构建一个XML解析器来解析一个像这样的RSS提要。

<rss version="0.92"> <channel> <title>MyTitle</title> <link>http://myurl.com</link> <description>MyDescription</description> <lastBuildDate>SomeDate</lastBuildDate> <docs>http://someurl.com</docs> <language>SomeLanguage</language> <item> <title>TitleOne</title> <description><![CDATA[Some text.]]></description> <link>http://linktoarticle.com</link> </item> <item> <title>TitleTwo</title> <description><![CDATA[Some other text.]]></description> <link>http://linktoanotherarticle.com</link> </item> </channel> </rss>

现在您可以使用两个SAX实现。您可以使用org.xml.sax或android.sax实现。我将在发布一个简短的例子之后解释两者的亲和性。

android.sax实现

我们从android.sax实现开始。

首先必须使用RootElement和Element对象定义XML结构。

在任何情况下，我都会使用POJO（Plain Old Java Objects）来处理您的数据。这将是POJO需要的。

Channel.java

 public class Channel implements Serializable { private Items items; private String title; private String link; private String description; private String lastBuildDate; private String docs; private String language; public Channel() { setItems(null); setTitle(null); // set every field to null in the constructor } public void setItems(Items items) { this.items = items; } public Items getItems() { return items; } public void setTitle(String title) { this.title = title; } public String getTitle() { return title; } // rest of the class looks similar so just setters and getters }

这个类实现了Serializable接口，所以你可以把它放入一个Bundle ，并用它来做一些事情。

现在我们需要一个班来保存我们的项目。在这种情况下，我只是要扩展ArrayList类。

Items.java

 public class Items extends ArrayList<Item> { public Items() { super(); } }

那是我们的物品容器。我们现在需要一个课程来保存每一个项目的数据。

Item.java

 public class Item implements Serializable { private String title; private String description; private String link; public Item() { setTitle(null); setDescription(null); setLink(null); } public void setTitle(String title) { this.title = title; } public String getTitle() { return title; } // same as above. }

例：

 public class Example extends DefaultHandler { private Channel channel; private Items items; private Item item; public Example() { items = new Items(); } public Channel parse(InputStream is) { RootElement root = new RootElement("rss"); Element chanElement = root.getChild("channel"); Element chanTitle = chanElement.getChild("title"); Element chanLink = chanElement.getChild("link"); Element chanDescription = chanElement.getChild("description"); Element chanLastBuildDate = chanElement.getChild("lastBuildDate"); Element chanDocs = chanElement.getChild("docs"); Element chanLanguage = chanElement.getChild("language"); Element chanItem = chanElement.getChild("item"); Element itemTitle = chanItem.getChild("title"); Element itemDescription = chanItem.getChild("description"); Element itemLink = chanItem.getChild("link"); chanElement.setStartElementListener(new StartElementListener() { public void start(Attributes attributes) { channel = new Channel(); } }); // Listen for the end of a text element and set the text as our // channel's title. chanTitle.setEndTextElementListener(new EndTextElementListener() { public void end(String body) { channel.setTitle(body); } }); // Same thing happens for the other elements of channel ex. // On every <item> tag occurrence we create a new Item object. chanItem.setStartElementListener(new StartElementListener() { public void start(Attributes attributes) { item = new Item(); } }); // On every </item> tag occurrence we add the current Item object // to the Items container. chanItem.setEndElementListener(new EndElementListener() { public void end() { items.add(item); } }); itemTitle.setEndTextElementListener(new EndTextElementListener() { public void end(String body) { item.setTitle(body); } }); // and so on // here we actually parse the InputStream and return the resulting // Channel object. try { Xml.parse(is, Xml.Encoding.UTF_8, root.getContentHandler()); return channel; } catch (SAXException e) { // handle the exception } catch (IOException e) { // handle the exception } return null; } }

现在这是一个非常快速的例子，你可以看到。使用android.sax SAX实现的主要优点是您可以定义必须解析的XML的结构，然后将事件侦听器添加到适当的元素中。缺点是代码变得相当重复和臃肿。

org.xml.sax实现

org.xml.sax SAX处理程序实现有点不同。

在这里你不指定或者声明你的XML结构，而只是监听事件。使用最广泛的是以下事件：

文件开始
文档结束
元素开始
元素结束
元素开始和元素结束之间的字符

使用上面的Channel对象的示例处理程序实现看起来像这样。

例

 public class ExampleHandler extends DefaultHandler { private Channel channel; private Items items; private Item item; private boolean inItem = false; private StringBuilder content; public ExampleHandler() { items = new Items(); content = new StringBuilder(); } public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { content = new StringBuilder(); if(localName.equalsIgnoreCase("channel")) { channel = new Channel(); } else if(localName.equalsIgnoreCase("item")) { inItem = true; item = new Item(); } } public void endElement(String uri, String localName, String qName) throws SAXException { if(localName.equalsIgnoreCase("title")) { if(inItem) { item.setTitle(content.toString()); } else { channel.setTitle(content.toString()); } } else if(localName.equalsIgnoreCase("link")) { if(inItem) { item.setLink(content.toString()); } else { channel.setLink(content.toString()); } } else if(localName.equalsIgnoreCase("description")) { if(inItem) { item.setDescription(content.toString()); } else { channel.setDescription(content.toString()); } } else if(localName.equalsIgnoreCase("lastBuildDate")) { channel.setLastBuildDate(content.toString()); } else if(localName.equalsIgnoreCase("docs")) { channel.setDocs(content.toString()); } else if(localName.equalsIgnoreCase("language")) { channel.setLanguage(content.toString()); } else if(localName.equalsIgnoreCase("item")) { inItem = false; items.add(item); } else if(localName.equalsIgnoreCase("channel")) { channel.setItems(items); } } public void characters(char[] ch, int start, int length) throws SAXException { content.append(ch, start, length); } public void endDocument() throws SAXException { // you can do something here for example send // the Channel object somewhere or whatever. } }

现在说实话，我真的不能告诉你在android.sax之上的这个处理程序实现的任何真正的优势。不过，我可以告诉你现在应该很明显的缺点。看看startElement方法中的else if语句。由于我们有标签<title> ， link和description的事实，我们必须在目前的XML结构中跟踪。也就是说，如果遇到<item>起始标签，我们将inItem标志设置为true以确保将正确的数据映射到正确的对象;如果遇到</item>标签， endElement方法将该标志设置为false 。为了表明我们已经完成了该物品标签。

在这个例子中，管理起来非常容易，但是不得不解析一个更复杂的结构，在不同层次上重复标签就变得非常棘手。在那里，你不得不使用Enums来设置你的当前状态，并且有很多switch / case statemenets来检查你的位置，或者一个更优雅的解决方案是使用标签堆栈的某种标签跟踪器。

在许多问题中，有必要针对不同的目的使用不同种类的xml文件。我不会试图去掌握这个无限的东西，并且从我自己的经验中得出我需要的一切。

Java也许是我最喜欢的编程语言。另外，这个爱情是通过你可以解决任何问题而得到加强的，拿出一辆自行车是没有必要的。

所以，我花了很多时间去创建一个运行数据库的客户端服务器，这个客户端服务器允许客户端远程地在数据库服务器上创建条目。不用检查输入数据等等，但不是那样的。

作为工作原理，我毫不犹豫地选择了以xml文件的形式传输信息。以下几种类型：

 <? xml version = "1.0" encoding = "UTF-8" standalone = "no"?> <doc> <id> 3 </ id> <fam> Ivanov </ fam> <name> Ivan </ name> <otc> I. </ otc> <dateb> 10-03-2005 </ dateb> <datep> 10-03-2005 </ datep> <datev> 10-03-2005 </ datev> <datebegin> 09-06-2009 </ datebegin> <dateend> 10-03-2005 </ dateend> <vdolid> 1 </ vdolid> <specid> 1 </ specid> <klavid> 1 </ klavid> <stav> 2.0 </ stav> <progid> 1 </ progid> </ doc>

除了说是关于医生机构的信息外，更容易进一步阅读。姓，名，唯一ID等。一般来说，数据系列。这个文件安全地在服务器端，然后开始解析文件。

在这两个选项解析（SAX vs DOM）中，我选择了SAX的观点，他的作品更加明亮，而且他是第一个落入了我手中的:)

所以。如您所知，为了成功使用解析器，我们需要重写所需的方法DefaultHandler。首先，连接所需的软件包。

 import org.xml.sax.helpers.DefaultHandler; import org.xml.sax. *;

现在我们可以开始编写我们的解析器

 public class SAXPars extends DefaultHandler {  ... }

我们从startDocument（）方法开始。他，顾名思义，对文档开始的事件作出反应。在这里，你可以挂起各种各样的动作，比如内存分配，或者重置值，但是我们的例子很简单，所以只需要标记一个合适的消息的开始：

 Override public void startDocument () throws SAXException {  System.out.println ("Start parse XML ..."); }

下一个。解析器遍历文档符合其结构的元素。启动方法startElement（）。而实际上，他的外表如下：startElement（String namespaceURI，String localName，String qName，Attributes atts）。这里namespaceURI – 命名空间，localName – 元素的本地名称，qName – 本地名称与命名空间的组合（用冒号分隔）和atts – 这个元素的属性。在这种情况下，一切都很简单。只需使用qName'om并将其放入某个服务行thisElement就足够了。因此，我们标记出我们现在的元素。

 @Override public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {  thisElement = qName; }

接下来，会议项目我们得到它的意义。这里包括方法characters（）。他具有以下形式：字符（char [] ch，int start，int length）。那么这里一切都很清楚。 ch – 这个元素中包含字符串本身自我重要性的文件。开始和长度 – 指示行和起始点的服务的数量和长度。

 @Override public void characters (char [] ch, int start, int length) throws SAXException {  if (thisElement.equals ("id")) {     doc.setId (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("fam")) {     doc.setFam (new String (ch, start, length));  }  if (thisElement.equals ("name")) {     doc.setName (new String (ch, start, length));  }  if (thisElement.equals ("otc")) {     doc.setOtc (new String (ch, start, length));  }  if (thisElement.equals ("dateb")) {     doc.setDateb (new String (ch, start, length));  }  if (thisElement.equals ("datep")) {     doc.setDatep (new String (ch, start, length));  }  if (thisElement.equals ("datev")) {     doc.setDatev (new String (ch, start, length));  }  if (thisElement.equals ("datebegin")) {     doc.setDatebegin (new String (ch, start, length));  }  if (thisElement.equals ("dateend")) {     doc.setDateend (new String (ch, start, length));  }  if (thisElement.equals ("vdolid")) {     doc.setVdolid (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("specid")) {     doc.setSpecid (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("klavid")) {     doc.setKlavid (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("stav")) {     doc.setStav (new Float (new String (ch, start, length)));  }  if (thisElement.equals ("progid")) {     doc.setProgid (new Integer (new String (ch, start, length)));  } }

是啊。我差点忘了。作为对象将折叠naparsennye数据说话的类型的医生。这个类是定义的，并具有所有必要的setter-getters。

下一个显而易见的元素结束，接着是下一个。负责结束endElement（）。它向我们表明该项目已经结束，你现在可以做任何事情。将继续。清洁元素。

 @Override public void endElement (String namespaceURI, String localName, String qName) throws SAXException {  thisElement = ""; }

如此整个文件，我们来到文件的末尾。工作endDocument（）。在这里，我们可以释放内存，做一些诊断，打印等等。在我们的例子中，只需要写一下解析结束。

 @Override public void endDocument () {  System.out.println ("Stop parse XML ..."); }

所以我们有一个类来解析xml我们的格式。以下是全文：

 import org.xml.sax.helpers.DefaultHandler; import org.xml.sax. *; public class SAXPars extends DefaultHandler { Doctors doc = new Doctors (); String thisElement = ""; public Doctors getResult () {  return doc; } @Override public void startDocument () throws SAXException {  System.out.println ("Start parse XML ..."); } @Override public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {  thisElement = qName; } @Override public void endElement (String namespaceURI, String localName, String qName) throws SAXException {  thisElement = ""; } @Override public void characters (char [] ch, int start, int length) throws SAXException {  if (thisElement.equals ("id")) {     doc.setId (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("fam")) {     doc.setFam (new String (ch, start, length));  }  if (thisElement.equals ("name")) {     doc.setName (new String (ch, start, length));  }  if (thisElement.equals ("otc")) {     doc.setOtc (new String (ch, start, length));  }  if (thisElement.equals ("dateb")) {     doc.setDateb (new String (ch, start, length));  }  if (thisElement.equals ("datep")) {     doc.setDatep (new String (ch, start, length));  }  if (thisElement.equals ("datev")) {     doc.setDatev (new String (ch, start, length));  }  if (thisElement.equals ("datebegin")) {     doc.setDatebegin (new String (ch, start, length));  }  if (thisElement.equals ("dateend")) {     doc.setDateend (new String (ch, start, length));  }  if (thisElement.equals ("vdolid")) {     doc.setVdolid (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("specid")) {     doc.setSpecid (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("klavid")) {     doc.setKlavid (new Integer (new String (ch, start, length)));  }  if (thisElement.equals ("stav")) {     doc.setStav (new Float (new String (ch, start, length)));  }  if (thisElement.equals ("progid")) {     doc.setProgid (new Integer (new String (ch, start, length)));  } } @Override public void endDocument () {  System.out.println ("Stop parse XML ..."); } }

我希望这个主题能够帮助我们轻松呈现SAX解析器的本质。

不要严格判断第一篇文章:)我希望这是至少有用的。

UPD：要运行这个解析器，你可以使用下面的代码：

 SAXParserFactory factory = SAXParserFactory.newInstance (); SAXParser parser = factory.newSAXParser (); SAXPars saxp = new SAXPars (); parser.parse (new File ("..."), saxp);

如何使用SAX解析器解析XML

具有优先级的等式（表达式）解析器？

解析iPhone上的HTML

PHP解析HTML代码

JSONException：类型java.lang.String的值不能转换为JSONObject

在Bash中转换文件的有效方法

解析逗号分隔的std :: string

用于Java的SQL解析器库

将URI字符串解析为名称 – 值集合