我们可以用BeautifulSoup来使用xpath吗？

我正在使用BeautifulSoup刮一个url，我有以下代码

import urllib import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.example.com/servlet/av/ResultTemplate=AVResult.html" req = urllib2.Request(url) response = urllib2.urlopen(req) the_page = response.read() soup = BeautifulSoup(the_page) soup.findAll('td',attrs={'class':'empformbody'})

现在在上面的代码中，我们可以使用findAll来获取与它们相关的标签和信息，但是我想使用xpath，如果可能的话，可以使用xpath和BeautifulSoup，任何人都可以给我一个示例代码，以便它更有帮助。

不，BeautifulSoup本身不支持XPathexpression式。

另一个库lxml支持XPath 1.0。它有一个BeautifulSoup兼容模式，它会尝试和汤parsing方式破解HTML。但是，默认的lxml HTMLparsing器在分析破碎的HTML时也是一样的，而且我相信速度更快。

一旦将文档parsing到lxml树中，就可以使用.xpath()方法来search元素。

 import urllib2 from lxml import etree url = "http://www.example.com/servlet/av/ResultTemplate=AVResult.html" response = urllib2.urlopen(url) htmlparser = etree.HTMLParser() tree = etree.parse(response, htmlparser) tree.xpath(xpathselector)

您可能感兴趣的是CSSselect器支持 ; CSSSelector类将CSS语句转换为XPathexpression式，使您更容易searchtd.empformbody ：

 from lxml.cssselect import CSSSelector td_empformbody = CSSSelector('td.empformbody') for elem in td_empformbody(tree): # Do something with these table cells.

完整的圈子：BeautifulSoup本身也有相当不错的CSSselect器支持：

 for cell in soup.select('table#foobar td.empformbody'): # Do something with these table cells.

我可以确认美丽的汤内没有XPath的支持。

Martijn的代码不能正常工作（现在已经有4年多了）， etree.parse()行会打印到控制台，并且不会将值分配给treevariables。引用这个，我能够弄清楚这个使用请求和lxml的作品：

 from lxml import html import requests page = requests.get('http://econpy.pythonanywhere.com/ex/001.html') tree = html.fromstring(page.content) #This will create a list of buyers: buyers = tree.xpath('//div[@title="buyer-name"]/text()') #This will create a list of prices prices = tree.xpath('//span[@class="item-price"]/text()') print 'Buyers: ', buyers print 'Prices: ', prices

BeautifulSoup从当前的元素定向childern有一个名为findNext的函数，所以：

 father.findNext('div',{'class':'class_value'}).findNext('div',{'id':'id_value'}).findAll('a')

以上代码可以模仿下面的xpath：

 div[class=class_value]/div[id=id_value]

我搜遍了他们的文档，似乎没有xpath选项。另外，正如你在这里看到的关于SO的类似问题，OP要求从xpath到BeautifulSoup的翻译，所以我的结论是 – 不，没有可用的xpathparsing。

我们可以用BeautifulSoup来使用xpath吗？

使用BeautifulSoup来查找包含特定文本的HTML标签

UnicodeEncodeError：'charmap'编解码器不能编码字符

beautifulsoup findAll find_all

如何按类查找元素

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。你需要安装一个parsing器库吗？

为什么使用BeautifulSoup和IDLE获得recursion错误？

我可以使用BeautifulSoup删除脚本标记吗？

如何使用美丽的汤find节点的孩子

用pip安装美丽的汤

我们可以用BeautifulSoup来使用xpath吗？

使用BeautifulSoup来查找包含特定文本的HTML标签

UnicodeEncodeError：'charmap'编解码器不能编码字符

beautifulsoup findAll find_all

如何按类查找元素

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。 你需要安装一个parsing器库吗？

为什么使用BeautifulSoup和IDLE获得recursion错误？

我可以使用BeautifulSoup删除脚本标记吗？

如何使用美丽的汤find节点的孩子

用pip安装美丽的汤

bs4.FeatureNotFound：找不到具有您请求的function的树生成器：lxml。你需要安装一个parsing器库吗？