使用Python解析HTML

网友

1楼 · 编辑于 2024-04-25 16:38:59

在这里，您可以阅读更多关于Python中不同HTML解析器及其性能的信息。尽管这篇文章有点过时了，但它还是给了你一个很好的概述。

即使不是内置的，我还是推荐美容师组。只是因为这样做很容易。例如：

import urllib2
from BeautifulSoup import BeautifulSoup

page = urllib2.urlopen('http://www.google.com/')
soup = BeautifulSoup(page)

x = soup.body.find('div', attrs={'class' : 'container'}).text

网友

2楼 · 编辑于 2024-04-25 16:38:59

我想你要找的是pyquery：

pyquery: a jquery-like library for python.

你想要的一个例子可能是：

from pyquery import PyQuery    
html = # Your HTML CODE
pq = PyQuery(html)
tag = pq('div#id') # or     tag = pq('div.class')
print tag.text()

它使用与Firefox或Chrome的inspect元素相同的选择器。例如：

the element selector is 'div#mw-head.noprint'

被检查的元件选择器是“div#mw head.noprint”。所以在pyquery中，只需要传递这个选择器：

pq('div#mw-head.noprint')

网友

3楼 · 编辑于 2024-04-25 16:38:59

So that I can ask it to get me the content/text in the div tag with class='container' contained within the body tag, Or something similar.

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup
html = #the HTML code you've written above
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)

我想你不需要性能描述-只要看看美组是如何工作的。看看它的official documentation。

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Python解析HTML

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >