如何用Python选择网页的具体表格

import urllib from bs4 import BeautifulSoup url = 'http://stock.finance.sina.com.cn/hkstock/finance/00759.html' html = urllib.urlopen(url).read() #.read() mean read all into a string soup = BeautifulSoup(html, "lxml") table = soup.find("table", { "class" : "tab05" }) for row in table.findAll("tr"): print row.findAll("td")

2条回答

网友

1楼 · 编辑于 2024-04-18 20:44:19

谢谢你的答复。我可能误解了你的意思。我将代码重写如下：

tables = soup.findAll("table", { "class" : "tab05" })

print len(tables)

for row in tables[0].findAll("tr"):
    for col in row.findAll("td"):
        print col.getText()

“len（tables）”的结果是1。只能访问第一个表。我还发现如果我使用

^{pr2}$

我无法得到那张桌子的全部信息。从这个代码得到的最后一个数字是“-45.7852”，这只是该表的一半。在

网友

2楼 · 编辑于 2024-04-18 20:44:19

该网站称，这四个表的类名都是“tab05”。在

因此，您只需在var soup处将.find方法更改为.findAll，然后就可以访问所有四个表。在

import urllib
from bs4 import BeautifulSoup

url = 'http://stock.finance.sina.com.cn/hkstock/finance/00759.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, "lxml")

tables = soup.findAll("table", { "class" : "tab05" })
print len(tables) #4

for table in tables:
    for row in table.findAll("tr"):
        for col in row.findAll("td"):
            print col.getText()

对于简体中文的编码，print col.getText()将在终端上得到正确的单词。如果要将它们写入文件，则必须将字符串编码为gb2312。在

^{pr2}$

对于第三个问题，因为数据是用javascript函数来呈现的数据表.js，我认为不可能简单地通过urllib来获得所有这些。最好去别的图书馆看看，找到合适的用法。在

相关问题更多 >

编程相关推荐

热门问题

热门文章