用带有口音和不同人物的靓汤

from BeautifulSoup import BeautifulSoup import urllib2 response = urllib2.urlopen('http://www.databaseolympics.com/sport/sportevent.htm?sp=FEN&enum=130') html = response.read() soup = BeautifulSoup(html) g = open('fencing_medalists.csv','w"') t = soup.findAll("table", {'class' : 'pt8'}) for table in t: rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') for td in cols: theText=str(td.find(text=True)) #theText=str(td.find(text=True)).encode("utf-8") if theText!="None": g.write(theText) else: g.write("") g.write(",") g.write("\n")

1条回答

网友

1楼 · 发布于 2024-04-27 00:19:05

如果您正在处理unicode，请始终将从磁盘或网络读取的响应视为字节包，而不是字符串。

CSV文件中的文本可能是utf-8编码的，应该先解码。

import codecs
# ...
content = response.read()
html = codecs.decode(content, 'utf-8')

此外，您还需要在将unicode文本写入输出文件之前将其编码为utf-8。使用codecs.open打开输出文件，指定编码。它将透明地为您处理输出编码。

^{pr2}$

并对字符串编写代码进行以下更改：

    theText = td.find(text=True)
    if theText is not None:
        g.write(unicode(theText))

编辑：BeautifulSoup可能会automatic unicode decoding，因此您可以跳过响应中的codecs.decode。

相关问题更多 >

编程相关推荐

热门问题

热门文章