Python打印函数sid

2024-04-20 13:17:35 发布

男 | 程序猿一只，喜欢编程写python代码。

我用lxml来解析一些带有俄语字母的HTML。这就是为什么我对编码感到头疼。我使用以下代码将html文本转换为树。然后我尝试使用css查询从页面中提取一些内容（页眉、arcticle内容）。在

from lxml import html
from bs4 import UnicodeDammit

doc = UnicodeDammit(html_text, is_html=True)
parser = html.HTMLParser(encoding=doc.original_encoding)
tree = html.fromstring(html_text, parser=parser)

...

def extract_title(tree):
   metas = tree.cssselect("meta[property^=og]")
   for meta in metas:       
       # print(meta.attrib)
       # print(sys.stdout.encoding)
       # print("123")    # Uncomment this to fix error
       content = meta.attrib['content']
       print(content.encode('utf-8')) # This fails with "[Decode error - output not utf-8]"

当我试图将unicode符号打印到stdout时，会出现“Decode error”。但是如果我在打印失败之前添加一些print语句，那么一切都会正常工作。我从未见过python print函数有如此奇怪的行为。我以为它没有副作用。你知道为什么会这样吗？我使用Windows和Sublime来运行这些代码。在

Tags：代码 from import tree parser 内容 doc html

0条回答

目前没有回答

Python打印函数sid

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python打印函数sid

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >