Python替换tex

words = ['Shop','Car','Home','Generic','Elements'] page = urllib.urlopen("html1/index.html").read() soup = BeautifulSoup(page, 'html.parser') texts = soup.findAll(text=True) for i in texts : if i == words : i = '***' print i

Traceback (most recent call last): File "replacing.py", line 28, in <module> print i File "F:\Python\Python27\lib\encodings\cp852.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 25: character maps to <undefined>

2条回答

网友

1楼 · 编辑于 2024-04-19 22:51:52

你有两个主要问题。第一个是编码问题，您试图打印不可打印的字符。为此，您可以使用以下内容中的答案：

UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function

或者，更深入的解释是：

Python, Unicode, and the Windows console（现在我看得更多了，它可能已经过时了，但仍然是一本有趣的读物）。你知道吗

但是，您的代码也存在逻辑问题。你知道吗

if i == words:

这一行不检查i是否在words中找到，而是将i与单词列表进行比较，这不是您想要的。我建议进行以下更改：

words = {'Shop','Car','Home','Generic','Elements'}

for i in texts:
    if i in words:
        i = '***'

将words转换为^{}允许平均O(1)查找，并使用if i in words检查是否在单词中找到i。你知道吗

网友

2楼 · 编辑于 2024-04-19 22:51:52

在python用于打印消息的编解码器中，似乎找不到要打印的字符之一。也就是说，你有一个字符的数据，但你不知道它应该是什么符号，所以你不能打印它。简单地将HTML转换为unicode格式就可以解决您的问题。你知道吗

关于如何做到这一点的好问题：

Convert HTML entities to Unicode and vice versa

相关问题更多 >

编程相关推荐

热门问题

热门文章