向.tx写入美化组输出时出现问题

import urllib2 from bs4 import BeautifulSoup import re import codecs # Ask user to enter URL url = raw_input("Please enter a valid URL: ") # Make sure file is clear for new content open('ctp_output.txt', 'w').close() # Open txt document for output txt = open('ctp_output.txt', 'w') # Parse HTML of article, aka making soup soup = BeautifulSoup(urllib2.urlopen(url).read()) # retrieve all of the paragraph tags with open('ctp_output.txt', 'w'): for tag in soup.find_all('p'): txt.write(tag.text.encode('utf-8') + '\n' + '\n') # Close txt file with new content added txt.close()

3条回答

网友

1楼 · 编辑于 2024-05-14 23:35:28

如果你在休·博思韦尔的修改之后得到了UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-23: ordinal not in range(128)，那么还需要执行以下操作

使用codecs.open()或io.open()使用适当的文本编码（即encoding=“…”）打开文本文件，而不是使用open（）打开bytefile。在

fp=codecs.open('ctp_output.txt', 'w',encoding="utf-8")

就写吧

^{pr2}$

您需要import codecs来完成此操作

网友

2楼 · 编辑于 2024-05-14 23:35:28

几个问题：

for tag in tags:
    f.write(tag.get_text() + '\n' + '\n')

需要进一步缩进（它应该是with open('ctp_output.txt', 'w') as f:的子级

^{pr2}$

是多余的-with语句已经确保文件被关闭

我看不出输出中有什么遗漏-你能举出一个消失的句子吗？在

编辑：这看起来像是Python3的问题-它在python2.7.5中完美地工作

编辑2:已修复str.解码（）：

你的代码可以简化为

from bs4 import BeautifulSoup

import sys
if sys.hexversion < 0x3000000:
    # Python 2.x
    from urllib2 import urlopen
    inp = raw_input
else:
    # Python 3.x
    from urllib.request import urlopen
    inp = input

def get_paras(url):
    page = urlopen(url).read().decode('utf-8')
    soup = BeautifulSoup(page)
    return [para.get_text() for para in soup('p')]

def write_lst(f, lst, fmt="{}\n\n".format):
    for item in lst:
        f.write(fmt(item))

def main():
    url   = inp("Please enter a fully qualified URL: ")
    fname = inp("Please enter the output file name: ")

    with open(fname, "w") as outf:
        write_lst(outf, get_paras(url))

if __name__=="__main__":
    main()

网友

3楼 · 编辑于 2024-05-14 23:35:28

我猜你应该用utf8编码你要写的东西：

to_write = tag.get_text() + "\n"
f.write(to_write.encode("utf-8"))

那是我最近的广告。在

相关问题更多 >

编程相关推荐

热门问题

热门文章