文件中存在unicode时，文件未在python中创建

import os from urllib import urlopen from bs4 import BeautifulSoup url= "http://www.mathrubhumi.com/sports/story.php?id=397111" raw = urlopen(url).read() soup = BeautifulSoup(raw,'lxml') texts = soup.findAll(text=True) name = soup.title.text name= name+'.txt' def contains_unicode(text): try: str(text) except: return True return False result = ''.join((text for text in texts if contains_unicode(text))) # Output to a file with open(os.path.join('/home/user1/textfiles',name,'w') as out: out.write(result)

1条回答

网友

1楼 · 发布于 2024-04-25 23:23:23

我尝试了这个方法，它成功了，它创建了一个名为Mathrub*.txt的文件，其中包含当前目录中的一些文本。你知道吗

import codecs
import os
from urllib import urlopen
from bs4 import BeautifulSoup
url= "http://www.mathrubhumi.com/sports/story.php?id=397111"
raw = urlopen(url).read()
soup = BeautifulSoup(raw,'lxml')
texts = soup.findAll(text=True)
name = soup.title.string
name= name+'.txt'
def contains_unicode(text):
    try:
        str(text)
    except:
        return True
    return False

result = ''.join((text for text in texts if contains_unicode(text)))
# Output to a file
with codecs.open(name,'w',encoding="utf-8") as out:
    out.write(result)

在添加编解码器部分之前，它大声抱怨试图编写一些它不知道如何解释的字符。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章