将元素添加到字典中断编码

from bs4 import BeautifulSoup import urllib #csv is for the csv writer import csv #intended data structure is list of dictionaries # holder = [{'headline': TheHeadline, 'url': TheURL, 'date1': Date1, 'date2': Date2, 'date3':Date3}, {'headline': TheHeadline, 'url': TheURL, 'date1': Date1, 'date2': Date2, 'date3':Date3}) #initiates the dictionary to hold the output holder = [] txt_contents = "http://sousuo.gov.cn/s.htm?q=&n=80&p=&t=paper&advance=true&title=&content=&puborg=&pcodeJiguan=%E5%9B%BD%E5%8F%91&pcodeYear=2016&pcodeNum=&childtype=&subchildtype=&filetype=&timetype=timeqb&mintime=&maxtime=&sort=pubtime&nocorrect=&sortType=1" #opens the output doc output_txt = open("output.txt", "w") #opens the output doc output_txt = open("output.txt", "w") def headliner(url): #opens the url for read access this_url = urllib.urlopen(url).read() #creates a new BS holder based on the URL soup = BeautifulSoup(this_url, 'lxml') #creates the headline section headline_text = '' #this bundles all of the headlines headline = soup.find_all('h3') #for each individual headline.... for element in headline: headline_text += ''.join(element.findAll(text = True)).encode('utf-8').strip() #this is necessary to turn the findAll output into text print element text = element.text.encode('utf-8') #prints each headline print text print "*******" #creates the dictionary for just that headline temp_dict = {} #puts the headline in the dictionary temp_dict['headline'] = text #appends the temp_dict to the main list holder.append(temp_dict) output_txt.write(str(text)) #output_txt.write(holder) headliner(txt_contents) print holder output_txt.close()

1条回答

网友

1楼 · 发布于 2024-04-23 17:24:32

编码没有出错。只是表达同一事物的不同方式：

>>> s = '漢字'
>>> s
'\xe6\xbc\xa2\xe5\xad\x97'
>>> print(s)
漢字
>>> s.__repr__()
"'\\xe6\\xbc\\xa2\\xe5\\xad\\x97'"
>>> s.__str__()
'\xe6\xbc\xa2\xe5\xad\x97'
>>> print(s.__repr__())
'\xe6\xbc\xa2\xe5\xad\x97'
>>> print(s.__str__())
漢字

要知道的最后一点是，当您将对象放入容器中时，它会打印repr，以在容器的表示中表示容器中的那些对象：

>>> ls = [s]
>>> print(ls)
['\xe6\xbc\xa2\xe5\xad\x97']

如果我们定义自己的自定义对象，也许会更清楚：

>>> class A(object):
...     def __str__(self):
...         return "str"
...     def __repr__(self):
...         return "repr"
...
>>> A()
repr
>>> print(A())
str
>>> ayes  = [A() for _ in range(5)]
>>> ayes
[repr, repr, repr, repr, repr]
>>> print(ayes[0])
str
>>>

相关问题更多 >

编程相关推荐

热门问题

热门文章