在csv文件中写入字符串时，如何避免重复的字符串？

#!/usr/bin/env python # -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests import csv url = 'https://ta.wikisource.org/w/index.php?title=அட்டவணை:அ. மருதகாசி-பாடல்கள்.pdf&action=history' content = requests.get(url).content soup = BeautifulSoup(content,'lxml') #getting the uncleaned contributors userBdi = soup.findAll('bdi') #list 2 string uncleanedContributors =''.join(str(userBdi)[1:-1]).replace('</','<').replace('<bdi>','').replace(',','\n').replace(' ','').replace('பக்கம்','அட்டவணை_பேச்சு').replace('Bot','').replace('BOT','') print() print('The output of uncleaned contributors') print('--------------------------------------') print(uncleanedContributors) with open('uncleaned-contributors.csv','a') as csvwrite: csvwriter = csvwrite.write(uncleanedContributors+'\n') content = open('uncleaned-contributors.csv','r').readlines() content4set = set(content) cleanedcontent = open('cleaned-contributors.csv','w') print() print('The output of cleaned contributors') print('--------------------------------------') for i, line in enumerate(content4set,0): cleanedcontent.write("{}.{}".format(str(i+1),line.replace('பக்கம்','அட்டவணை_பேச்சு'))) line=line.strip() print(i, line) cleanedcontent.close()

1条回答

网友
1楼 · 发布于 2024-06-11 17:21:53

以下是解决问题的一种方法：
from bs4 import BeautifulSoup import requests import csv url = 'https://ta.wikisource.org/w/index.php?title=அட்டவணை:அ. மருதகாசி-பாடல்கள்.pdf&action=history' content = requests.get(url).content soup = BeautifulSoup(content,'lxml') #getting the uncleaned contributors userBdi = soup.findAll('bdi') #list 2 string uncleanedContributors =''.join(str(userBdi)[1:-1]).replace('</','<').replace('<bdi>','').replace(',','\n').replace(' ','').replace('பக்கம்','அட்டவணை_பேச்சு').replace('Bot','').replace('BOT','') cleanedcontent = open('cleaned-contributors.csv','w') print() print('The output of cleaned contributors') print(' ') def unique_list(l): ulist = [] [ulist.append(x) for x in l if x not in ulist] return ulist a = ' '.join(unique_list(uncleanedContributors.split())) for i, j in enumerate(a.split(' ')): cleanedcontent.write("{}.{}".format(str(i+1),j.replace('பக்கம்','அட்டவணை_பேச்சு'))) cleanedcontent.write('\n') print(i+1, j) cleanedcontent.close()
执行时
[1]: The output of cleaned contributors 1 Balajijagadesh 2 Info-farmer 3 Tshrinivasan
上面的解决方案代码给出了问题中所需的精确输出，并且能够直接写入CSV文件而不产生任何重复。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章