使用BeautifulSoup 4在whoscall.in进行的爬取问题

if(website == "1"): reqInput = "http://whoscall.in/1/%s/" % (teleWho) urlfile = urllib2.Request(reqInput) print (reqInput) time.sleep(1) requestRec = requests.get(reqInput) soup = BeautifulSoup(requestRec.content, "lxml") noMatch = soup.find(text=re.compile(r"no reports yet on the phone number")) print(requestRec.content)# #only if needed# type(noMatch) is str if noMatch is None: worksheet.write(idx+1, 2, "Got a hit") howMany = soup.find_all('img',{'src':'/default-avatar.gif'}) howManyAreThere = len(howMany) worksheet.write(idx+1,1,howManyAreThere) print (howManyAreThere) scamNum = soup.find_all(text=("scam"),recursive=True) #,'scam','Scammer','scammer'# scamCount = len(scamNum) print(scamNum) searchTerms = {scamCount:scamCount} sentiment = max(searchTerms, key=searchTerms.get) worksheet.write(idx+1,3,sentiment)

1条回答

网友

1楼 · 发布于 2024-04-25 03:38:05

更改此行：

scamNum = soup.find_all(text=("scam"),recursive=True)

收件人：

scamNum = [ div.text for div in soup.find_all('div', {'style':'font-size:14px; margin:10px; overflow:hidden'}) if 'scam' in div.text.lower() ]

对多个单词尝试以下方法：

words = [ 'word1', 'word2', ... ]
scamNum = [ div.text for div in soup.find_all('div', {'style':'font-size:14px; margin:10px; overflow:hidden'}) if any( word for word in words if word in div.text.lower()) ]

相关问题更多 >

编程相关推荐

热门问题

热门文章