无法将当前的\u url与Python中的列表进行比较(&S)

2024-04-25 12:42:06 发布

您现在位置:Python中文网/ 问答频道 /正文

如果current_url已经在列表中,我试图跳过它,但是遇到了一条错误消息

我的目标是抓取一个网页,将该网页添加到一个文本文件中,然后当我重新开始抓取时,我想将要抓取的网页与列表中的网页进行比较。当网页处于空白状态时,我想跳过它

但是这个问题突然出现,它无法将current_url与列表进行比较: 这段代码:

if cur_url in visited_urls:

完整代码:

打开文本文件

visited_urls = 'C:/webdrivers/visited_p_urls_test.txt' # This specific the location of the text file on the PC
cur_url = driver.current_url

# Go to main test website
driver.get('https://www.google.com')
sleep(randint(1,3))

with open(visited_urls, 'a') as filehandle: # This opens a text file with the "Append to (add) mode."
    filehandle.write('\n' + cur_url)

# Go to main test website
driver.get('https://adwords.google.com/home/')
sleep(randint(1,3))

with open(visited_urls, 'a') as filehandle: # This opens a text file with the "Append to (add) mode."
    filehandle.write('\n' + cur_url)

driver.get('https://adwords.google.com/home/tools/')
sleep(randint(1,3))

with open(visited_urls, 'a') as filehandle: # This opens a text file with the "Append to (add) mode."
    filehandle.write('\n' + cur_url)

if cur_url in visited_urls:
    print 'I CANNOT comment because I already did before' 
else:
    print 'I can comment'

with open(visited_urls, 'r') as filehandle: # This opens a text file with the "Read" mode.
    filecontent = filehandle.readlines()    # readlines reads ALL lines in a text file
    for line in filecontent:
        print(line)

我收到以下错误消息:

TypeError: 'str' object is not callable

Tags: thetotextinurl网页driverwith
1条回答
网友
1楼 · 发布于 2024-04-25 12:42:06

您试图在文件路径(C:/webdrivers/visited_p_urls_test.txt)中搜索字符串,但必须在文件内容中搜索:

if 'blabla' in open('example.txt').read():
    print("true")

就你而言:

# open file, read it's content and only then search if cur_url exists already in a file
if cur_url in open(visited_urls).read():
    print 'I CANNOT comment because I already did before' 
else:
    print 'I can comment'

相关问题 更多 >