改进循环发送正确数据的方法

2024-04-28 04:51:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图抓取多个PDF文件以进行数据处理,但是当我试图将num发送给给定的查询时,像这样https://www.google.com/search?q=filetype:PDF+%PDF-+aa&num=100&start=0 接下来&start=1等等,但我总是有相同的值5

import string
ext = "pdf"
magic_header = "%PDF-"
ltrs = string.ascii_lowercase
build_query = [''.join([a,b]) for a in ltrs for b in ltrs]
max_results = 10
counter = 0
while counter < max_results:
    while True:
        if counter == 0:
            for query in build_query:
                print('https://www.google.com/search?q=filetype:{}+{}+{}&num=100&start={}'.format(ext, magic_header, query,counter))
            break
        print(counter)
        counter += 1 
    break

Tags: inhttpscomforsearchstringpdfwww
2条回答

问题是while True循环、if counter == 0break语句的使用。这将确保counter在循环期间始终递增

import string

ext = "pdf"
magic_header = "%PDF-"
ltrs = string.ascii_lowercase
build_query = ["".join([a, b]) for a in ltrs for b in ltrs]
max_results = 10
counter = 0
while counter < max_results:
    for query in build_query:
        print(
            "https://www.google.com/search?q=filetype:{}+{}+{}&num=100&start={}".format(
                ext, magic_header, query, counter
            )
        )
    counter += 1

编辑以下有关讨论的内容:

import string

ext = "pdf"
magic_header = "%PDF-"
ltrs = string.ascii_lowercase
build_query = ["".join([a, b]) for a in ltrs for b in ltrs]
max_results = 10
counter = 0
while True:
    for query in build_query:
        print(
            "https://www.google.com/search?q=filetype:{}+{}+{}&num=100&start={}".format(
                ext, magic_header, query, counter
            )
        )

    if counter < max_results:
        break
    else:
        counter += 1

为什么循环如此复杂

这里有一个简单的解决方案。我已要求您只对列表中的前10项进行大查询

import string
ext = "pdf"
magic_header = "%PDF-"
ltrs = string.ascii_lowercase
build_query = [''.join([a,b]) for a in ltrs for b in ltrs][1:10]
max_results = 5
counter = 0

while counter <= max_results:
    for query in build_query:
            print('https://www.google.com/search?q=filetype:{}+{}+{}&num=100&start={}'.format(ext, magic_header,query,counter))
    counter +=1

相关问题 更多 >