在Python中通过bs4遍历URL列表

1 投票
1 回答
1799 浏览
提问于 2025-04-17 22:34

我有一个.txt文件(叫做test_1.txt),它的格式如下:

https://maps.googleapis.com/maps/api/directions/xml?origin=Bethesda,MD&destination=Washington,DC&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Miami,FL&destination=Mobile,AL&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Scranton,PA&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Baltimore,MD&destination=Charlotte,NC&sensor=false&mode=walking

如果你去上面提到的某个链接,你会看到以XML格式输出的内容。下面的代码让我能够遍历到第二个方向请求(从迈阿密到莫比尔),但它打印出来的数据看起来很随机,并不是我想要的。我也能让代码正常工作,当我一次只访问一个URL时,它能准确打印出我需要的数据,直接从代码中读取这个.txt文件。有没有什么原因导致它只访问第二个URL并打印错误的信息呢?下面是Python代码:

import urllib2
from bs4 import BeautifulSoup

with open('test_1.txt', 'r') as f:
    f.readline()
    mapcalc = f.readline()
    response = urllib2.urlopen(mapcalc)
    soup = BeautifulSoup(response)

for leg in soup.select('route > leg'):
    duration = leg.duration.text.strip()
    distance = leg.distance.text.strip()
    start = leg.start_address.text.strip()
    end = leg.end_address.text.strip()
    print duration
    print distance
    print start
    print end

编辑:

这是Python代码在Shell中的输出:

56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA

1 个回答

1

这里有一个链接,可以帮助你更好地理解打开文件和读取行时的行为,这和Lev Levitsky的评论有关。

一种方法是:

import httplib2
from bs4 import BeautifulSoup

http = httplib2.Http()
with open('test_1.txt', 'r') as f:
    for mapcalc in f:
        status, response = http.request(mapcalc)
        for leg in BeautifulSoup(response):
            duration = leg.duration.text.strip()
            distance = leg.distance.text.strip()
            start = leg.start_address.text.strip()
            end = leg.end_address.text.strip()
            print duration
            print distance
            print start
            print end

f.close()

我对这种事情还很陌生,但我让上面的代码运行成功,得到了以下输出:

4877
1 hour 21 mins
6582
4.1 mi
Bethesda, MD, USA
Washington, DC, USA
56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA
190
3 mins
269
0.2 mi
Chicago, IL, USA
Scranton, PA, USA
12
1 min
15
49 ft
Baltimore, MD, USA
Charlotte, NC, USA

撰写回答