在Python中通过bs4遍历URL列表

1 投票

1 回答

1799 浏览

提问于 2025-04-17 22:34

我有一个.txt文件（叫做test_1.txt），它的格式如下：

https://maps.googleapis.com/maps/api/directions/xml?origin=Bethesda,MD&destination=Washington,DC&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Miami,FL&destination=Mobile,AL&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Scranton,PA&sensor=false&mode=walking
https://maps.googleapis.com/maps/api/directions/xml?origin=Baltimore,MD&destination=Charlotte,NC&sensor=false&mode=walking

如果你去上面提到的某个链接，你会看到以XML格式输出的内容。下面的代码让我能够遍历到第二个方向请求（从迈阿密到莫比尔），但它打印出来的数据看起来很随机，并不是我想要的。我也能让代码正常工作，当我一次只访问一个URL时，它能准确打印出我需要的数据，直接从代码中读取这个.txt文件。有没有什么原因导致它只访问第二个URL并打印错误的信息呢？下面是Python代码：

import urllib2
from bs4 import BeautifulSoup

with open('test_1.txt', 'r') as f:
    f.readline()
    mapcalc = f.readline()
    response = urllib2.urlopen(mapcalc)
    soup = BeautifulSoup(response)

for leg in soup.select('route > leg'):
    duration = leg.duration.text.strip()
    distance = leg.distance.text.strip()
    start = leg.start_address.text.strip()
    end = leg.end_address.text.strip()
    print duration
    print distance
    print start
    print end

编辑：

这是Python代码在Shell中的输出：

56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA

数据提取数据解析网络爬虫 xml格式文本文件处理错误调试 URL遍历

1 个回答

这里有一个链接，可以帮助你更好地理解打开文件和读取行时的行为，这和Lev Levitsky的评论有关。

一种方法是：

import httplib2
from bs4 import BeautifulSoup

http = httplib2.Http()
with open('test_1.txt', 'r') as f:
    for mapcalc in f:
        status, response = http.request(mapcalc)
        for leg in BeautifulSoup(response):
            duration = leg.duration.text.strip()
            distance = leg.distance.text.strip()
            start = leg.start_address.text.strip()
            end = leg.end_address.text.strip()
            print duration
            print distance
            print start
            print end

f.close()

我对这种事情还很陌生，但我让上面的代码运行成功，得到了以下输出：

4877
1 hour 21 mins
6582
4.1 mi
Bethesda, MD, USA
Washington, DC, USA
56
1 min
77
253 ft
Miami, FL, USA
Mobile, AL, USA
190
3 mins
269
0.2 mi
Chicago, IL, USA
Scranton, PA, USA
12
1 min
15
49 ft
Baltimore, MD, USA
Charlotte, NC, USA

回答于 2025-04-17 由 Python大师

分享举报

在Python中通过bs4遍历URL列表

1 个回答

撰写回答