如何在靓汤中刮经纬度

2024-04-16 17:42:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我对beauthulsoup4相当陌生,从下面的代码中提取html响应中的纬度和经度值时遇到了困难。在

url = 'http://cinematreasures.org/theaters/united-states?page=1' 
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.findAll("tr")
print links

这个代码多次打印出这个响应。在

^{pr2}$

全tr响应

<tr>\n <th id="theater_name"><a href="/theaters/united-states?sort=name&amp;order=desc">\u2191 Name</a> </th>\n <th id="theater_location"><a href="/theaters/united-states?sort=location&amp;order=asc">Location</a> </th>\n <th id="theater_status"><a href="/theaters/united-states?sort=open&amp;order=desc">Status</a> </th>\n <th id="theater_screens"><a href="/theaters/united-states?sort=screens&amp;order=asc">Screens</a> </th>\n</tr>, <tr class="even location theater" data="{id: 0, point: {lng: -94.1751038, lat: 36.0848965}, category: 'open'}">\n <td class="name">\n <a class="map-link" href="/theaters/8775"> <img alt="112 Drive-In" height="48" src="http://photos.cinematreasures.org/production/photos/22137/1313612883/thumb.JPG?1313612883" width="48" /> </a>\n<a class="map-link" href="/theaters/8775">112 Drive-In</a>\n <div class="info-box">\n <div class="photo" style="float: left;"> <a href="/theaters/8775"> <img alt="thumb" height="48" src="http://photos.cinematreasures.org/production/photos/22137/1313612883/thumb.JPG?1313612883" width="48" /> </a> </div>\n <p style="min-width: 200px !important;">\n<strong><a href="/theaters/8775">112 Drive-In</a></strong>\n <br>\n 3352 Highway 112 North <br>Fayetteville, AR 72702 <br>United States <br>479.442.4542 <br>\n</br> </br> </br> </br> </br> </p>\n</div>\n</td>\n <td class="location">\n Fayetteville, AR, United States\n</td>\n <td class="status">\n Open\n</td>\n <td class="screens">\n 1\n</td>\n</tr>

我该如何从该响应中获取lng和lat值?在

提前谢谢你。在


Tags: bridorderlocationsorttrclassunited
3条回答

如果您只希望得到一个响应,请执行以下操作:

print links[0]

好的,所以您正确地获取了所有<tr>,现在我们只需要从它们中获取data属性。在

import re
import requests
from bs4 import BeautifulSoup

url = 'http://cinematreasures.org/theaters/united-states?page=1' 
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
theaters = soup.findAll("tr", class_="theater")
data = [ t.get('data') for t in theaters if t.get('data') ]
print data 

不幸的是,这给了您一个字符串列表,而不是一个人们可能希望的dictionary对象。我们可以在数据字符串上使用正则表达式将其转换为dict(谢谢RootTwo):

^{pr2}$

我的方法是:

import requests
import demjson
from bs4 import BeautifulSoup

url = 'http://cinematreasures.org/theaters/united-states?page=1'
page = requests.get(url)
soup = BeautifulSoup(page.text)

to_plain_coord = lambda d: (d['point']['lng'], d['point']['lat'])
# Grabbing theater coords if `data` attribute exists
coords = [
    to_plain_coord(demjson.decode(t.attrs['data']))
    for t in soup.select('.theater')
    if 'data' in t.attrs]

print(coords)

我不使用任何字符串操作。相反,我从data属性加载JSON。不幸的是,这里不是很有效的JSON,所以我使用demjson库进行JSON解析。在

^{pr2}$

相关问题 更多 >