用Python解析网站(XML)中特定内容并保存到MySQL
我想向Flickr的API发送一个REST请求。返回的结果是这样的(XML格式):
This XML file does not appear to have any style information associated with it. The
document tree is shown below.
<rsp stat="ok">
<photos page="1" pages="974001" perpage="250" total="243500161">
<photo id="123" owner="1234" secret="123" server="1" farm="4"
title="DSC01316" ispublic="1" isfriend="0" isfamily="0" views="0" tags=""
latitude="47.825188" longitude="11.300722" accuracy="16" context="0"
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0"
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>
<photo id="123" owner="123" secret="123" server="1" farm="3"
title="DSC01351" ispublic="1" isfriend="0" isfamily="0" views="0" tags=""
latitude="47.825263" longitude="11.300891" accuracy="16" context="0"
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0"
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>
and so forth...
我希望Python能从这个网站中提取出“照片ID”、“拥有者”、“标题”等词,并把这些信息保存到一个已经用phpadmin设置好的MySQL数据库里。
为了更好地理解:我有一个表格,第一行是我的分类,第二行是从示例中提取的数据。
Photo ID Owner Secret Server Farm Title ispublic isfriend isfamily ....
123 1234 123 1 4 DSC01316 1 0 0
我开始用这个方法来提取信息,但它并没有成功……
import xml.etree.ElementTree as ET
import requests
url="https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description%22"
page=requests.get(url)
data = page.text
root = ET.fromstring(data)
for x in root.Element.get('photo'):
test = x.get('Photo ID', 'Owner', 'Secret' , 'Server' , 'Farm' , 'Title' , 'ispublic' , 'isfriend' , 'isfamily')
print (test)
#does not work. it says: AttributeError: 'Element' object has no attribute 'Element'
有没有什么建议?我只是想要一个提示,我想自己写代码!请注意,我对Python还比较陌生,给我一个文档链接对我来说没什么用,因为我知识太少了。我需要进一步的解释。谢谢!
1 个回答
1
BeautifulSoup4 让你更容易解析 XML 或 HTTP 文档。你可以在安装这个包之后,试试下面的代码,安装方法是用 pip install beautifulsoup4
。
from bs4 import BeautifulSoup
xml = "..."
soup = BeautifulSoup(xml)
for photo in soup.find_all('photo'):
print(photo.attrs['title'])
然后你会得到,
DSC01316
DSC01351
想了解更多信息,可以查看 http://www.crummy.com/software/BeautifulSoup/bs4/doc/。