用Python解析网站(XML)中特定内容并保存到MySQL

0 投票
1 回答
677 浏览
提问于 2025-04-18 13:29

我想向Flickr的API发送一个REST请求。返回的结果是这样的(XML格式):

This XML file does not appear to have any style information associated with it. The 
document tree is shown below.

<rsp stat="ok">
<photos page="1" pages="974001" perpage="250" total="243500161">

<photo id="123" owner="1234" secret="123" server="1" farm="4" 
title="DSC01316" ispublic="1" isfriend="0" isfamily="0" views="0" tags="" 
latitude="47.825188" longitude="11.300722" accuracy="16" context="0" 
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0" 
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

<photo id="123" owner="123" secret="123" server="1" farm="3" 
title="DSC01351" ispublic="1" isfriend="0" isfamily="0" views="0" tags="" 
latitude="47.825263" longitude="11.300891" accuracy="16" context="0" 
place_id="XT" woeid="123" geo_is_family="0" geo_is_friend="0" 
geo_is_contact="0" geo_is_public="1">
<description/>
</photo>

and so forth...

我希望Python能从这个网站中提取出“照片ID”、“拥有者”、“标题”等词,并把这些信息保存到一个已经用phpadmin设置好的MySQL数据库里。

为了更好地理解:我有一个表格,第一行是我的分类,第二行是从示例中提取的数据。

Photo ID    Owner    Secret    Server    Farm    Title    ispublic    isfriend    isfamily    ....
123         1234     123       1         4       DSC01316 1           0           0      

我开始用这个方法来提取信息,但它并没有成功……

import xml.etree.ElementTree as ET
import requests

url="https://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=5...b&per_page=250&accuracy=1&has_geo=1&extras=geo,tags,views,description%22"
page=requests.get(url)
data = page.text
root = ET.fromstring(data)
for x in root.Element.get('photo'):
    test = x.get('Photo ID', 'Owner', 'Secret' , 'Server' , 'Farm' , 'Title' , 'ispublic' , 'isfriend' , 'isfamily')
print (test)

#does not work. it says: AttributeError: 'Element' object has no attribute 'Element'

有没有什么建议?我只是想要一个提示,我想自己写代码!请注意,我对Python还比较陌生,给我一个文档链接对我来说没什么用,因为我知识太少了。我需要进一步的解释。谢谢!

1 个回答

1

BeautifulSoup4 让你更容易解析 XML 或 HTTP 文档。你可以在安装这个包之后,试试下面的代码,安装方法是用 pip install beautifulsoup4

from bs4 import BeautifulSoup

xml = "..."
soup = BeautifulSoup(xml)

for photo in soup.find_all('photo'):
    print(photo.attrs['title'])

然后你会得到,

DSC01316
DSC01351

想了解更多信息,可以查看 http://www.crummy.com/software/BeautifulSoup/bs4/doc/

撰写回答