从XML url到数据框架

2024-05-13 23:36:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我是Python新手,从web导入简单的XML文件并将其转换为XML文件时遇到一些问题: https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/cny.xml

我尝试了几种方法,包括使用BS4,但都没有成功

from bs4 import BeautifulSoup
import requests
socket = requests.get('https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/cny.xml')
soup = bs4.BeautifulSoup(socket.content, ['lxml', 'xml'])

all_obs = soup.find_all('Obs')

l = []
df = pd.DataFrame(columns=['TIME_PERIOD','OBS_VALUE'])
pos= 0
for obs in all_obs:
    l.append(obs.find('TIME_PERIOD').text)
    l.append(obs.find('OBS_VALUE').text)
    


    df.loc[pos] = l
    l = []
    pos+=1
    
print(df)

有人能帮我吗? 谢谢


Tags: 文件httpsposdfexchangewwwxmlall
1条回答
网友
1楼 · 发布于 2024-05-13 23:36:41

好的

from bs4 import BeautifulSoup
import requests
import pandas as pd

response = requests.get('https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/cny.xml')

bs = BeautifulSoup(response.text, ['xml'])

obs = bs.find_all("Obs")
#<Obs OBS_CONF="F" OBS_STATUS="A" OBS_VALUE="10.7255" TIME_PERIOD="2005-04-01"/>

df = pd.DataFrame(columns=['TIME_PERIOD','OBS_VALUE'])

for node in obs:
    df = df.append({'TIME_PERIOD': node.get("TIME_PERIOD"), 'OBS_VALUE': node.get("OBS_VALUE")}, ignore_index=True)
    
df.head()

相关问题 更多 >