我试图从一个HTML文档中提取一系列表,并从用作标题的标记中添加一个带有常量值的新列。然后,我们的想法是使这个新的三列表成为一个数据框架。下面是到目前为止我已经想到的代码。也就是说,每个表将有第三列,其中所有行值将等于AGO、DPK、ATK或PMS,具体取决于该系列表之前的标题。非常感谢您的帮助,因为我是python和HTML新手。谢谢你的帮助
import pandas as pd
from bs4 import BeautifulSoup
from robobrowser import RoboBrowser
br = RoboBrowser()
br.open("https://oilpriceng.net/03-09-2019")
table = br.find_all('td', class_='vc_table_cell')
for element in table:
data = element.find('span', class_='vc_table_content')
prod_name = br.find_all('strong')
ago = prod_name[0].text
dpk = prod_name[1].text
atk = prod_name[2].text
pms = prod_name[3].text
if br.find('strong').text == ago:
data.append(ago.text)
elif br.find('strong').text == dpk:
data.append(dpk.text)
elif br.find('strong').text == atk:
data.append(atk.text)
elif br.find('strong').text == pms:
data.append(pms.text)
print(data.text)
df = pd.DataFrame(data)
The result i'm hoping for is to go from this
AGO
Enterprise Price
Coy A $0.5/L
Coy B $0.6/L
Coy C $0.7/L
to the new table below as a dataframe in Pandas
Enterprise Price Product
Coy A $0.5/L AGO
Coy B $0.6/L AGO
Coy C $0.7/L AGO
and to repeat the same thing for other tables with DPK, ATK and PMS information
我希望我正确理解了你的问题。此脚本将把页面中找到的所有表刮到数据框中,并将其保存到csv文件:
印刷品:
{}(LibreOffice的屏幕截图):
相关问题 更多 >
编程相关推荐