如何从早期的<strong>标记中提取HTML表并添加具有常量值的新列？

import pandas as pd from bs4 import BeautifulSoup from robobrowser import RoboBrowser br = RoboBrowser() br.open("https://oilpriceng.net/03-09-2019") table = br.find_all('td', class_='vc_table_cell') for element in table: data = element.find('span', class_='vc_table_content') prod_name = br.find_all('strong') ago = prod_name[0].text dpk = prod_name[1].text atk = prod_name[2].text pms = prod_name[3].text if br.find('strong').text == ago: data.append(ago.text) elif br.find('strong').text == dpk: data.append(dpk.text) elif br.find('strong').text == atk: data.append(atk.text) elif br.find('strong').text == pms: data.append(pms.text) print(data.text) df = pd.DataFrame(data) The result i'm hoping for is to go from this AGO Enterprise Price Coy A $0.5/L Coy B $0.6/L Coy C $0.7/L to the new table below as a dataframe in Pandas Enterprise Price Product Coy A $0.5/L AGO Coy B $0.6/L AGO Coy C $0.7/L AGO and to repeat the same thing for other tables with DPK, ATK and PMS information

1条回答

网友

1楼 · 发布于 2024-04-23 09:48:05

我希望我正确理解了你的问题。此脚本将把页面中找到的所有表刮到数据框中，并将其保存到csv文件：

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://oilpriceng.net/03-09-2019/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

data, last = {'Enterprise':[], 'Price':[], 'Product':[]}, ''
for tag in soup.select('h1 strong, tr:has(td.vc_table_cell)'):
    if tag.name == 'strong':
        last = tag.get_text(strip=True)
    else:
        a, b = tag.select('td')
        a, b = a.get_text(strip=True), b.get_text(strip=True)
        if a and b != 'DEPOT PRICE':
            data['Enterprise'].append(a)
            data['Price'].append(b)
            data['Product'].append(last)

df = pd.DataFrame(data)
print(df)
df.to_csv('data.csv')

印刷品：

            Enterprise         Price Product
0            AVIDOR PH        ₦190.0     AGO
1            SHORELINK                   AGO
2    BULK STRATEGIC PH        ₦190.0     AGO
3                  TSL                   AGO
4              MASTERS                   AGO
..                 ...           ...     ...
165             CHIPET        ₦132.0     PMS
166               BOND                   PMS
167           RAIN OIL                   PMS
168               MENJ        ₦133.0     PMS
169              NIPCO  ₦ 2,9000,000     LPG

[170 rows x 3 columns]

{}（LibreOffice的屏幕截图）：

相关问题更多 >

编程相关推荐

热门问题

热门文章