我想从HTML中提取数据,并将其转换为pandas DataDrame,最后将其存储为aCSV fi

2024-04-26 14:08:31 发布

您现在位置:Python中文网/ 问答频道 /正文

import sys,csv,os
import pandas as pd
from bs4 import BeautifulSoup
import requests
from lxml import html



#url = r'https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=137&Tx_State=0&Tx_District=0&Tx_Market=0&DateFrom=01-jan-2016&DateTo=19-nov-2019&Fr_Date=01-jan-2016&To_Date=19-nov-2019&Tx_Trend=2&Tx_CommodityHead=Ajwan&Tx_StateHead=--Select--&Tx_DistrictHead=--Select--&Tx_MarketHead=--Select--'


Export_Path = r"E:\Knoema_Work_Dataset"

Res = requests.get(url)
Soup = BeautifulSoup(Res.content,'lxml')
#print(Soup.prettify())


mylists = ['137','281','325','166','86','130']
for mylist in mylists:
    url = 'https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity='+mylist+'+&Tx_State=0&Tx_District=0&Tx_Market=0&DateFrom=01-jan-2016&DateTo=19-nov-2019&Fr_Date=01-jan-2016&To_Date=19-nov-2019&Tx_Trend=2&Tx_CommodityHead=Ajwan&Tx_StateHead=--Select--&Tx_DistrictHead=--Select--&Tx_MarketHead=--Select--'+ mylist
    soup = BeautifulSoup(Res.content,'lxml')
    table = soup.find('table', {'class':'tableagmark_new'})
    DataAll = pd.DataFrame(columns = ['State Name','District Name','Market Name','Variety','Group','Arrivals (Tonnes)','Min Price (Rs./Quintal)','Max Price (Rs./Quintal)','Modal Price (Rs./Quintal)','Reported Date'],dtype = object,index=range(0,1000))
    row_marker = 0
    for row in table.find_all('tr'):
        column_marker = 0
        columns = row.findAll('td')
        for column in columns:
            DataAll.iat[row_marker,column_marker] = column.get_text()
            column_marker += 1

    DataAll

Export_Path_F = os.path.join(Export_Path, 'aggr.csv')
DataAll.to_csv(Export_Path_F, encoding='utf-8-sig', index=False)

我只得到dataframe'DataAll'中表中的最后一行 我需要在数据框上绘制完整的表格 我进行迭代,将多个表中的数据刮到一个数据帧中 请帮帮我,这样我就可以得到数据帧中的所有内容

Url=https://agmarknet.gov.in/SearchCmmMkt.aspx?Tx_Commodity=137&Tx_State=0&Tx_District=0&Tx_Market=0&DateFrom=01-jan-2016&DateTo=19-nov-2019&Fr_Date=01-jan-2016&To_Date=19-nov-2019&Tx_Trend=2&Tx_CommodityHead=Ajwan&Tx_StateHead=--Select--&Tx_DistrictHead=--Select--&Tx_MarketHead=--Select--


Tags: pathinimportdatecolumnexportselectmarket