如何使用Python从带有下拉字段的web链接中读取数据问题的回答

如何使用Python从带有下拉字段的web链接中读取数据

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在Chrome/Firefox的<code>DevTool</code>中使用<code>"Network"</code>，我可以看到从浏览器到服务器的所有请求。当我点击“获取数据”时，我会看到一个带有下拉字段选项的url，比如 <a href="https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?instrumentType=FUTIDX&symbol=NIFTY&expiryDate=select&optionType=select&strikePrice=&dateRange=day&fromDate=&toDate=&segmentLink=9&symbolCount=" rel="nofollow noreferrer">https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?instrumentType=FUTIDX&symbol=NIFTY&expiryDate=select&optionType=select&strikePrice=&dateRange=day&fromDate=&toDate=&segmentLink=9&symbolCount=</a> 通常我可以在<code>pd.read_html("https://...")</code>中直接使用url来获取HTML中的所有表，然后我可以使用<code>[0]</code>来获取第一个表作为数据帧。在 因为我得到了错误，所以我使用模块<code>requests</code>来获取HTML，然后使用<code>pd.read_html("string_with_html")</code>将HTML中的所有表转换为数据帧。在 它给了我一个<code>DataFrame</code>的多级列索引和3个我删除的未知列。在 代码注释中的更多信息 <pre><code>import requests import pandas as pd # create session to get and keep cookies s = requests.Session() # get page and cookies url = 'https://www.nseindia.com/products/content/derivatives/equities/historical_fo.htm' s.get(url) # get HTML with tables url = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?instrumentType=FUTIDX&symbol=NIFTY&expiryDate=select&optionType=select&strikePrice=&dateRange=day&fromDate=&toDate=&segmentLink=9&symbolCount=" headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0', 'X-Requested-With': 'XMLHttpRequest', 'Referer': 'https://www.nseindia.com/products/content/derivatives/equities/historical_fo.htm' } # get HTML from url r = requests.get(url, headers=headers) print('status:', r.status_code) #print(r.text) # user pandas to parse tables in HTML to DataFrames all_tables = pd.read_html(r.text) print('tables:', len(all_tables)) # get first DataFrame df = all_tables[0] #print(df.columns) # drop multilevel column index df.columns = df.columns.droplevel() #print(df.columns) # droo unknow columns df = df.drop(columns=['Unnamed: 14_level_1', 'Unnamed: 15_level_1', 'Unnamed: 16_level_1']) print(df.columns) </code></pre> 结果 ^{pr2}$

如何使用Python从带有下拉字段的web链接中读取数据

1 个回答

相关Python问题