当我选择concurrent.futures时,无法确定如何将结果写回同一工作表

2024-04-19 14:47:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用openpyxl库从工作表中读取不同的代码名,然后在website中使用这些代码生成结果,最后将结果写回位于相关单元格中代码旁边的同一工作表

当我运行脚本而不在其中实现多处理时,我发现它工作得完美无缺

但是,当我使用这个库concurrent.futures时,我不知道如何将结果写回工作表的相关单元格中

我目前的尝试:

import requests
from openpyxl import load_workbook
import concurrent.futures as futures

wb = load_workbook('Screener.xlsx')
ws = wb['Screener-1']

link = 'https://backend.otcmarkets.com/otcapi/company/profile/full/{}?'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
}
params = {
    'symbol': ''
}

def get_info(ticker):
    target_link = link.format(ticker)
    params['symbol'] = ticker
    r = requests.get(target_link,params,headers=headers)
    try:
        address = r.json()['address']
    except (AttributeError,KeyError,IndexError): 
        address = ""
    try:
        website = r.json()['website']
    except (AttributeError,KeyError,IndexError): 
        website = ""
    return address,website

if __name__ == '__main__':
    ticker_list = []
    for row in range(2, ws.max_row + 1):
        if ws.cell(row=row,column=1).value==None:break          
        ticker = ws.cell(row=row,column=1).value
        ticker_list.append(ticker)

    with futures.ThreadPoolExecutor(max_workers=6) as executor:
        future_to_url = {executor.submit(get_info, ticker): ticker for ticker in ticker_list}
        for future in futures.as_completed(future_to_url):
            address,website = future.result()[0],future.result()[1]
            print(address,website)

    #         ws.cell(row=row, column=2).value = '{}'.format(address)
    #         ws.cell(row=row, column=3).value = '{}'.format(website)
    # wb.save('Screener.xlsx')

您的测试有几个提示:

tickers = ['URBT','TPRP','CRBO','PVSP','TSPG','VMHG','MRTI','VTMC','TORM','SORT']

How can I write the results back to the same worksheet while doing reverse search using concurrent.futures?

如果您想知道我试图将数据写入的确切位置,这就是the worksheet的样子


Tags: importwsvalueaddressaslinkcellcolumn
1条回答
网友
1楼 · 发布于 2024-04-19 14:47:42

既然您已经在使用openpyxl,我建议您使用pandas,因为您可能会发现使用工作簿更容易一些openpyxl权力{}{}

假设您有一个文件Screener.xlsx,其中有Symbol列,如下所示:

enter image description here

您可以获取丢失的数据并更新工作簿

以下是方法:

import concurrent.futures as futures

import pandas as pd
import requests

link = 'https://backend.otcmarkets.com/otcapi/company/profile/full/{}'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 '
                  '(KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36',
}


def get_info(ticker) -> dict:
    r = requests.get(link.format(ticker), headers=headers)
    print(f"Fetching data for {ticker}...")
    try:
        address = r.json()["address1"]
    except (AttributeError, KeyError, IndexError):
        address = "None"
    try:
        website = r.json()["website"]
    except (AttributeError, KeyError, IndexError):
        website = "None"
    return {"ticker": ticker, "address": address, "website": website}


if __name__ == "__main__":
    df = pd.read_excel("Screener.xlsx")
    tickers = df["Symbol"].to_list()
    with futures.ThreadPoolExecutor(max_workers=6) as executor:
        future_to_url = {
            executor.submit(get_info, ticker): ticker for ticker in tickers
        }
    tickers_scraped = [
        future.result() for future in futures.as_completed(future_to_url)
    ]
    sorted_tickers = sorted(
        tickers_scraped, key=lambda i: tickers.index(i["ticker"])
    )
    df.loc[:, ["Address", "Website"]] = [
        [i["address"], i["website"]] for i in sorted_tickers
    ]
    df.to_excel("Screener.xlsx", index=False)

要获得此信息:

enter image description here

编辑:

这里有一种纯pandas方法,不需要先对刮取的数据进行排序

if __name__ == "__main__":
    df = pd.read_excel("Screener.xlsx")
    tickers = df["Symbol"].to_list()
    with futures.ThreadPoolExecutor(max_workers=6) as executor:
        future_to_url = {
            executor.submit(get_info, ticker): ticker for ticker in tickers
        }
    tickers_scraped = [
        future.result() for future in futures.as_completed(future_to_url)
    ]
    df_scraped = pd.DataFrame(tickers_scraped).set_index("ticker")
    df = df.set_index("Symbol")
    df[["Address", "Website"]] = df_scraped[["address", "website"]]
    df = df.reset_index()
    df.to_excel("Screener.xlsx", index=False)

相关问题 更多 >