循环行和从excel-fi获取输入的数据

2024-03-29 15:47:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用excel中的输入值和每一行的scraping-web值来刮取web数据,并将输出保存到同一个excel文件中。你知道吗

from bs4 import BeautifulSoup
import requests 
from urllib import request
import os
import pandas as pd


ciks = pd.read_csv("ciks.csv")
ciks.head()

输出

    CIK
0   1557822
1   1598429
2   1544670
3   1574448
4   1592290

那么

for x in ciks:
    url="https://www.sec.gov/cgi-bin/browse-edgar?CIK=" + x +"&owner=exclude&action=getcompany"
    r = request.urlopen(url)
    bytecode = r.read()
    htmlstr = bytecode.decode()
    soup = BeautifulSoup(bytecode)
    t = soup.find('span',{'class':'companyName'})
    print(t.text)

我有个问题: ---->;9打印(t.text)

AttributeError:“NoneType”对象没有属性“text”

在这里,我想从CSV文件中获取每一行值作为输入的web数据。你知道吗


Tags: 文件csv数据textfromimportwebread
1条回答
网友
1楼 · 发布于 2024-03-29 15:47:19

将列值转换为列表,然后在for循环中使用会更容易-请参阅下面的解决方案

from bs4 import BeautifulSoup
import requests 
from urllib import request
import os
import pandas as pd
#ciks = pd.read_csv("ciks.csv")
df = pd.read_csv("ciks.csv")
mylist = df['CIK'].tolist()# CIK is the column name

company =[]
for item in mylist:
    print(item)
    url="https://www.sec.gov/cgi-bin/browse-edgar?CIK=" + str(item) +"&owner=exclude&action=getcompany"
    r = request.urlopen(url)
    bytecode = r.read()
    htmlstr = bytecode.decode()
    soup = BeautifulSoup(bytecode,features="lxml")
    t = soup.find('span',{'class':'companyName'})
    company.append(t.text)
    print(t.text)
df.assign(company= company)
print(df)

df.to_csv("ciks.csv")

相关问题 更多 >