从csv文件导入数据时出现问题,循环遍历tickers,从finviz.com抓取数据,然后导出到csv文件以进行更多分析

2024-05-14 08:46:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我很难把所有的东西都整理好。我想(1)从csv文件中提取股票行情列表(2) 在finviz.com中遍历所有股票代码,以获取某些关键数据点(3) 将所有数据提取到另一个csv文件中进行更多分析。这是我到目前为止的代码

import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import csv
import time
import datetime
from datetime import datetime as dt

Symbol = []

with open('shortlist.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
    Symbol.append(row[0])


def get_fundamental_data(df):
for symbol in df.index:
    try:
        url = 'http://finviz.com/quote.ashx?t=' + symbol.lower()
        soup = bs(requests.get(url).content, features='html5lib')
        for m in df.columns:
            df.loc[symbol, m] = fundamental_metric(soup, m)
    except Exception, e:
        print (symbol, 'not found')
return df


def fundamental_metric(soup, metric):
return soup.find(text=metric).find_next(class_='snapshot-td2').text


metric = [  # 'Inst Own',
        # 'Insider Own',
'Price',
'Shs Outstand',
'Shs Float',
'Short Float',
'Short Ratio',
'Book/sh',
'Cash/sh',
'Rel Volume',
'Earnings',
'Avg Volume',
'Volume',
]
df = pd.DataFrame(index=symbol, columns=metric)
df = get_fundamental_data(df)

print df

df.to_csv('finviz_' + time.strftime('%Y-%m-%d') + '.csv')

附件是要导入的my shortlist.csv:enter image description here

我得到的错误是: enter image description here

我在Pycharm上使用python3

结果应该是这样的: enter image description here


Tags: csvinimportdfforgetdatetimeas
2条回答

您的“符号”定义在函数get_fundamental_data()
不能在for循环或函数外使用“symbol”

这三种方法中的一种会让你非常接近你想要的位置

方法1:

import csv
import requests
from bs4 import BeautifulSoup

url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
url_list = [url_base + s for s in tckr]

with open('C:\\Users\\Excel\\Downloads\\SO.csv', 'a', newline='') as f:
    writer = csv.writer(f)

    for url in url_list:
        try:
            fpage = requests.get(url)
            fsoup = BeautifulSoup(fpage.content, 'html.parser')

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            
        except HTTPError:
            print("{} - not found".format(url))

方法2:

import requests
from bs4 import BeautifulSoup

base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,7,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})

light_rows = main_div.find_all('tr', class_="table-light-row-cp")
dark_rows = main_div.find_all('tr', class_="table-dark-row-cp")

data = []
for rows_set in (light_rows, dark_rows):
    for row in rows_set:
        row_data = []
        for cell in row.find_all('td'):
            val = cell.a.get_text()
            row_data.append(val)
        data.append(row_data)

#   sort rows to maintain original order
data.sort(key=lambda x: int(x[0]))

import pandas
pandas.DataFrame(data).to_csv("AAA.csv", header=False)

方法3:

import csv
import requests
from bs4 import BeautifulSoup

url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
url_list = [url_base + s for s in tckr]

with open('C:/Users/Excel/Desktop/today.csv', 'a', newline='') as f:
    writer = csv.writer(f)

    for url in url_list:
        try:
            fpage = requests.get(url)
            fsoup = BeautifulSoup(fpage.content, 'html.parser')

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            
        except HTTPError:
            print("{} - not found".format(url))

相关问题 更多 >

    热门问题