Python pandas 找到中间 50%

1 投票
2 回答
1270 浏览
提问于 2025-05-01 04:21

我正在使用Python和Pandas来处理股票的逐笔交易数据,我想把这些数据压缩成每天的总交易量、最高价、最低价、平均价,以及交易量的25%和75%的水平。我不太确定如何找到25%和75%的水平在哪里。

#Refrences
from time import *
import urllib.request as web
import pandas as pd
import os

dateToday = "2014-10-31"

def pullData(exchange,stock,date):
    baseUrl='http://netfonds.no/quotes/tradedump.php?csv_format=csv'
    fullUrl=baseUrl+'&date='+date.replace("-","")+'&paper='+stock+'.'+exchange
    fileName=('netfonds/trades/'+stock+'.txt')
    try:
        if not os.path.isdir(os.path.dirname(fileName)):
            os.makedirs(os.path.dirname(fileName))
    except OSError:
        print("Directory Error")
    #print(fullUrl)    
    webBuffer=web.urlopen(fullUrl)
    webData=pd.read_csv(webBuffer,usecols=['price','quantity'])
    low = webData['price'].min()
    high = webData['price'].max()
    print(low,high)


def getList(fileName):
    stockList = []
    file = open(fileName+'.txt', 'r').read()
    fileByLines = file.split('\n')
    for eachLine in fileByLines:
        if '#' not in eachLine:
            lineByValues = eachLine.split('.')
            stockList.append(lineByValues)
    return stockList

def fromList():
    print("Parsing stock tickers...")
    stockList = getList('stocks')
    print("Found "+str(len(stockList))+" stocks")

    for eachEntry in stockList:
        start_time = time()
        try:
            print("Attempting to pull data for "+eachEntry[1])
            pullData(eachEntry[0],eachEntry[1],dateToday)
            print("Pulled succcessfully in "+str(round(time()-start_time))+" seconds")
        except Exception:
            print("Unable to pull data... "+eachEntry[1])

first_time = time()
fromList()
print("Program Finished! Took "+str(round((time()-first_time)/60))+' minutes')
暂无标签

2 个回答

2

pandas 的 Series 和 DataFrame 有一个叫做 describe 的方法,这个方法和 R 语言里的 summary 很像:

In [3]: import numpy as np

In [4]: import pandas as pd

In [5]: s = series.values()

In [6]: s.describe()
Out[6]: 
count    100.000000
mean       0.540376
std        0.296250
min        0.002514
25%        0.268722
50%        0.593436
75%        0.831067
max        0.991971
0

我通过简单使用 numpy.repeat() 找到了我需要的东西。

inflated=pd.DataFrame(np.repeat(webData['price'].values,webData['quantity'].values))

撰写回答