如何从一个cs中随机抽样一百万次

2024-04-16 20:37:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个很大的csv看起来像这样

claim score
yes   1
yes   1
no    1
no    1
yes   1
...   1
...   1
...   1

分数都是相同的数字,我需要运行的大小说(1000)随机抽样很多次。然后计算“是”计数的平均百分比

代码如下所示:

#imports
import random
import numpy

TotalYes = 0
csvFile = numpy.genfromtxt("/nas/home/twu/wind/output_1.csv",delimiter=",",dtype=None)
for j in range(1,10001):
    #csv format : claim (Yes/No), value
    #read in your csv file and store in array
    #initialize random number generator
    random.seed()

#create RandomSample array
RandSamples=[]
samplesize = 1000
#Fill RandomSample array with 10000 random samples from cvs array
for i in range(1,1001):
    #for row in csvFile:
    #get a random index within csvFile[]. random num range is 0 to csv array length
    randIndex=random.randint(0,len(csvFile))
    print randIndex
    RandSamples.append(csvFile[randIndex:randIndex+1,:])
#RandSamples1=numpy.asarray(RandSamples)
#get number of 'yes' from RandomSample array
RandYesSample=[]
for i in range(0,1001):

    # check to see if current record is Yes claim or no
    if RandSamples[i:i+1,:1] == "yes":
        #yes, copy value to yes array
        RandYesSample.append (RandSamples[i:i+1,:1])

#get percent of yes in RandomSample array
PercYes = float(len(RandYesSample)) / 1000
TotalYes = TotalYes + PercYes

TotalYes = float(TotalYes) / 10000

print TotalYes  

我的错误是:

if RandSamples[i:i+1,:1] == "yes":...TypeError: list indices must be
integers, not tuple

我不能让它工作。有人能帮忙吗?你知道吗


Tags: csvcsvfilenoinnumpyforrangerandom
1条回答
网友
1楼 · 发布于 2024-04-16 20:37:33

如果对列表进行切片时遇到问题,应该类似于[start:end:step],但是您正在放置一个应该删除的逗号:

csvFile[randIndex:randIndex+1,:]

应该是

csvFile[randIndex]

在以下方面相同:

if RandSamples[i] == "yes":

以及:

RandYesSample.append (RandSamples[i])

相关问题 更多 >