Python遍历大型数据集并删除

2024-03-29 00:39:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的数据集包含1-12个月的10000个客户数据。我为每个客户生成12个月期间不同值的关联

当前我的输出关联文件比原始文件有更多的行。当我试图从原始数据集中删除已经评估过的行时,我意识到这是一个迭代错误

我期望的结果是一个由10000个条目组成的数据集,这些条目对应于每个客户的年度评估

我把我认为错误的地方加了黑体字

这是我目前的代码:

for x_customer in range(0,len(overalldata),12):

        for x in range(0,13,1):
                cust_months = overalldata[0:x,1]

                cust_balancenormal = overalldata[0:x,16]

                cust_demo_one = overalldata[0:x,2]
                cust_demo_two = overalldata[0:x,3]

                num_acct_A = overalldata[0:x,4]
                num_acct_B = overalldata[0:x,5]

                out_mark_channel_one = overalldata[0:x,25]
                out_service_channel_two = overalldata[0:x,26]
                out_mark_channel_three = overalldata[0:x,27]
                out_mark_channel_four = overalldata[0:x,28]


    #Correlation Calculations

                #Demographic to Balance Correlations
                demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0]
                demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0]


                #Demographic to Account Number Correlations
                demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0]
                demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0]
                demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0]
                demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0]

                #Marketing Response Channel One
                mark_one_corr_acct_a = numpy.corrcoef(num_acct_A, out_mark_channel_one)[1, 0]
                mark_one_corr_acct_b = numpy.corrcoef(num_acct_B, out_mark_channel_one)[1, 0]
                mark_one_corr_balance = numpy.corrcoef(cust_balancenormal, out_mark_channel_one)[1, 0]

                #Marketing Response Channel Two
                mark_two_corr_acct_a = numpy.corrcoef(num_acct_A, out_service_channel_two)[1, 0]
                mark_two_corr_acct_b = numpy.corrcoef(num_acct_B, out_service_channel_two)[1, 0]
                mark_two_corr_balance = numpy.corrcoef(cust_balancenormal, out_service_channel_two)[1, 0]

                #Marketing Response Channel Three
                mark_three_corr_acct_a = numpy.corrcoef(num_acct_A, out_mark_channel_three)[1, 0]
                mark_three_corr_acct_b = numpy.corrcoef(num_acct_B, out_mark_channel_three)[1, 0]
                mark_three_corr_balance = numpy.corrcoef(cust_balancenormal, out_mark_channel_three)[1, 0]

                #Marketing Response Channel Four
                mark_four_corr_acct_a = numpy.corrcoef(num_acct_A, out_mark_channel_four)[1, 0]
                mark_four_corr_acct_b = numpy.corrcoef(num_acct_B, out_mark_channel_four)[1, 0]
                mark_four_corr_balance = numpy.corrcoef(cust_balancenormal, out_mark_channel_four)[1, 0]


                #Result Correlations For Exporting to CSV of all Correlations
                result_correlation = [(demo_one_corr_balance),(demo_two_corr_balance),(demo_one_corr_acct_a),(demo_one_corr_acct_b),(demo_two_corr_acct_a),(demo_two_corr_acct_b),(mark_one_corr_acct_a),(mark_one_corr_acct_b),(mark_one_corr_balance),
                                      (mark_two_corr_acct_a),(mark_two_corr_acct_b),(mark_two_corr_balance),(mark_three_corr_acct_a),(mark_three_corr_acct_b),(mark_three_corr_balance),(mark_four_corr_acct_a),(mark_four_corr_acct_b),
                                      (mark_four_corr_balance)]
                result_correlation_nan_nuetralized = numpy.nan_to_num(result_correlation)
                c.writerow(result_correlation)

        **result_correlation_combined = emptylist.append([result_correlation])
        cust_delete_list = [0,x_customer,1]
        overalldata = numpy.delete(overalldata, (cust_delete_list), axis=0)**

Tags: numpydemochanneloutonenumthreemark
1条回答
网友
1楼 · 发布于 2024-03-29 00:39:39

这也许不能完全解决你的问题,但我认为这是相关的

在列表对象(空或其他)上运行.append时,该方法返回的值为None。因此,对于行result_correlation_combined = emptylist.append([result_correlation]),不管empty_list是空列表还是非空列表,result_correlation_combined的值都将是None

这里有一个简单的例子来说明我所说的-我只是编一些数字,因为没有提供数据

>>> empty_list = []
>>> result_correlation = []

>>> for j in range(10):
        result_correlation.append(j)

>>> result_correlation
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> result_correlation_combined = empty_list.append(result_correlation)
>>> print(result_correlation_combined)
None

因此,您可以运行result_correlation_combined.append(result_correlation)result_correlation_combined += result_correlation,甚至result_correlation_combined.extend(result_correlation)。。。它们都会产生相同的结果。看看这是否给了你想要的答案。如果没有,请回来

相关问题 更多 >