numpy 相关系数

-1 投票
2 回答
6383 浏览
提问于 2025-04-17 16:20

1) 我该如何用Python代码找到以下数据集的相关性?

T = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
P = [ 3480. 7080. 10440. 13200. 16800. 20400. 23880. 27480. 30840. 38040. 41520. 44880. 48480. 52080. 55680. 59280. 62520. 66120. 67580. 69620. 69621.] 

2) **输入是一个csv文件:

2,M,17748,60,60,21768,1460.0,7,2011-04-02 00:00:00,0,B,5,2011-07-22 03:03:00,52.0,1,1992,2011,2011,22,2,7,0,3,4,21768,1992-07-05 00:00:00,26,21768,W,50f38a469cf9c253d600000c,21768 1,M,18002,3,3,1746,3480.0,2,2011-04-07 00:00:00,0,B,5,2011-07-25 01:03:00,123.0,1,1985,2011,2011,25,7,7,0,1,4,1746,1985-02-05 00:00:00,3,1746,D,50f38a469cf9c253d600000d,1746 1,M,18003,3,3,2239,3600.0,1,2011-04-06 00:00:00,0,B,29,2011-07-25 01:03:00,89.0,1,1972,2011,2011,25,6,7,0,1,4,2239,1972-01-29 00:00:00,3,2239,D,50f38a469cf9c253d600000e,2239 1,F,18004,3,3,1965,3360.0,1,2011-04-06 00:00:00,0,B,28,2011-07-25 01:03:00,76.0,1,1955,2011,2011,25,6,7,0,1,4,1965,1955-01-28 00:00:00,3,1965,D,50f38a469cf9c253d600000f,1965**

我写了:

counts_W=defaultdict(int) 
counts_D=defaultdict(int) 
for row in reader: 
if(row[28]=='W'):
counts_W[row[5]] += 1
Amt_Wtotal += float(row[6]) 
dataW.append(Amt_Wtotal) 
else: 
counts_D[row[5]] += 1
Amt_Dtotal += float(row[6])
dataD.append(Amt_Dtotal) 
Withdraw_amount = array(counts_W.values())
Withdraw_frequency = array(dataW)
Deposit_amount = array(counts_D.values())
Deposit_frequency = array(dataD)

这段代码的输出是:

Withdraw==== defaultdict(, {'21768': 1}) [1460.0] count== 1 Deposit===== defaultdict(, {'2239': 1, '1700': 1, '2458': 1, '2056': 1, '2376': 1, '1965': 1, '1974': 1, '2425': 1, '21768': 1, '2069': 1, '2404': 1, '2402': 1, '1763': 1, '1762': 1, '1910': 1, '1746': 1, '10036': 1, '1903': 1, '2445': 1, '1770': 1}) [3480.0, 7080.0, 10440.0, 13200.0, 16800.0, 20400.0, 23880.0, 27480.0, 30840.0, 38040.0, 41520.0, 44880.0, 48480.0, 52080.0, 55680.0, 59280.0, 62520.0, 66120.0, 67580.0, 69620.0] count== 20

我该如何将相同金额添加到字典中,并访问它以找到相关性?

3) 我该如何找到一年中每个月的频率和金额?

2 个回答

0

在编程中,有时候我们会遇到一些问题,想要找到解决办法。比如,某个功能不工作了,或者程序运行得很慢。这时候,我们可以去一些技术论坛,比如StackOverflow,去问问题或者寻找答案。

在这些论坛上,很多人会分享他们的经验和解决方案。你可以看到其他人遇到的类似问题,以及他们是如何解决的。这种交流可以帮助我们更快地找到问题的根源,并学习到新的技巧。

总之,技术论坛是一个很好的资源,可以帮助我们在编程的路上少走弯路。

T = np.array([1 ,1 ,1 ,1 ,1, 1, 1 ,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1])
P = np.array([ 3480, 7080, 10440, 13200, 16800, 20400, 23880, 27480, 30840, 38040, 
    41520, 44880, 48480, 52080, 55680, 59280, 62520, 66120, 67580, 69620, 69621])
print (T.shape)
print (P.shape)
t_p = np.stack((T,P))
print (np.corrcoef(t_p))
3

为了计算两组数据之间的相关性,我使用了 scipy.stats 这个工具包。我建议你也去了解一下这个包。

来自 文档 的内容:

pearsonr(x, y) #Pearson correlation coefficient and the p-value for testing
spearmanr(a[, b, axis]) #Spearman rank-order correlation coefficient and the p-value
pointbiserialr(x, y) #Point biserial correlation coefficient and the associated p-value.
kendalltau(x, y[, initial_lexsort]) #Calculates Kendall’s tau, a correlation measure for ordinal data.

这里还有一些与频率相关的方法:

cumfreq(a[, numbins, defaultreallimits, weights])   #cumulative frequency histogram
relfreq(a[, numbins, defaultreallimits, weights])   #relative frequency histogram

撰写回答