不同长度数据帧的乘法

2024-03-28 15:23:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧:都有5列,但第一个有100行,第二个只有一行。我应该将第一个数据帧的每一行乘以第二个数据帧的这一行,然后将每一行中列的值和第六个新列“乘和”中的值相加。“我见过”np.dot公司“操作,但我不确定是否可以将其应用于数据帧。另外,我正在寻找pythonic/pandas操作或方法,是否有可能从头开始替换一点繁重的numpy代码?事先谢谢你的建议。你知道吗


Tags: 数据方法代码numpypandasnp公司pythonic
2条回答

我想你可以把DataFrames转换成numpy arrays,方法是^{},把它们乘以最后一个^{}

import pandas as pd
import numpy as np

np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(1,5)))
df1.columns = list('ABCDE')
print df1
   A  B  C  D  E
0  5  8  9  5  0

np.random.seed(0)
df2 = pd.DataFrame(np.random.randint(10,size=(10,5)))
df2.columns = list('ABCDE')
print df2
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1
3  6  7  7  8  1
4  5  9  8  9  4
5  3  0  3  5  0
6  2  3  8  1  3
7  3  3  7  0  1
8  9  9  0  4  7
9  3  2  7  2  0
print df2.values * df1.values
[[25  0 27 15  0]
 [45 24 45 10  0]
 [35 48 72 40  0]
 [30 56 63 40  0]
 [25 72 72 45  0]
 [15  0 27 25  0]
 [10 24 72  5  0]
 [15 24 63  0  0]
 [45 72  0 20  0]
 [15 16 63 10  0]]

df = pd.DataFrame(df2.values * df1.values)
df['sum'] = df.sum(axis=1)
print df
    0   1   2   3  4  sum
0  25   0  27  15  0   67
1  45  24  45  10  0  124
2  35  48  72  40  0  195
3  30  56  63  40  0  189
4  25  72  72  45  0  214
5  15   0  27  25  0   67
6  10  24  72   5  0  111
7  15  24  63   0  0  102
8  45  72   0  20  0  137
9  15  16  63  10  0  104

时间安排:

In [1185]: %timeit df2.mul(df1.ix[0], axis=1)
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 287 µs per loop

In [1186]: %timeit pd.DataFrame(df2.values * df1.values)
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 98 µs per loop

你可能正在寻找这样的东西:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4], 
                     'B' : [-1.,-2.5, -3.9]})

df1['sum of multipliations']=df1.sum(axis = 1)


df2 = pd.DataFrame({ 'A' : [2.], 
                     'B' : [3.], 
                     'sum of multipliations' : [1.]})

print df1
print df2

row = df2.ix[0]
df5=df1.mul(row, axis=1)
df5.loc['Total']= df5.sum()
print df5

相关问题 更多 >