Python:pandas,解析数学运算

2024-05-23 22:40:25 发布

您现在位置:Python中文网/ 问答频道 /正文

stackoverflow上有人建议我使用pandas来标记csv文件的值,并提供了以下代码:

# original code
import pandas

cmf = pandas.read_csv('CMF_MA68II.csv', names=['wavelength', 'x', 'y', 'z'])
d65 = pandas.read_csv('D65_MA68II_10nm.csv', names=['wavelength', 'a', 'b'])
data = pandas.read_csv('spectral_data.csv', names=['serialNumber', 'wavelength', 'measurement', 'name'])

lookup = pandas.merge(cmf, d65, on='wavelength')
merged = pandas.merge(data, lookup, on='wavelength')

totals = ((lookup[['x', 'y', 'z']].T*lookup['a']).T).sum()
wps  = 100 * totals/totals['y']

print totals['y']
print "D65_CMF_2006_10_deg white point = "
print wps

我在结尾添加了这一部分:

^{pr2}$

但是这些行对我文件的所有行执行操作,而不管与它们相关联的name,结果是所有单独测量的平均值。在

如您所见,文件'spectral_data.csv'的结构是names=['serialNumber', 'wavelength', 'measurement', 'name']

我想做的是执行这个操作:

merged['X'] = (merged.x * merged.a * merged.measurement).sum()/totals['y']

{i>,'和它们的一个序列名一样,'{i>'是一个由多个序列值组成的序列,'{i>','在一个序列中,'{i>}中定义了一个序列,''>','

有人能解决这个问题吗?在

谢谢

文件示例: 'CMF_MA68号二、 csv公司'

400,1.879338E-02,2.589775E-03,8.508254E-02
410,8.277331E-02,1.041303E-02,3.832822E-01
420,2.077647E-01,2.576133E-02,9.933444E-01
430,3.281798E-01,4.698226E-02,1.624940E+00
440,4.026189E-01,7.468288E-02,2.075946E+00
450,3.932139E-01,1.039030E-01,2.128264E+00
460,3.013112E-01,1.414586E-01,1.768440E+00
470,1.914176E-01,1.999859E-01,1.310576E+00
480,7.593120E-02,2.682271E-01,7.516389E-01
490,1.400745E-02,3.554018E-01,3.978114E-01
500,5.652072E-03,4.780482E-01,2.078158E-01
510,3.778185E-02,6.248296E-01,8.852389E-02
520,1.201511E-01,7.788199E-01,3.784916E-02
530,2.380254E-01,8.829552E-01,1.539505E-02
540,3.841856E-01,9.665325E-01,6.083223E-03
550,5.374170E-01,9.907500E-01,2.323578E-03
560,7.123849E-01,9.944304E-01,8.779264E-04
570,8.933408E-01,9.640545E-01,3.342429E-04
580,1.034327E+00,8.775360E-01,1.298230E-04
590,1.147304E+00,7.869950E-01,5.207245E-05
600,1.148163E+00,6.629035E-01,2.175998E-05
610,1.048485E+00,5.282296E-01,9.530130E-06
620,8.629581E-01,3.950755E-01,0.000000E+00
630,6.413984E-01,2.751807E-01,0.000000E+00
640,4.323126E-01,1.776882E-01,0.000000E+00
650,2.714900E-01,1.083996E-01,0.000000E+00
660,1.538163E-01,6.033976E-02,0.000000E+00
670,8.281010E-02,3.211852E-02,0.000000E+00
680,4.221473E-02,1.628841E-02,0.000000E+00
690,2.025590E-02,7.797457E-03,0.000000E+00
700,9.816228E-03,3.776140E-03,0.000000E+00

'D65_MA68II_10号纳米.csv'

400,82.7549,14.708
410,91.486,17.6753
420,93.4318,20.995
430,86.6823,24.6709
440,104.865,28.7027
450,117.008,33.0859
460,117.812,37.8121
470,114.861,42.8693
480,115.923,48.2423
490,108.811,53.9132
500,109.354,59.8611
510,107.802,66.0635
520,104.79,72.4959
530,107.689,79.1326
540,104.405,85.947
550,104.046,92.912
560,100,100
570,96.3342,107.184
580,95.788,114.436
590,88.6856,121.731
600,90.0062,129.043
610,89.5991,136.346
620,87.6987,143.618
630,83.2886,150.836
640,83.6992,157.979
650,80.0268,165.028
660,80.2146,171.963
670,82.2778,178.769
680,78.2842,185.429
690,69.7213,191.931
700,71.6091,198.261

'光谱_数据.csv'

0,400,12.73,"a"
0,410,12.41,"a"
0,420,12.55,"a"
0,430,13.42,"a"
0,440,15.07,"a"
0,450,17.31,"a"
0,460,19.20,"a"
0,470,20.96,"a"
0,480,22.11,"a"
0,490,23.45,"a"
0,500,24.62,"a"
0,510,25.42,"a"
0,520,24.51,"a"
0,530,22.43,"a"
0,540,20.94,"a"
0,550,21.59,"a"
0,560,22.36,"a"
0,570,21.54,"a"
0,580,22.03,"a"
0,590,28.86,"a"
0,600,37.02,"a"
0,610,42.00,"a"
0,620,44.79,"a"
0,630,46.57,"a"
0,640,47.56,"a"
0,650,48.70,"a"
0,660,49.90,"a"
0,670,50.75,"a"
0,680,51.53,"a"
0,690,52.24,"a"
0,700,53.00,"a"
1,400,2.31,"b"
1,410,2.33,"b"
1,420,2.33,"b"
1,430,2.30,"b"
1,440,2.29,"b"
1,450,2.30,"b"
1,460,2.27,"b"
1,470,2.26,"b"
1,480,2.24,"b"
1,490,2.23,"b"
1,500,2.22,"b"
1,510,2.21,"b"
1,520,2.20,"b"
1,530,2.19,"b"
1,540,2.18,"b"
1,550,2.18,"b"
1,560,2.18,"b"
1,570,2.16,"b"
1,580,2.15,"b"
1,590,2.14,"b"
1,600,2.14,"b"
1,610,2.13,"b"
1,620,2.12,"b"
1,630,2.11,"b"
1,640,2.11,"b"
1,650,2.11,"b"
1,660,2.10,"b"
1,670,2.08,"b"
1,680,2.07,"b"
1,690,2.06,"b"
1,700,2.04,"b"

Tags: 文件csvpandasreaddatanames序列merged
2条回答

这将把计算分成三个新列,然后按名称和序列号分组(在本例中,您实际上可以按其中一个进行分组,但这样一来,最终结果中会同时使用这两个列):

# First calculate the new columns
cols = ['x', 'y', 'z']
uppercols = ['X', 'Y', 'Z']
for uppercol, col in zip(uppercols, cols):
    merged[uppercol] = (merged[col] * merged.a * merged.measurement)/totals['y']

# Now group and sum
sums = merged.groupby(['serialNumber', 'name'])[uppercols].sum()

要将其写入CSV文件,只需

^{pr2}$

您可以分组并应用用户定义的函数:

res =  merged.groupby(['serialNumber','name']).apply(lambda g:pd.Series([(g[c] * g.a * g.measurement).sum() / totals['y'] for c in "xyz"], index=['X','Y','Z']))
print res

相关问题 更多 >