<p>您应该考虑将<code>pandas</code>用于这些任务。Google docs youself针对特定情况(csv文件中没有行标题),我将给出一个基本示例:</p>
<pre><code>import pandas as pd
</code></pre>
<p>首先加载csv,它实际上取决于其格式,因此可能需要更改分隔符,我从示例数据中获取了csv格式(多个空格):</p>
<pre><code>dataframe = pd.read_csv(filepath, sep='\s+')
</code></pre>
<p>然后按列集合对数据进行分组:</p>
<pre><code>groupby = dataframe.groupby(['string1','string2'])
print(groupby.groups)
</code></pre>
<p>它返回一个“DataFrameGroupBy”对象,该对象本质上是包装器中的一个列表(列值的元组,与该数据匹配的行的dataframe)。你知道吗</p>
<p>然后对这些行应用自定义函数以生成新行:</p>
<pre><code>def add_average_velocity(input_rows):
input_rows['avg_velocity'] = (input_rows['rate']/input_rows['distance']).mean()
return input_rows
new_dataframe = dataframe.groupby(['string1','string2']).apply(add_average_velocity).reset_index()
print(new_dataframe)
</code></pre>
<p>或者,如果您想完全删除所有旧数据,只保留新数据:</p>
<pre><code>def add_average_velocity(input_rows):
output_data = pd.Series({'velocity':(input_rows['rate']/input_rows['distance']).mean()})
# you can skip making a pd.Series objects if you are okay with having the data unnamed in resulting dataframe. You can always rename columns later anyway.
return output_data
new_dataframe = dataframe.groupby(['string1','string2']).apply(add_average_velocity).reset_index()
print(new_dataframe)
</code></pre>