为csv中的列创建不同的元组值，并计算第3列的平均值

def read_csv(filepath, has_header=False): with open(filepath, 'r') as file: reader = csv.reader(file) data = list(reader) header = None if has_header: header = data[0] data = data[1:] file.close() return data, header if __name__ == '__main__': outfilepath = "data/outfile12.csv" outdata = [] codes, header = read_csv("data/sample.csv", has_header=TRUE) # create dictionary codes_dict = { } for code in codes: codes_dict[(code[0], code[1])] for row in codes : #Write logic here

2条回答

网友

1楼 · 编辑于 2024-05-29 05:57:10

您应该考虑将pandas用于这些任务。Google docs youself针对特定情况（csv文件中没有行标题），我将给出一个基本示例：

import pandas as pd

首先加载csv，它实际上取决于其格式，因此可能需要更改分隔符，我从示例数据中获取了csv格式（多个空格）：

dataframe = pd.read_csv(filepath, sep='\s+')

然后按列集合对数据进行分组：

groupby = dataframe.groupby(['string1','string2'])
print(groupby.groups)

它返回一个“DataFrameGroupBy”对象，该对象本质上是包装器中的一个列表（列值的元组，与该数据匹配的行的dataframe）。你知道吗

然后对这些行应用自定义函数以生成新行：

def add_average_velocity(input_rows):
    input_rows['avg_velocity'] = (input_rows['rate']/input_rows['distance']).mean()
    return input_rows

new_dataframe = dataframe.groupby(['string1','string2']).apply(add_average_velocity).reset_index()
print(new_dataframe)

或者，如果您想完全删除所有旧数据，只保留新数据：

def add_average_velocity(input_rows):
    output_data = pd.Series({'velocity':(input_rows['rate']/input_rows['distance']).mean()})
    # you can skip making a pd.Series objects if you are okay with having the data unnamed in resulting dataframe. You can always rename columns later anyway.
    return output_data

new_dataframe = dataframe.groupby(['string1','string2']).apply(add_average_velocity).reset_index()
print(new_dataframe)

网友

2楼 · 编辑于 2024-05-29 05:57:10

给你：

=^..^=

import pandas as pd
from io import StringIO

# create raw data
raw_data = StringIO("""
string1 string2 rate distance
A. C. 1 20
A. B 2. 30
A. C. 2. 20""")

# load data into data frame
df = pd.read_csv(raw_data, sep=' ')
# calculate divide
df['divide'] = df['rate'] / df['distance']
# drop not needed columns
df = df.drop(columns=['rate','distance'])
# grop by columns and sum values
result = df.groupby(['string1', 'string2']).mean()

输出：

string1 string2          
A.      B        0.066667
        C.       0.075000

相关问题更多 >

编程相关推荐

热门问题

热门文章