在CSV文件中追加单元格后找到五个不同类别的最小值和最大值

1 投票
2 回答
836 浏览
提问于 2025-04-17 18:38

大家好,我有个问题。我刚开始学习最小值和最大值。

我在找每个类别的五列数据的最小值和最大值时遇到了困难。

这是我现在的情况:

我把一个包含26列数据的csv文件中的5列数据移动到了一个txt文件里。

比如说,csv文件里的数据是这样的:

state          car      motorcycle   van        airplane       bike 
Maine          35.5      8.1         5.7         21.0%         33.2%
Michigan       47.9      9.1         5.5         20.40%        25.2%                   
Washington     52.5      1.2         4.6          3.50%        24.7%                     
Denver         21.8      20.5        5.3          2.90%        30.9%    

我想要把最小值和最大值整理成这样:

                   min                       max
car             Denver:      21.8          Washington: 52.5   
motor           Washington:  1.2           Denver:     20.5 
van             Washington   4.6           Maine:       5.7 
airplane        Denver       2.90%         Maine       21.0% 
bike            Washington   24.7%         Maine      33.2%                    -

这是我现在的结果:

import csv
import string, re
import operator

output = []
data = []
csv_string = []
data_file = []

try:
    with open('data.csv', 'r') as csv_string:
         for line in csv_string:
             cells = line.split(",")
             output.append((cells[0], cells[1], cells[5], cells[7], cells[11], cells[13]))
                         for lines in output:


            #state = cells[0]

            zmin = cells[1]   #car = cells[1]
            ymin = cells[1]
            xmin = cells[5]   #motor = cells[5] 
            wmin = cells[5]
            vmin = cells[7]   #van = cells[7]
            zmax = cells[7]
            ymax = cells[11]  #airplane = cells[11]
            xmax = cells[11]
            wmax = cells[13]  #bike = cells[13] 
            vmax = cells[13]

        if cells[1] < xmin: 
            zmin = cells[1] 
        if cells[1] > xmax: 
            zmax = cells[1]


        if cells[5] < ymin: 
            ymin = cells[5]
        if cells[5] > ymax: 
            ymax = cells[5]

        if cells[7] < zmin:
            xmin = cells[7]
        if cells[7] > zmax: 
            xmax = cells[7] 

        if cells[11] < zmin:
            wmin = cells[11]
        if cells[11] > zmax: 
            wmax = cells[11]

        if cells[13] < zmin:
            vmin = cells[13]
        if cells[13] > zmax: 
            vmax = cells[13]

        outstring = ' '
    for item in output:
            for cell in item:
                outstring += "{0:<35}".format(cell) #Width/Distance of each row
            outstring += "\n"

    print(outstring)

    print('Min: ',zmin,ymin,xmin,wmin,vmin)

    print('Max: ',state,zmax,ymax,xmax,wmax,vmax) 


  finally:
          f.close()

try:    
    f_write = open('output.txt', 'w') #creates the file
    try:
        f_write.writelines(outstring)

  finally:
          f.close()

我不太确定我哪里做错了。我一直在阅读关于最小值和最大值的内容,但我不明白在处理csv文件并添加5列数据时,这些知识是怎么应用的。

如果有人能给我一些指导,非常感谢你的帮助。

程序输出的数字是错误的。

 print('Min: ',zmin,ymin,xmin,wmin,vmin)
      47.9,  8.1, 5.5, 20.40%, 25.2% 
 print('Max: ',state,zmax,ymax,xmax,wmax,vmax) 
      21.8, 9.1, 4.6, 20.40%, 30.9% 

2 个回答

4

使用pandas这个库,它专门用来处理数据,这样的任务就简单多了:

import pandas as pd

c = lambda x: float(x.strip('%'))
df = pd.read_csv(f,sep='\s+', converters = {'bike':c, 'airplane':c})

vehicles = df.columns[1:]  #['car', 'motorcycle', 'van', 'airplane', 'bike']

max_v = zip(df['state'][df[vehicles].idxmax().values], 
            df[vehicles].max().values.astype('|S4'))
min_v = zip(df['state'][df[vehicles].idxmin().values],
            df[vehicles].min().values.astype('|S4'))

max_i = [': '.join(tup) for tup in max_v]
min_i = [': '.join(tup) for tup in min_v]

print pd.DataFrame({'min':min_i, 'max':max_i}, index=vehicles)

输出:

                         max               min
car         Washington: 52.5      Denver: 21.8
motorcycle      Denver: 20.5   Washington: 1.2
van               Maine: 5.7   Washington: 4.6
airplane         Maine: 21.0       Denver: 2.9
bike             Maine: 33.2  Washington: 24.7
2

你可以通过使用Python自带的csv模块来完成很多需要的操作。下面是如何找到数据中某些字段(或者说列)的最小值和最大值的方法。示例中的data.csv文件只包含了我们关心的字段,但实际上它可以包含所有26列的数据,而代码只会处理FIELDS列表中列出的字段。

import csv

ID = 'state'
FIELDS = ['car', 'motorcycle', 'van', 'airplane', 'bike']
MIN_ID, MIN, MAX_ID, MAX = 0, 1, 2, 3  # indices of data in min_maxes records

with open('data.csv', 'rb') as csv_file:
    csv_dict_reader = csv.DictReader(csv_file, delimiter=',')

    # initialize min and max values from first row of csv file
    row = csv_dict_reader.next()
    min_maxes = {field: [row[ID], float(row[field])]*2 for field in FIELDS}

    # update min and max values with data from remaining rows of csv file
    for row in csv_dict_reader:
        for id, value, min_max_rec in (
                (row[ID], float(row[field]), min_maxes[field]) for field in FIELDS):
            if value < min_max_rec[MIN]:
                min_max_rec[MIN_ID] = id
                min_max_rec[MIN]    = value
            if value > min_max_rec[MAX]:
                min_max_rec[MAX_ID] = id
                min_max_rec[MAX]    = value

print '                   min                  max'
for field in FIELDS:
    min_max_rec = min_maxes[field]
    print '{:10}    {:12}{:4.1f}      {:12}{:4.1f}'.format(field,
              min_max_rec[MIN_ID]+':', min_max_rec[MIN],
              min_max_rec[MAX_ID]+':', min_max_rec[MAX])

输入(简化版的data.csv文件):

state,car,motorcycle,van,airplane,bike
Maine,35.5,8.1,5.7,21.0,33.2
Michigan,47.9,9.1,5.5,20.40,25.2
Washington,52.5,1.2,4.6,3.,24.7
Denver,21.8,20.5,5.3,2.90,30.9

输出:

                   min                  max
car           Denver:     21.8      Washington: 52.5
motorcycle    Washington:  1.2      Denver:     20.5
van           Washington:  4.6      Maine:       5.7
airplane      Denver:      2.9      Maine:      21.0
bike          Washington: 24.7      Maine:      33.2

撰写回答