PandasCSV平均和排序

2024-04-26 21:45:13 发布

您现在位置:Python中文网/ 问答频道 /正文

在我的最后一个计算Python任务中,我被要求用Python编写一个数据库程序,该程序允许我访问三个类数据库,每个数据库包含参加过算术测验的学生的三个分数。有三种方法必须对代码进行排序;按字母顺序使用名字,作为平均值,将三个分数相加,除以三,得到唯一的值,然后将分数从最高分数到最低分数排序。 因此,假设以下是CSV文件之一:

name1       name2 score1 score2 score3
Atticus     Finch 9      8      10
Jem         Finch 5      7      6
Jean Louise Finch 3      2      4

如果最终用户希望它按字母顺序排序,那么它在Python IDLE GUI上应该是这样的:

^{pr2}$

如果最终用户希望将其作为平均值排序,则应如下所示:

Atticus     Finch 9
Jem         Finch 6
Jean Louise Finch 3

如果最终用户希望它从最高到最低排序,这就是它应该是这样的:

Atticus     Finch 10     9      8
Jem         Finch 7      6      5
Jean Louise Finch 4      3      2

现在我的代码是这样的:

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.")

classanddatasorter =''
while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]:
classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ")
if classanddatasorter == "A1":
 df = pd.read_csv('classa.csv')
 df[["score1", "score2","score3"]].mean(axis=1)

elif classanddatasorter == "A2":
 df = pd.read_csv('classa.csv')
 saved_column = df.column_name
 name = df.name
 name.sort 

elif classanddatasorter == "A3":
 df = pd.read_csv('classa.csv')
 df.sort[('score1','score2','score3'], ascending=False) 

elif classanddatasorter == "B1":
 df = pd.read_csv('classb.csv')
 df[["score1", "score2","score3"]].mean(axis=1)  

elif classanddatasorter == "B2":
 df = pd.read_csv('classb.csv')
 saved_column = df.column_name
 name = df.name

elif classanddatasorter == "B3":
 df = pd.read_csv('classb.csv')
 df.sort[('score1','score2','score3'], ascending=False)

elif classanddatasorter == "C1":
 df = pd.read_csv('classc.csv')
 df[["score1", "score2","score3"]].mean(axis=1)

elif classanddatasorter == "C2":
 bamboo = pd.read_csv('classc.csv')
 saved_column = df.column_name
 name = df.name
 name.sort 

elif classanddatasorter == "C3":
 df = pd.read_csv('classc.csv')
 df.sort[('score1','score2','score3'], ascending=False)

到目前为止,我得到了以下错误:

尝试将代码排序为平均值:

 Traceback (most recent call last):
  File "C:\Users\MVMCJK\Downloads\Python code\Seperate independent draft of Task 3 (not intergated with Task 1 and 2) draft 3.py", line 70, in <module>
df[["score1", "score2","score3"]].mean(axis=1)
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
return self._getitem_array(key)
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1835, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1112, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['score1' 'score2' 'score3'] not in index"

尝试按字母顺序对代码进行排序:

Traceback (most recent call last):
  File "C:\Users\MVMCJK\Downloads\Python code\Seperate independent draft of Task 3 (not intergated with Task 1 and 2) draft 3.py", line 74, in <module>
saved_column = df.column_name
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2150, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'column_name'

最后一个部分甚至不能远程运行:由于无效语法,它默认拒绝运行,我必须消除它以使程序正常工作,当我输入A3时,它甚至没有给出响应。 我试着在google上搜索KeyError和AttributeError,但是我找不到任何与我的问题相关的东西,从而能够找到进一步的解决方案。有人知道我的节目有什么好玩的吗?任何帮助都将不胜感激。在

编辑:已更新但仍不起作用的代码:

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.")
classanddatasorter =''
while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]:
classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ")
if classanddatasorter == "A1":
 df = pd.read_csv('classa.csv')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)

elif classanddatasorter == "A2":
 df = pd.read_csv('classa.csv', index_col='name1')
 saved_column = df.column_name
 name = df.name
 name.sort 

elif classanddatasorter == "A3":
 df = pd.read_csv('classa.csv')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)


elif classanddatasorter == "B1":
 df = pd.read_csv('classb.csv')
 df['average'] = df[["score1", "score2","score3"]].mean(axis=1)


elif classanddatasorter == "B2":
 df = pd.read_csv('classb.csv',index_col='name1')
 saved_column = df.column_name
 name = df.name

elif classanddatasorter == "B3":
 df = pd.read_csv('classb.csv')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

elif classanddatasorter == "C1":
 df = pd.read_csv('classc.csv')
 df['average'] = df[["score1", "score2","score3"]].mean(axis=1)

elif classanddatasorter == "C2":
 df = pd.read_csv('classc.csv',index_col='name1')
 saved_column = df.column_name
 name = df.name
 df = name.sort 

elif classanddatasorter == "C3":
 df = pd.read_csv('classc.csv')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

编辑2:更新了一些bakkal的代码示例。在

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.")
classanddatasorter =''
while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]:
 classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ")

if classanddatasorter == "A1":
 df = pd.read_csv('classa.csv')
 print('Sorted by name1')
 df.sort('name1')
 print(df)
elif classanddatasorter == "A2":
 df = pd.read_csv('classa.csv')
 print('Sorted by average column')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
 print(df)
 print(df[['name1', 'name2', 'average']].sort('average'))
elif classanddatasorter == "A3":
 df = pd.read_csv('classa.csv')
 print('Sorted scores')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

 for i in xrange(0, scores.shape[1]):
     column_name = 'rank{}'.format(i)
     df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])
elif classanddatasorter == "B1":
 df = pd.read_csv('classb.csv')
 print('Sorted by name1')
 df.sort('name1')
 print(df)
elif classanddatasorter == "B2":
 df = pd.read_csv('classb.csv')
 print('Sorted by average column')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
 print(df)
 print(df[['name1', 'name2', 'average']].sort('average'))
elif classanddatasorter == "B3":
 df = pd.read_csv('classb.csv')
 print('Sorted scores')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])
elif classanddatasorter == "C1":
 df = pd.read_csv('classc.csv')
 print('Sorted by name1')
 df.sort('name1')
 print(df)
elif classanddatasorter == "C2":
 df = pd.read_csv('classc.csv')
 print('Sorted by average column')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
 print(df)
 print(df[['name1', 'name2', 'average']].sort('average'))
elif classanddatasorter == "C3":
 df = pd.read_csv('classc.csv')
 print('Sorted scores')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

 for i in xrange(0, scores.shape[1]):
     column_name = 'rank{}'.format(i)
     df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]) 

Tags: ofcsvthetonameindfread
1条回答
网友
1楼 · 发布于 2024-04-26 21:45:13

解析和探索

假设我们有这样一个CSV文件(注意逗号后面的空格,并将其分隔开,否则您将需要使用CSV选项来指定格式)

分数.csv

name1,name2,score1,score2,score3
Atticus,Finch,9,8,10
Jem,Finch,5,7,6
Jean Louise,Finch,3,2,4

我们读取了CSV文件

^{pr2}$

现在df是:

         name1  name2  score1  score2  score3
0      Atticus  Finch       9       8      10
1          Jem  Finch       5       7       6
2  Jean Louise  Finch       3       2       4

df.columns是:

Index([u'name1', u'name2', u'score1', u'score2', u'score3'], dtype='object')

如您所见,dfcolumns,但没有column_name属性,因此您的错误如下

AttributeError: 'DataFrame' object has no attribute 'column_name'

分类

现在让我们按字母顺序排序

df.sort('name1')

结果是:

         name1  name2  score1  score2  score3
0      Atticus  Finch       9       8      10
2  Jean Louise  Finch       3       2       4
1          Jem  Finch       5       7       6

你想要平均数,我们加一列

df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)

df现在有了一个新列,您可以根据它进行排序!在

         name1  name2  score1  score2  score3  average
0      Atticus  Finch       9       8      10        9
1          Jem  Finch       5       7       6        6
2  Jean Louise  Finch       3       2       4        3

如果您只想查看average

df[['name1', 'name2', 'average']].sort('average')


         name1  name2  average
0      Atticus  Finch        9
1          Jem  Finch        6
2  Jean Louise  Finch        3

考虑到数据不是整齐的/标准化的,您想要的最后一个分数排序有点棘手,但是这里有一个尝试

scores = df[['score1', 'score2', 'score3']].values

scores现在看起来像这样

array([[ 9,  8, 10],
       [ 5,  7,  6],
       [ 3,  2,  4]])

我们对scores数组进行排序

scores.sort(axis=1)

array([[ 8,  9, 10],
       [ 5,  6,  7],
       [ 2,  3,  4]])

这些是您想要的排序分数,所以让我们把它们放到我们的df中,我们必须对每个score列都这样做,这样我们就可以使用scores.shape[1],这是2D数组中的列数

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

现在我们的df看起来像这样

         name1  name2  score1  score2  score3  rank0  rank1  rank2
0      Atticus  Finch       9       8      10      8      9     10
1          Jem  Finch       5       7       6      5      6      7
2  Jean Louise  Finch       3       2       4      2      3      4

为了得到你想要的展示

df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]


         name1  name2  rank2  rank1  rank0
0      Atticus  Finch     10      9      8
1          Jem  Finch      7      6      5
2  Jean Louise  Finch      4      3      2

整理数据

您可以通过阅读this PDF paper来阅读有关整理数据的更多信息

基本上,如果你的数据看起来像这样的话,很多操作会更容易

name, test, score
bob, 1, 10
bob, 2, 9

而不是

name, score1, score2
bob, 10, 9

Python脚本

import pandas as pd
df = pd.read_csv('scores.csv')

print('Original Data')
print(df)

print('Sorted by name1')
df.sort('name1')
print(df)

print('Sorted by average column')
df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
print(df)
print(df[['name1', 'name2', 'average']].sort('average'))

print('Sorted scores')
scores = df[['score1', 'score2', 'score3']].values
scores.sort(axis=1)

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])

您也可以将生成的数据帧保存到另一个.csv,而不是print(),例如.to_csv('score_sorted_avg.csv')

相关问题 更多 >