PandasCSV平均和排序

Traceback (most recent call last): File "C:\Users\MVMCJK\Downloads\Python code\Seperate independent draft of Task 3 (not intergated with Task 1 and 2) draft 3.py", line 70, in <module> df[["score1", "score2","score3"]].mean(axis=1) File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__ return self._getitem_array(key) File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1835, in _getitem_array indexer = self.ix._convert_to_indexer(key, axis=1) File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1112, in _convert_to_indexer raise KeyError('%s not in index' % objarr[mask]) KeyError: "['score1' 'score2' 'score3'] not in index"

Traceback (most recent call last): File "C:\Users\MVMCJK\Downloads\Python code\Seperate independent draft of Task 3 (not intergated with Task 1 and 2) draft 3.py", line 74, in <module> saved_column = df.column_name File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2150, in __getattr__ (type(self).__name__, name)) AttributeError: 'DataFrame' object has no attribute 'column_name'

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.") classanddatasorter ='' while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]: classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ") if classanddatasorter == "A1": df = pd.read_csv('classa.csv') df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1) elif classanddatasorter == "A2": df = pd.read_csv('classa.csv', index_col='name1') saved_column = df.column_name name = df.name name.sort elif classanddatasorter == "A3": df = pd.read_csv('classa.csv') scores = df[['score1', 'score2', 'score3']].values scores.sort(axis=1) elif classanddatasorter == "B1": df = pd.read_csv('classb.csv') df['average'] = df[["score1", "score2","score3"]].mean(axis=1) elif classanddatasorter == "B2": df = pd.read_csv('classb.csv',index_col='name1') saved_column = df.column_name name = df.name elif classanddatasorter == "B3": df = pd.read_csv('classb.csv') scores = df[['score1', 'score2', 'score3']].values scores.sort(axis=1) elif classanddatasorter == "C1": df = pd.read_csv('classc.csv') df['average'] = df[["score1", "score2","score3"]].mean(axis=1) elif classanddatasorter == "C2": df = pd.read_csv('classc.csv',index_col='name1') saved_column = df.column_name name = df.name df = name.sort elif classanddatasorter == "C3": df = pd.read_csv('classc.csv') scores = df[['score1', 'score2', 'score3']].values scores.sort(axis=1)

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.") classanddatasorter ='' while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]: classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ") if classanddatasorter == "A1": df = pd.read_csv('classa.csv') print('Sorted by name1') df.sort('name1') print(df) elif classanddatasorter == "A2": df = pd.read_csv('classa.csv') print('Sorted by average column') df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1) print(df) print(df[['name1', 'name2', 'average']].sort('average')) elif classanddatasorter == "A3": df = pd.read_csv('classa.csv') print('Sorted scores') scores = df[['score1', 'score2', 'score3']].values scores.sort(axis=1) for i in xrange(0, scores.shape[1]): column_name = 'rank{}'.format(i) df[column_name] = scores[:, i] print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]) elif classanddatasorter == "B1": df = pd.read_csv('classb.csv') print('Sorted by name1') df.sort('name1') print(df) elif classanddatasorter == "B2": df = pd.read_csv('classb.csv') print('Sorted by average column') df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1) print(df) print(df[['name1', 'name2', 'average']].sort('average')) elif classanddatasorter == "B3": df = pd.read_csv('classb.csv') print('Sorted scores') scores = df[['score1', 'score2', 'score3']].values scores.sort(axis=1) for i in xrange(0, scores.shape[1]): column_name = 'rank{}'.format(i) df[column_name] = scores[:, i] print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]) elif classanddatasorter == "C1": df = pd.read_csv('classc.csv') print('Sorted by name1') df.sort('name1') print(df) elif classanddatasorter == "C2": df = pd.read_csv('classc.csv') print('Sorted by average column') df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1) print(df) print(df[['name1', 'name2', 'average']].sort('average')) elif classanddatasorter == "C3": df = pd.read_csv('classc.csv') print('Sorted scores') scores = df[['score1', 'score2', 'score3']].values scores.sort(axis=1) for i in xrange(0, scores.shape[1]): column_name = 'rank{}'.format(i) df[column_name] = scores[:, i] print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])

1条回答

网友

1楼 · 发布于 2024-04-26 21:45:13

解析和探索

假设我们有这样一个CSV文件（注意逗号后面的空格，并将其分隔开，否则您将需要使用CSV选项来指定格式）

分数.csv

name1,name2,score1,score2,score3
Atticus,Finch,9,8,10
Jem,Finch,5,7,6
Jean Louise,Finch,3,2,4

我们读取了CSV文件

^{pr2}$

现在df是：

         name1  name2  score1  score2  score3
0      Atticus  Finch       9       8      10
1          Jem  Finch       5       7       6
2  Jean Louise  Finch       3       2       4

而df.columns是：

Index([u'name1', u'name2', u'score1', u'score2', u'score3'], dtype='object')

如您所见，df有columns，但没有column_name属性，因此您的错误如下

AttributeError: 'DataFrame' object has no attribute 'column_name'

分类

现在让我们按字母顺序排序

df.sort('name1')

结果是：

         name1  name2  score1  score2  score3
0      Atticus  Finch       9       8      10
2  Jean Louise  Finch       3       2       4
1          Jem  Finch       5       7       6

你想要平均数，我们加一列

df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)

df现在有了一个新列，您可以根据它进行排序！在

         name1  name2  score1  score2  score3  average
0      Atticus  Finch       9       8      10        9
1          Jem  Finch       5       7       6        6
2  Jean Louise  Finch       3       2       4        3

如果您只想查看average列

df[['name1', 'name2', 'average']].sort('average')


         name1  name2  average
0      Atticus  Finch        9
1          Jem  Finch        6
2  Jean Louise  Finch        3

考虑到数据不是整齐的/标准化的，您想要的最后一个分数排序有点棘手，但是这里有一个尝试

scores = df[['score1', 'score2', 'score3']].values

scores现在看起来像这样

array([[ 9,  8, 10],
       [ 5,  7,  6],
       [ 3,  2,  4]])

我们对scores数组进行排序

scores.sort(axis=1)

array([[ 8,  9, 10],
       [ 5,  6,  7],
       [ 2,  3,  4]])

这些是您想要的排序分数，所以让我们把它们放到我们的df中，我们必须对每个score列都这样做，这样我们就可以使用scores.shape[1]，这是2D数组中的列数

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

现在我们的df看起来像这样

         name1  name2  score1  score2  score3  rank0  rank1  rank2
0      Atticus  Finch       9       8      10      8      9     10
1          Jem  Finch       5       7       6      5      6      7
2  Jean Louise  Finch       3       2       4      2      3      4

为了得到你想要的展示

df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]


         name1  name2  rank2  rank1  rank0
0      Atticus  Finch     10      9      8
1          Jem  Finch      7      6      5
2  Jean Louise  Finch      4      3      2

整理数据

您可以通过阅读this PDF paper来阅读有关整理数据的更多信息

基本上，如果你的数据看起来像这样的话，很多操作会更容易

name, test, score
bob, 1, 10
bob, 2, 9

而不是

name, score1, score2
bob, 10, 9

Python脚本

import pandas as pd
df = pd.read_csv('scores.csv')

print('Original Data')
print(df)

print('Sorted by name1')
df.sort('name1')
print(df)

print('Sorted by average column')
df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
print(df)
print(df[['name1', 'name2', 'average']].sort('average'))

print('Sorted scores')
scores = df[['score1', 'score2', 'score3']].values
scores.sort(axis=1)

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])

您也可以将生成的数据帧保存到另一个.csv，而不是print()，例如.to_csv('score_sorted_avg.csv')

解析和探索

分类

整理数据

Python脚本

相关问题更多 >

编程相关推荐

热门问题

热门文章