如何使用NAN选择Pandas数据集中的所有行,然后将其转换为新列?

2024-04-18 05:32:30 发布

您现在位置:Python中文网/ 问答频道 /正文

[原始数据][1]我有一套全年阅读的书名数据。我只想更改原始数据,添加一个名为Date的新列


    Tittle         Page        Author    Rating
0   Monday, Mar. 1  NaN         NaN      NaN
1   Tittle 1        5.0         JHK      1.50
2   Tittle 2        13.0        ABB      0.03
3   Tittle 3        100.0       ACC      3.5
4   Tittle 4        9.0          NN      5.40
5   Tuesday, Jan. 2 NaN         NaN       NaN
6   Tittle 5        6.0         BBB      6.50
7   Tittle 7        14.0        CCC      10.00
8   Tittle 8        10.0        CNN      2.50
9   Wednesday, Dec.3 NaN        NaN      NaN
10  Tittle 10       5.0         CBS      1.00
11  Title 20        5.0         ABC      1.00
12  Title 21        25.0        JJJ      3.50
13  Title 22        1.0         NNN      7.50
14  Thursday, Mar.4 NaN         NaN      NaN
15  Title 25        100.0       VVV      9.00
16  Title 30        6.0         YYYY     9.00
17  Title 35        2.0         QQQ      9.00

我尝试过使用dropna(),但最终它只是去掉了整行

    dfs = pd.read_csv('Book2.csv')
    df = dfs.dropna()
    display(df)
        Tittle  Page    Author  Rating
1   Tittle 1    5.0     JHK     1.50
2   Tittle 2    13.0    ABB     0.03
4   Tittle 4    9.0     tvN     5.40
6   Tittle 5    6.0     BBB     6.50
7   Tittle 7    14.0    CCC     10.00
8   Tittle 8    10.0    CNN     2.50
10  Tittle 10   5.0     CBS     1.00
11  Title 20    5.0     ABC     1.00
12  Title 21    25.0    JJJ     3.50
13  Title 22    1.0     NNN     7.50
15  Title 25    100.0   VVV     9.00
16  Title 30    6.0     YYYY    9.00
17  Title 35    2.0     QQQ     9.00

我曾尝试使用pd.isna()创建一个新的数据帧,但结果并不是我想要的样子

    dfs = pd.read_csv('Book2.csv')
    df = dfs[dfs.isnull().any(axis=1)]
    display(df)

    Tittle  Page    Author  Rating
0   Monday, Mar. 1  NaN NaN NaN
5   Tuesday, Jan. 2 NaN NaN NaN
9   Wednesday, Dec. 3   NaN NaN NaN
14  Thursday, Mar. 4    NaN NaN NaN

最后,我只是在excel中手动编辑它,使它看起来像我想要的那样。 [我希望它看起来像这样。][2]

    Tittle  Page    Author  Rating  Date
0   Tittle 1    5   JHK 1.50    Monday, Mar. 1
1   Tittle 2    13  ABB 0.03    Monday, Mar. 1
2   Tittle 3    100 ACC 4.50    Monday, Mar. 1
3   Tittle 4    9   tvN 5.40    Monday, Mar. 1
4   Tittle 5    6   BBB 6.50    Tuesday, Jan. 2
5   Tittle 7    14  CCC 10.00   Tuesday, Jan. 2
6   Tittle 8    10  CNN 2.50    Tuesday, Jan. 2
7   Tittle 10   5   CBS 1.00    Wednesday, Dec. 3
8   Title 20    5   ABC 1.00    Wednesday, Dec. 3
9   Title 21    25  JJJ 3.50    Wednesday, Dec. 3
10  Title 22    1   NNN 7.50    Wednesday, Dec. 3
11  Title 25    100 VVV 9.00    Thursday, Mar. 4
12  Title 30    6   YYYY    9.00    Thursday, Mar. 4
13  Title 35    2   QQQ 9.00    Thursday, Mar. 4

```




So I could sort or groupby the data by date later on if i want to see if there is a trend on the students reading habits in the future but for now I just want to add another date column and use the date that is all ready there in the data set if possible. I have to comb through a total of 181 pages of this and I am hoping pandas could help cut down the hours I will have to spend editing this manually in excel and using the copy and paste.

if you have any other recommendation in how to efficiently wrangle this data-set where the title will not mix in the same column and the NAN will be taken cared off. It will be greatly appreciated. 


  [1]: https://i.stack.imgur.com/JBihm.png
  [2]: https://i.stack.imgur.com/nlJbE.png

Tags: andthetointitlenanmarjan
1条回答
网友
1楼 · 发布于 2024-04-18 05:32:30

用三个或更多的NA数字填充有条件提取的数据帧,并用fillna()。将生成的日期序列与NA为零的数据组合

df1 = df.loc[df.isnull().sum(axis=1) == 3].reindex(index=df.index)
df1.fillna(method='ffill', inplace=True)
df1 = pd.concat([df1[['Tittle']], df.loc[df.isnull().sum(axis=1) == 0]], axis=1, join='inner')
df1.columns =['Date', 'Tittle', 'Page', 'Author', 'Rating']

df1
Date    Tittle  Page    Author  Rating
1   Monday, Mar. 1  Tittle 1    5.0     JHK     1.50
2   Monday, Mar. 1  Tittle 2    13.0    ABB     0.03
3   Monday, Mar. 1  Tittle 3    100.0   ACC     3.50
4   Monday, Mar. 1  Tittle 4    9.0     NN  5.40
6   Tuesday, Jan. 2     Tittle 5    6.0     BBB     6.50
7   Tuesday, Jan. 2     Tittle 7    14.0    CCC     10.00
8   Tuesday, Jan. 2     Tittle 8    10.0    CNN     2.50
10  Wednesday, Dec. 3   Tittle 10   5.0     CBS     1.00
11  Wednesday, Dec. 3   Title 20    5.0     ABC     1.00
12  Wednesday, Dec. 3   Title 21    25.0    JJJ     3.50
13  Wednesday, Dec. 3   Title 22    1.0     NNN     7.50
15  Thursday, Mar. 4    Title 25    100.0   VVV     9.00
16  Thursday, Mar. 4    Title 30    6.0     YYYY    9.00
17  Thursday, Mar. 4    Title 35    2.0     QQQ     9.00

相关问题 更多 >