将新列或行添加为pd.Series

2024-04-26 00:43:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图通过使用pd.Series对象添加一列和一行。以下是我到目前为止的情况:

import pandas as pd
df = pd.DataFrame([
    {"Title": "Titanic",    "ReleaseYear": 1997, "Director": "James Cameron"},
    {"Title": "Spider-Man", "ReleaseYear": 2002, "Director": "Sam Raimi"}
])

# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993, 'Steven Spielberg'])
df.loc[2] = new_row

# Add a new column
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'])
df['Keyword'] = new_keyword_column
df

这似乎增加了列fine,但是行给出了所有NaN

enter image description here

正确的方法是什么


Tags: 对象importadddfnewtitle情况column
2条回答

如果要添加新行或新列,则使用对齐(这意味着尝试匹配序列索引值和数据帧列/行,如果不匹配,则获取NaNs表示不匹配值):

您的方法很好,只需要为新行设置相同的Series索引值:

# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993, 'Steven Spielberg'], index=df.columns)
df.loc[2] = new_movie_row

若DataFrame的默认索引值相同,那个么默认索引是相同的,但对于一般数据也是必要的

# Add a new column
new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'], index=df.index)
df['Keyword'] = new_keyword_column

print (df)
           Title  ReleaseYear          Director   Keyword
0        Titanic         1997     James Cameron      Boat
1     Spider-Man         2002         Sam Raimi    Spider
2  Jurassic Park         1993  Steven Spielberg  Dinosaur

但通常情况下,如果需要新行/列,则可以使用长度相同的列表或1d数组(如果需要相同的值,则使用标量):

# Add a new row
df.loc[2] = ['Jurassic Park', 1993, 'Steven Spielberg']

# Add a new column
df['Keyword'] = ['Boat', 'Spider', 'Dinosaur']

# Add a new column with same values
df['same vals'] = 10


为什么必须使用系列而不仅仅是列表

仅当某些输入数据丢失时,才需要按系列对齐:

# Add a new row
new_movie_row = pd.Series(['Jurassic Park', 1993], index=['Title','ReleaseYear'])
df.loc[2] = new_movie_row
print (df)
           Title  ReleaseYear       Director
0        Titanic         1997  James Cameron
1     Spider-Man         2002      Sam Raimi
2  Jurassic Park         1993            NaN

也可以指定列:

df.loc[2, ['Title','ReleaseYear']] = ['Jurassic Park', 1993]

如果仅使用列表获取错误:

df.loc[3] = ['Jurassic Park', 1993]
print (df)

>ValueError: cannot set a row with mismatched columns

Pandas尝试根据索引/列名进行对齐这称为^{},我们可以在这里使用.tolist

df.loc[2] = new_movie_row.tolist()
df
           Title  ReleaseYear          Director
0        Titanic         1997     James Cameron
1     Spider-Man         2002         Sam Raimi
2  Jurassic Park         1993  Steven Spielberg

这同样适用于添加列

new_keyword_column = pd.Series(['Boat', 'Spider', 'Dinosaur'],index=[4,5,6])  # Notice the Index is 4, 5, 6.

df['new'] = new_keyword_column
df
           Title  ReleaseYear          Director  new
0        Titanic         1997     James Cameron  NaN
1     Spider-Man         2002         Sam Raimi  NaN
2  Jurassic Park         1993  Steven Spielberg  NaN

由于索引不对齐,所以得到所有NaN,为了抵消这一点,可以使用.tolist()

df['new'] = new_keyword_column.tolist()
df
           Title  ReleaseYear          Director       new
0        Titanic         1997     James Cameron      Boat
1     Spider-Man         2002         Sam Raimi    Spider
2  Jurassic Park         1993  Steven Spielberg  Dinosaur

相关问题 更多 >