组合Pandas中的多个数据

2024-04-20 13:41:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧:

Id      First_name1 first_name2.    first_name3   last_name1 last_name2

1.         Michel.     michelle.         Michele.        Jeremi.        Jeremy
2          Jack.        jack.                Jak.               Jean.           Jean
3.         Dave.        Dav.                Dave              Daniel.        Danielle

如您所见,对于相同的id,名称是不一样的。如果first_name1==first_name2first_name3,我想检查每一行。如果相等,则创建一个名为first_name的新列,否则将所有不同的名称设置为first_name1,依此类推。。。像这样:

Id.        First_name.       First_name1.       First_name2.        Last_name1.         Last_name2

1.         Michel.              Michelle.             Michele.                Jeremy.                Jeremi
2.         Jack.                 Jak.                     nan.                       Jean.                   nan
3.         Dave.                 Dav.                    nan.                       Daniel.                Danielle

Tags: idnanjeanfirstlastjackdavename1
2条回答

首先,遍历数据帧的行:

for index, row in yourdf.iterrows():

然后对数据帧中的每一行比较要比较的两个值:

if row['First_name1'] == row['first_name2']:
    # Create the new column and set its value to first_name
    row['new_column'] = first_name
else:
    # Set each column to the value you want
    row['first_name'] = first_name1
    row['first_name2'] = first_name1

你的问题我不太清楚,但从我得到的情况来看,你试着这样做:

import pandas as pd
import numpy as np

header = ["First_name1", "First_name2", "First_name3", "Last_name1", "Last_name2"]
df= pd.DataFrame([["Michel", "Michelle", "Michele", "Jeremi", "Jeremy"],
                         ["Jack", "Jack", "Jak", "Jean", "Jean"],
                         ["Dave", "Dav", "Dave", "Daniel", "Danielle"]], columns=header)

print df

# Create empty df
finalDataFrame = pd.DataFrame(columns=header)

for index, row in df.iterrows():
    firstName = row[0]
    # convert to row as tuple cannot be modified
    lrow = list(row)
    if (firstName == row[1]):
        lrow[1] = np.NaN
    if (firstName == row[2]):
        lrow[2] = np.NaN
    # Append the row to the final DataFrame
    finalDataFrame.loc[len(finalDataFrame)] = lrow

print finalDataFrame

希望有帮助!你知道吗

相关问题 更多 >