如何将多个变量传递给pandas dataframe以与.map一起使用以创建新列

2条回答

网友

1楼 · 编辑于 2024-04-23 10:12:48

我通常使用apply来处理这种情况；它基本上是map的DataFrame版本（axis参数允许您决定是否将函数应用于行或列）：

df.apply(lambda row: row.a*row.b*row.c, axis =1)

或者

df.apply(np.prod, axis=1)

0     8
1    30
2    72

网友

2楼 · 编辑于 2024-04-23 10:12:48

Is there a way of creating a new column in a pandas dataframe using .MAP or something else which takes as input three columns and returns a single column. For example input would be 1, 2, 3 and output would be 1*2*3

为此，可以将apply与axis=1一起使用。但是，指定的函数不是用三个单独的参数（每列一个）调用的，而是用每行一个单独的参数调用的，该参数将是包含该行数据的序列。您可以在函数中对此进行说明：

def combine(row):
    return row['a'] + row['b'] + row['c']

>>> df.apply(combine, axis=1)
0     7
1    10
2    13

或者可以传递lambda，该lambda将序列解压为单独的参数：

def combine(one,two,three):
    return one + two + three

>>> df.apply(lambda x: combine(*x), axis=1)
0     7
1    10
2    13

如果只想传递特定行，则需要通过在带有列表的数据帧上编制索引来选择它们：

>>> df[['a', 'b', 'c']].apply(lambda x: combine(*x), axis=1)
0     7
1    10
2    13

注意双括号。（这实际上与apply无关；使用列表索引是从数据帧访问多个列的常规方法。）

但是，需要注意的是，在许多情况下，您不需要使用apply，因为您可以对列本身使用矢量化操作。上面的combine函数可以用DataFrame列本身作为参数调用：

>>> combine(df.a, df.b, df.c)
0     7
1    10
2    13

当“组合”操作可矢量化时，这通常更有效。

Likewise is there also a way of having a function take in one argument, a date and return three new pandas dataframe columns; one for the year, month and day?

如上所述，有两种基本方法可以做到这一点：使用apply的一般但非矢量化方法，以及更快的矢量化方法。假设您有这样一个数据帧：

>>> df = pandas.DataFrame({'date': pandas.date_range('2015/05/01', '2015/05/03')})
>>> df
        date
0 2015-05-01
1 2015-05-02
2 2015-05-03

可以定义一个函数，该函数为每个值返回一个序列，然后apply将其返回到列：

def dateComponents(date):
    return pandas.Series([date.year, date.month, date.day], index=["Year", "Month", "Day"])

>>> df.date.apply(dateComponents)
11:    Year  Month  Day
0  2015      5    1
1  2015      5    2
2  2015      5    3

在这种情况下，这是唯一的选项，因为没有矢量化的方式来访问各个日期组件。但是，在某些情况下，可以使用矢量化操作：

>>> df = pandas.DataFrame({'a': ["Hello", "There", "Pal"]})
>>> df
        a
0  Hello
1  There
2    Pal

>>> pandas.DataFrame({'FirstChar': df.a.str[0], 'Length': df.a.str.len()})
   FirstChar  Length
0         H       5
1         T       5
2         P       3

这里再次通过直接对值进行操作而不是按元素应用函数来对操作进行矢量化。在本例中，我们有两个矢量化操作（获取第一个字符和字符串长度），然后将结果包装到另一个对DataFrame的调用中，为这两种结果中的每一种创建单独的列。

相关问题更多 >

编程相关推荐

热门问题

热门文章