函数的奇异性与较大的不适用性Pandas.DataFram

2024-03-29 12:51:35 发布

您现在位置:Python中文网/ 问答频道 /正文

更新的问题

如前所述,我提供了一个可复制的示例。
有链接可以访问我的dataframe的1/6(通过Pickle序列化的pandas.DataFrame对象)和^{}以获得可复制的代码,其中有一个正确应用函数的数据帧示例,而较大的数据帧则没有。你知道吗

注意dropbox会说视图不可用但文件可用,如果不可用请告诉我。你知道吗

古代的问题,问题最终不是来自pool.map()

与此problematic相关,我在数据帧的一个示例上使用此方法来查看它是否正确,具体情况如下:

m = dfsample.Result.eq('Win')
s = m.shift().cumsum()
dfsample['gap_in_days'] = dfsample.groupby(['name', s])['Gap done'].cumsum() #"Expected Gap" in the linked topic
dfsample['nb_of_games'] = dfsample.assign(nb_of_games = 1).groupby('name')['nb_of games'].apply(lambda x:x.shift().cumsum()).fillna(0)
dfsample['gap_in_numbers'] = dfsample.assign(nb = 1).groupby(['name',s])['nb'].cumsum()

它呈现了我所期望的:

+-----------+------------+---------------------+----------+-------------+-------------+----------------+
|    Player |   Result   |        Date         | Gap done | gap_in_days | nb_of_games | gap_in_numbers |
+-----------+------------+---------------------+----------+-------------+-------------+----------------+
| K2000     | Lose       | 2015-11-13 13:42:00 |      0.0 |         0.0 |           0 | -1 *           |
| K2000     | Lose       | 2016-03-23 16:40:00 |    131.0 |       131.0 |           1 | 1              |
| K2000     | Lose       | 2016-05-16 19:17:00 |     54.0 |       185.0 |           2 | 2              |
| K2000     | Win        | 2016-06-09 19:36:00 |     54.0 |       239.0 |           3 | 3              |
| K2000     | Win        | 2016-06-30 14:05:00 |     54.0 |        54.0 |           4 | 1              |
| K2000     | Lose       | 2016-07-29 16:20:00 |     29.0 |        29.0 |           5 | 2              |
| K2000     | Win        | 2016-10-08 17:48:00 |     29.0 |        58.0 |           6 | 3              |
| Kssis     | Lose       | 2007-02-25 15:05:00 |      0.0 |         0.0 |           0 | 1 *            |
| Kssis     | Lose       | 2007-04-25 6:07:00  |     59.0 |        59.0 |           1 | 1              |
| Kssis     | Not-ranked | 2007-06-01 16:54:00 |     37.0 |        96.0 |           2 | 2              |
| Kssis     | Lose       | 2007-09-09 14:33:00 |     99.0 |       195.0 |           3 | 3              |
| Kssis     | Lose       | 2008-04-06 16:27:00 |    210.0 |       405.0 |           4 | 4              |
+-----------+------------+---------------------+----------+-------------+-------------+----------------+

为了解释数据,Gap done是两个不同游戏之间的天数。gap_in_days是玩家赢得游戏的天数。nb_of_games我想是恐怖的。gap_in_numbers是玩家获胜前玩的游戏数。
注意:关于带*的值。我知道这些是奇怪的结果,但正如我告诉安迪L.这是正确的。当nb_of_games为0时,我只替换为0。此外,我还向您展示了它,因为如果您进行测试,您显然会看到它并得到询问。

现在,当我在带有pool.map(function , iterable)的函数中应用相同的东西时,它不起作用,而在dataframe dfsample的样本上应用相同的函数则完全可以。你知道吗

功能如下:

def gap_nb(df):
    s = mask_result(df)
    df['gap_in_numbers'] = df.assign(nb = 1).groupby(['name',s])['nb'].cumsum()
    return df

函数mask_result是:

def mask_result(df):
    mask = df.Result.eq('P')
    s = mask.shift().cumsum()
    return s

在我把它和pool.map(function, iterable)一起使用之后

dfs = pool.map(gap_nb , dfs) #where dfs is a list of slices of a big dataframe

它只是将带有1的列gap_in_numbers呈现为:

+----------------+
| gap_in_numbers |
+----------------+
|              0 |
|              1 |
|              1 |
|              1 |
|              1 |
|            ... |
|              1 |
+----------------+

我试图找到一些方法,比如在另一个函数中使用assign(),然后在另一个函数中应用cumsum(),但它返回相同的结果。你知道吗

有人能告诉我为什么吗?你知道吗


熊猫版本:0.23.4 Python版本:3.7.4


要使用的示例数据(没有最后一列)

import io
s = '''Player,Result,Date,Gap,done,gap_in_days,nb_of_games
K2000,Lose,2015-11-13,13:42:00,0.0,0.0,0
K2000,Lose,2016-03-23,16:40:00,131.0,131.0,1
K2000,Lose,2016-05-16,19:17:00,54.0,185.0,2
K2000,Win,2016-06-09,19:36:00,54.0,239.0,3
K2000,Win,2016-06-30,14:05:00,54.0,54.0,4
K2000,Lose,2016-07-29,16:20:00,29.0,29.0,5
K2000,Win,2016-10-08,17:48:00,29.0,58.0,6
Kssis,Lose,2007-02-25,15:05:00,0.0,0.0,0
Kssis,Lose,2007-04-25,6:07:00,59.0,59.0,1
Kssis,Not-ranked,2007-06-01,16:54:00,37.0,96.0,2
Kssis,Lose,2007-09-09,14:33:00,99.0,195.0,3
Kssis,Lose,2008-04-06,16:27:00,210.0,405.0,4'''

df = pd.read_csv(io.StringIO(s))

Tags: of数据函数indfwingamesnumbers