将连续的行与相同的列值合并

2条回答

网友

1楼 · 编辑于 2024-06-07 03:45:52

实际上，我认为@chrisb的groupby解决方案更好，但是如果可能存在非连续重复值，则需要创建另一个groupby键变量来跟踪这些值。不过，对于较小的问题来说，这是一种快速而肮脏的方法。

我认为在这种情况下，使用基本迭代器比尝试使用pandas函数更容易。我可以想象使用groupby的情况，但是如果第二个变量重复出现，则很难保持连续的条件。

这可能可以清除，但样本：

df = DataFrame({'a': ['The', 'Skoll', 'Foundation', ',', 
                      'based', 'in', 'Silicon', 'Valley'], 
                'b': ['DT', 'Org', 'Org', ',', 'VBN', 'IN', 
                      'Location', 'Location']})

# Initialize result lists with the first row of df
result1 = [df['a'][0]]  
result2 = [df['b'][0]]

# Use zip() to iterate over the two columns of df simultaneously,
# making sure to skip the first row which is already added
for a, b in zip(df['a'][1:], df['b'][1:]):
    if b == result2[-1]:        # If b matches the last value in result2,
        result1[-1] += " " + a  # add a to the last value of result1
    else:  # Otherwise add a new row with the values
        result1.append(a)
        result2.append(b)

# Create a new dataframe using these result lists
df = DataFrame({'a': result1, 'b': result2})

网友

2楼 · 编辑于 2024-06-07 03:45:52

@rfan的答案当然有效，作为替代，这里有一个使用pandasgroupby的方法。

.groupby()按“b”列对数据进行分组-需要sort=False来保持顺序的完整性。.apply()对每组b数据应用一个函数，在本例中，将由空格分隔的字符串连接在一起。

In [67]: df.groupby('b', sort=False)['a'].apply(' '.join)
Out[67]: 

b
DT                       The
Org         Skoll Foundation
,                          ,
VBN                    based
IN                        in
Location      Silicon Valley
Name: a, dtype: object

编辑：

为了处理更一般的情况（重复的非连续值），一种方法是首先添加一个sentinel列，跟踪每一行应用于哪一组连续数据，如下所示：

df['key'] = (df['b'] != df['b'].shift(1)).astype(int).cumsum()

然后将键添加到groupby，它应该可以在重复值的情况下工作。例如，对于这个带有repeats的虚拟数据：

df = DataFrame({'a': ['The', 'Skoll', 'Foundation', ',', 
                      'based', 'in', 'Silicon', 'Valley', 'A', 'Foundation'], 
                'b': ['DT', 'Org', 'Org', ',', 'VBN', 'IN', 
                      'Location', 'Location', 'Org', 'Org']})

应用groupby：

In [897]: df.groupby(['key', 'b'])['a'].apply(' '.join)
Out[897]: 
key  b       
1    DT                       The
2    Org         Skoll Foundation
3    ,                          ,
4    VBN                    based
5    IN                        in
6    Location      Silicon Valley
7    Org             A Foundation
Name: a, dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章