如何在pandas DataFrame中创建新行，包含现有行字符串中的单词？

8 投票

1 回答

5456 浏览

提问于 2025-04-17 19:03

我在中有一个叫做df.strings的列，里面存放着一些文本字符串。我想把这些字符串里的每个单词都放到单独的行里，同时其他列的值保持不变。比如说，如果我有3个字符串（还有一个不相关的列，叫做时间）：

    Strings Time
0   The dog  4Pm
1  lazy dog  2Pm
2   The fox  1Pm

我想要的新行应该包含这些字符串中的单词，但其他列的内容要保持一致。

Strings   --- Words ---Time  
"The dog" --- "The" --- 4Pm  
"The dog" --- "dog" --- 4Pm  
"lazy dog"--- "lazy"--- 2Pm  
"lazy dog"--- "dog" --- 2Pm  
"The fox" --- "The" --- 1Pm  
"The fox" --- "fox" --- 1Pm

我知道怎么把字符串里的单词分开：

   string_list  = '\n'.join(df.Strings.map(str))
   word_list = re.findall('[a-z]+', Strings)

但是我该怎么把这些单词放回到数据框中，同时保留索引和其他变量呢？我现在用的是Python 2.7和pandas 0.10.1。

编辑：
我现在明白了如何使用groupby来扩展行，具体可以参考这个问题：

def f(group):
    row = group.irow(0)
    return DataFrame({'words':  re.findall('[a-z]+',row['Strings'])})
df.groupby('class', group_keys=False).apply(f)

不过，我还是想保留其他列的内容。这可能吗？

数据处理 groupby 数据清洗字符串分割 pandas 数据框数据操作行扩展

1 个回答

这是我的代码，它没有使用 groupby()，我觉得这样会更快。

import pandas as pd
import numpy as np
import itertools

df = pd.DataFrame({
"strings":["the dog", "lazy dog", "The fox jump"], 
"value":["a","b","c"]})

w = df.strings.str.split()
c = w.map(len)
idx = np.repeat(c.index, c.values)
#words = np.concatenate(w.values)
words = list(itertools.chain.from_iterable(w.values))
s = pd.Series(words, index=idx)
s.name = "words"
print df.join(s)

结果是：

        strings value words
0       the dog     a   the
0       the dog     a   dog
1      lazy dog     b  lazy
1      lazy dog     b   dog
2  The fox jump     c   The
2  The fox jump     c   fox
2  The fox jump     c  jump

回答于 2025-04-17 由 Python大师

分享举报

如何在pandas DataFrame中创建新行，包含现有行字符串中的单词？

1 个回答

撰写回答