标记文本并为datafram中的每一行创建更多的行

2条回答

网友

1楼 · 编辑于 2024-04-27 19:32:18

用途：

s = (df.pop('text')
      .str.strip('.')
      .str.split('\.\s+', expand=True)
      .stack()
      .rename('text')
      .reset_index(level=1, drop=True))

df = df.join(s).reset_index(drop=True)
print (df)
   file_id                         text
0        1      I am the first document
1        1         I am a nice document
2        2     I am the second document
3        2  I am an even nicer document

解释：

首先对extract列使用^{}，通过^{}删除最后一个.，并通过^{}和escape .进行拆分，因为特殊的正则表达式字符，对于Series，通过^{}重塑，对于Series，通过^{}和rename重塑，对于^{}的Series，通过^{}和rename重塑为原始

网友

2楼 · 编辑于 2024-04-27 19:32:18

df = pd.DataFrame( { 'field_id': [1,2], 
                    'text': ["I am the first document. I am a nice document.",
                             "I am the second document. I am an even nicer document."]})

df['sents'] = df.text.apply(lambda txt: [x for x in txt.split(".") if len(x) > 1])
df = df.set_index(['field_id']).apply(lambda x: 
                                      pd.Series(x['sents']),axis=1).stack().reset_index(level=1, drop=True)
df = df.reset_index()
df.columns = ['field_id','text']

相关问题更多 >

编程相关推荐

热门问题

热门文章

标记文本并为datafram中的每一行创建更多的行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >