Python：从cs中逐行提取关键字

id,some_text,new_keyword_field 1,What is the meaning of the word Himalaya?,"meaning,word,himalaya" 2,"Palindrome is a word, phrase, or sequence that reads the same backward as forward","palindrome,word,phrase,sequence,reads,backward,forward"

1条回答

网友

1楼 · 发布于 2024-06-11 06:45:10

下面是一种使用pandapply将新关键字列添加到数据帧的干净方法。Apply的工作原理是首先定义一个函数（在我们的例子中是get_keywords），我们可以将应用于每一行或列。在

import pandas as pd
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# I define the stop_words here so I don't do it every time in the function below
stop_words = stopwords.words('english')
# I've added the index_col='id' here to set your 'id' column as the index. This assumes that the 'id' is unique.
df = pd.read_csv('test-data.csv', index_col='id')

在这里，我们定义将应用于每一行的函数数据框应用在下一个牢房里。您可以看到，这个函数get_keywords以一个row作为参数，并返回一个逗号分隔的关键字字符串，就像您在上面想要的输出中一样（“含义，单词，喜马拉雅”）。在这个函数中，我们降低、标记、用isalpha()过滤掉标点符号、过滤掉停止字，并将关键字连接在一起以形成所需的输出。在

^{pr2}$

现在我们已经定义了要应用的函数，我们调用df.apply(get_keywords, axis=1)。这将返回熊猫系列（类似于列表）。因为我们希望这个系列成为我们数据帧的一部分，所以我们使用df['keywords'] = df.apply(get_keywords, axis=1)将其添加为新列

# applying the get_keywords function to our dataframe and saving the results
# as a new column in our dataframe called 'keywords'
# axis=1 means that we will apply get_keywords to each row and not each column
df['keywords'] = df.apply(get_keywords, axis=1)

Output: Dataframe after adding 'keywords' column

相关问题更多 >

编程相关推荐

热门问题

热门文章