如何在数据框中将同义词存储为列?

2024-05-15 02:26:26 发布

您现在位置:Python中文网/ 问答频道 /正文

要将下面代码的获取结果存储在数据框中

两列一列是实际名称,另一列是新行中的每个同义词

import nltk
from nltk.corpus import wordnet
import pandas as pd

List = ['protest','riot','conflict']
df=[]
def process_genre(str):
    for genre in str:
        result = []
        for syn in wordnet.synsets(genre):
            for l in syn.lemmas():
                result.append(l.name())
        print(set(result))
process_genre(List)

output:
-------
{'resist', 'objection', 'dissent', 'protestation', 'protest'}
{'bacchanalia', 'riot', 'saturnalia', 'belly_laugh', 'scream', 'wow', 'bacchanal', 'thigh-slapper', 'sidesplitter', 'drunken_revelry', 'carouse', 'rioting', 'roister', 'debauchery', 'orgy', 'public_violence', 'howler', 'debauch'}
{'fight', 'battle', 'difference', 'dispute', 'conflict', 'infringe', 'engagement', 'struggle', 'difference_of_opinion', 'contravene', 'run_afoul'}

要将结果存储在数据框中:

# Expected Result:

Col1           Col2
--------------------
protest       resist
protest       objection
protest       dissent
...           ...
riot          scream
riot          carouse
riot          saturnalia
...           ...
conflict      Fight
conflict      battle
...           ...


Tags: 数据inimportforresultprocesswordnetlist
2条回答

这是一个可能的解决方案:

from nltk.corpus import wordnet
import pandas as pd

def process_genres(genres):
    return (pd.DataFrame([(genre, l.name())
                          for genre in genres
                          for syn in wordnet.synsets(genre)
                          for l in syn.lemmas()], columns=['Col1', 'Col2'])
              .drop_duplicates())

以下是如何使用它:

>>> genres = ['protest', 'riot', 'conflict']
>>> df = process_genres(genres)
>>> df
        Col1                   Col2
0    protest                protest
1    protest           protestation
...
11      riot                   riot
12      riot        public_violence
13      riot                rioting
...
34  conflict               conflict
35  conflict               struggle
36  conflict                 battle
...
53  conflict             contravene

与编码的逻辑相同,但使用列表理解来构建pandas构造函数兼容的结构

import nltk
from nltk.corpus import wordnet
import pandas as pd

# same logic as question,  just use list comprehesions to process words and synonyms
# to build DF compatible construction list
df = pd.DataFrame([
    {"col1":word, 
     # use set to take unique values
     "col2":{l.name() 
                  for syn in wordnet.synsets(word)
                  for l in syn.lemmas() 
            }
    }
    for word in ['protest','riot','conflict']
]).explode("col2") # expand embedded list of synonyms

# filter out word as a synonym of itself
df.loc[df.col1!=df.col2].head(10)

^{tb1}$

相关问题 更多 >

    热门问题