如何在数据框中将同义词存储为列？

import nltk from nltk.corpus import wordnet import pandas as pd List = ['protest','riot','conflict'] df=[] def process_genre(str): for genre in str: result = [] for syn in wordnet.synsets(genre): for l in syn.lemmas(): result.append(l.name()) print(set(result)) process_genre(List) output: ------- {'resist', 'objection', 'dissent', 'protestation', 'protest'} {'bacchanalia', 'riot', 'saturnalia', 'belly_laugh', 'scream', 'wow', 'bacchanal', 'thigh-slapper', 'sidesplitter', 'drunken_revelry', 'carouse', 'rioting', 'roister', 'debauchery', 'orgy', 'public_violence', 'howler', 'debauch'} {'fight', 'battle', 'difference', 'dispute', 'conflict', 'infringe', 'engagement', 'struggle', 'difference_of_opinion', 'contravene', 'run_afoul'}

# Expected Result: Col1 Col2 -------------------- protest resist protest objection protest dissent ... ... riot scream riot carouse riot saturnalia ... ... conflict Fight conflict battle ... ...

2条回答

网友

1楼 · 编辑于 2024-05-15 02:26:26

这是一个可能的解决方案：

from nltk.corpus import wordnet
import pandas as pd

def process_genres(genres):
    return (pd.DataFrame([(genre, l.name())
                          for genre in genres
                          for syn in wordnet.synsets(genre)
                          for l in syn.lemmas()], columns=['Col1', 'Col2'])
              .drop_duplicates())

以下是如何使用它：

>>> genres = ['protest', 'riot', 'conflict']
>>> df = process_genres(genres)
>>> df
        Col1                   Col2
0    protest                protest
1    protest           protestation
...
11      riot                   riot
12      riot        public_violence
13      riot                rioting
...
34  conflict               conflict
35  conflict               struggle
36  conflict                 battle
...
53  conflict             contravene

网友

2楼 · 编辑于 2024-05-15 02:26:26

与编码的逻辑相同，但使用列表理解来构建pandas构造函数兼容的结构

import nltk
from nltk.corpus import wordnet
import pandas as pd

# same logic as question,  just use list comprehesions to process words and synonyms
# to build DF compatible construction list
df = pd.DataFrame([
    {"col1":word, 
     # use set to take unique values
     "col2":{l.name() 
                  for syn in wordnet.synsets(word)
                  for l in syn.lemmas() 
            }
    }
    for word in ['protest','riot','conflict']
]).explode("col2") # expand embedded list of synonyms

# filter out word as a synonym of itself
df.loc[df.col1!=df.col2].head(10)

^{tb1}$

相关问题更多 >

编程相关推荐

热门问题

热门文章