从句子中给单词贴标签

2024-04-24 02:33:44 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个带有3列的数据框，即'word'、'pos tag'、'label'。这些词源于一篇课文文件。现在我想有另一个专栏“句子#”，说明这些单词最初来自的句子索引。你知道吗

Current state:-
WORD POS-Tag Label
my   PRP$     IR
name  NN      IR 
is   VBZ      IR
ron  VBN      PERSON
.     .
my   PRP$     IR
name NN       IR
is   VBZ      IR
harry VBN     Person
.      .      IR
Desired state:-
Sentence#  WORD    Pos-Tag  Label
 1          My       PRP      IR
 1          name     NN       IR
 1           is      VBZ      IR
 1           ron     VBN      Person
 1            .       .       IR
 2            My     PRP      IR
 2            name   NN       IR
 2             is    VBZ      IR
 2           harry   VBN      Person
 2              .     .       IR

我以前用的代码现在：-你知道吗

#necessary libraries
import pandas as pd
import numpy as np
import nltk 
import string
document=open(r'C:\Users\xyz\newfile.txt',encoding='utf8')
content=document.read()

sentences = nltk.sent_tokenize(content)
sentences = [nltk.word_tokenize(sent) for sent in sentences]
sentences = [nltk.pos_tag(sent) for sent in sentences]


flat_list=[]

# flattening a nested list
for x in sentences:
    for y in x:
        flat_list.append(y)

df = pd.DataFrame(flat_list, columns=['word','pos_tag']) 

#importing data to create the 'Label' column
data=pd.read_excel(r'C:\Users\xyz\pname.xlsx')
pname=list(set(data['Product']))

df['Label']=['drug' if x in fl else 'IR' for x in df['word']]

Tags： name in import for ir is sentences nn

1条回答

网友

1楼 · 发布于 2024-04-24 02:33:44

只需事先使用带有适当标点符号的split（）将内容拆分为几行即可。将每一行存储在某个列表中，然后对于索引，使用枚举（lines）中的line：执行通常执行的操作，并将索引添加到df中。你知道吗

从句子中给单词贴标签

相关问题更多 >

编程相关推荐

热门问题

热门文章

从句子中给单词贴标签

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >