如何使用Python对文本文件中的名词进行分类

2024-04-29 07:18:24 发布

您现在位置:Python中文网/ 问答频道 /正文

从一篇商业文章中,我想提炼出定义它所谈论的商业本质的词语。例如,如果文章中包含“零售银行”或“快递服务”或“钢铁厂”等词,我们就可以了解这项业务。你知道吗

`

import nltk
from nltk.collocations import *
from nltk import *
import csv
from nltk.corpus import stopwords
Text=open('bbb_2.txt')
t=Text.read().lower().decode('utf8')

tokens = nltk.wordpunct_tokenize(t)


posTagged=pos_tag(tokens)

nnp=[(wrd,tags) for (wrd,tags) in posTagged if tags in ('NNP','NNPS') ]`

这里我可以提取名词实体。但我怎么才能把它们标记为与业务相关呢?为了进一步澄清,我举了一个例子。 例子。假设这是文章的一部分

`Microsoft Corporation is an American multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports and sells computer software, consumer electronics, personal computers, and services.Its best known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers.` 

现在的任务是提取微软开发的产品类型。答案很简单-computer software, consumer electronics, personal computers, and services。问题是我如何让计算机理解这一点?你知道吗


Tags: andthetextinfromimport文章tags