Python cleantext包_程序模块 - PyPI

一个用于清理原始文本数据的开源python包

cleantext的Python项目详细描述

纯文本

cleantext是一个用于清理原始文本数据的开源python包。库的源代码可以找到here.

特点

cleantext有两种主要方法

clean：清除原始文本并返回清理后的文本
clean_words：清除原始文本并返回干净单词列表

cleantext可以应用以下所有清理操作或选定的组合：

删除多余的空白
将整个文本转换为统一的小写
从文本中删除数字
删除文本中的标点符号
删除停止语，然后选择停止语的语言（停止词通常是无意义语言中最常见的词，如is、am、the、this、are等）
停止说话（词干分析是将具有相似含义的单词转换成单个单词的过程。例如，单词run，runs，running will result run，run，run）

安装

cleantext需要Python 3和NLTK才能执行。在

要使用pip安装，请使用

pip install cleantext

使用

导入库：

importcleantext

选择方法：

要以字符串格式返回文本

^{pr2}$

要返回文本中的单词列表

cleantext.clean_words("your_raw_text_here",all=True)

要选择一组特定的清洁操作

cleantext.clean_words("your_raw_text_here",all=False# Execute all cleaning operationsextra_spaces=True,# Remove extra white space stemming=True,# Stem the wordsstopwords=True,# Remove stop wordslowercase=True,# Convert to lowercasenumbers=True,# Remove all digits punct=True,# Remove all punctuationsstp_lang='english'# Language for stop words)

示例

importcleantextcleantext.clean('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133',extra_spaces=True,lowercase=True,numbers=True,punct=True)

回报率

'this is a sample text to clean'

importcleantextcleantext.clean_words('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133',all=True)

回报率

['sampl','text','clean']

cleantext 1.1.3

cleantext的Python项目详细描述

纯文本

特点

安装

使用

示例

许可证

麻省理工学院
如有任何问题、问题、错误和建议，请访问here
标签：
项目
欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

pyDLO

mlnd-test-distributions

djangopylibmc

darker

graphql-booster

tensorflowtext

googlecloudiot

emqx-exproto

sv-practise-distributions

phstglib

pysample-profiler

pyFlask

oenkelhash

am-viewer

QazaqstanPhoneNumberParser

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

cleantext 1.1.3

cleantext的Python项目详细描述

纯文本

特点

安装

使用

示例

许可证

麻省理工学院 如有任何问题、问题、错误和建议，请访问here标签：项目欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

pyDLO

mlnd-test-distributions

djangopylibmc

darker

graphql-booster

tensorflowtext

googlecloudiot

emqx-exproto

sv-practise-distributions

phstglib

pysample-profiler

pyFlask

oenkelhash

am-viewer

QazaqstanPhoneNumberParser

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

麻省理工学院
如有任何问题、问题、错误和建议，请访问here
标签：
项目
欢迎加入QQ群-->： 979659372

导航栏

项目链接

标签