Python PyTLDR包_程序模块 - PyPI

执行自动文章摘要的模块。

PyTLDR的Python项目详细描述

一个python模块，用于自动摘要文章、文本文件和网页。

许可证

这个程序是免费软件：你可以重新发布和/或修改它根据自由软件基金会，或者许可证的第3版，或者选项）任何更高版本。

这个程序的发布是希望它会有用，但是没有任何保证；甚至没有适销性或适合某一特定目的的适销性。见GNU将军公共许可证了解更多详细信息。

你应该收到一份GNU通用公共许可证的副本用这个程序。如果没有，请参阅http://www.gnu.org/licenses/。

安装

使用PIP或简易安装

您可以使用pip或 easy_install：

pip install pytldr

用法

使用pytldr模块的简单示例程序可以在 https://github.com/jaijuneja/PyTLDR/blob/master/example.py

在当前的形式中，这个模块包含三个不同的实现自动文本摘要：

使用textrank算法（基于pagerank）
使用潜在语义分析
使用句子关联分数

注意，以上三个实现都是抽取的它们只是从输入文本。他们不制定自己的句子（这样的算法被称为“抽象的”，并且仍处于原始阶段。

句子标记化

pytldr附带了一个内置的语句标记器，用于总结。标记器在几种语言中执行词干分析，如下所示以及停止删除单词。您还可以指定自己的列表停止说话。

frompytldr.nlp.tokenizerimportTokenizertokenizer=Tokenizer(language='english',stopwords=None,stemming=True)# Note that if stopwords=None then the tokenizer loads stopwords from a bundled data-set# You can alternatively specify a text file or provide a list of words

注意，标记器是初始化摘要生成器对象，如下所示。

textRank摘要

使用pagerank算法对句子进行排序，其中“投票”或 “in links”由句子之间共享的单词表示。

frompytldr.summarize.textrankimportTextRankSummarizerfrompytldr.nlp.tokenizerimportTokenizertokenizer=Tokenizer('english')summarizer=TextRankSummarizer(tokenizer)# If you don't specify a tokenizer when intiializing a summarizer then the# English tokenizer will be used by defaultsummarizer=TextRankSummarizer()# English tokenizer used# This object creates a summary using the summarize method:# e.g. summarizer.summarize(text, length=5, weighting='frequency', norm=None)# The length parameter specifies the length of the summary, either as a# number of sentences, or a percentage of the original text# The summarizer can take as input...# 1. A string:summary=summarizer.summarize("Some long article bla bla...",length=4)# 2. A text file:summary=summarizer.summarize("/path/to/file.txt",length=0.25)# Above summary is a quarter of the length of the original text# 3. A URL (must start with http://):summary=summarizer.summarize("http://newsite.com/some_article")

潜在语义分析（lsa）综述

将文章的维度缩减为几个“主题”簇使用奇异值分解，并选择与这些主题最相关。这是比较抽象的摘要算法。

这个模块包含两个不同的lsa实现算法，如两篇学术论文所述：

J.Steinberger和K.Jezek（2004年）。使用潜在语义分析文本总结和总结评价。
Ozsoy，M.，Alpaslan，F.，和Cicekli，I.（2011）。文本摘要使用潜在语义分析。

最近的Ozsoy等人默认情况下调用implementation，但两者都是类具有相同的接口。

frompytldr.summarize.lsaimportLsaSummarizer,LsaOzsoy,LsaSteinbergersummarizer=LsaOzsoy()summarizer=LsaSteinberger()summarizer=LsaSummarizer()# This is identical to the LsaOzsoy objectsummary=summarizer.summarize(text,topics=4,length=5,binary_matrix=True,topic_sigma_threshold=0.5)# topics specifies the number of topics to cluster the article into.# topic_sigma_threshold removes all topics with a singular value less than a given# percentage of the largest singular value.

联系人

如果您有任何问题或遇到错误，请随意请在jai -dot- juneja -at- gmail -dot- com与我联系。

欢迎加入QQ群-->： 979659372

PyTLDR 0.1.4

PyTLDR的Python项目详细描述

许可证

安装

使用PIP或简易安装

最新开发版本

用法

句子标记化

textRank摘要

潜在语义分析（lsa）综述

相关性得分汇总

更多帮助

联系人

推荐PyPI第三方库

simpleplotdigitizer

wdiffhtml

mdns

trytond-project-revenue

gridfs-fuse

python-terraform

gitstatic

raspberrysystem

eulerlib

optimal

minip

chardetails

bbcondeparser

oca

django-menuz

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

PyTLDR 0.1.4

PyTLDR的Python项目详细描述

许可证

安装

使用PIP或简易安装

最新开发版本

用法

句子标记化

textRank摘要

潜在语义分析（lsa）综述

相关性得分汇总

更多帮助

联系人

推荐PyPI第三方库

simpleplotdigitizer

wdiffhtml

mdns

trytond-project-revenue

gridfs-fuse

python-terraform

gitstatic

raspberrysystem

eulerlib

optimal

minip

chardetails

bbcondeparser

oca

django-menuz

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签