Python LFExtractor包_程序模块 - PyPI

一种用于提取和搜索语言特征的语言工具

LFExtractor的Python项目详细描述

灵逖特征提取器

说明

一种语料库语言工具，用于提取和搜索文本或语料库中的语言特征。在
主要版本有95个内置语言特征，而论文项目版本有98个内置语言特征。删除特征是每句话的单词数、话语数和重叠数，这些特征在正常语料库中不被认为是可访问的。在
超过2/3的这些特征来自于Biber等人（2006），其中42个特征也出现在Biber（1988）中。这些特性通常被称为多维（MD）分析框架的一部分。在
该程序主要在两个可在线访问的语料库上测试，即British Academic Spoken Corpus和Michigan Corpus of Academic Englush，但出于版权考虑，在这里它是在test_sample上测试的。在

先决条件

Computer Langauges：
- python3.6+：用命令检查：python --version或{}（Download Page）
- java1.8+：用命令检查：“Java--version”（Download Page）。在
在
Python packages

Package	Description	Pip download
stanfordcorenlp	A Python wrapper for StanforeCoreNLP	^{}
pandas	Used for storing extracted feature frequencies	^{}

此外，程序中大量使用内置包，尤其是正则表达式的内置re包。在

安装

直接从这个页面和cd下载到项目文件夹。在
通过pip:pip/pip3 install LFExtractor

使用

通向StanfordCoreNLP的路径

请在文本中指定到StanfordCoreNLP的目录_处理器.py第一次使用程序时在LFE文件夹下。

[十] nlp = StanfordCoreNLP("/path/to/StanfordCoreNLP/")

示例：nlp=StanfordCoreNLP（“/Users/wzx/p_包/stanford-corenlp-4.1.0”）

处理一组文件

fromLFE.extractorimportCorpusLFElfe=CorpusLFE('/directory/to/the/corpus/under/analysis/')# get frequency data and tagged corpus and extracted features by defaultlfe.corpus_feature_fre_extraction()lfe.corpus_feature_fre_extraction()# lfe.corpus_feature_fre_extraction(normalized_rate=100, save_tagged_corpus=True, save_extracted_features=True, left=0, right=0). # change the normalized_rate, trun off tagged text and leave extracted text with specified context to displaylfe.corpus_feature_fre_extraction(1000,False,True,2,3)# extract frequency data only, and the data are normalized at 1000 words.  # get frequency data onlylfe.corpus_feature_fre_extraction(save_tagged_corpus=False,save_extracted_features=False)# get tagged corpus onlylfe.save_tagged_corpus()# get extracted feature onlylfe.save_corpus_extracted_features()# lfe.save_corpus_extracted_features(left=0, right=0)# set how many words to display besides the target patternlfe.save_corpus_extracted_features(2,3)# extract and save specific linguistic feature by feature name# to see the built-in features' names, use `show_feature_names()`fromLFE.extractorimport*print(show_feature_names())# Six letter words and longer, Contraction, Agentless passive, By passive...# specify which feature to extract and savelfe.save_corpus_one_extracted_feature_by_name('Six letter words and longer')# extract and save specific linguistic feature by feature regex, for example, 'you know' lfe.save_corpus_one_extracted_feature_by_regex(r'you_\S+ know_\S+',2,2,feature_name='You Know')# Extract phrase 'you know' along with 2 words spanning around. Also remember the '_\S+' at the end of each word since the corpus will be automatically POS tagged.# for more complex structure, the features_set.py can be ultilized, for example, to extract "article + adj + noun" structurefromLFEimportfeatures_setasfsART=fs.ARTADJ=fs.ADJNOUN=fs.NOUNlfe.save_corpus_one_extracted_feature_by_regex(rf'{ART}{ADJ}{NOUN}',2,2,'Noun phrase')# result example (use test_sample): away_RB by_IN	【 the_DT whole_JJ thing_NN 】	In_IN fact_NN

处理文本

^{pr2}$

处理语料库的一部分

fromLFE.extractorimport*lfe=CorpusLFE('/directory/to/the/corpus/under/analysis/')# get_filepath_list and select the files you want to examine and construct a listfp_list=lfe.get_filepath_list()# loop through the list and use the functionalities mentioned above to get the results you want

欢迎加入QQ群-->： 979659372

LFExtractor 1.0.1

LFExtractor的Python项目详细描述

灵逖特征提取器

说明

先决条件

安装

使用

通向StanfordCoreNLP的路径

处理一组文件

处理文本

处理语料库的一部分

推荐PyPI第三方库

contiamo

scipy-extra

pythonic-binance

zish_antlr

django-slock

meqtrees-catter

django-politico-token-service

pykill

travis-talk

kiwi-flight-events-oag-processing

lektor-gulp-support

cyaron

python-ilorest-librar

secret-keeper

pixplz

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

LFExtractor 1.0.1

LFExtractor的Python项目详细描述

灵逖特征提取器

说明

先决条件

安装

使用

通向StanfordCoreNLP的路径

处理一组文件

处理文本

处理语料库的一部分

推荐PyPI第三方库

contiamo

scipy-extra

pythonic-binance

zish_antlr

django-slock

meqtrees-catter

django-politico-token-service

pykill

travis-talk

kiwi-flight-events-oag-processing

lektor-gulp-support

cyaron

python-ilorest-librar

secret-keeper

pixplz

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签