一个简单的实用程序,用于接收一个句子并输出有关其中锥子词的信息
awlif的Python项目详细描述
异常
一个非常基本的工具,接受一个句子的文本并输出 相同的文本,注释有关于 它的文字在Academic Word List中。
安装
pip install awlify
如果你以前没有在你的系统上使用过Spacy,你需要 要安装我们在这里使用的模型,请使用下面的命令:
python -m spacy download en_core_web_sm
测试
python -m unittest
文件中的用法
from awlify import awlify
result = awlify('please inform me of the academic words in this sentence')
print(result)
{"data": {"sentence": "please inform me of the academic words in this sentence", "awl_words": [{"index": 5, "word": "academic", "meta": {"head": "academy", "sublist": 5}}]}}
从命令行使用
python -m awlify 'this is a sentence to check'
{"data": {"sentence": "this is a sentence to check", "awl_words": []}}
预期输入/输出
输出格式:
{
"data": {
"sentence": "THIS IS THE ORIGINAL SENTENCE",
"awl_words": [
{
"index": INDEX_OF_AWL_WORD_FOUND,
"word": "AWL_WORD_FOUND",
"meta": {
"head": "THE_HEADWORD_FROM_THE_AWL",
"sublist": THE_AWL_SUBLIST_OF_THE_WORD
}
}
]
}
}
简单句子的输入示例(无锥子):
simple_sentence = awlify('this is a sentence')
一个简单句子的输出示例(无锥子):
{
"data": {
"sentence": "this is a sentence",
"awl_words": []
}
}
复杂句子的输入示例(几个锥子字):
complex_sentence = awlify('the economic recovery is ongoing and potentially problematic')
复杂句子的示例输出(几个锥子字):
{
"data": {
"sentence": "the economic recovery is ongoing and potentially problematic",
"awl_words": [
{
"index": 1,
"word": "economic",
"meta": {
"head": "economy",
"sublist": 1
}
},
{
"index": 2,
"word": "recovery",
"meta": {
"head": "recover",
"sublist": 6
}
},
{
"index": 6,
"word": "potentially",
"meta": {
"head": "potential",
"sublist": 2
}
}
]
}
}
注释
句子标记化的当前实现使用spacy, 所以它比绝对必要的要重一点,因为我们 不利用任何更先进的特性 包裹的一部分。
理论上,只要一个简单的 regex,所以我可以在将来添加这样做的选项,如果 并不是真正需要Spacy的全部功能的用例。
参考文献
Coxhead,Averil(2000)一份新的学术词汇表。泰索季刊,34(2):213-238.