python中的信息抽取框架
iep的Python项目详细描述
iepy是用于 Information Extraction 着重于关系抽取。
举一个关系抽取的例子,如果我们试图找到 出生日期:
"John von Neumann (December 28, 1903 – February 8, 1957) was a Hungarian and
American pure and applied mathematician, physicist, inventor and polymath."
然后iepy的任务是标识“John von Neumann”和 “December 28, 1903”作为“was born in”的主题和对象实体 关系。
- 它的目标是:
- users 需要对大型数据集执行信息提取。
- scientists 想尝试新的IE算法。
功能
- A corpus annotation tool with a web-based UI
- An active learning relation extraction tool pre-configured with convenient defaults.
- A rule based relation extraction tool for cases where the documents are semi-structured or high precision is required.
- A web-based user interface that:
- Allows layman users to control some aspects of IEPY.
- Allows decentralization of human input.
- A shallow entity ontology with coreference resolution via Stanford CoreNLP
- An easily hack-able active learning core, ideal for scientist wanting to experiment with new algorithms.
安装
安装所需的软件包:
sudo apt-get install build-essential python3-dev liblapack-dev libatlas-dev gfortran openjdk-7-jre
然后简单地用pip安装
有关安装的详细信息,请访问
Read the Docs页。pip install iepy
运行测试
如果您对项目有贡献并希望运行测试,您只需执行以下操作:
- Make sure your JAVAHOME is correctly set. Read more about it here
- In the root of the project run
nosetests
了解更多
完整的文档可以在Read the Docs上找到。