将tagtog的pubtator格式转换为anndoc格式

PubTator2Anndoc的Python项目详细描述


将pubtator格式的注释转换为tagtog格式。

发布者格式

pubtator格式使用以下格式:

<PMID>|t|<TITLE>
<PMID>|a|<ABSTRACT>
<PMID>  <START OFFSET 1>    <LAST OFFSET 1> <MENTION 1> <TYPE 1>    <IDENTIFIER 1>
<PMID>  <START OFFSET 2>    <LAST OFFSET 2> <MENTION 2> <TYPE 2>    <IDENTIFIER 2>

<PMID>|t|<TITLE>
<PMID>|a|<ABSTRACT>
<PMID>  <START OFFSET 1>    <LAST OFFSET 1> <MENTION 1> <TYPE 1>    <IDENTIFIER 1>
<PMID>  <START OFFSET 2>    <LAST OFFSET 2> <MENTION 2> <TYPE 2>    <IDENTIFIER 2>

其中:

  • 第一行包含论文标题。
  • 第二行是论文的摘要。
  • 后面的行包含 制表符分隔格式:
    • PMID
    • 开始偏移量
    • 结束偏移
    • 提及(实体文本)
    • 实体类型
    • 标识符(规范形式)

例如,

20085714|t|Autosomal-dominant striatal degeneration is caused by a mutation in the phosphodiesterase 8B gene.
20085714|a|Autosomal-dominant striatal degeneration is caused by a mutation in the phosphodiesterase 8B gene. Autosomal-dominant striatal degeneration (ADSD) is an autosomal-dominant movement disorder affecting the striatal part of the basal ganglia. ADSD is characterized by bradykinesia, dysarthria, and muscle rigidity. These symptoms resemble idiopathic Parkinson disease, but tremor is not present. Using genetic linkage analysis, we have mapped the causative genetic defect to a 3.25 megabase candidate region on chromosome 5q13.3-q14.1. A maximum LOD score of 4.1 (Theta = 0) was obtained at marker D5S1962. Here we show that ADSD is caused by a complex frameshift mutation (c.94G>C+c.95delT) in the phosphodiesterase 8B (PDE8B) gene, which results in a loss of enzymatic phosphodiesterase activity. We found that PDE8B is highly expressed in the brain, especially in the putamen, which is affected by ADSD. PDE8B degrades cyclic AMP, a second messenger implied in dopamine signaling. Dopamine is one of the main neurotransmitters involved in movement control and is deficient in Parkinson disease. We believe that the functional analysis of PDE8B will help to further elucidate the pathomechanism of ADSD as well as contribute to a better understanding of movement disorders.
20085714    72  92  phosphodiesterase 8B    Gene    8622
20085714    99  139 Autosomal-dominant striatal degeneration    Disease 609161
20085714    671 678 c.94G>C Mutation    c|SUB|G|94|C
20085714    679 687 c.95delT    Mutation    c|DEL|95|T
20085714    696 716 phosphodiesterase 8B    Gene    8622
20085714    981 989 Dopamine    Chemical    D004298

有关pubtator格式的更多详细信息,请访问此link

tagtog格式-anndoc

tagtog使用两个文件-一个html(.html)文件和一个json (.ann.json)表示注释的文件。
HTML文件被分成几个部分-每个部分都有一个ID。
json文件具有特定于文档的注释,后跟en

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java http响应未定义   java在单个数组中添加浮点值和字符串值,并使用它们进行写入。CSV文件   映射中的java重复列引发实体异常,但没有重复列   java为什么是线程。stop()方法不安全吗?   如何在java中从trycatch块返回数组?   java如何使用多个线程作为客户端,每个线程都在switch语句中执行一项任务   Android GridView上的java滚动位置跳转   java丰富:listShuttle格式   Java中的macos评测本机方法奇怪的结果   这个Java程序需要是两个独立的文件吗   无法使用selenium Java或JavascriptExecutor发送密钥/单击   java如何配置Jetty在类更改时重新加载WebAppContext   java我不能发布com。安卓截击服务器错误   java如何在使用addOnScrollListener时显示滚动条   java如何避免数据模型贫乏?存储库可以注入实体吗?   没有集合的java选择排序。在ArrayList中排序   macos在MacOSX上升级JDK   java如何使用jpa模板编写查询