一个Python包,使用多个Docker映像并行生成文档分析器或从文本中提取元数据

docprofiler的Python项目详细描述


文档分析器

一个Python包,使用多个Docker映像和NLP工具/framworks并行生成文档概要文件并从文本中提取元数据。在

摘要

非结构化数据的数量在不断增长,包括文本文档、社交媒体文本、博客和文章。在信息检索领域,提取元数据和生成概要文件可以提高文本/文档检索的性能,也可以帮助我们分析和理解从完全非结构化到半结构化的数据。这个python库背后的动机是将开源NLP工具/技术以一种非常高效和简单的方式结合在一起,并借助于Docker映像和python的异步功能来并行处理它们。在

NLP工具/框架

TaskFramework/ModelDocker Image (GPU support available)Ports
Unsupervised Keyphrase ExtractionSIFRank-2020docker pull aayushpatel007/sifrank-keyphrases5000
Named Entity RecognitionFlairNerdocker pull aayushpatel007/flair_ner5001
Entity LinkingTAGMEdocker pull aayushpatel007/tagme-entity-linking5002
Text SummarizationTextRankdocker pull aayushpatel007/text-summarization5003
GeoParsingMordecai (Upcoming)--Upcoming--N.A
Language Detection--Upcoming----Upcoming--N.A
Readability Analysis--Upcoming----Upcoming--N.A

运行Docker containers

要生成文档配置文件,您需要运行上述容器:(不必运行所有容器。您可以选择任务并相应地运行它。)但是,需要为以下容器打开上述端口:

例如:


docker container run -d -p 5000:5000 aayushpatel007/sifrank-keyphrases 0 # Replace 0 with -1 while running on a CPU. 

docker container run -d -p 5001:5001 aayushpatel007/flair_ner 1 # defaut uses ner-ontonotes trained model. If running on CPU, you can replace 1 with 0. 

docker container run -d -p 5002:5002 aayushpatel007/tagme-entity-linking 0.2 "tag_me_api_token" # Note that when running entity-linking container you need TAGME API token 

docker container run -d - p 5003:5003 aayushpatel007/text-summarization

^{pr2}$

使用DocProfiler:

pip3 install docprofiler==1.0.1

from DocProfiler import docprofiler as d


text = """THREE WEEKS ago, Phil Morgan, head of the financial-services initiative at the Welsh Development Agency, took an exhibition to Bristol, to 'sell' to that city's business community the attractions of relocating in south-east Wales. Although the day-long event was one of a series that will shortly go to Reading and then to places along the M25 corridor around London, the Bristol visit touched a raw nerve. Relations have never been good between Bristol and South Wales, and the English city resented what it saw as the seduction of one of its own growth industries. The move illustrates the aggressive policy that is being followed in order to attract financial-services companies to South Wales. Cardiff, as the main centre for the sector in south-east Wales, has never had a particularly strong indigenous financial industry. Before Mr Peter Walker, then secretary of state for Wales, launched his financial services initiative to build the city's financial nexus three years ago, effectively all that the Welsh capital had to offer was: the Bank of Wales, set up in the 1972; one medium-sized building society, the Principality, 28th in the societies' pecking order; the venture capital group, 3i; and one major incomer, Chemical Bank. Mr Walker's initiative had immediate results. NM Rothschild has been the five-star name to arrive, but others - including National Provident Institution, Banque Nationale de Paris, Axa, the French insurance giant, Willis Wrightson (also in insurance), and stockbrokers Bell, Lawrie, White - have strengthened the sector. While this financial muscle has greatly added to the city's commercial depth, it has not yet turned Cardiff into an important financial centre. 'You still have to look at Cardiff as an emerging financial city,' says Peter Davies, Rothschild's director of corporate finance in Wales. 'It is not possible to get institutional support for a major issue, and if a growing business wants to raise equity capital, it has really only one choice - 3i.' Rothschild's own venture fund is based in London, and of two attempts by the WDA to set up funds, the Cardiff Consortium came to nothing and the Welsh Venture Capital Fund was eventually closed. That situation is about to change. Meirion Thomas, an executive director of the WDA, says that a new fund, Venture Link, is raising Pounds 5m from Welsh institutions, such as the local-authority pension funds, and expects to be in business within the next few months. Venture Link, which has two other funds in Britain, has opened an office in Cardiff, and will gear its lending to the bottom end of the market, offering capital between Pounds 25,000 and Pounds 150,000. 'The signs are that people within Wales are looking much more to institutions within the country for their needs,' says Mr Thomas. 'The Bank of Wales, for instance, is getting more into the risk-capital business, and that is encouraging.' Venture Link is not the only one interested in Cardiff. Credit Lyonnais, the French bank, having already opened a dozen regional centres in Britain, is considering an office in Cardiff, which it currently serves from Bristol through a former WDA official. Other entrants are in the pipeline, according to Mr Morgan. At 3i, Nigel Guy, the Cardiff director, says there is plenty of money for lending, but that the city lacks a large corporate base, such as that of the West Midlands, to sustain a major financial sector. Last year, 3i completed Pounds 18m of business in 50 investments, but the 12 months to March 1991 'was not so good. The position in Cardiff reflected what is happening in the rest of the UK economy.' Even so, Mr Guy says there is considerable interest in management buy-outs, boosted by companies drawing back into their core activities and seeking to sell what they see as peripheral businesses. 'There has been an upsurge in interest in this sector in the past four months,' he says, 'including a couple of seven-figure deals, such as that at British Rotatherm, which was bought from its Scandinavian parents.' This activity encourages Mr Morgan at the financial services initiative, who says that the 'temperature is still good, despite the recession. The downturn in the City of London provides us in Cardiff with an opportunity to attract firms to a location that provides them with a better cost base. Both wage and salary levels, and property prices, are much more attractive here than in the south east of England.' Mr Morgan has been aiming particularly at the insurance companies, such as Axa, believing that, as the continental concerns grow increasingly larger in preparation for the single market in 1992, they will want to have national networks of offices. 'We want them to look at Cardiff and the rest of south-east Wales as a potential location,' he says, offering the example of DAS, a German concern, which late last year opened an office in Bedwas, between Cardiff and Newport. 'A number of overseas banks, such as Credit Agricole and Paribas, of France, and the Canadian Imperial Bank of Commerce, have already financed deals in Wales. 'The electronics-based Gooding Group has Japan's C Itoh and the American Citibank among its major shareholders; and with this level of interest, I am convinced we shall be seeing major banks of their standing actually opening offices in Cardiff before long."""

data,final_time = d.generate_profile(text, URL_LIST=['http://host_ip_addr:5001/flairner','http://host_ip_addr:5030/textrank','http://host_ip_addr:5020/tagme','http://host_ip_addr:5000/sifrank'],no_of_keyphrases=10)

print(data)

输出

{
  'DOC': '1',
  'Tagme-entities': [
    'Welsh Development Agency',
    'Bristol',
    'M25 motorway',
    'Wales',
    'Cardiff',
    'Peter Walker Baron Walker of Worcester',
    'Carole King',
    'Galaxy Nexus',
    'Bank of Wales',
    'Venture capital',
    'Chemical Bank',
    'BNP Paribas',
    'Paris',
    'AXA',
    'Equity finance',
    '3i',
    'N M Rothschild amp Sons',
    'Cardiff City F C',
    'Pension',
    'Cr dit Lyonnais',
    'Recession',
    'South East England',
    'East of England',
    'Insurance',
    'Cardiff University',
    'Bedwas',
    'Newport Wales',
    'Cr dit Agricole',
    'Canadian Imperial Bank of Commerce',
    'Japan',
    'Itochu',
    'Citibank'
  ],
  'Keyphrases': [
    'financial-services initiative',
    'phil morgan',
    'welsh development agency',
    'business community',
    'south-east wales',
    'welsh institutions',
    'strong indigenous financial industry',
    'bristol',
    'venture capital group',
    'french insurance giant'
  ],
  'Summary': "THREE WEEKS ago, Phil Morgan, head of the financial-services initiative at the Welsh Development Agency, took an exhibition to Bristol, to 'sell' to that city's business community the attractions of relocating in south-east Wales. Before Mr Peter Walker, then secretary of state for Wales, launched his financial services initiative to build the city's financial nexus three years ago, effectively all that the Welsh capital had to offer was: the Bank of Wales, set up in the 1972; one medium-sized building society, the Principality, 28th in the societies' pecking order; the venture capital group, 3i; and one major incomer, Chemical Bank. 'You still have to look at Cardiff as an emerging financial city,' says Peter Davies, Rothschild's director of corporate finance in Wales. 'It is not possible to get institutional support for a major issue, and if a growing business wants to raise equity capital, it has really only one choice - 3i.' Rothschild's own venture fund is based in London, and of two attempts by the WDA to set up funds, the Cardiff Consortium came to nothing and the Welsh Venture Capital Fund was eventually closed. Meirion Thomas, an executive director of the WDA, says that a new fund, Venture Link, is raising Pounds 5m from Welsh institutions, such as the local-authority pension funds, and expects to be in business within the next few months. At 3i, Nigel Guy, the Cardiff director, says there is plenty of money for lending, but that the city lacks a large corporate base, such as that of the West Midlands, to sustain a major financial sector. Both wage and salary levels, and property prices, are much more attractive here than in the south east of England.' Mr Morgan has been aiming particularly at the insurance companies, such as Axa, believing that, as the continental concerns grow increasingly larger in preparation for the single market in 1992, they will want to have national networks of offices. 'We want them to look at Cardiff and the rest of south-east Wales as a potential location,' he says, offering the example of DAS, a German concern, which late last year opened an office in Bedwas, between Cardiff and Newport.",
  'GPE': [
    'South Wales,',
    'South Wales. Cardiff,',
    'Wales,',
    'France,',
    'Bedwas,',
    'Cardiff',
    'Bristol',
    'Newport.',
    'Wales.',
    'Bristol,',
    'Britain,',
    'Reading',
    "England.'",
    'Cardiff,',
    'Wales',
    'UK',
    'London,'
  ],
  'ORG': [
    "Japan's C Itoh",
    'Venture Link',
    'Banque Nationale de Paris, Axa,',
    'DAS,',
    'Chemical Bank.',
    'the City of London',
    'Willis Wrightson',
    'the Principality, 28th',
    'Cardiff. Credit Lyonnais,',
    'the Cardiff Consortium',
    'Gooding Group',
    'Wales,',
    'Venture Link,',
    'NM Rothschild',
    "'The Bank of Wales,",
    'Axa,',
    'WDA',
    'Paribas,',
    'the Welsh Development Agency,',
    'WDA,',
    'British Rotatherm,',
    'the Canadian Imperial Bank of Commerce,',
    'Citibank',
    'the Bank of Wales,',
    'state',
    'Credit Agricole'
  ],
  'PERSON': [
    'Nigel Guy,',
    'Peter Walker,',
    'Bell, Lawrie, White',
    'Thomas.',
    "Peter Davies, Rothschild's",
    'Guy',
    'Morgan',
    'Meirion Thomas,',
    'Phil Morgan,',
    'Morgan.',
    "Walker's"
  ],
  'LOC': [
    'the West Midlands,'
  ],
  'NORP': [
    'Welsh',
    'French',
    'English',
    'Scandinavian',
    'American',
    'German'
  ],
  'EVENT': [

  ],
  'DATE': [
    'the next few months.',
    'Last year, 3i',
    'the 12 months to March 1991',
    'THREE WEEKS ago,',
    'three years ago,',
    '1992,',
    'late last year',
    'the 1972;',
    "the past four months,'"
  ],
  'MONEY': [
    'Pounds 25,000',
    'Pounds 5m',
    'Pounds 18m'
  ],
  'ADDITIONAL': [
    '50',
    'M25',
    'one',
    '150,000.',
    'two',
    'dozen'
  ],
  'Time_by_SIFRank_keyphrases': 1.515525030998106,
  'Time_by_EntityLinking': 3.566188880999107,
  'Time_by_TextSummarization': 0.07235045199922752,
  'Time_by_FlairNER': 2.02276362200064
}

Total time taken by docprofiler : 3.9456855

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何将字符串转换为自定义对象   java如何从socket方法获取数据?   Java中的soap读取回车和新行   java在单击时替换图像   java推荐的使用RXJava执行异步任务的方法   java MySql连接器JDBC驱动程序不支持连接池吗?   java将活动堆栈清理到顶部   java计数用户输入的数量   java从webservice下载大文件导致应用程序性能问题   JavaLocalDate。EPOCH不可用   java如何在使用Selenium等待一定时间后,在页面无法加载(get(url))时自动刷新页面   java Calendar setLenient方法不允许检查年份字段的健全性   java Eclipse和intelliJ 安卓 SDK问题   java为什么我可以在没有super关键字的情况下调用父方法?   java iText的PDF格式不好