使用BeautifulSoup和Python从PubMed搜索结果中删除引用文本?

2024-04-25 12:49:10 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我试图从PubMed搜索中从每篇文章中以AMA格式获取所有引用。以下代码仅用于获取第一篇文章中的引用数据

import requests
import xlsxwriter
from bs4 import BeautifulSoup


URL = 'https://pubmed.ncbi.nlm.nih.gov/?term=infant+formula&size=200'
response = requests.get(URL)

html_soup = BeautifulSoup(response.text, 'html5lib')
article_containers = html_soup.find_all('article', class_ = 'labs-full-docsum')

first_article = article_containers[0]
citation_text = first_article.find('div', class_ = 'docsum-wrap').find('div', class_ = 'result-actions-bar').div.div.find('div', class_ = 'content').div.div.text

print(citation_text)

脚本返回一个空行,即使当我通过Google Chrome检查源代码时,文本在该“div”中清晰可见

这与JavaScript有关吗?如果是,我如何修复它


Tags: textimportdivurlresponsehtmlarticle文章
1条回答
网友
1楼 · 发布于 2024-04-25 12:49:10

此脚本将从提供的URL获取“AMA”格式的所有引用:

import json
import requests
from bs4 import BeautifulSoup


URL = 'https://pubmed.ncbi.nlm.nih.gov/?term=infant+formula&size=200'
response = requests.get(URL)

html_soup = BeautifulSoup(response.text, 'html5lib')

for article in html_soup.select('article'):
    print(article.select_one('.labs-docsum-title').get_text(strip=True, separator=' '))
    citation_id = article.input['value']
    data = requests.get('https://pubmed.ncbi.nlm.nih.gov/{citation_id}/citations/'.format(citation_id=citation_id)).json()
    # uncomment this to print all data:
    # print(json.dumps(data, indent=4))
    print(data['ama']['orig'])
    print('-' * 80)

印刷品:

Review of Infant Feeding: Key Features of Breast Milk and Infant Formula .
Martin CR, Ling PR, Blackburn GL. Review of Infant Feeding: Key Features of Breast Milk and Infant Formula. Nutrients. 2016;8(5):279. Published 2016 May 11. doi:10.3390/nu8050279
                                        
Prebiotics in infant formula .
Vandenplas Y, De Greef E, Veereman G. Prebiotics in infant formula. Gut Microbes. 2014;5(6):681-687. doi:10.4161/19490976.2014.972237
                                        
Effects of infant formula composition on long-term metabolic health.
Lemaire M, Le Huërou-Luron I, Blat S. Effects of infant formula composition on long-term metabolic health. J Dev Orig Health Dis. 2018;9(6):573-589. doi:10.1017/S2040174417000964
                                        
Selenium in infant formula milk.
He MJ, Zhang SQ, Mu W, Huang ZW. Selenium in infant formula milk. Asia Pac J Clin Nutr. 2018;27(2):284-292. doi:10.6133/apjcn.042017.12
                                        

... and so on.

相关问题 更多 >