如何使用BeautifulSoup通过切换选项卡搜索隐藏的元素？

def test_select_files(url): texts = [] response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") td_tags = soup.findAll('td') for tag in td_tags: print(tag.text.strip()) test_select_files('https://www.encodeproject.org/experiments/ENCSR000EEC/')

1条回答

网友

1楼 · 发布于 2024-05-26 19:55:04

您应该能够从HTML中返回的JSON中获得所需的所有信息：

from bs4 import BeautifulSoup
import requests
import json

r = requests.get('https://www.encodeproject.org/experiments/ENCSR000EEC/')
soup = BeautifulSoup(r.content, 'html.parser')
json_data = soup.find('script', type='application/json').string
data = json.loads(json_data)    

for file in data['files']:
    print(f"{file['accession']}  {file['file_format']:10}  {file['output_type']}")

这将使您的输出开始：

ENCFF000XTK  bam         alignments
ENCFF000XTL  bam         alignments
ENCFF000XTM  bigBed      peaks
ENCFF000XTP  bigWig      signal
ENCFF000XTZ  fastq       reads
ENCFF000XUA  fastq       reads
ENCFF001VKJ  bed         peaks
ENCFF002CUG  bed         optimal IDR thresholded peaks
ENCFF715UNN  bigBed      optimal IDR thresholded peaks
ENCFF836BQL  bam         unfiltered alignments

我建议您print(data)了解每个文件还有哪些其他可用信息

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用BeautifulSoup通过切换选项卡搜索隐藏的元素？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >