python3 代码从网页源代码中提取特定行

-1 投票

1 回答

1354 浏览

提问于 2025-04-18 11:59

在我的代码中，我使用了urlopen，但它会获取整个网页的内容。有没有办法只提取网页源代码中的某一特定行，以便优化我的程序呢？

比如说，我想从这个链接的源代码中打印出第135行：www.ncbi.nlm.nih.gov/snp/?term=273898673?term=273898673

我的代码：

from urllib.request import urlopen
query="www.ncbi.nlm.nih.gov/snp/?term=273898673?term=273898673"
data=urlopen(query)
html = data.read()
codec = data.info().get_param('charset', 'utf8')
data = html.decode(codec)
print (data)

我可以对urlopen()进行什么自定义吗？
附注：我使用的是Python 3.X

编程技巧网页抓取 urlopen 数据优化源代码解析特定行提取

1 个回答

你可以使用enumerate这个功能来获取特定的行号，而不需要一次性把所有内容都读到内存里：

import urllib.request
response = urllib.request.urlopen('http://www.ncbi.nlm.nih.gov/snp/?term=273898673?term=273898673')
for line_number, line in enumerate(response):
    # Because this is 0-index based
    if line_number == 134:
        print line
    # Stop reading
    elif line_number > 134:
        break

回答于 2025-04-18 由 Python大师

分享举报

python3 代码从网页源代码中提取特定行

1 个回答

撰写回答