Python靓汤img标签里面有一个div解析错误的链接显示

2024-04-20 02:03:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个密码:

import urllib
import urllib.request
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re

theurl= 'http://es.ninemanga.com/chapter/Dragon%20Ball%20Multiverse/279006.html'

req = Request(theurl  + '.html', headers={'User-Agent': 'Mozilla/5.0'})
thepage = urlopen(req).read()
soup = BeautifulSoup(thepage, "html.parser")


for divs in soup.findAll('div', {"class": "pic_box"}):

    temp = divs.find('img', {"id" : "manga_pic_1"})
    temp1 = temp.get('src')
    print(temp1 + "\n")

我想得到所有带有class pic_boxdiv标签,在它们里面是所有的img标签和它们的src

我用soup.findAll('div', {"class": "pic_box"})正确地做到了这一点 然后temp.get('src')但不知何故我得到:

http://a8.ninemanga.com/es_manga/43/555/279006/4c58c372ca4561627e5a01f6c841290e.jpg

而不是:

https://c5.ninemanga.com/es_manga/43/555/279006/939559ac8d7af80cf6b4ead0ada4f718.jpg

是他们阻止了我的请求还是我做错了什么?你知道吗

repl to test it

referenced link in theurl variable from which I want to extract 'src'


Tags: fromimportdivsrccomboxeshtml