
2024-04-26 22:10:31 发布

您现在位置:Python中文网/ 问答频道 /正文


import urllib.request
import urllib.parse

url = "http://pitts.emory.edu/dia/image_details.cfm?ID=17250"
f = urllib.request.urlopen(url)


</div> <div class="col-sm-6"> <P> <b>Book Title:</b> <A HREF="book_detail.cfm?ID=2449">The Holy Bible containing the Old and New Testaments, according to the authorised version. With illustrations by Gustave Doré</a> </p> <P> <b>Author:</b> Doré, Gustave, 1832-1883 </p> <P> <b>Image Title:</b> Baptism of Jesus </p> <P> <b>Scripture Reference:</b><ul><li>John 1 (<a href='search.cfm?biblicalbook=John&biblicalbookchapter=1'>further images</a> / <a rel='shadowbox;height=500;width=600' href='http://www.commonenglishbible.com/explore/passage-lookup/?query=John+1'>scripture text</a>)</li></ul> </p> <P> <b>Description:</b> John the Baptist baptizes Jesus in the Jordan River; the Holy Spirit appears overhead in the form of a dove. The artist, Gustave Doré (1832-1883), has placed his signature at the lower left of the woodcut, and the engraver’s signature, A. Ligny, is located at the lower right. </P> <P> <A HREF="book_list.cfm?ID=2449">Click here </a> for additional images available from this book. </P> <p>For information on licensing this image, please send an email, including a link to the image, to <a href="mailto:dia@emory.edu?subject=Licensing%20Image%20From%20DIA - 17250">dia@emory.edu</a> </p> </div>




Tags: thetoimagedivid源代码urllibjohn


import urllib.request
import urllib.parse
from bs4 import BeautifulSoup

url = "http://pitts.emory.edu/dia/image_details.cfm?ID=17250"
f = urllib.request.urlopen(url)

soup = BeautifulSoup(f, 'html.parser')
parent = soup.find("b", text="Description:").parent
parent.find("b", text="Description:").decompose()



import bs4

markup = """
<div class="col-sm-6">
    <b>Book Title:</b>
    <A HREF="book_detail.cfm?ID=2449">The Holy Bible containing the Old and New Testaments, according to the authorised version. With illustrations by Gustave Doré</a>

        <b>Author:</b> Doré, Gustave, 1832-1883

        <b>Image Title:</b> Baptism of Jesus

        <b>Scripture Reference:</b><ul><li>John 1 (<a href='search.cfm?biblicalbook=John&biblicalbookchapter=1'>further images</a> / <a rel='shadowbox;height=500;width=600' href='http://www.commonenglishbible.com/explore/passage-lookup/?query=John+1'>scripture text</a>)</li></ul>

            <b>Description:</b> John the Baptist baptizes Jesus in the Jordan River; the Holy Spirit appears overhead in the form of a dove. The artist, Gustave Doré (1832-1883), has placed his signature at the lower left of the woodcut, and the engraver’s signature, A. Ligny, is located at the lower right.

        <A HREF="book_list.cfm?ID=2449">Click here
        </a> for additional images available from this book.

    <p>For information on licensing this image, please send an email, including a link to the image, to 
        <a href="mailto:dia@emory.edu?subject=Licensing%20Image%20From%20DIA - 17250">dia@emory.edu</a>


soup = bs4.BeautifulSoup(markup, "html.parser")

for el in soup.select('p:has(> b:contains("Description:"))'):
    print(el.get_text().strip('').replace('Description: ', ''))


John the Baptist baptizes Jesus in the Jordan River; the Holy Spirit appears overhead in the form of a dove. The artist, Gustave Doré (1832-1883), has placed his signature at the lower left of the woodcut, and the engraver’s signature, A. Ligny, is located at the lower right. 


from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("http://pitts.emory.edu/dia/image_details.cfm?ID=17250")

soup = BeautifulSoup(html, 'html.parser')
page = soup.find_all('p')[4].getText()


相关问题 更多 >