使用Beaufifulsoup和请求从网站上抓取内容

2024-04-25 14:00:38 发布

您现在位置:Python中文网/ 问答频道 /正文

UPDATE=我的脚本提取以下文本,但我仍在努力获取所需的信息。你知道吗

[<button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-ba7189.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-w-ba7212.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-w-ba7560.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-ultraboost-x-bb0879.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/books-all-gone-book-2016.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156645c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156646c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/new-balance-m576-lifestyle-m576-pgw.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-low-310810-407.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-4-retro-308497-117.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-cny-fm-363637-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-white-black-364462-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-wrinkled-patent-364465-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-zoku-runner-ultk-is-bd5852.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-solid-pique-polo-1702p3795-blk.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-camo-poly-jkt-170203584-camo.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-bb2791.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-pk-ba7496.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-equipment-support-ultra-ba7474.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-bb2910.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/asics-gel-kayano-trainer-knit-h7s4n-4545.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-414571-122.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-15-retro-881429-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-6-retro-384664-113.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-002.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-sock-racer-og-875837-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-nikelab-air-max-1-pinnacle-859554-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-premium-core-362632-03.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-cl-lthr-golden-neutrals-bd3744.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-club-c-85-gum-bs7635.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10346/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10341/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10336/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>]

我目前正试图从这篇文章中提取“form\u key”信息。在本例中,form键是“Ayxpa0t2JpTEfPBd”,这是我要提取和打印的文本

你能告诉我如何提取和打印这些信息吗。提前谢谢!你知道吗


Tags: comhttptitlewwwitbuttonshopnow
2条回答

可以使用正则表达式提取form_key

In [1]: s = 'http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/'

In [2]: import re

In [3]: m = re.search('.*/form_key/([^/]+)/.*', s)

In [4]: m.group(1)
Out[4]: 'Ayxpa0t2JpTEfPBd'

因此,为了匹配您的示例,您可以执行以下操作:

import re

s = """onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')"><span><span>SHOP NOW</span></span></button>"""
m = re.search('.*/form_key/([^/]+)/.*', s)

if m:
    print m.group(1)

这段代码搜索页面中的按钮,选择一个按钮,获取onclick属性,然后获取表单键。正则表达式是从罗伯特的回答中使用的,所以一定要向他表示感谢!你知道吗

import requests
from bs4 import BeautifulSoup
import re

url = "http://www.urbanjunglestore.com/"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

req = requests.request("GET", url, headers=headers, verify=False)
response = BeautifulSoup(req.content,
                         "html.parser")

all_buttons = response.find_all("button", title="SHOP NOW") 

one_button = all_buttons[0]

onclick_attribute = one_button['onclick']  # this gets the text of the onclick attribute

def get_form_key_from_onclick_attr(attr_text):
    """ use a regex to extract the form key from the onclick attribute text """
    results = re.search('.*/form_key/([^/]+)/.*', attr_text)
    return results.group(1)

get_form_key_from_onclick_attr(onclick_attribute)

相关问题 更多 >