使用Python从检查元素中获取代码

4 投票

1 回答

23428 浏览

数据工程师

提问于 2025-04-18 16:44

在Safari浏览器中，我可以右键点击并选择“检查元素”，然后会出现很多代码。请问有没有办法用Python获取这些代码呢？最好的方法是能把这些代码保存到一个文件里。

更具体一点，我想找到这个页面上的图片链接：http://500px.com/popular。我可以通过“检查元素”看到这些链接，但我想用Python把它们提取出来。

数据提取网页抓取代码解析文件保存 safari浏览器图片链接提取检查元素

1 个回答

获取网页源代码的一种方法是使用Beautiful Soup库。这里有一个教程可以查看。下面是页面的代码，注释是我加的。这个代码现在不能用了，因为它使用的示例网站内容已经变了，但这个概念应该能帮助你实现你想做的事情。希望对你有帮助。

from bs4 import BeautifulSoup
# If Python2:
#from urllib2 import urlopen
# If Python3 (urllib2 has been split into urllib.request and urllib.error):
from urllib.request import urlopen

BASE_URL = "http://www.chicagoreader.com"

def get_category_links(section_url):
    # Put the stuff you see when using Inspect Element in a variable called html.
    html = urlopen(section_url).read()    
    # Parse the stuff.
    soup = BeautifulSoup(html, "lxml")    
    # The next two lines will change depending on what you're looking for. This 
    # line is looking for <dl class="boccat">.  
    boccat = soup.find("dl", "boccat")    
    # This line organizes what is found in the above line into a list of 
    # hrefs (i.e. links). 
    category_links = [BASE_URL + dd.a["href"] for dd in boccat.findAll("dd")]
    return category_links

编辑 1：上面的解决方案提供了一种通用的网页抓取方法，但我同意对这个问题的评论。对于这个网站，使用API绝对是更好的选择。感谢yuvi提供这个信息。API可以在https://github.com/500px/PxMagic找到。

编辑 2：关于获取热门照片链接的问题，这里有一个示例。下面是来自示例的Python代码。你需要先安装这个API库。

import fhp.api.five_hundred_px as f
import fhp.helpers.authentication as authentication
from pprint import pprint
key = authentication.get_consumer_key()
secret = authentication.get_consumer_secret()

client = f.FiveHundredPx(key, secret)
results = client.get_photos(feature='popular')

i = 0
PHOTOS_NEEDED = 2
for photo in results:
    pprint(photo)
    i += 1
    if i == PHOTOS_NEEDED:
        break

回答于 2025-04-18 由 Python大师

分享举报

使用Python从检查元素中获取代码

1 个回答

撰写回答