在Django模板中显示爬取结果

2 投票

3 回答

2660 浏览

提问于 2025-04-15 23:33

我正在用Django搭建一个抓取网站。出于某种原因，下面的代码只提供了一张图片，而我希望它能打印出每一张图片、每一个链接和每一个价格，有谁能帮帮我吗？（另外，如果你们知道怎么把这些数据放进数据库模型里，这样我就不需要每次都抓取网站了，我非常欢迎这个建议，但这可能是另一个问题）谢谢！

这是模板文件：

{% extends "base.html" %}

{% block title %}Boats{% endblock %}

{% block content %}

<img src="{{ fetch_boats }}"/>

{% endblock %}

这是views.py文件：

#views.py
from django.shortcuts import render_to_response
from django.template.loader import get_template
from django.template import Context
from django.http import Http404, HttpResponse
from fetch_images import fetch_imagery

def fetch_it(request):
    fi = fetch_imagery()
    return render_to_response('fetch_image.html', {'fetch_boats' : fi})

这是fetch_images模块：

#fetch_images.py
from BeautifulSoup import BeautifulSoup
import re
import urllib2

def fetch_imagery():
    response = urllib2.urlopen("http://www.boattrader.com/search-results/Type")
    html = response.read()

#create a beautiful soup object
    soup = BeautifulSoup(html)

#all boat images have attribute height=165
    images = soup.findAll("img",height="165")
    for image in images:
        return image['src'] #print th url of the image only

# all links to detailed boat information have class lfloat
    links = soup.findAll("a", {"class" : "lfloat"})
    for link in links:
        return link['href']
        #print link.string

# all prices are spans and have the class rfloat
    prices = soup.findAll("span", { "class" : "rfloat" })
    for price in prices:
        return price
        #print price.string

最后，如果需要的话，下面是urlconf中的映射网址：

from django.conf.urls.defaults import *
from mysite.views import fetch_it

urlpatterns = patterns('', ('^fetch_image/$', fetch_it))

django web scraping url routing template rendering database models image fetching data display

3 个回答

我在网上找了很久，想找一个展示抓取数据的例子，这篇文章真的帮了我不少忙。自从这个问题首次发布以来，相关模块有了一些小改动，所以我想把它更新一下，并把我需要的代码分享出来。

这个例子很不错，因为它展示了如何在网络请求时运行一些Python代码，并生成一些简单的内容，而不需要涉及数据库或模型类。

假设你有一个可以添加这些改动的Django项目，你应该能访问 <your-base-url>/fetch_boats，然后看到一堆船的图片。

views.py

import django.shortcuts
from django.shortcuts import render
from bs4 import BeautifulSoup
import urllib.request

def fetch_boats(request):
    fi = fetch_imagery()
    return render(request, "fetch_boats.html", {"boat_images": fi})

def fetch_imagery():
    response = urllib.request.urlopen("http://www.boattrader.com")
    html     = response.read()
    soup     = BeautifulSoup(html, features="html.parser")
    images   = soup.findAll("img")

    for image in images:
        yield image["src"]

urls.py

from django.urls import path
from .views import fetch_boats

urlpatterns = [
    path('fetch_boats', fetch_boats, name='fetch_boats'),
]

templates/fetch_boats.html

{% extends 'base.html' %}
{% block title %} ~~~&lt; Boats &gt;~~~ {% endblock title %}
{% block content %}

    {% for image in boat_images %}
        <br /><br />
        <img src="{{ image }}" />
    {% endfor %}

{% endblock content %}

回答于 2025-04-15 由 Python大师

分享举报

虽然这个话题有点偏离，但我觉得抓取数据会消耗很多CPU时间、内存和带宽，所以我认为应该在后台以异步的方式进行，也就是不影响其他操作。

不过这个想法真的很不错 :)

回答于 2025-04-15 由 Python大师

分享举报

你的 fetch_imagery 函数需要一些改进——因为你使用的是 return 而不是 yield，所以一旦执行到第一个 return image['src']，这个函数就会结束（我假设你所有的返回都是在同一个函数里，如你的代码所示）。

另外，我猜测你是想从 fetch_imagery 返回一个列表或元组（或者定义一个生成器方法），那么你的模板应该像这样：

{% block content %}
    {% for image in fetch_boats %}
        <img src="{{ image }}" />
    {% endfor %}
{% endblock %}

这段代码基本上会遍历你列表中的所有项目（在你的情况下是图片网址），并为每一个网址创建 img 标签。

回答于 2025-04-15 由 Python大师

分享举报

在Django模板中显示爬取结果

3 个回答

撰写回答