使用beautifulsoup在Kickstarter上抓取多个url

2024-05-16 13:05:13 发布

您现在位置：Python中文网/ 问答频道 /正文

4191

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图从kickstarter网站上获取每个项目每一类奖励的支持者数量。我输入的是一个url列表。在

我遇到了许多问题：

我不能从txt文件加载url，但是我的代码只有在列表是python的情况下才能工作。在
Im试图获取的数据示例如下（实际上，下面是我从下面代码中的2个链接中获取的数据片段）：
。。。。。 ..... , 质押约500美元或更多约325欧元 , 质押约2500加元或以上约1625欧元 , 5个支持者 , 2个支持者 , ........ .......
我需要为每个项目将上面显示的结果写入一行CSV文件中。因此，CSV文件的第一个单元格将是project link（或title，如果用beauthoulsoup刮掉的话）；第二列应该是一系列值，由每个提议的支持者数量组成。使用上述数据的示例：

"project link" , "pledge__backer-count" 5 backers , "pledge__amount" $500

我正在努力处理URL列表中的部分代码。第一部分是从网上的一个例子中复制的，效果很好。提前谢谢你的帮助，我的论文真的需要这个。 “”

from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
import re

def simple_get(url):
    """
    Attempts to get the content at `url` by making an HTTP GET request.
    If the content-type of response is some kind of HTML/XML, return the
    text content, otherwise return None
    """
    try:
        with closing(get(url, stream=True)) as resp:
            if is_good_response(resp):
                return resp.content
            else:
                return None

    except RequestException as e:
        log_error('Error during requests to {0} : {1}'.format(url, str(e)))
        return None

def is_good_response(resp):
    """
    Returns true if the response seems to be HTML, false otherwise
    """
    content_type = resp.headers['Content-Type'].lower()
    return (resp.status_code == 200 
            and content_type is not None 
            and content_type.find('html') > -1)

def log_error(e):
    """
    It is always a good idea to log errors. 
    This function just prints them, but you can
    make it do anything.
    """
    print(e)



urls=['https://www.kickstarter.com/projects/socialismmovie/socialism-an-american-story?ref=home_potd','https://www.kickstarter.com/projects/1653847368/the-cuban-a-film-about-the-power-of-music-over-alz?ref=home_new_and_noteworthy']


for url in urls:
        Project_raw=simple_get(url)
        Project_bs4= BeautifulSoup(Project_raw, 'lxml')

        Backers_offers=Project_bs4.find_all("h2", class_="pledge__amount")
        Backers_per_offer=Project_bs4.find_all("span", class_="pledge__backer-count")
        Offers_plus_Backers=Backers_offers+Backers_per_offer

        print(Offers_plus_Backers)

Tags： the to from import project url get return

0条回答

目前没有回答

使用beautifulsoup在Kickstarter上抓取多个url

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用beautifulsoup在Kickstarter上抓取多个url

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >