使用BeautifulSoup从Craigslist获取价格

2 投票

1 回答

1135 浏览

提问于 2025-04-17 19:54

我刚开始学习Python编程（也就几天的时间），基本上是在看别人的代码，尤其是在StackOverflow上。我现在想写的代码是用BeautifulSoup这个库来获取Craigslist上摩托车的pid和对应的价格。我知道有很多其他的方法可以做到这一点，但我现在的代码是这样的：

from bs4 import BeautifulSoup         
from urllib2 import urlopen               
u = ""
count = 0
while (count < 9):
    site = "http://sfbay.craigslist.org/mca/" + str(u)
    html = urlopen(site)                      
    soup = BeautifulSoup(html)                
    postings = soup('p',{"class":"row"})                      
    f = open("pid.txt", "a")
    for post in postings:
        x = post.getText()
        y = post['data-pid']
        prices = post.findAll("span", {"class":"itempp"})
        if prices == "":
            w = 0
        else:
            z = str(prices)
            z = z[:-8]
            w = z[24:]
        filewrite = str(count) + " " + str(y) + " " +str(w) + '\n'
        print y
        print w
        f.write(filewrite)
    count = count + 1 
    index = 100 * count
    print "index is" + str(index)
    u = "index" + str(index) + ".html"

这个代码运行得很好，随着我不断学习，我打算对它进行优化。现在我遇到的问题是，价格为空的条目仍然会显示出来。我是不是漏掉了什么明显的东西呢？谢谢。

代码优化数据提取 web scraping 编程学习 beautifulsoup 条件过滤 Craigslist

1 个回答

问题在于你是怎么比较 prices 的。你说：

prices = post.findAll("span", {"class":"itempp"})

在 BS 中，.findAll 会返回一个元素的列表。当你把价格和一个空字符串比较时，它总是会返回 false，也就是“假”。

>>>[] == ""
False

把 if prices == "": 改成 if prices == []，这样就没问题了。

希望这能帮到你。

回答于 2025-04-17 由 Python大师

分享举报

使用BeautifulSoup从Craigslist获取价格

1 个回答

撰写回答