Python使用BeautifulSoup4和urllib2制作web spid

2024-04-26 22:09:02 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我目前的代码。你知道吗

import urlparse
import urllib2
c_s_l = ["anchorage", "fairbanks","kenai","juneau"]
#Craigslist state list

#for city in c_s_l:                      #insert from c_s_l into url[] in city
    #print "https://%s.craigslist.org/search/cta?query=sprinter" % city #substitutes city in city list in url
for city in c_s_l:                      #insert from c_s_l into url[] in city
    base = "https://%s.craigslist.org"% city #substitutes city in city list in url
    url = base + "/search/cta?query=sprinter" 
    response = urllib2.urlopen(url)
    html = response.read()
    soup = BeautifulSoup(html, 'html.parser')
    for a in soup.find_all('a', class_='result-title hdrlnk'):
        print a

我最终想扩大到craigslist的所有网站。但现在我正在想办法消除我不想要的东西。就是那些短跑车。谢谢你的建议和帮助


Tags: infromhttpsorgimporturlcityfor