获取带有温度值的文本文件

import urllib2 from bs4 import BeautifulSoup #CSV f = open('wunder-data.txt', 'w') #months, days for m in range(1, 13): for d in range(1, 32): #get if already gone through month if (m == 2 and d > 28): break elif (m in [4, 6, 9, 11] and d > 30): break #open wunderground.com url timestamp = '2009' + str(m) + str(d) print "Getting data for " + timestamp url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html" page = urllib2.urlopen(url) #get temp from page soup = BeautifulSoup(page) #dayTemp = soup.body.nobr.b.string dayTemp = soup.findAll(attrs={"class":"nobr"})[4].span.string #Format month for timestamp if len(str(m)) < 2: mStamp = '0' + str(m) else: mStamp = str(m) #Format day for timestamp if len(str(d)) < 2: dStamp = '0' + str(d) else: dStamp = str(d) #Build timestamp timestamp = '2009' + mStamp + dStamp #Write timestamp and temperature to file f.write(timestamp + ',' + dayTemp + '\n') # Done getting data! Close file. f.close()

1条回答

网友

1楼 · 发布于 2024-04-20 00:14:22

代码缩进有问题。从#get if already..到page = urllib2.urlopen(url)的部分代码缩进更多，因此它只是内部循环的一部分。解析网页内容并写入文件在外循环中。这就是为什么您只抓取几个月的最后一天（事实上，由于您的循环定义为每个月31日之前，所以大部分都是无效的）。你知道吗

您可以使用datetime正确地迭代一年中的几天，即：

d = datetime.datetime(2009, 1, 1)
end_date = datetime.datetime(2010, 1, 1)
delta = datetime.timedelta(days=1)
while d < end_date:
    print "Getting data for " + d.strftime("%Y-%m-%d")
    url = "http://www.wunderground.com/history/airport/KBUF/2009/%d/%d/DailyHistory.html" % (d.day, d.month)
    page = urllib2.urlopen(url)

    #process web content and write to file

    d += delta

# Done getting data! Close file.
f.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章