如何使用Python提取Met Office JSON下载的数据
我正在使用Python 3.4。
我开始了一个项目,目的是下载英国气象局的天气预报数据(格式是JSON),并把这些信息用作我家供暖系统的天气补偿器。我已经成功从气象局下载了JSON数据文件,现在我想提取我需要的信息。我可以通过把文件转换成字符串,然后用.find
和.int
方法来提取数据,但这样的方法感觉有点粗糙(虽然有效)。既然JSON被认为是一种常用的数据交换格式,那肯定有更好的方法来处理这个问题。我发现了一些像json.load
和json.loads
这样的东西,还有json.JSONDecoder.decode
,但我在使用这些方法时没有成功,实际上我对自己在做什么也不是很清楚!
我的代码是:
import urllib.request
import json
#Comment: THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
#Comment: **** = my personal met office API key, which I had better keep to myself
response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/json/354037?res=3hourly&key=****')
FCData = response.read()
FCDataStr = str(FCData)
#Comment: END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET
#Comment: Example of data extraction
ChPos = FCDataStr.find('"DV"') #Find "DV"
ChPos = FCDataStr.find('"dataDate"', ChPos, ChPos+50) #Find "dataDate"
FileDataDate = FCDataStr[ChPos+12:ChPos+22] #Extract the date of the file
#Comment: And so on
当我使用json.loads(FCDataStr)
时,出现了以下错误信息:
"ValueError: Expecting value: line 1 column 1 (char 0)"
通过删除开头的b'和结尾的',这个错误就消失了(见下文)。使用print(FCDataStr)
打印JSON文件的字符串格式,结果是:
b'{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2014-07-29T20:00:00Z","type":"Forecast","Location":{"i":"354037","lat":"51.7049","lon":"-2.9022","name":"USK","country":"WALES","continent":"EUROPE","elevation":"43.0","Period":[{"type":"Day","value":"2014-07-29Z","Rep":[{"D":"NNW","F":"22","G":"11","H":"51","Pp":"4","S":"9","T":"24","V":"VG","W":"7","U":"7","$":"900"},{"D":"NW","F":"19","G":"16","H":"61","Pp":"8","S":"11","T":"22","V":"EX","W":"8","U":"1","$":"1080"},{"D":"NW","F":"16","G":"20","H":"70","Pp":"1","S":"11","T":"18","V":"VG","W":"2","U":"0","$":"1260"}]},{"type":"Day","value":"2014-07-30Z","Rep":[{"D":"NW","F":"13","G":"16","H":"84","Pp":"0","S":"7","T":"14","V":"VG","W":"0","U":"0","$":"0"},{"D":"WNW","F":"12","G":"13","H":"90","Pp":"0","S":"7","T":"13","V":"VG","W":"0","U":"0","$":"180"},{"D":"WNW","F":"13","G":"11","H":"87","Pp":"0","S":"7","T":"14","V":"GO","W":"1","U":"1","$":"360"},{"D":"SW","F":"18","G":"9","H":"67","Pp":"0","S":"4","T":"19","V":"VG","W":"1","U":"2","$":"540"},{"D":"WNW","F":"21","G":"13","H":"56","Pp":"0","S":"9","T":"22","V":"VG","W":"3","U":"6","$":"720"},{"D":"W","F":"21","G":"20","H":"55","Pp":"0","S":"11","T":"23","V":"VG","W":"3","U":"6","$":"900"},{"D":"W","F":"18","G":"22","H":"57","Pp":"0","S":"11","T":"21","V":"VG","W":"1","U":"2","$":"1080"},{"D":"WSW","F":"16","G":"13","H":"80","Pp":"0","S":"7","T":"16","V":"VG","W":"0","U":"0","$":"1260"}]},{"type":"Day","value":"2014-07-31Z","Rep":[{"D":"SW","F":"14","G":"11","H":"91","Pp":"0","S":"4","T":"15","V":"GO","W":"0","U":"0","$":"0"},{"D":"SW","F":"14","G":"11","H":"92","Pp":"0","S":"4","T":"14","V":"GO","W":"0","U":"0","$":"180"},{"D":"SW","F":"15","G":"11","H":"89","Pp":"3","S":"7","T":"16","V":"GO","W":"3","U":"1","$":"360"},{"D":"WSW","F":"17","G":"20","H":"79","Pp":"28","S":"11","T":"18","V":"GO","W":"3","U":"2","$":"540"},{"D":"WSW","F":"18","G":"22","H":"72","Pp":"34","S":"11","T":"20","V":"GO","W":"10","U":"5","$":"720"},{"D":"WSW","F":"18","G":"22","H":"66","Pp":"13","S":"11","T":"20","V":"VG","W":"7","U":"5","$":"900"},{"D":"WSW","F":"17","G":"22","H":"69","Pp":"36","S":"11","T":"19","V":"VG","W":"10","U":"2","$":"1080"},{"D":"WSW","F":"16","G":"16","H":"84","Pp":"6","S":"9","T":"17","V":"GO","W":"2","U":"0","$":"1260"}]},{"type":"Day","value":"2014-08-01Z","Rep":[{"D":"SW","F":"16","G":"13","H":"91","Pp":"4","S":"7","T":"16","V":"GO","W":"7","U":"0","$":"0"},{"D":"SW","F":"15","G":"11","H":"93","Pp":"5","S":"7","T":"16","V":"GO","W":"7","U":"0","$":"180"},{"D":"SSW","F":"15","G":"11","H":"93","Pp":"7","S":"7","T":"16","V":"GO","W":"7","U":"1","$":"360"},{"D":"SSW","F":"17","G":"18","H":"79","Pp":"14","S":"9","T":"18","V":"GO","W":"7","U":"2","$":"540"},{"D":"SSW","F":"17","G":"22","H":"74","Pp":"43","S":"11","T":"19","V":"GO","W":"10","U":"5","$":"720"},{"D":"SW","F":"16","G":"22","H":"81","Pp":"48","S":"11","T":"18","V":"GO","W":"10","U":"5","$":"900"},{"D":"SW","F":"16","G":"18","H":"80","Pp":"55","S":"9","T":"17","V":"GO","W":"12","U":"1","$":"1080"},{"D":"SSW","F":"15","G":"16","H":"89","Pp":"38","S":"7","T":"16","V":"GO","W":"9","U":"0","$":"1260"}]},{"type":"Day","value":"2014-08-02Z","Rep":[{"D":"S","F":"14","G":"11","H":"94","Pp":"15","S":"7","T":"15","V":"GO","W":"7","U":"0","$":"0"},{"D":"SSE","F":"14","G":"11","H":"94","Pp":"16","S":"7","T":"15","V":"GO","W":"7","U":"0","$":"180"},{"D":"S","F":"14","G":"13","H":"93","Pp":"36","S":"7","T":"15","V":"GO","W":"10","U":"1","$":"360"},{"D":"S","F":"15","G":"20","H":"84","Pp":"62","S":"11","T":"17","V":"GO","W":"14","U":"2","$":"540"},{"D":"SSW","F":"16","G":"22","H":"78","Pp":"63","S":"11","T":"18","V":"GO","W":"14","U":"5","$":"720"},{"D":"WSW","F":"16","G":"27","H":"66","Pp":"59","S":"13","T":"19","V":"VG","W":"14","U":"5","$":"900"},{"D":"WSW","F":"15","G":"25","H":"68","Pp":"39","S":"13","T":"18","V":"VG","W":"10","U":"2","$":"1080"},{"D":"SW","F":"14","G":"16","H":"80","Pp":"28","S":"9","T":"15","V":"VG","W":"0","U":"0","$":"1260"}]}]}}}}'
使用:
DecodedJSON = json.loads(FCDataStr)
print(DecodedJSON)
得到的结果与原始的FCDataStr文件非常相似。
我该如何继续从文件中提取数据(比如每三小时的预报中的温度、风速等)呢?
3 个回答
我一直在解析气象局的数据输出。
多亏了上面的回复,我现在有了一个对我有用的解决方案。
我正在把我感兴趣的数据写入一个CSV文件:
import sys
import os
import urllib.request
import json
### THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxobs/all/json/3351?res=hourly&?key=<my key>')
FCData = response.read()
FCDataStr = FCData.decode('utf-8')
### END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET
#Converts JSON data to a dictionary object
FCData_Dic = json.loads(FCDataStr)
# Open output file for appending
fName=<my filename>
if (not os.path.exists(fName)):
print(fName,' does not exist')
exit()
fOut=open(fName, 'a')
# Loop through each day, will nearly always be 2 days,
# unless run at midnight.
i = 0
j = 0
for k in range(24):
# there will be 24 values altogether
# find the first hour value for the first day
DateZ = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['value'])
hhmm = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j] ['$'])
Temperature = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]['T'])
Humidity = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]['H'])
DewPoint = (FCData_Dic['SiteRep']['DV']['Location']['Period'][i]['Rep'][j]['Dp'])
recordStr = '{},{},{},{},{}\n'.format(DateZ,hhmm,Temperature,Humidity,DewPoint)
fOut.write(recordStr)
j = j + 1
if (hhmm == '1380'):
i = i + 1
j = 0
fOut.close()
print('Records added to ',fName)`
对于那些对使用英国气象局每三小时天气预报数据感到困惑的人,下面是我正在使用的解决方案:
import urllib.request
import json
### THIS IS THE CALL TO GET THE MET OFFICE FILE FROM THE INTERNET
response = urllib.request.urlopen('http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/json/**YourLocationID**?res=3hourly&key=**your_api_key**')
FCData = response.read()
FCDataStr = FCData.decode('utf-8')
### END OF THE CALL TO GET MET OFFICE FILE FROM THE INTERNET
#Converts JSON data to a dictionary object
FCData_Dic = json.loads(FCDataStr)
#The following are examples of extracting data from the dictionary object.
#The JSON data is heavily nested.
#Each [] goes one level down, usually defined with {} in the JSON data.
dataDate = (FCData_Dic['SiteRep']['DV']['dataDate'])
print('dataDate =',dataDate)
#There are also [] in the JSON data, which are referenced with integers,
# starting from [0]
#Here, the [0] refers to the first day's block of data defined with [].
DateDay0 = (FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['value'])
print('DateDay0 =',DateDay0)
#The second [0] picks out each of the first day's forecast data, in this case the time, referenced by '$'
TimeOfFC = (FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['Rep'][0]['$'])
print('TimeOfFC =',TimeOfFC)
#Ditto for the temperature.
Temperature = int((FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['Rep'][0]['T']))
print('Temperature =',Temperature)
#Ditto for the weather Type (a code number).
WeatherType = int((FCData_Dic['SiteRep']['DV']['Location']['Period'][0]['Rep'][0]['W']))
print('WeatherType =',WeatherType)
希望这能帮到某些人!
这是问题所在:
FCDataStr = str(FCData)
当你对一个 bytes
对象使用 str
时,你得到的是这个 bytes
对象的字符串表示形式——用引号括起来,前面带有 b
前缀,并且特殊字符会用反斜杠转义。
如果你想把二进制数据解码成文本,你需要使用 decode
方法:
FCDataStr = FCData.decode('utf-8')
(我猜是 UTF-8,因为 JSON 通常应该是 UTF-8 编码,除非另有说明。)
更详细地说:
urllib.request.urlopen
返回一个 http.client.HTTPResponse
对象,这个对象像一个二进制文件(它实现了 io.RawIOBase
)。
你不能把这个对象直接传给 json.load
,因为它需要一个类似文本文件的对象——也就是有一个 read
方法,返回 str
类型,而不是 bytes
类型。你可以把 HTTPResponse
包装在一个 io.BufferedReader
中,然后再包装在一个 io.TextIOBase
中(并指定 encoding='utf-8'
),然后再传给 json.load
,但这可能会比你想做的事情多很多。
所以,最简单的方法就是你原本想做的,只需用 decode
替代 str
:
data_bytes = response.read()
data_str = data_bytes.decode('utf-8')
data_dict = json.loads(data_str)
然后,不要尝试直接访问 data_str
中的数据——那只是一个字符串,表示你的数据的 JSON 编码;data_dict
才是真正的数据。
例如,要找到 SiteRep
中 DV
的 dataDate
,你只需这样做:
data_dict['SiteRep']['DV']['DataDate']
这样你就能得到字符串 '2014-07-31T14:00:00Z'。你可能还想把它转换成 datetime.datetime
对象(因为 JSON 只理解几种基本类型:字符串、数字、列表和字典)。但这比从 data_str
中用 find
方法或猜测偏移量要好得多。
我猜你可能找到了某些为 Python 2.x 编写的示例代码,在那种情况下,你可以通过调用适当的构造函数在字节字符串和 Unicode 字符串之间转换,而不需要指定编码,这样默认会使用 sys.getdefaultencoding()
,而且通常(至少在 Mac 或大多数现代 Linux 发行版上)是 UTF-8,所以尽管这样做是错误的,但它恰好能工作。在这种情况下,你可能想找一些更好的示例代码来学习……