从人口普查api收集acs和地理空间数据的工具
autocensus的Python项目详细描述
自动套间
python包,用于在pandas数据框架中从Census API以及相关的地理空间点和边界收集美国社区调查(acs)数据。使用asyncio/aiohttp并发请求数据。
该软件包正在积极开发中,预计将对其api进行重大更改。
内容
安装
autocensus需要python 3.7或更高版本。安装如下:
pip install autocensus
要运行autocensus,必须通过census_api_key
关键字参数(如下面的示例所示)或通过设置环境变量CENSUS_API_KEY
来指定Census API key。
示例
fromautocensusimportQuery# Configure queryquery=Query(estimate=5,years=[2014,2015,2016,2017],variables=['B01002_001E','B03001_001E','DP03_0025E','S0503_C02_077E'],for_geo='tract:*',in_geo=['state:08','county:005'],# Fill in the following with your actual Census API keycensus_api_key='Your Census API key')# Run query and collect output in dataframedataframe=query.run()
输出:
name | geo_id | geo_type | year | date | variable_code | variable_label | variable_concept | annotation | value | percent_change | difference | centroid | internal_point | geometry |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Census Tract 151, Arapahoe County, Colorado | 1400000US08005015100 | tract | 2014 | 2014-12-31 | B01002_001E | Median age - Total | Median Age by Sex | 45.7 | POINT (…) | POINT (…) | MULTIPOLYGON (…) | |||
Census Tract 151, Arapahoe County, Colorado | 1400000US08005015100 | tract | 2015 | 2015-12-31 | B01002_001E | Median age - Total | Median Age by Sex | 45.2 | -1.1 | -0.5 | POINT (…) | POINT (…) | MULTIPOLYGON (…) | |
Census Tract 151, Arapahoe County, Colorado | 1400000US08005015100 | tract | 2016 | 2016-12-31 | B01002_001E | Median age - Total | Median Age by Sex | 45.9 | 1.6 | 0.7 | POINT (…) | POINT (…) | MULTIPOLYGON (…) | |
Census Tract 151, Arapahoe County, Colorado | 1400000US08005015100 | tract | 2017 | 2017-12-31 | B01002_001E | Median age - Total | Median Age by Sex | 45.7 | -0.4 | -0.2 | POINT (…) | POINT (…) | MULTIPOLYGON (…) | |
Census Tract 49.51, Arapahoe County, Colorado | 1400000US08005004951 | tract | 2014 | 2018-12-31 | B01002_001E | Median age - Total | Median Age by Sex | 26.4 | POINT (…) | POINT (…) | MULTIPOLYGON (…) |
加入地理空间数据
AutoConsus将自动连接2013年及以后年份的以下地理类型的地理空间数据(质心、代表点和几何图形):
- 国家一级
nation
region
division
state
urban area
zip code tabulation area
county
congressional district
metropolitan statistical area/micropolitan statistical area
combined statistical area
american indian area/alaska native area/hawaiian home land
new england city and town area
- 州一级
alaska native regional corporation
block group
county subdivision
tract
place
public use microdata area
state legislative district (upper chamber)
state legislative district (lower chamber)
对于跨越前几年的查询,这些几何字段将填充空值。(2013年之前的年份不提供人口普查边界形状文件。)
如果不需要地理空间数据,请在初始化查询时将关键字argjoin_geography
设置为False
:
query=Query(estimate=5,years=[2014,2015,2016,2017],variables=['B01002_001E','B03001_001E','DP03_0025E','S0503_C02_077E'],for_geo='tract:*',in_geo=['state:08','county:005'],join_geography=False)
如果join_geography
是False
,则centroid
、internal_point
和geometry
列将不包含在结果中。
缓存
为了提高跨查询的性能,autoconsus默认情况下在磁盘上缓存shapefile。缓存位置因平台而异:
- Linux系统:
/home/{username}/.cache/autocensus
- 麦克:
/Users/{username}/Library/Application Support/Caches/autocensus
- 窗口:
C:\\Users\\{username}\\AppData\\Local\\socrata\\autocensus
发布到Socrata
如果安装了socrata-py,则可以通过Query.to_socrata
方法将查询结果(或包含多个查询结果的数据帧)直接发布到socrata。
证书
您必须拥有对要发布到的域具有适当权限的Socrata帐户。默认情况下,AutoConsus将在以下两个常见环境变量下查找您的Socrata帐户凭据:
SOCRATA_KEY_ID
,SOCRATA_KEY_SECRET
SOCRATA_USERNAME
,SOCRATA_PASSWORD
MY_SOCRATA_USERNAME
,MY_SOCRATA_PASSWORD
SODA_USERNAME
,SODA_PASSWORD
或者,可以通过auth
关键字参数显式地提供凭据:
auth=(os.environ['MY_SOCRATA_KEY'],os.environ['MY_SOCRATA_KEY_SECRET'])query.to_socrata('some-domain.data.socrata.com',auth=auth)
示例:创建新数据集
# Run query and publish results as a new dataset on Socrata domainquery.to_socrata('some-domain.data.socrata.com',name='Average Commute Time by Colorado County, 2013–2017',# Optionaldescription='5-year estimates from the American Community Survey'# Optional)< H3>示例:替换现有数据集< EH3>中的行
# Run query and publish results to an existing dataset on Socrata domainquery.to_socrata('some-domain.data.socrata.com',dataset_id='xxxx-xxxx')
示例:从多个查询创建新数据集
fromautocensusimportQueryfromautocensus.socrataimportto_socrataimportpandasaspd# County-level querycounty_query=Query(estimate=5,years=range(2013,2018),variables=['DP03_0025E'],for_geo='county:*',in_geo='state:08')county_dataframe=county_query.run()# State-level querystate_query=Query(estimate=5,years=range(2013,2018),variables=['DP03_0025E'],for_geo='state:08')state_dataframe=state_query.run()# Concatenate dataframes and upload to Socratacombined_dataframe=pd.concat([county_dataframe,state_dataframe])to_socrata('some-domain.data.socrata.com',dataframe=combined_dataframe,name='Average Commute Time by Colorado County with Statewide Averages, 2013–2017',# Optionaldescription='5-year estimates from the American Community Survey'# Optional)
主题
autocensus包含一些预先构建的acs变量列表,这些变量围绕种族、教育和住房等主题。它们生活在autocensus.topics
模块中:
importautocensusfromautocensusimportQueryquery=Query(estimate=5,years=[2014,2015,2016,2017],# Housing variables: B25035_001E, B25064_001E, B25077_001Evariables=autocensus.topics.housing,for_geo='tract:*',in_geo=['state:08','county:005'])
目前autocensus包含的主题有population
、race
、education
、income
和housing
。
已知问题
ssl错误
要禁用ssl验证,请在初始化verify_ssl=False
时指定Query
:
query=Query(estimate=5,years=[2014,2015,2016,2017],variables=['B01002_001E','B03001_001E','DP03_0025E','S0503_C02_077E'],for_geo='tract:*',in_geo=['state:08','county:005'],verify_ssl=False)