循环通过URL为Pandas创建csv和数据帧

2024-06-16 12:10:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从URL结构中显示的满足特定日期要求的URL中提取数据,并将这些信息放入csv中以在本地使用。你知道吗

http://web.mta.info/developers/data/nyct/turnstile/turnstile_190629.txt

URL末尾的6位数字序列是年-月-日指示符:190629

我正在收集2016-2019(16-19)年3月至6月(03-06)的数据。如果URL存在,则创建一个csv,并将它们合并到一个csv中,以馈送到一个数据帧中。你知道吗

这很管用,但速度很慢,而且我知道这并不是最像Python的方式。你知道吗

import requests
import pandas as pd
import itertools

date_list = [['16', '17', '18', '19'],['03', '04', '05', '06'],['01', '02', '03', '03', '04', '05', '06'
       ,'07', '08', '09','10', '11', '12','13','14' ,'15', '16',
       '17','18','19','20','21','22','23','24','25','26','27'
       ,'28','29','30','31']]
date_combo = []

# - create year - month - date combos
# - link: https://stackoverflow.com/questions/798854/all-combinations-of-a-list-of-lists

for sub_list in itertools.product(*date_list):
    date_combo.append(sub_list)

url_lead = 'http://web.mta.info/developers/data/nyct/turnstile/turnstile_'
url_list = []

# - this checks the url is valid and adds them to a list
for year, month, day in date_combo:
    concat_url = url_lead + year + month + day + '.txt'
    response = requests.get(concat_url)
    if response.status_code == 200:
# ---- creates a list of active urls
        url_list.append(concat_url)
# ---- this creates individual csvs ---- change path for saving locally
# ---- filename is date
        df = pd.read_csv(concat_url, header = 0, sep = ',')
        df.to_csv(r'/Users/.../GitHub/' + year + month + day + '.csv')

# - this creates a master df for all urls
dfs = [pd.read_csv(url,header = 0, sep = ',') for url in url_list]
df = pd.concat(dfs, ignore_index = True)
df.to_csv(r'/Users/.../GitHub/seasonal_mta_data_01.csv')

我的代码按预期运行,但如果有任何建议,我将不胜感激!你知道吗


Tags: csv数据urldffordatadateyear
1条回答
网友
1楼 · 发布于 2024-06-16 12:10:12

我想不出多少。以下是一些我想做的不同的事情:

# more consie construction of date_combo  
date_list = [range(16,20), range(3,7),range(1,32)]
date_combo = [sub_list for sub_list in itertools.product(*date_list)]

url_lead = 'http://web.mta.info/developers/data/nyct/turnstile/turnstile_'
url_list = []
dfs = []

# - this checks the url is valid and adds them to a list
for year, month, day in date_combo:
    # year, month, day are integers
    # so we use f string here
    concat_url = f'{url_lead}{year}{month:02}{day:02}.txt'

    response = requests.get(concat_url)
    if response.status_code == 200:
        url_list.append(concat_url)

        # append to dfs and save csv
        dfs.append(pd.read_csv(concat_url, header = 0, sep = ','))
        dfs[-1].to_csv(f'/Users/.../GitHub/{year}{month:02}{day:02}.csv)

# we don't need to request the txt files again:
df = pd.concat(dfs, ignore_index = True)
df.to_csv(r'/Users/.../GitHub/seasonal_mta_data_01.csv')

相关问题 更多 >