从URL读取Zip文件,并将Ms Access文件转换为数据帧

2024-05-13 01:09:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我写此消息是为了寻求帮助,以便找到此python代码的解决方案。我正在尝试生成FAA PMA零件的可视化,并与我公司的零件数据库进行比较。 我已经使用PowerBI、SQL查询和Excel文件完成了这项工作,但它不是在线可视化。每个月我都必须从URL(https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip)下载FAA PMA零件,以更新此可视化

也就是说,为了从上面提到的Url中读取Ms Access文件并将其转换为数据帧,我尝试编写了这些python代码

然而,由于以下原因,我没有成功完成这项任务:

首先,我尝试使用sqlalchemy,但出现了以下错误

    from zipfile import ZipFile
    from io import BytesIO
    import urllib
    from urllib.request import urlopen
    from sqlalchemy import create_engine
    import pandas as pd
    
    r = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
    file = ZipFile(BytesIO(r))
    pma_accdb = file.open("PMA.accdb")
    
    connection_string = (r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
                         r'DBQ=pma_accdb;'
                         r'ExtendedAnsiSQL=1;'
                                              )
    connection_url = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
    engine = create_engine(connection_url)
    sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)
    
    print(sql_df.head())

错误:回溯(最近一次呼叫上次): 文件“C:/Users/thiago.ribeiro/.PyCharmCE2019.1/config/scratches/scratch.py”,第17行,在 引擎=创建引擎(连接url) 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python38\lib\site packages\sqlalchemy\engine\uem>init\uuum.py”,第500行,在create\u engine中 返回策略。创建(*args,**kwargs) 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python38\lib\site packages\sqlalchemy\engine\strategies.py”,第61行,在create中 入口点=u.\u获取\u入口点() 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python38\lib\site packages\sqlalchemy\engine\url.py”,第172行,在_get\u entrypoint中 cls=registry.load(名称) 文件“C:\Users\thiago.ribeiro\AppData\Local\Programs\Python38\lib\site packages\sqlalchemy\util\langhelpers.py”,第267行,已加载 提升exc.NOSUCH模块错误( sqlalchemy.exc.NoSuchModuleError:无法加载插件:sqlalchemy.方言:access.pyodbc

之后,我尝试使用pyodbc,但也出现了以下错误

    from zipfile import ZipFile
    from io import BytesIO
    from urllib.request import urlopen
    import pyodbc
    import pandas as pd
    
    r = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
    file = ZipFile(BytesIO(r))
    pma_accdb = file.open("PMA.accdb")
    
    driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
    filepath = pma_accdb
    
    myDataSources = pyodbc.dataSources()
    access_driver = myDataSources['MS Access Database']
    
    cnxn = pyodbc.connect(driver=access_driver, dbq=filepath, autocommit=True)
    crsr = cnxn.cursor()
    crsr.execute("SELECT * FROM Parts")
    
    sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)
    
    print(sql_df.head())

错误:回溯(最近一次呼叫上次): 第17行,在 cnxn=pyodbc.connect(driver=access\u driver,dbq=filepath,autocommit=True) pyodbc.Error:('HY000','[HY000][Microsoft][Driver-ODBC-Microsoft-Access]常规错误无法打开进程0x2efc线程0x4a20 DBC 0xd39ea788的注册表项临时(易失性)Ace DSN

综上所述,您能否帮助我了解我的代码哪里出错,以及如何改进代码,以便从url读取MS Access并将其转换为数据帧


Tags: 文件fromimporturlsqlaccesssqlalchemydriver
1条回答
网友
1楼 · 发布于 2024-05-13 01:09:03

我得到一个朋友的帮助。这就是解决方案:


        from zipfile import ZipFile
        from io import BytesIO
        import urllib
        from urllib.request import urlopen
        from sqlalchemy import create_engine
        import pandas as pd
        
        resp = urlopen("https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPMA.nsf/f6a80b2e3c91131686257bcf0063a042/bc1b3424cc899bee86257beb006237ae/$FILE/PMA.zip").read()
        zipfile = ZipFile(BytesIO(resp))
        
        connection_string = (r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
        r'DBQ=' + zipfile.extract('PMA.accdb') + ';'
        r'ExtendedAnsiSQL=1;'
        )
        connection_url = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
        engine = create_engine(connection_url)
        sql_df = pd.read_sql("SELECT * FROM Parts", con=engine)

相关问题 更多 >