Python编写原始SQL在Python中操作数据帧

2条回答

网友
1楼 · 编辑于 2024-06-16 09:50:51

如源代码所示，R的sqldf和Python的pandasql实际上都运行在内存中的SQLite实例中（R的默认情况下是SQLite）。因此，实际上，您可以通过与SQLAlchemy接口，用pandas的SQL方法（read_sql和to_sql）复制功能，而pandasql实际上是在幕后操作的！具体来说，请考虑以下示例：
将所有需要的环境数据帧导入内存中的SQLite数据库：
import numpy as np import pandas as pd from sqlalchemy import create_engine # IN-MEMORY DATABASE (NO PATH SPECIFIED) engine = create_engine('sqlite://') dates = pd.date_range('2018-01-01', '2018-06-22', freq='D') df1 = pd.DataFrame({'current_date': np.random.choice(dates, 50), 'analysis_tool': 'pandas', 'num_value': np.random.randint(100, size=50)*1000 }, columns=['current_date', 'analysis_tool', 'num_value']) df2 = pd.DataFrame({'current_date': np.random.choice(dates, 50), 'analysis_tool': 'r', 'num_value': np.random.randint(100, size=50)*1000 }, columns=['current_date', 'analysis_tool', 'num_value']) df3 = pd.DataFrame({'current_date': np.random.choice(dates, 50), 'analysis_tool': 'sas', 'num_value': np.random.randint(100, size=50)*1000 }, columns=['current_date', 'analysis_tool', 'num_value']) df1.to_sql("df1", con=engine, if_exists='replace') df2.to_sql("df2", con=engine, if_exists='replace') df3.to_sql("df3", con=engine, if_exists='replace')
运行所需的SQL查询以更新和操作数据：
# QUERIES RUN IN A TRANSACTION with engine.begin() as cn: cn.execute("UPDATE df1 SET analysis_tool = 'python pandas'") cn.execute("INSERT INTO df3 (analytic_tool, current_date, num_value) VALUES (?, ?, ?)", ('sas', '2018-06-23', 51000))
导入为熊猫数据帧：
strSQL = """SELECT * FROM df1 UNION ALL SELECT * FROM df2 UNION ALL SELECT * FROM df3;""" df_all = pd.read_sql(strSQL, engine) engine.dispose() # IN-MEMORY DATABASE DESTROYED

网友
2楼 · 编辑于 2024-06-16 09:50:51

在pandas中不需要SQL来执行此操作。可以使用以下方法连接两个数据帧：
df1.set_index('patient_id').join(df2.set_index('patid'))
您可以根据条件创建列，类似于CASE WHEN ECD='1234' THEN 'ACTIVE' ELSE 'ACTIVE' END AS ACTIVE_INACTIVE, 做一些类似的事情：
^{pr2}$
如果您真的需要使用SQL，可以使用
sudo -H pip3 install pandasql
然后，您可以像您预期的那样使用它：
from pandasql import sqldf pysqldf = lambda q: sqldf(q, globals()) q = """SELECT *, CASE WHEN a.ECD='1234' THEN 'ACTIVE' ELSE 'INACTIVE' END AS ACTIVE_INACTIVE FROM df1 a JOIN df2 b ON a.patient_id = b.patid;""" print(pysqldf(q).head())

相关问题更多 >

编程相关推荐

热门问题

热门文章