使用Python从Hadoop查询

2024-04-24 08:10:20 发布

您现在位置:Python中文网/ 问答频道 /正文

希望这个问题能解决。目前,这项工作:

import pyodbc, sys, os
import pandas as pd**

def get_data(SQL_statement):# insert HQL Statement with the usual '''<QUERY>''' 
    pyodbc.autocommit = True
    #Connection settings- DSN can be replaced with STG or DEV as required, depending on where you want to connect.
    conn = pyodbc.connect("DSN=HDP_PROD", autocommit=True)
    cursor = conn.cursor()
    #V1.1-- Config settings to limit TEZ container size preventing out of memory error, query takes slightly longer to run. 
    cursor.execute("set hive.tez.container.size=8192")
    cursor.execute("set hive.auto.convert.join.noconditionaltask.size=6553")
    #cursor.execute("set hive.auto.convert.join=false")
    cursor.execute(SQL_statement)
    #Creates df from SQL/HQL statement
    df = pd.read_sql(SQL_statement,conn)
    #Returns df to memory 
    return df


HIVE = gethive('''SELECT *
                FROM sp_commercial.INTERACTIONS_LAST6M''')

如果在函数errors out上方的select语句中添加where条件。你知道吗

因此,我想知道如何使用where条件从python查询hue/hadoop?你知道吗


Tags: toimportdfexecutesqlsizeasconn