Python3.6连接到大型数据帧的MS SQL Server

2024-04-20 04:12:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我是一个新的Python程序员,同时也是一个新的数据科学家,所以请原谅任何听起来愚蠢的事情。除非有人好奇,否则我不会透露详细信息,但基本上我需要连接到Microsoft SQL Server并上载一个相对较大(约500k行)的Pandas DF,而且我几乎每天都需要这样做,因为项目目前是这样。在

它不一定是熊猫DF-我读过关于使用odo的csv文件,但我没能得到任何工作。我遇到的问题是我无法大容量插入DF,因为文件与sqlserver实例不在同一台计算机上。我经常犯以下错误:

pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near the keyword 'IF'. (156) (SQLExecDirectW)")

由于我尝试了不同的SQL语句,您可以将IF替换为CREATE语句中的第一个COL_NAME。我使用SQLAlchemy创建引擎并连接到数据库。这可能不言而喻,但是pd.to_sql()方法对于我移动的数据量来说太慢了,所以我需要更快的方法。在

顺便说一下,我使用的是python3.6。我在这里写下了我尝试过的大多数没有成功的事情。在

import pandas as pd
from sqlalchemy import create_engine
import numpy as np    
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('test_col'))
address = 'mssql+pyodbc://uid:pw@server/path/database?driver=SQL Server'
engine = create_engine(address)
connection = engine.raw_connection()
cursor = connection.cursor()
# Attempt 1 <- This failed to even create a table at the cursor_execute statement so my issues could be way in the beginning here but I know that I have a connection to the SQL Server because I can use pd.to_sql() to create tables successfully (just incredibly slowly for my tables of interest)
create_statement = """
DROP TABLE test_table
CREATE TABLE test_table (test_col)
"""
cursor.execute(create_statement)
test_insert = '''
INSERT INTO test_table
(test_col)
values ('abs');
'''
cursor.execute(test_insert)

Attempt 2 <- From iabdb WordPress blog I came across
def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))
records = [str(tuple(x)) for x in take_rates.values]

insert_ = """
INSERT INTO test_table
("A")
VALUES
"""

for batch in chunker(records, 2): # This would be set to 1000 in practice I hope
    print(batch)
    rows = str(batch).strip('[]')
    print(rows)
    insert_rows = insert_ + rows
    print(insert_rows)
    cursor.execute(insert_rows)
    #conn.commit() # don't know when I would need to commit

conn.close()

# Attempt 3 # From a related Stack Exchange Post
 create the table but first drop if it already exists
command = """DROP TABLE IF EXISTS test_table
CREATE TABLE test_table # these columns are from my real dataset
"Serial Number" serial primary key,
"Dealer Code" text,
"FSHIP_DT" timestamp without time zone,
;"""
cursor.execute(command)
connection.commit()

# stream the data using 'to_csv' and StringIO(); then use sql's 'copy_from' function
output = io.StringIO()
# ignore the index
take_rates.to_csv(output, sep='~', header=False, index=False)
# jump to start of stream
output.seek(0)
contents = output.getvalue()
cur = connection.cursor()
# null values become ''
cur.copy_from(output, 'Config_Take_Rates_TEST', null="")
connection.commit()
cur.close()

在我身边玩微软的SQL服务器似乎不太好。。。 我想为粗略的格式道歉-我已经使用这个脚本几个星期了,但最后决定尝试为StackOverflow组织一些东西。非常感谢任何人能提供的帮助!在


Tags: thetointestoutputexecutesqlserver
2条回答

“DROP TABLE IF EXISTS test_TABLE”看起来像是无效的tsql语法。 你可以这样做:

if (object_id('test_table') is not null) 
DROP TABLE test_table

如果只需要替换现有表,请截断它并使用bcp实用程序上载该表。快多了。在

from subprocess import call

command = "TRUNCATE TABLE test_table"
take_rates.to_csv('take_rates.csv', sep='\t', index=False)
call('bcp {t} in {f} -S {s} -U {u} -P {p} -d {db} -c -t "{sep}" -r "{nl}" -e {e}'.format(t='test_table', f='take_rates.csv', s=server, u=user, p=password, db=database, sep='\t', nl='\n')

您需要安装bcp实用程序(yum install mssql tools on CentOS/RedHat)。在

相关问题 更多 >