我是一个新的Python程序员,同时也是一个新的数据科学家,所以请原谅任何听起来愚蠢的事情。除非有人好奇,否则我不会透露详细信息,但基本上我需要连接到Microsoft SQL Server并上载一个相对较大(约500k行)的Pandas DF,而且我几乎每天都需要这样做,因为项目目前是这样。在
它不一定是熊猫DF-我读过关于使用odo的csv文件,但我没能得到任何工作。我遇到的问题是我无法大容量插入DF,因为文件与sqlserver实例不在同一台计算机上。我经常犯以下错误:
pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near the keyword 'IF'. (156) (SQLExecDirectW)")
由于我尝试了不同的SQL语句,您可以将IF
替换为CREATE
语句中的第一个COL_NAME
。我使用SQLAlchemy创建引擎并连接到数据库。这可能不言而喻,但是pd.to_sql()
方法对于我移动的数据量来说太慢了,所以我需要更快的方法。在
顺便说一下,我使用的是python3.6。我在这里写下了我尝试过的大多数没有成功的事情。在
import pandas as pd
from sqlalchemy import create_engine
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 1)), columns=list('test_col'))
address = 'mssql+pyodbc://uid:pw@server/path/database?driver=SQL Server'
engine = create_engine(address)
connection = engine.raw_connection()
cursor = connection.cursor()
# Attempt 1 <- This failed to even create a table at the cursor_execute statement so my issues could be way in the beginning here but I know that I have a connection to the SQL Server because I can use pd.to_sql() to create tables successfully (just incredibly slowly for my tables of interest)
create_statement = """
DROP TABLE test_table
CREATE TABLE test_table (test_col)
"""
cursor.execute(create_statement)
test_insert = '''
INSERT INTO test_table
(test_col)
values ('abs');
'''
cursor.execute(test_insert)
Attempt 2 <- From iabdb WordPress blog I came across
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
records = [str(tuple(x)) for x in take_rates.values]
insert_ = """
INSERT INTO test_table
("A")
VALUES
"""
for batch in chunker(records, 2): # This would be set to 1000 in practice I hope
print(batch)
rows = str(batch).strip('[]')
print(rows)
insert_rows = insert_ + rows
print(insert_rows)
cursor.execute(insert_rows)
#conn.commit() # don't know when I would need to commit
conn.close()
# Attempt 3 # From a related Stack Exchange Post
create the table but first drop if it already exists
command = """DROP TABLE IF EXISTS test_table
CREATE TABLE test_table # these columns are from my real dataset
"Serial Number" serial primary key,
"Dealer Code" text,
"FSHIP_DT" timestamp without time zone,
;"""
cursor.execute(command)
connection.commit()
# stream the data using 'to_csv' and StringIO(); then use sql's 'copy_from' function
output = io.StringIO()
# ignore the index
take_rates.to_csv(output, sep='~', header=False, index=False)
# jump to start of stream
output.seek(0)
contents = output.getvalue()
cur = connection.cursor()
# null values become ''
cur.copy_from(output, 'Config_Take_Rates_TEST', null="")
connection.commit()
cur.close()
在我身边玩微软的SQL服务器似乎不太好。。。 我想为粗略的格式道歉-我已经使用这个脚本几个星期了,但最后决定尝试为StackOverflow组织一些东西。非常感谢任何人能提供的帮助!在
“DROP TABLE IF EXISTS test_TABLE”看起来像是无效的tsql语法。 你可以这样做:
如果只需要替换现有表,请截断它并使用bcp实用程序上载该表。快多了。在
您需要安装bcp实用程序(yum install mssql tools on CentOS/RedHat)。在
相关问题 更多 >
编程相关推荐