我有一个数据库表,我正试图从中获得500多万行的两列
python中的以下代码完美且快速地工作(对于完整的5+行数据,通过查询检索并写入CSV,大约需要3分钟):
import pandas as pd
import teradatasql
hostname = "myhostname.domain.com"
username = "myusername"
password = "mypassword"
with teradatasql.connect(host=hostname, user=username, password=password, encryptdata=True) as conn:
df = pd.read_sql("SELECT COL1, COL2 FROM MY_TABLE", conn)
df.to_csv(mypath, sep = '\t', index = False)
R with teradatasql
包中的以下代码适用于显式提供的要检索的行计数的小值。但是,当n足够大时(实际上没有那么大),或者当我要求它检索完整的5+行数据集时,它会花费大量的时间,或者几乎永远不会返回
知道发生了什么吗
library(teradatasql)
dbconn <- DBI::dbConnect(
teradatasql::TeradataDriver(),
host = 'myhostname.domain.com',
user = 'myusername', password = 'mypassword'
)
dbExecute(dbconn, "SELECT COL1, COL2 FROM MY_TABLE")
[1] 5348946
system.time(dbGetQuery(dbconn, "SELECT COL1, COL2 FROM MY_TABLE", n = 10))
user system elapsed
0.084 0.016 1.496
system.time(dbGetQuery(dbconn, "SELECT COL1, COL2 FROM MY_TABLE", n = 100))
user system elapsed
0.104 0.024 1.548
system.time(dbGetQuery(dbconn, "SELECT COL1, COL2 FROM MY_TABLE", n = 1000))
user system elapsed
0.488 0.036 1.826
system.time(dbGetQuery(dbconn, "SELECT COL1, COL2 FROM MY_TABLE", n = 10000))
user system elapsed
7.484 0.100 9.413
system.time(dbGetQuery(dbconn, "SELECT COL1, COL2 FROM MY_TABLE", n = 100000))
user system elapsed
767.824 4.648 782.518
system.time(dbGetQuery(dbconn, "SELECT COL1, COL2 FROM MY_TABLE", n = 5348946))
< DOES NOT RETURN IN HOURS >
以下是一些版本信息供参考:
> packageVersion('teradatasql')
[1] ‘17.0.0.2’
> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 6.1
year 2019
month 07
day 05
svn rev 76782
language R
version.string R version 3.6.1 (2019-07-05)
nickname Action of the Toes
teradatasql
驱动程序从获取的结果集行在内存中构造一个大的data.frame
非常慢为了获得良好的获取性能,您需要限制一次从结果集中获取的行数
下面是一些非正式性能测试的结果,这些测试从一个两列表中获取行,该表包含一个
integer
列和一个varchar(100)
列。一次获取100行时性能最佳相关问题 更多 >
编程相关推荐