如何优化我的SQLAlchemy实例?
我正在使用SQLAlchemy来获取大量数据。简单来说,我们有一个“交付”表,这个表通过多个其他表相连,而这些表又可以连接到更多的表。我需要从这些相连的表中获取数据,以便创建一个交付的JSON文件。
目前,我访问这些数据的方式非常慢。有没有人能给我一些建议(无论是具体的还是一般性的)来提高性能?下面是我访问数据的代码,我不想把所有的表定义都贴出来,因为它们比较大。所有表的定义都是标准的,没有设置懒加载之类的。
# Get the deliveries for the specified date
delivery_rs = session.query(Delivery).join(Order) \
.filter(and_(Delivery.DespatchDateTime.between(start_date, end_date), Order.ProductionSite == site_map.get(site))).all()
# Setup our rowcount for the metadata later
rowcount = 0
# Go through each delivery in the resultset, formulate the full job/delivery/client/customer data and add it to the data array
for delivery in delivery_rs:
# Add to our rowcount
rowcount = rowcount + 1
# Add the jobs to our job array
job_deliveries = delivery.JobDeliveries
jobs = []
quantity = 0
for job_delivery in job_deliveries:
job = job_delivery.Job
web_ref = job.ClientJobReference
if web_ref and not re.match(r'^CCW_', web_ref):
web_ref = ""
elif web_ref:
web_ref = re.sub(r'^CCW_', '', web_ref)
jobs.append({
"web_ref": "CCW_{}".format(web_ref) if web_ref else "",
"name": job.JobName,
# The artwork is stored in S3, so provide a link
"thumbnail": "https://example.com/{}.png".format(web_ref) if web_ref else ""
})
quantity = quantity + job_delivery.Quantity
# Format our delivery data
if delivery.AddressContact:
address_contact = delivery.AddressContact
contact_data = {
"title": title_map.get(address_contact.Title),
"name": address_contact.ContactName,
"email": address_contact.ContactEmail,
"phone": address_contact.ContactNumber
}
else:
contact_data = {}
order = delivery.Order
client = order.Client
delivery_method = delivery.DeliveryMethod
address = delivery.Address
result["data"].append(
{
"order_number": order.OrderSequenceId,
"quantity": quantity,
"method": delivery_method.Name,
"client": client.Name,
"end_client": client.EndCustomer,
"jobs": jobs,
"contact": contact_data,
"address": {
"business": address.BusinessName,
"postcode": address.PostCode,
"town": address.Town,
"county": address.County,
"country": address.Country.Name,
"lines": [
address.AddressLine1,
address.AddressLine2
]
}
}
)
我还没有尝试太多,因为网上有很多关于优化SQLAlchemy的信息,但我不太确定在我的情况下什么方法会有效。收集大约100个交付的数据需要15秒(大多数交付都有多个工作)。我尝试过把每个表都连接到查询上,但结果没有返回任何数据。
1 个回答
0
预加载或者说急切加载所有相关的数据,可能会大大提高性能。否则,每次你访问一个没有加载的关系时,就会执行另一个查询,这样会很慢。
在你的例子中,使用 SQLAlchemy 2.0 的 select
查询方式,可能看起来像这样(未经测试):
from sqlalchemy.sql import select
q = select(Delivery).join(Order).where(and_(Delivery.DespatchDateTime.between(start_date, end_date), Order.ProductionSite == site_map.get(site)))
q = q.options(
# We are already doing a join, so just use that join to load in the related orders and then join again to get those clients.
joinedload(Delivery.Order).joinedload(Order.Client)
# Load all these separately.
selectinload(Delivery.DeliveryMethod),
selectinload(Delivery.Address),
selectinload(Delvery.AddressContact),
# But here get this association object and the job too.
selectinload(Delivery.JobDeliveries).joinedload(JobDelivery.Job)
)
delivery_rs = session.scalars(q).all()
很难分辨哪些是类,哪些是关系,哪些是值,因为看起来所有的名称都是用大驼峰命名法,所以我不太确定我理解得对不对。
为了检查你是否获取了所有的关系,可以开启 echo=True
并启动你的脚本。所有的查询应该在开始时就执行,而不是在你的循环里面。你可以在里面放一个打印语句之类的。