Python Django - 将数组与模型比较
我有一个Oracle数据库,无法在里面添加新表,所以我在Django里创建了一个sqlite数据库,主要是为了把Oracle数据库里的数据同步到sqlite。
目前,Oracle数据库里大约有50万条数据。
Oracle数据库里的主键是递增的,但这并没有保证。有时候网络会出现问题,导致Django和Oracle数据库之间的连接中断,这样在同步的时候我就会漏掉一些值。
因此,我在Django里设计了一个模型:
class sequential_missing(models.Model):
database = models.CharField(max_length=200, primary_key=True)
row = models.IntegerField(primary_key=True)
基本上,数据库里有一行数据在Oracle那边是缺失的,我会对比sqlite数据库中缺失的连续值,找出在Oracle数据库里实际上是空缺的那个连续数字。这样可以加快处理速度,而不需要检查所有缺失的连续值。
整个功能的实现如下:
def checkMissing(maxValue, databaseObjects, databaseName):
missingValues = []
#############SECTION 1##########################
print "Database:" + databaseName
print "Checking for Missing Sequential Numbers"
set_of_pk_values = set(databaseObjects.objects.all().values_list('pk', flat=True))
set_one_to_max_value = set(xrange(1, maxValue+1))
missingValues = set_one_to_max_value.difference(set_of_pk_values)
#############SECTION 1##########################
#Even though missingValues could be enough, but the problem is that not even Oracle can
#guarantee the automatic incremented number is sequential, hence we would look up the values
#we thought it was missing, and remove them from missingValues, which should be faster than
#checking all of them in the oracle database
#############SECTION 2##########################
print "Checking for numbers that are empty, Current Size:" + str(len(missingValues))
emptyRow = []
for idx, val in enumerate(missingValues):
found = False
for items in sequential_missing.objects.all():
if(items.row == val and items.database == databaseName):
found = True
#print "Database:" + str(items.row) + ", Same as Empty Row:" + str(val)
if(found == True):
emptyRow.append(val)
#############SECTION 2##########################
#############SECTION 3##########################
print "Removing empty numbers, Current Size:" + str(len(missingValues)) + ", Empty Row:" + str(len(emptyRow))
missingValuesCompared = []
for idx, val in enumerate(missingValues):
found = False
for items in emptyRow:
if(val == items):
found = True
#print "Empty Row:" + str(items) + ", same as Missing Values:" + str(val)
if(found == False):
missingValuesCompared.append(val)
print "New Size:" + str(len(missingValuesCompared))
return missingValuesCompared
#############SECTION 3##########################
代码分为三个部分:
找出缺失的连续值
检查模型中的值,看看是否有匹配的,并且是相同的
创建一个新的数组,不包括第二部分中包含的行。
问题是第二部分的处理时间很长,复杂度是O(n^2),因为它需要遍历整个数据库,检查这些行是否原本是空的。
有没有更快的方法来做到这一点,同时又能尽量少占用内存呢?
编辑:
使用ROW IN会好很多,
setItem = []
for items in missingValues:
setItem.append(items)
print "Items in setItem:" + str(len(setItem))
currentCounter = 0
currentEndCounter = 500
counterIncrement = 500
emptyRowAppend = []
end = False
firstPass = False
while(end == False):
emptyRow = sequential_missing.objects.filter(database=databaseName, row__in = setItem[currentCounter:currentEndCounter])
for items in emptyRow:
emptyRowAppend.append(items.row)
if(firstPass == True):
end = True
if ((currentEndCounter+counterIncrement)>maxValue):
currentCounter += counterIncrement
currentEndCounter = maxValue
firstPass = True
else:
currentCounter += counterIncrement
currentEndCounter += counterIncrement
print "Removing empty numbers," + "Empty Row Append Size:" + str(len(emptyRowAppend)) + ", Missing Value Size:" + str(len(missingValues)) + ", Set Item Size:" + str(len(setItem)) + ", Empty Row:" + str(len(emptyRowAppend))
missingValuesCompared = []
for idx, val in enumerate(missingValues):
found = False
for items in emptyRowAppend:
if(val == items):
found = True
break
if(found == False):
missingValuesCompared.append(val)
1 个回答
1
你可以把这段代码
emptyRow = []
for idx, val in enumerate(missingValues):
found = False
for items in sequential_missing.objects.all():
if(items.row == val and items.database == databaseName):
found = True
#print "Database:" + str(items.row) + ", Same as Empty Row:" + str(val)
if(found == True):
emptyRow.append(val)
换成
emptyRow = sequential_missing.objects.filter(database=databaseName,row__in = missingValues)
这样,你就可以对数据库发出一个查询。不过这样做会把所有缺失的值拼接成一个字符串,然后插入到查询中。你可以试试看这样做是否可行。
如果不行的话,你应该把缺失的值和顺序缺失的对象按值排序,这样你就能更快地找到需要的项目。可以像这样:
sort(missingValues)
val_index = 0
for item in sequential_missing.objects.all().order_by('row'):
while (val_index < len(missingValues) and item.row>missingValues[val_index]):
val_index += 1
if (item.row == missingValues[val_index]):
emptyRow.append(item.row)