Codility 基因组范围查询
我最近发现了Codility,并且正在进行演示培训。我写了一个解决基因组范围查询问题的方案,运行得很好,使用了动态规划的方法,但得分只有87%,而不是我预期的100%。
有没有人知道这是为什么呢?
你可以在这里找到这个问题,它在前缀部分。只需开始一个测试就能看到问题描述! Codility培训
谢谢!
def solution(S, P, Q):
# write your code in Python 2.6
S = list(S)
sol = [[0]*len(S),[0]*len(S),[0]*len(S),[0]*len(S)]
mapping = {"A":1, "C":2, "G":3, "T":4}
for i in range(0,len(S)):
if S[i] == 'A':
sol[0][i]+= 1
elif S[i] == 'C':
sol[1][i] += 1
elif S[i] == 'G':
sol[2][i] += 1
elif S[i] == 'T':
sol[3][i] += 1
if i < len(S)-1:
sol[0][i+1] = sol[0][i]
sol[1][i+1] = sol[1][i]
sol[2][i+1] = sol[2][i]
sol[3][i+1] = sol[3][i]
for n in range(0, len(P)):
l = P[n]
r = Q[n]
pre_sum = [0,0,0,0]
if l > 0:
pre_sum = [sol[0][l],sol[1][l],sol[2][l],sol[3][l]]
post_sum = [sol[0][r],sol[1][r],sol[2][r],sol[3][r]]
if post_sum[0]-pre_sum[0] > 0:
P[n] = 1
elif post_sum[1]-pre_sum[1] > 0:
P[n] = 2
elif post_sum[2]-pre_sum[2] > 0:
P[n] = 3
elif post_sum[3]-pre_sum[3] > 0:
P[n] = 4
else:
P[n] = mapping[S[P[n]]];
return P
pass
9 个回答
1
我们可以计算从当前位置(i=0,1,...,N-1)到每种核苷酸最近的前一个核苷酸的距离,所有之前的核苷酸和当前的核苷酸(在当前位置)都要考虑在内。
距离数组 pre_dists 大概会是这样的:
| C A G C C T A |
----|-----------------------------------|
A | -1 0 1 2 3 4 0 |
C | 0 1 2 0 0 1 2 |
G | -1 -1 0 1 2 3 4 |
T | -1 -1 -1 -1 -1 0 1 |
根据这些距离数据,我可以得到任何片段的最小影响因子。
我在Python中的实现:
def solution(S, P, Q):
N = len(S)
M = len(P)
# impact factors
I = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
# distance from current position to the nearest nucleotide
# for each nucleotide type (previous or current nucleotide are considered)
# e.g. current position is 'A' => the distance dist[0] = 0, index 0 for type A
# 'C' => the distance dist[1] = 0, index 1 for type C
pre_dists = [[-1]*N,[-1]*N,[-1]*N,[-1]*N]
# initial values
pre_dists[I[S[0]]-1][0] = 0
for i in range(1, N):
for t in range(4):
if pre_dists[t][i-1] >= 0:
# increase the distances
pre_dists[t][i] = pre_dists[t][i-1] + 1
# reset distance for current nucleotide type
pre_dists[I[S[i]]-1][i] = 0
# result keeper
res = [0]*M
for k in range(M):
p = P[k]
q = Q[k]
if pre_dists[0][q] >=0 and q - pre_dists[0][q] >= p:
res[k] = 1
elif pre_dists[1][q] >=0 and q - pre_dists[1][q] >= p:
res[k] = 2
elif pre_dists[2][q] >=0 and q - pre_dists[2][q] >= p:
res[k] = 3
else:
res[k] = 4
return res
希望这对你有帮助。谢谢!
2
如果还有人对这个练习感兴趣,我分享一下我的Python解决方案(在Codility上得了满分100/100)
def solution(S, P, Q):
count = []
for i in range(3):
count.append([0]*(len(S)+1))
for index, i in enumerate(S):
count[0][index+1] = count[0][index] + ( i =='A')
count[1][index+1] = count[1][index] + ( i =='C')
count[2][index+1] = count[2][index] + ( i =='G')
result = []
for i in range(len(P)):
start = P[i]
end = Q[i]+1
if count[0][end] - count[0][start]:
result.append(1)
elif count[1][end] - count[1][start]:
result.append(2)
elif count[2][end] - count[2][start]:
result.append(3)
else:
result.append(4)
return result
5
这是一个得分100分的算法,时间复杂度是O(N+M),没有使用任何语言特定的技巧,比如in
或contains
这些操作符:
Lets define prefix as:
* last index of particular nucleone before on in current position. If no prev occcurance put -1.
*
*
* indexes: 0 1 2 3 4 5 6
* factors: 2 1 3 2 2 4 1
* C A G C C T A
*
* prefix : A -1 1 1 1 1 1 6
* C 0 0 0 3 4 4 4
* G -1 -1 2 2 2 2 2
* T -1 -1 -1 -1 -1 5 5
*
* Having such defined prefix let us easily calculate answer question of minimal factor in following way:
* subsequence S[p]S[p+1]...S[q-1]S[q] has the lowest factor:
* 1 if prefix index [A][q] >= p
* 2 if prefix index [C][q] >= p
* 3 if prefix index [G][q] >= p
* 4 if prefix index [T][q] >= p
这是我对这个想法的实现
7
这个方法也能完美运行,成功率是100/100。
def solution(S, P, Q):
res = []
for i in range(len(P)):
if 'A' in S[P[i]:Q[i]+1]:
res.append(1)
elif 'C' in S[P[i]:Q[i]+1]:
res.append(2)
elif 'G' in S[P[i]:Q[i]+1]:
res.append(3)
else:
res.append(4)
return res
2
哦,我之前也在做这个,调试花了我很长时间,不过最后我还是成功了,得了满分100。
举个例子,当
S='AGT'
,还有 P=[1]
和 Q=[2]
时,函数应该返回3,因为G的位置是3,但你写的(我最开始写的也是)会返回4,表示T的位置。
我觉得这样改就能解决问题:
if l > 0:
pre_sum = [sol[0][l-1],sol[1][l-1],sol[2][l-1],sol[3][l-1]]