我有两个长长的列表;f
和e
。f
的值对应于同一索引处的e
的值。例如:
f = ["a", "b", "c", "d", "e", "a", "a", "c", "c", "c", "c", "d", "e", ...]
e = ["A", "B", "C", "D", "E", "A", "A", "C", "C", "C", "C", "D", "E", ...]
我想创建一个列表,它将包含来自f
的n
元素,以及来自e
的那些n
元素的对应列表。所以基本上,来自f
的got元素的索引将与来自e
的got元素的索引相同。你知道吗
f_sub = ["b", ...]
e_sub = ["B", ...]
之后,我想从列表f
中删除这些n
元素,并通过保持f
的顺序将其从列表e
中删除。你知道吗
f_new = ["a", "c", "d", "e", "a", "a", "c", "c", "c", "c", "d", "e", ...]
e_new = ["A", "C", "D", "E", "A", "A", "C", "C", "C", "C", "D", "E", ...]
我已经做了,但对我来说太贵了,代码运行非常慢。你知道吗
import codecs, random, time
from collections import Counter, defaultdict
from itertools import dropwhile
if __name__ == "__main__":
print "Importing English corpus ..."
f = codecs.open("../corpus/corpus.en", encoding = "utf-8").readlines()
init_f = f
print "Importing Turkish corpus ..."
e = codecs.open("../corpus/corpus.tr", encoding = "utf-8").readlines()
print "Creating dictionary ..."
trans = defaultdict()
for d in range(len(f)):
trans[f[d]] = e[d]
print "Calculating occurences in corpus ..."
cnt = Counter(f)
print "Creating test data ..."
f_test = open("../dataset/test.en", "w")
e_test = open("../dataset/test.tr", "w")
cntr = 0
for a in range(len(f)):
if cnt[f[a]] == 1:
print str(cntr+1) + " : 5000"
f_test.write(f[a].encode("utf-8"))
e_test.write(e[a].encode("utf-8"))
f.remove(f[a])
e.remove(e[a])
cnt[f[a]] = 0
cntr += 1
if cntr == 5000:
break
f_test.close()
e_test.close()
print "Creating development data ..."
f_dev = open("../dataset/dev.en", "w")
e_dev = open("../dataset/dev.tr", "w")
cntr = 0
for b in range(len(f)):
if cnt[f[b]] == 1:
print str(cntr+1) + " : 5000"
f_dev.write(f[b].encode("utf-8"))
e_dev.write(e[b].encode("utf-8"))
f.remove(f[b])
e.remove(e[b])
cnt[f[b]] = 0
cntr += 1
if cntr == 5000:
break
f_dev.close()
e_dev.close()
print "Creating train data ..."
f_train = open("../dataset/train.en", "w")
e_train = open("../dataset/train.tr", "w")
for c in range(len(f)):
print str(c+1) + " : " + str(len(f))
f_train.write(f[c].encode("utf-8"))
e_train.write(e[c].encode("utf-8"))
f_train.close()
e_train.close()
有什么快速的方法可以做到这一点?你知道吗
谢谢你
目前没有回答
相关问题 更多 >
编程相关推荐