Python通过保留got元素的索引,从一个列表中获取N个只出现一次的元素

2024-04-25 22:01:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个长长的列表;fef的值对应于同一索引处的e的值。例如:

f = ["a", "b", "c", "d", "e", "a", "a", "c", "c", "c", "c", "d", "e", ...]
e = ["A", "B", "C", "D", "E", "A", "A", "C", "C", "C", "C", "D", "E", ...]

我想创建一个列表,它将包含来自fn元素,以及来自e的那些n元素的对应列表。所以基本上,来自f的got元素的索引将与来自e的got元素的索引相同。你知道吗

f_sub = ["b", ...]
e_sub = ["B", ...]

之后,我想从列表f中删除这些n元素,并通过保持f的顺序将其从列表e中删除。你知道吗

f_new = ["a", "c", "d", "e", "a", "a", "c", "c", "c", "c", "d", "e", ...]
e_new = ["A", "C", "D", "E", "A", "A", "C", "C", "C", "C", "D", "E", ...]

我已经做了,但对我来说太贵了,代码运行非常慢。你知道吗

import codecs, random, time

from collections import Counter, defaultdict
from itertools import dropwhile

if __name__ == "__main__":
    print "Importing English corpus ..."
    f = codecs.open("../corpus/corpus.en", encoding = "utf-8").readlines()
    init_f = f

    print "Importing Turkish corpus ..."
    e = codecs.open("../corpus/corpus.tr", encoding = "utf-8").readlines()


    print "Creating dictionary ..."
    trans = defaultdict()
    for d in range(len(f)):
        trans[f[d]] = e[d]

    print "Calculating occurences in corpus ..."
    cnt = Counter(f)

    print "Creating test data ..."
    f_test = open("../dataset/test.en", "w")
    e_test = open("../dataset/test.tr", "w")
    cntr = 0
    for a in range(len(f)):
        if cnt[f[a]] == 1:
            print str(cntr+1) + " : 5000"
            f_test.write(f[a].encode("utf-8"))
            e_test.write(e[a].encode("utf-8"))
            f.remove(f[a])
                        e.remove(e[a])
            cnt[f[a]] = 0
            cntr += 1
            if cntr == 5000:
                break
    f_test.close()
    e_test.close()

    print "Creating development data ..."
        f_dev = open("../dataset/dev.en", "w")
        e_dev = open("../dataset/dev.tr", "w")
    cntr = 0
        for b in range(len(f)):
        if cnt[f[b]] == 1:
            print str(cntr+1) + " : 5000"
                    f_dev.write(f[b].encode("utf-8"))
                    e_dev.write(e[b].encode("utf-8"))
            f.remove(f[b])
                        e.remove(e[b])
            cnt[f[b]] = 0
            cntr += 1
            if cntr == 5000:
                break
        f_dev.close()
        e_dev.close()

    print "Creating train data ..."
        f_train = open("../dataset/train.en", "w")
        e_train = open("../dataset/train.tr", "w")
        for c in range(len(f)):
        print str(c+1) + " : " + str(len(f))
                f_train.write(f[c].encode("utf-8"))
                e_train.write(e[c].encode("utf-8"))
        f_train.close()
        e_train.close()

有什么快速的方法可以做到这一点?你知道吗

谢谢你


Tags: devtest元素列表closeiftraincorpus

热门问题