同时打乱两个列表保持相同顺序

130 投票

8 回答

124103 浏览

提问于 2025-04-18 04:12

我正在使用 nltk 库中的 movie_reviews 数据集，这个数据集中包含了大量的文档。我的任务是对这些评论进行预测性能的评估，分别在数据预处理和不预处理的情况下进行比较。但是我遇到了一个问题，列表 documents 和 documents2 中有相同的文档，我需要将它们打乱顺序，但又要保持两个列表中的顺序一致。我不能单独打乱它们，因为每次打乱列表时，结果都会不同。因此，我需要一次性打乱这两个列表，保持相同的顺序，因为最后的比较是依赖于顺序的。我使用的是 Python 2.7。

举个例子（实际情况是字符串已经被分词，但这不是重点）：

documents = [(['plot : two teen couples go to a church party , '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['they get into an accident . '], 'neg'),
             (['one of the guys dies'], 'neg')]

documents2 = [(['plot two teen couples church party'], 'neg'),
              (['drink then drive . '], 'pos'),
              (['they get accident . '], 'neg'),
              (['one guys dies'], 'neg')]

我需要在打乱两个列表后得到这样的结果：

documents = [(['one of the guys dies'], 'neg'),
             (['they get into an accident . '], 'neg'),
             (['drink and then drive . '], 'pos'),
             (['plot : two teen couples go to a church party , '], 'neg')]

documents2 = [(['one guys dies'], 'neg'),
              (['they get accident . '], 'neg'),
              (['drink then drive . '], 'pos'),
              (['plot two teen couples church party'], 'neg')]

我有以下代码：

def cleanDoc(doc):
    stopset = set(stopwords.words('english'))
    stemmer = nltk.PorterStemmer()
    clean = [token.lower() for token in doc if token.lower() not in stopset and len(token) > 2]
    final = [stemmer.stem(word) for word in clean]
    return final

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

documents2 = [(list(cleanDoc(movie_reviews.words(fileid))), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle( and here shuffle documents and documents2 with same order) # or somehow

字符串处理数据预处理性能评估数据集分词列表打乱顺序一致性文档预测

8 个回答

一个简单又快速的方法是使用 random.seed() 和 random.shuffle()。这样你可以多次生成相同的随机顺序。

a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
seed = random.random()
random.seed(seed)
a.shuffle()
random.seed(seed)
b.shuffle()
print(a)
print(b)

>>[3, 1, 4, 2, 5]
>>[8, 6, 9, 7, 10]

这个方法也适用于当你因为内存问题不能同时处理两个列表的情况。

回答于 2025-04-18 由 Python大师

分享举报

同时打乱任意数量的列表。

from random import shuffle

def shuffle_list(*ls):
  l =list(zip(*ls))

  shuffle(l)
  return zip(*l)

a = [0,1,2,3,4]
b = [5,6,7,8,9]

a1,b1 = shuffle_list(a,b)
print(a1,b1)

a = [0,1,2,3,4]
b = [5,6,7,8,9]
c = [10,11,12,13,14]
a1,b1,c1 = shuffle_list(a,b,c)
print(a1,b1,c1)

输出：

$ (0, 2, 4, 3, 1) (5, 7, 9, 8, 6)
$ (4, 3, 0, 2, 1) (9, 8, 5, 7, 6) (14, 13, 10, 12, 11)

注意：
shuffle_list() 返回的对象是 元组。

附注：
shuffle_list() 也可以用于 numpy.array()。

a = np.array([1,2,3])
b = np.array([4,5,6])

a1,b1 = shuffle_list(a,b)
print(a1,b1)

输出：

$ (3, 1, 2) (6, 4, 5)

回答于 2025-04-18 由 Python大师

分享举报

from sklearn.utils import shuffle

a = ['a', 'b', 'c','d','e']
b = [1, 2, 3, 4, 5]

a_shuffled, b_shuffled = shuffle(np.array(a), np.array(b))
print(a_shuffled, b_shuffled)

#random output
#['e' 'c' 'b' 'd' 'a'] [5 3 2 4 1]

当然可以！请把你想要翻译的内容发给我，我会帮你用简单易懂的语言解释清楚。

回答于 2025-04-18 由 Python大师

分享举报

我找到了一种简单的方法来做到这一点。

import numpy as np
a = np.array([0,1,2,3,4])
b = np.array([5,6,7,8,9])

indices = np.arange(a.shape[0])
np.random.shuffle(indices)

a = a[indices]
b = b[indices]
# a, array([3, 4, 1, 2, 0])
# b, array([8, 9, 6, 7, 5])

回答于 2025-04-18 由 Python大师

分享举报

312

你可以这样做：

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

当然，这只是一个简单列表的例子，但你可以用同样的方法来适应你的情况。

回答于 2025-04-18 由 Python大师

分享举报

同时打乱两个列表保持相同顺序

8 个回答

撰写回答