从Python中的单词列表中返回一个随机单词

6 投票

8 回答

18612 浏览

提问于 2025-04-15 14:29

我想用Python从一个文件中随机获取一个单词，但我觉得我下面的方法不是最好的，也不够高效。请帮帮我。

import fileinput
import _random
file = [line for line in fileinput.input("/etc/dictionaries-common/words")]
rand = _random.Random()
print file[int(rand.random() * len(file))],

效率优化文件操作随机选择

8 个回答

>>> import random
>>> random.choice(list(open('/etc/dictionaries-common/words')))
'jaundiced\n'

从人力时间上来看，这种方法是高效的。

顺便提一下，你的实现和标准库中的random.py的实现是一样的：

 def choice(self, seq):
    """Choose a random element from a non-empty sequence."""
    return seq[int(self.random() * len(seq))]

测量时间性能

我在想，所提供的解决方案的相对性能如何。基于linecache的方法显然是最受欢迎的。相比于在select_random_line()中实现的诚实算法，random.choice的单行代码慢了多少呢？

# nadia_known_num_lines   9.6e-06 seconds 1.00
# nadia                   0.056 seconds 5843.51
# jfs                     0.062 seconds 1.10
# dcrosta_no_strip        0.091 seconds 1.48
# dcrosta                 0.13 seconds 1.41
# mark_ransom_no_strip    0.66 seconds 5.10
# mark_ransom_choose_from 0.67 seconds 1.02
# mark_ransom             0.69 seconds 1.04

（每个函数调用了10次，计算的是缓存性能）。

这些结果显示，在这种情况下，简单的解决方案（dcrosta）比更复杂的方案（mark_ransom）要快。

用于比较的代码（作为一个gist）：

import linecache
import random
from timeit import default_timer


WORDS_FILENAME = "/etc/dictionaries-common/words"


def measure(func):
    measure.func_to_measure.append(func)
    return func
measure.func_to_measure = []


@measure
def dcrosta():
    words = [line.strip() for line in open(WORDS_FILENAME)]
    return random.choice(words)


@measure
def dcrosta_no_strip():
    words = [line for line in open(WORDS_FILENAME)]
    return random.choice(words)


def select_random_line(filename):
    selection = None
    count = 0
    for line in file(filename, "r"):
        if random.randint(0, count) == 0:
            selection = line.strip()
            count = count + 1
    return selection


@measure
def mark_ransom():
    return select_random_line(WORDS_FILENAME)


def select_random_line_no_strip(filename):
    selection = None
    count = 0
    for line in file(filename, "r"):
        if random.randint(0, count) == 0:
            selection = line
            count = count + 1
    return selection


@measure
def mark_ransom_no_strip():
    return select_random_line_no_strip(WORDS_FILENAME)


def choose_from(iterable):
    """Choose a random element from a finite `iterable`.

    If `iterable` is a sequence then use `random.choice()` for efficiency.

    Return tuple (random element, total number of elements)
    """
    selection, i = None, None
    for i, item in enumerate(iterable):
        if random.randint(0, i) == 0:
            selection = item

    return selection, (i+1 if i is not None else 0)


@measure
def mark_ransom_choose_from():
    return choose_from(open(WORDS_FILENAME))


@measure
def nadia():
    global total_num_lines
    total_num_lines = sum(1 for _ in open(WORDS_FILENAME))

    line_number = random.randint(0, total_num_lines)
    return linecache.getline(WORDS_FILENAME, line_number)


@measure
def nadia_known_num_lines():
    line_number = random.randint(0, total_num_lines)
    return linecache.getline(WORDS_FILENAME, line_number)


@measure
def jfs():
    return random.choice(list(open(WORDS_FILENAME)))


def timef(func, number=1000, timer=default_timer):
    """Return number of seconds it takes to execute `func()`."""
    start = timer()
    for _ in range(number):
        func()
    return (timer() - start) / number


def main():
    # measure time
    times = dict((f.__name__, timef(f, number=10))
                 for f in measure.func_to_measure)

    # print from fastest to slowest
    maxname_len = max(map(len, times))
    last = None
    for name in sorted(times, key=times.__getitem__):
        print "%s %4.2g seconds %.2f" % (name.ljust(maxname_len), times[name],
                                         last and times[name] / last or 1)
        last = times[name]


if __name__ == "__main__":
    main()

回答于 2025-04-15 由 Python大师

分享举报

另一种解决方案是使用 getline

import linecache
import random
line_number = random.randint(0, total_num_lines)
linecache.getline('/etc/dictionaries-common/words', line_number)

根据文档的说明：

linecache模块可以让你从任何文件中获取任何一行，同时它会在内部进行优化，使用缓存来处理从同一个文件读取多行的常见情况。

编辑：你可以先计算总数并存储，因为字典文件不太可能会改变。

回答于 2025-04-15 由 Python大师

分享举报

随机模块里有一个叫做 choice() 的功能，正好可以满足你的需求：

import random

words = [line.strip() for line in open('/etc/dictionaries-common/words')]
print(random.choice(words))

另外要注意，这个方法假设每个单词都是单独放在文件的一行里。如果文件很大，或者你经常需要执行这个操作，你可能会发现每次都重新读取文件会让你的程序变得慢。

回答于 2025-04-15 由 Python大师

分享举报

从Python中的单词列表中返回一个随机单词

8 个回答

测量时间性能

用于比较的代码（作为一个gist）：

撰写回答