文本中的第n个单词

Question

我该如何在一段文本中找到第n个单词呢？

举个例子：

my_txt("hello to you all" , 3)

all

我不想使用任何内置的函数……这不是作业 :D

Answer 1

因为所有的东西在某种程度上都是内置函数，所以我就不理会你说的不想用内置函数的说法了。

def my_txt(text, n):
    return text.split()[n]

这个方法的主要缺点是会把标点符号也包含进去。我就留给你自己去想办法怎么去掉这些标点吧。:)

Answer 2

最简单的方法就是这样：

"hello to you all".split()[3]

在80年代的方法是，你需要逐字检查文本，同时记下你发现的状态——可能还有更好的方法，但大致就是这个思路。可以看出，无论如何都得使用很多“内置”的功能。我只是避免使用那些让事情变得简单明了的方法，就像上面那样。

def my_txt(text, target):
    count = 0
    last_was_space = False
    start = end = 0
    for index, letter in enumerate(text):
        if letter.isspace():
            if not last_was_space:
                 end = index
            last_was_space = True
        elif last_was_space:
            last_was_space = False
            count += 1
            if count > target:
                return text[start:end]
            elif count == target:
                start = index
    if count == target:
        return text[start:].strip()
    raise ValueError("Word not found")

Answer 3

好的，你要求这个。你需要一个“分割成单词”的功能。这里就是。这个功能假设“单词”是由空格分开的。

没有使用任何内置函数，没有导入任何东西，也没有使用内置类型的方法，连简单的 += 这种都不算。并且这个功能已经测试过了。

C:\junk>\python15\python
Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> def mysplit(s):
...     words = []
...     inword = 0
...     for c in s:
...         if c in " \r\n\t": # whitespace
...             inword = 0
...         elif not inword:
...             words = words + [c]
...             inword = 1
...         else:
...             words[-1] = words[-1] + c
...     return words
...
>>> mysplit('')
[]
>>> mysplit('x')
['x']
>>> mysplit('foo')
['foo']
>>> mysplit('  foo')
['foo']
>>> mysplit('  foo    ')
['foo']
>>> mysplit('\nfoo\tbar\rzot ugh\n\n   ')
['foo', 'bar', 'zot', 'ugh']
>>>

文本中的第n个单词

5 个回答

撰写回答