如何下载NLTK数据?

2024-04-26 03:47:54 发布

您现在位置:Python中文网/ 问答频道 /正文

更新答案:NLTK适用于2.7井。我得了3.2分。我卸载了3.2并安装了2.7。现在成功了!!

我已经安装了NLTK并尝试下载NLTK数据。我所做的就是遵循这个网站上的说明:http://www.nltk.org/data.html

我下载了NLTK,安装了它,然后尝试运行以下代码:

>>> import nltk
>>> nltk.download()

它给了我如下的错误信息:

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    nltk.download()
AttributeError: 'module' object has no attribute 'download'
 Directory of C:\Python32\Lib\site-packages

尝试了nltk.download()nltk.downloader(),都给出了错误消息。

然后我使用help(nltk)来取出包,它显示以下信息:

NAME
    nltk

PACKAGE CONTENTS
    align
    app (package)
    book
    ccg (package)
    chat (package)
    chunk (package)
    classify (package)
    cluster (package)
    collocations
    corpus (package)
    data
    decorators
    downloader
    draw (package)
    examples (package)
    featstruct
    grammar
    help
    inference (package)
    internals
    lazyimport
    metrics (package)
    misc (package)
    model (package)
    parse (package)
    probability
    sem (package)
    sourcedstring
    stem (package)
    tag (package)
    test (package)
    text
    tokenize (package)
    toolbox
    tree
    treetransforms
    util
    yamltags

FILE
    c:\python32\lib\site-packages\nltk

我确实在那里看到下载程序,不知道为什么它不工作。Python3.2.2,系统Windows vista。


Tags: 数据答案httppackagedata网站downloadpackages
3条回答

不要给你的文件命名nltk.py我使用了相同的代码并将其命名为nltk,得到了与你相同的错误,我更改了文件名,它运行得很好。

试试看

nltk.download('all')

这将下载所有数据,无需单独下载。

TL;博士

要下载特定的数据集/模型,请使用nltk.download()函数,例如,如果要下载punkt语句标记器,请使用:

$ python3
>>> import nltk
>>> nltk.download('punkt')

如果您不确定所需的数据/模型,可以从数据+模型的基本列表开始:

>>> import nltk
>>> nltk.download('popular')

它将下载“热门”资源列表,其中包括:

<collection id="popular" name="Popular packages">
      <item ref="cmudict" />
      <item ref="gazetteers" />
      <item ref="genesis" />
      <item ref="gutenberg" />
      <item ref="inaugural" />
      <item ref="movie_reviews" />
      <item ref="names" />
      <item ref="shakespeare" />
      <item ref="stopwords" />
      <item ref="treebank" />
      <item ref="twitter_samples" />
      <item ref="omw" />
      <item ref="wordnet" />
      <item ref="wordnet_ic" />
      <item ref="words" />
      <item ref="maxent_ne_chunker" />
      <item ref="punkt" />
      <item ref="snowball_data" />
      <item ref="averaged_perceptron_tagger" />
    </collection>

编辑

如果有人在从nltk、从https://stackoverflow.com/a/38135306/610569下载较大的数据集时避免了错误

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

更新

From v3.2.5, NLTK has a more informative error message当找不到nltk_data资源时,例如:

>>> from nltk import word_tokenize
>>> word_tokenize('x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
    opened_resource = _open(resource_url)
  File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
    return find(path_, path + ['']).open()
  File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/Users/alvas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

相关的

相关问题 更多 >