如何下载NLTK数据？

Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> nltk.download() AttributeError: 'module' object has no attribute 'download' Directory of C:\Python32\Lib\site-packages

NAME nltk PACKAGE CONTENTS align app (package) book ccg (package) chat (package) chunk (package) classify (package) cluster (package) collocations corpus (package) data decorators downloader draw (package) examples (package) featstruct grammar help inference (package) internals lazyimport metrics (package) misc (package) model (package) parse (package) probability sem (package) sourcedstring stem (package) tag (package) test (package) text tokenize (package) toolbox tree treetransforms util yamltags FILE c:\python32\lib\site-packages\nltk

3条回答

网友

1楼 · 编辑于 2024-04-26 03:47:54

不要给你的文件命名nltk.py我使用了相同的代码并将其命名为nltk，得到了与你相同的错误，我更改了文件名，它运行得很好。

网友

2楼 · 编辑于 2024-04-26 03:47:54

试试看

nltk.download('all')

这将下载所有数据，无需单独下载。

网友

3楼 · 编辑于 2024-04-26 03:47:54

TL；博士

要下载特定的数据集/模型，请使用nltk.download()函数，例如，如果要下载punkt语句标记器，请使用：

$ python3
>>> import nltk
>>> nltk.download('punkt')

如果您不确定所需的数据/模型，可以从数据+模型的基本列表开始：

>>> import nltk
>>> nltk.download('popular')

它将下载“热门”资源列表，其中包括：

<collection id="popular" name="Popular packages">
      <item ref="cmudict" />
      <item ref="gazetteers" />
      <item ref="genesis" />
      <item ref="gutenberg" />
      <item ref="inaugural" />
      <item ref="movie_reviews" />
      <item ref="names" />
      <item ref="shakespeare" />
      <item ref="stopwords" />
      <item ref="treebank" />
      <item ref="twitter_samples" />
      <item ref="omw" />
      <item ref="wordnet" />
      <item ref="wordnet_ic" />
      <item ref="words" />
      <item ref="maxent_ne_chunker" />
      <item ref="punkt" />
      <item ref="snowball_data" />
      <item ref="averaged_perceptron_tagger" />
    </collection>

编辑

如果有人在从nltk、从https://stackoverflow.com/a/38135306/610569下载较大的数据集时避免了错误

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

更新

From v3.2.5, NLTK has a more informative error message当找不到nltk_data资源时，例如：

>>> from nltk import word_tokenize
>>> word_tokenize('x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
    opened_resource = _open(resource_url)
  File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
    return find(path_, path + ['']).open()
  File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/Users/alvas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

TL；博士

编辑

更新

相关的

相关问题更多 >

编程相关推荐

热门问题

热门文章