Django Haystack内存错误

2 投票
2 回答
679 浏览
提问于 2025-04-18 09:02

我正在使用Django Haystack和Elastic Search,现在我想要对55149个文件进行rebuild_index或者update_index,但是遇到了内存错误。我猜是因为文件太多了,但我该怎么解决这个问题呢?请注意,我希望能够索引大约200,000个文件。

python manage.py rebuild_index

WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 55149 processs
Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/rebuild_index.py", line 16, in handle
    call_command('update_index', **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 159, in call_command
    return klass.execute(*args, **defaults)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 195, in handle
    return super(Command, self).handle(*items, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/django/core/management/base.py", line 385, in handle
    label_output = self.handle_label(label, **options)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 221, in handle_label
    self.update_backend(label, using)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 267, in update_backend
    do_update(backend, index, qs, start, end, total, self.verbosity)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/management/commands/update_index.py", line 89, in do_update
    backend.update(index, current_qs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/haystack/backends/elasticsearch_backend.py", line 183, in update
    self.conn.bulk_index(self.index_name, 'modelresult', prepped_docs, id_field=ID)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 96, in decorate
    return func(*args, query_params=query_params, **kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 388, in bulk_index
    query_params=query_params)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 238, in send_request
    **({'data': request_body} if body else {}))
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/sessions.py", line 425, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/sessions.py", line 383, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/sessions.py", line 486, in send
    r = adapter.send(request, **kwargs)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/adapters.py", line 330, in send
    timeout=timeout
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 480, in urlopen
    body=body, headers=headers)
  File "/home/vagrant/blook/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 285, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 812, in _send_output
    msg += message_body
MemoryError

2 个回答

0

我通过增加虚拟机的内存解决了这个问题(之前我只有384MB的内存),还腾出了一些空间。

在设置中增加TIMEOUT的时间也可能会有帮助。

3

试着使用更小的批量大小,可以通过 --batch-size=XXX 来设置。

默认的批量大小是1000,所以你可以先试试更小的数字,然后再逐渐增加。

撰写回答