Kibana的索引管理不会更新文档

connections.create_connection(hosts=['localhost']) for index, doc in df.iterrows(): new_cluster = Cluster(meta={'id': doc.url_hashed}, title = doc.title, cluster = doc.cluster, url = doc.url, paper = doc.paper, published = doc.published, entered = datetime.datetime.now() ) new_cluster.save()

from datetime import datetime from elasticsearch_dsl import Document, Date, Integer, Keyword, Text from elasticsearch_dsl.connections import connections class Cluster(Document): title = Text(analyzer='standard', fields={'raw': Keyword()}) cluster = Integer() url = Text() paper = Text() published = Date() entered = Date() class Index: name = 'cluster' def save(self, ** kwargs): return super(Cluster, self).save(** kwargs)

1条回答

网友

1楼 · 发布于 2024-04-20 04:39:52

首先，您有1个节点（可能是主节点和数据节点），在索引管理中，它表示您的索引状态为yellow，这意味着没有分配副本碎片（如果只有1个节点，则不能有副本，因为副本意味着将这些主碎片放在另一个节点上）。如果需要1个副本，则至少需要2个数据节点）。您需要为索引将副本设置为0，才能使群集再次处于绿色状态：

PUT /<YOUR_INDEX>/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}

至于索引计数，在批量操作之后，需要发生flush才能在磁盘上写入文档。来自文档：

Flushing an index is the process of making sure that any data that is currently only stored in the transaction log is also permanently stored in the Lucene index. When restarting, Elasticsearch replays any unflushed operations from the transaction log into the Lucene index to bring it back into the state that it was in before the restart. Elasticsearch automatically triggers flushes as needed, using heuristics that trade off the size of the unflushed transaction log against the cost of performing each flush.
Once each operation has been flushed it is permanently stored in the Lucene index.

基本上，当您批量处理N个文档时，不会立即看到它们，因为它们还没有写入Lucene索引。在bulk操作完成后，可以手动触发flush：

POST /<YOUR_INDEX>/_flush

然后检查索引中的文档数：

GET _cat/indices?v&s=index

也可以强制每N秒刷新一次，例如：

PUT /<YOUR_INDEX>/_settings
{
    "index" : {
        "refresh_interval" : "1s"
    }
}

您可以在docs中阅读更多关于它的内容，但我的建议是，如果文档的数量与您打包的文档的数量相同，则不要担心它，使用Kibana dev tools而不是index managementGUI。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章