使用Python将dataframe索引到elasticsearch中

2024-03-29 06:06:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将一些pandas数据帧索引到ElasticSearch中。我在解析生成的json时遇到了一些问题。我想我的问题来自地图。请在下面找到我的代码。在

import logging
from pprint import pprint
from elasticsearch import Elasticsearch
import pandas as pd

def create_index(es_object, index_name):
    created = False
    # index settings
    settings = {
        "settings": {
            "number_of_shards": 1,
            "number_of_replicas": 0
        },
        "mappings": {
            "danger": {
                "dynamic": "strict",
                "properties": {
                    "name": {
                       "type": "text"
                    },
                    "first_name": {
                        "type": "text"
                    },
                    "age": {
                        "type": "integer"
                    },
                    "city": {
                        "type": "text"
                    },
                    "sex": {
                        "type": "text",
                    },
                }
            }
        }
    }

    try:
        if not es_object.indices.exists(index_name):
            #Ignore 400means to ignore "Index Already Exist" error
            es_object.indices.create(index=index_name, ignore=400,     
body=settings)
            print('Created Index')
        created = True
    except Exception as ex:
        print(str(ex))
    finally:
        return created


def store_record(elastic_object, index_name, record):
    is_stored = True
    try:
        outcome = elastic_object.index(index=index_name,doc_type='danger', body=record)
        print(outcome)
    except Exception as ex:
        print('Error in indexing data')


data = [['Hook', 'James','90', 'Austin','M'],['Sparrow','Jack','15', 'Paris', 'M'],['Kent','Clark','13', 'NYC', 'M'],['Montana','Hannah','28','Las Vegas', 'F'] ]
df = pd.DataFrame(data,columns=['name', 'first_name', 'age', 'city', 'sex'])
result = df.to_json(orient='records')
result = result[1:-1]
es = Elasticsearch()
if es is not None:
        if create_index(es, 'cracra'):
            out = store_record(es, 'cracra', result)
            print('Data indexed successfully')

我有以下错误

^{pr2}$

我不知道它是从哪里来的。如果有人能帮我解决这个问题,我将不胜感激。在

非常感谢!在


Tags: textnameimportindexifsettingsobjectes
1条回答
网友
1楼 · 发布于 2024-03-29 06:06:13

尝试从映射中删除多余的逗号:

"mappings": {
  "danger": {
    "dynamic": "strict",
    "properties": {
      "name": {
        "type": "text"
      },
      first_name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "city": {
        "type": "text"
      },
      "sex": {
        "type": "text", <  here
      }, <  and here
    }
  }
}

更新

看来索引创建成功了,问题出在数据索引上。正如Nishant Saini所指出的,您可能试图一次索引多个文档。可以使用Bulk API完成。以下是索引两个文档的正确请求示例:

^{pr2}$

请求正文中的每个文档都必须出现在新行中,前面有一些元信息。在这种情况下,metainfo只包含必须分配给文档的id。在

您可以手动进行此查询,也可以对Python使用Elasticsearch Helpers,它可以负责添加正确的metainfo。在

相关问题 更多 >