用一行命令将json原始数据导入elasticsearch的工具
jsonpyes的Python项目详细描述
json副本
亚历山大刘
- 用一行命令将原始json数据文件导入elasticsearch
非常快——处理大数据的速度是原来的4到10倍。
安装
pip install jsonpyes
Notice: Before using
pip
to installjsonpyes
, firstly you need to installpython-pip
on your system. ( Supports Python Python2.7, 3.3, 3,4, 3.5, 3.6 )
Jsonpyes
说明:
There are 3 proccesses of importing raw JSON data to ElasticSearch
1. Only validating raw JSON data
2. Without validating ,just import data to ElasticSearch
3. After validating successfully, then import data to ElasticSearch
A valid JSON file here refers to a JSON file stacked with many lines of data
file valid_data.json and its content
{"key1": "valueA", "key2": {"sub_key1": "value2A", "sub_key2": ["Good", "Morning"]}}
{"key1": "valueB", "key2": {"sub_key1": "value2B", "sub_key2": ["Good", "Afternoon"]}}
...
{"key1": "valueC", "key2": {"sub_key1": "value2C", "sub_key2": ["Good", "Evening"]}}
包含的功能
1.验证json格式数据
jsonpyes --data raw_data.json --check
如果json数据文件有效:
如果json数据文件无效:
2。仅导入而不验证
jsonpyes --data raw_data.json --bulk http://localhost:9200 --import --index myindex2 --type mytype2
注意:如果原始json数据文件无效,jsonpyes
将不会导入它。
或者启用多线程jsonpyes --data raw_data.json --bulk http://localhost:9200 --import --index myindex2 --type mytype2 --thread 8
jsonpyes
在将数据导入ElasticSearch时支持多线程
多线程比较
无多线程
用8个线程和jsonpyes
将文件切割成碎片,然后公平地分配给工作者
As you can see these two containers have same docs loaded, if we use --thread 8 it could be several times faster, usually 5 to 10 times faster.
That really depends on your computer/server resources.
This was tested on a 4GB RAM / 2.4Ghz intel i5 Linux x64 laptop system.
无多线程
用8个线程和jsonpyes
将文件切割成碎片,然后公平地分配给工作者
As you can see these two containers have same docs loaded, if we use --thread 8 it could be several times faster, usually 5 to 10 times faster. That really depends on your computer/server resources. This was tested on a 4GB RAM / 2.4Ghz intel i5 Linux x64 laptop system.
而且有效。
3。验证和导入
jsonpyes --data raw_data.json --bulk http://localhost:9200 --import --index myindex1 --type mytype1 --check
而且有效。
参考
- 算法手写