用于与.avro/.avsc文件交互的python实用程序
ctodd-python-lib-avro的Python项目详细描述
Christopher H.Todd的AVRO/AVSC的Python库
ctodd python lib avro项目负责与Apache AVRO交互。这包括转换为字节数组和字节数组、写入和读取.avro文件、写入和读取.avsc文件以及其他次要的生活质量包装
该库依赖于python的avro-python3包,并包装有自定义/特定的异常处理、更简单的交互和更实用的风格,以减少处理avro的项目中的代码
目录
依赖关系
Python包
- avro-python3>;=1.8.2
- SimpleJSON=3.16.0
库
avro_converter_helpers.py
此库用于将avro转换为其他格式(first.json)
功能:
def convert_avro_file_to_json(avro_filename, json_filename=None):
"""
Purpose:
Convert an .avro file into a .json file
Args:
avro_filename (String): Path/filename of the .avro file to convert to .json
json_filename (String): Path/filename of the .json file to generate. if none
is specified, just use the same .avro path and change the extension
Yields:
json_filename (String): Path/filename of the .json file generated
"""
avro_exceptions.py
保存将由avro_helpers库生成的自定义异常类型的文件
异常类型:
class AvroTestException(Exception):
"""
Purpose:
The AvscInvalid will be raised when reading the .avsc raises an exception
"""
class AvscInvalid(Exception):
"""
Purpose:
The AvscInvalid will be raised when reading the .avsc raises an exception
"""
class AvscNotFound(Exception):
"""
Purpose:
The AvscNotFound will be raised when trying to Read a .avsc file
that cannot be found.
"""
class AvroNotFound(Exception):
"""
Purpose:
The AvroNotFound will be raised when trying to Read a .avro file
that cannot be found.
"""
avro_general_helpers.py
普通助手此库用于与与读写无关的.avro文件交互。
功能:
N/A
avro_reading_helpers.py
阅读助手。此库用于帮助完成读取.avro文件的任务
功能:
def get_record_from_avro_generator(avro_filename):
"""
Purpose:
Generator of records from a .avro filename (with path in the filename)
Args:
avro_filename (String): Path/filename of the .avro file to get records from
Yields:
avro_record (Record Obj from .avro): Record read from the .avro file
"""
def get_record_from_avro_buffered(avro_filename):
"""
Purpose:
Buffered Get records from a .avro filename (with path in the filename)
Args:
avro_filename (String): Path/filename of the .avro file to get records from
Returns:
avro_records (List of Record Objs from .avro): List of Records read from
the .avro file
"""
avro_schema_helpers.py
avro模式帮助程序。此库用于与.avsc文件交互
功能:
def get_schema_from_avsc_file(avsc_filename):
"""
Purpose:
Get the file schema from an .avsc filename (with path in the filename)
Args:
avsc_filename (String): Path/filename of the .avsc file to get the schema from
Return:
avro_schema (AVRO Schema Object): Schema object from the avro library
"""
avro_writing_helpers.py
写作助手。此库用于帮助完成写入.avro文件的任务
功能:
def write_raw_records_to_avro(raw_records, avro_filename, avro_schema):
"""
Purpose:
Write Records to .avro File
Args:
raw_records (List of Dicts): List of Recrods to Write to AVRO as Bytes
avro_filename (String): Filename and path of .avro to write
avro_schema (AVRO Schema Object): Schema object from the avro library
Returns:
N/A
"""
def serialize_data(raw_records, avro_schema):
"""
Purpose:
Serialize a record as bytes
Args:
raw_records (List of Dicts): List of Records to Serialize
avro_schema (AVRO Schema Object): Schema object from the avro library
Return:
serialized_records (List of Byte Array): Records Serialized into Byte-Array
"""
示例脚本
用于测试和与库交互的示例可执行python脚本/模块。这些示例显示了库的用例,可以用作与库一起开发的模板,也可以用作一次性开发工作
read_avro_file.py
Purpose:
Read an .avro File
Steps:
- Either
- Read .avro File as Buffered List
- Read .avro File as Generator
function call:
python3 read_avsc_file.py {--avro=avro_filename}
example call:
python3 read_avsc_file.py --avro="./data/test_data.avro"
read_avsc_file.py
Purpose:
Read an .avsc File to get the schema
Steps:
- Read .avsc Schema
function call:
python3 read_avsc_file.py {--avsc=avsc_filename}
example call:
python3 read_avsc_file.py --avsc="./avsc/test_schema.avsc"
write_avro_file.py
Purpose:
Write an .avro File
Steps:
- Either
- Write .avro File
function call:
python3.6 write_avro_file.py {--avro=avro_filename} \
{--avsc=avsc_filename}
example call:
python3.6 write_avro_file.py --avro="./data/generated_data.avro" \
--avsc="./avsc/test_schema.avsc"
注释
- 依赖于f-string符号,它仅限于python3.6。通过重构删除这些内容,可以使用python3.0.x到3.5.x进行开发
待办事项
- UnitTest框架已就位,但缺少测试