用于与.avro/.avsc文件交互的python实用程序

ctodd-python-lib-avro的Python项目详细描述


Christopher H.Todd的AVRO/AVSC的Python库

ctodd python lib avro项目负责与Apache AVRO交互。这包括转换为字节数组和字节数组、写入和读取.avro文件、写入和读取.avsc文件以及其他次要的生活质量包装

该库依赖于python的avro-python3包,并包装有自定义/特定的异常处理、更简单的交互和更实用的风格,以减少处理avro的项目中的代码

目录

依赖关系

Python包

  • avro-python3>;=1.8.2
  • SimpleJSON=3.16.0

avro_converter_helpers.py

此库用于将avro转换为其他格式(first.json)

功能:

def convert_avro_file_to_json(avro_filename, json_filename=None):
    """
    Purpose:
        Convert an .avro file into a .json file
    Args:
        avro_filename (String): Path/filename of the .avro file to convert to .json
        json_filename (String): Path/filename of the .json file to generate. if none
            is specified, just use the same .avro path and change the extension
    Yields:
        json_filename (String): Path/filename of the .json file generated
    """

avro_exceptions.py

保存将由avro_helpers库生成的自定义异常类型的文件

异常类型:

class AvroTestException(Exception):
    """
    Purpose:
        The AvscInvalid will be raised when reading the .avsc raises an exception
    """
class AvscInvalid(Exception):
    """
    Purpose:
        The AvscInvalid will be raised when reading the .avsc raises an exception
    """
class AvscNotFound(Exception):
    """
    Purpose:
        The AvscNotFound will be raised when trying to Read a .avsc file
        that cannot be found.
    """
class AvroNotFound(Exception):
    """
    Purpose:
        The AvroNotFound will be raised when trying to Read a .avro file
        that cannot be found.
    """

avro_general_helpers.py

普通助手此库用于与与读写无关的.avro文件交互。

功能:

N/A

avro_reading_helpers.py

阅读助手。此库用于帮助完成读取.avro文件的任务

功能:

def get_record_from_avro_generator(avro_filename):
    """
    Purpose:
        Generator of records from a .avro filename (with path in the filename)
    Args:
        avro_filename (String): Path/filename of the .avro file to get records from
    Yields:
        avro_record (Record Obj from .avro): Record read from the .avro file
    """
def get_record_from_avro_buffered(avro_filename):
    """
    Purpose:
        Buffered Get records from a .avro filename (with path in the filename)
    Args:
        avro_filename (String): Path/filename of the .avro file to get records from
    Returns:
        avro_records (List of Record Objs from .avro): List of Records read from
            the .avro file
    """

avro_schema_helpers.py

avro模式帮助程序。此库用于与.avsc文件交互

功能:

def get_schema_from_avsc_file(avsc_filename):
    """
    Purpose:
        Get the file schema from an .avsc filename (with path in the filename)
    Args:
        avsc_filename (String): Path/filename of the .avsc file to get the schema from
    Return:
        avro_schema (AVRO Schema Object): Schema object from the avro library
    """

avro_writing_helpers.py

写作助手。此库用于帮助完成写入.avro文件的任务

功能:

def write_raw_records_to_avro(raw_records, avro_filename, avro_schema):
    """
    Purpose:
        Write Records to .avro File
    Args:
        raw_records (List of Dicts): List of Recrods to Write to AVRO as Bytes
        avro_filename (String): Filename and path of .avro to write
        avro_schema (AVRO Schema Object): Schema object from the avro library
    Returns:
        N/A
    """
def serialize_data(raw_records, avro_schema):
    """
    Purpose:
        Serialize a record as bytes
    Args:
        raw_records (List of Dicts): List of Records to Serialize
        avro_schema (AVRO Schema Object): Schema object from the avro library
    Return:
        serialized_records (List of Byte Array): Records Serialized into Byte-Array
    """

示例脚本

用于测试和与库交互的示例可执行python脚本/模块。这些示例显示了库的用例,可以用作与库一起开发的模板,也可以用作一次性开发工作

read_avro_file.py

    Purpose:
        Read an .avro File

    Steps:
        - Either
            - Read .avro File as Buffered List
            - Read .avro File as Generator

    function call:
        python3 read_avsc_file.py {--avro=avro_filename}

    example call:
        python3 read_avsc_file.py --avro="./data/test_data.avro"

read_avsc_file.py

    Purpose:
        Read an .avsc File to get the schema

    Steps:
        - Read .avsc Schema

    function call:
        python3 read_avsc_file.py {--avsc=avsc_filename}

    example call:
        python3 read_avsc_file.py --avsc="./avsc/test_schema.avsc"

write_avro_file.py

    Purpose:
        Write an .avro File

    Steps:
        - Either
            - Write .avro File

    function call:
        python3.6 write_avro_file.py {--avro=avro_filename} \
            {--avsc=avsc_filename}

    example call:
        python3.6 write_avro_file.py --avro="./data/generated_data.avro" \
            --avsc="./avsc/test_schema.avsc"

注释

  • 依赖于f-string符号,它仅限于python3.6。通过重构删除这些内容,可以使用python3.0.x到3.5.x进行开发

待办事项

  • UnitTest框架已就位,但缺少测试

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
安卓 java。lang.NullPointerException:uriString   如何使IntelliJ IDEA支持Java 7功能?   如何最好地将这个java方法翻译成python   eclipse java。lang.IllegalStateException:设置后无法更改位置   java连接超时在HttpClient中不起作用   java在Eclipse中添加JPA连接   java我需要帮助来构建一个返回数组的方法   c#从Internet Explorer 8中的ActiveX控件中提取数据   java使用varargs传递参数对,而不会遇到错误模式   java使用jQuery读取txt文件时无法返回函数外的值   ApachePOI如何在Java中获取“last saved by”Office文件属性   to date JavaTo_date()在可调用语句中   向maven添加依赖项时出现java问题   java Selenium服务器,在ASP中单击定位器。NET网页工作不稳定