这是一个用于编码的子模块包


deepair-encoder的Python项目详细描述

深空编码器

此包用于对计算机兼容数据帧的数据字段进行编码。

包装结构

deepair_encoder

——编码器.py -init.py ——公用事业 ——编码器工具.py -init.py ——记录器.py ——拆分器.py

1个目录,6个文件

依赖关系

注意:运行此包需要以下python3包:

  • 努比
  • scipy
  • 熊猫
  • sklearn
  • 制表
  • 全面质量管理

函数声明

下面是可用于DeepAir-dev的包中函数的签名。

编码器.py

下面是可以通过将此模块导入为from deepair_encoder.encoder import <function_name>来访问的函数。

encode_username

def encode_username(df, drop_field=True):
    '''
        Username one-hot encoder.
        inputs:
            df: Dataframe which have username column (pandas df series)
            drop_field: a flag if the usename column should be dropped or not after encoding (bool)
        return:
            df: Dataframe which have only username indicator [0/1] (pandas df series)
    '''

encode_discounts

def encode_discounts(df, drop_field=True):
    '''
        Discounts encoder.
        inputs:
            df: dataframe which has discounts column (pandas df series)
            drop_field: a flag if the discounts column should be dropped or not after encoding (bool)
        return:
            df: a dataframe with 3 new columns 'PROMOCODE', 'RES', 'LFG' and discounts droped if drop_field = True
    '''

minmax_score

def minmax_score(df, fields, keyval=['requestid', 'direction_onward'], drop_field=True, verbose=True):
    '''
        scorer function to normalize from 0-1 inversely to the value.
        inputs:
            df: Dataframe which contains fields (pandas df series)
            fields: Dataframe columns on which the score is calculated (list)
            keyval: Group the data by this key values (list)
            drop_field: Indicator to replace fields by new scored fields (bool)
            verbose: Progress bar indicator (bool)
        return:
            field: Dataframe with a score for totalprice relative to whole requestid-direction, by index, by faregroup  (pandas df series)
    '''

minmax_normalize

def minmax_normalize(df, fields, keyval=['requestid', 'direction_onward'], drop_field=True, verbose=True):
    '''
        normalizer function to normalize from 0-1 proportional to the value.
        inputs:
            df: Dataframe which contains fields (pandas df series)
            fields: Dataframe columns on which the normalication is calculated (list)
            keyval: Group the data by this key values (list)
            drop_field: Indicator to replace fields by normalized new fields (bool)
            verbose: Progress bar indicator (bool)
        return:
            field: Dataframe with a score for totalprice relative to whole requestid-direction, by index, by faregroup  (pandas df series)
    '''

encode_totalprice

def encode_totalprice(df):
    '''
        total price encoder.
        inputs:
            df: Dataframe which have totalprice and totaltaxes column (pandas df series)
        return:
            df: Dataframe with a score for totalprice relative to whole requestid-direction, by index, by faregroup  (pandas df series)
    '''

encode_bookingid

def encode_bookingid(df):
    '''
        Bookingid flag one-hot encoder.
        inputs:
            df: Dataframe which have Bookingid column (pandas df series)
        return:
            df: Dataframe which have only Bookingid indicator [0/1] (pandas df series)
    '''

encode_faregroup

def encode_faregroup(raw_file, df, verbose=False):
    '''
        Faregroup encoder.
        inputs:
            raw_file: raw_file location where the faregroup_definition is available
            df: Dataframe which have faregroup column (pandas df series)
            verbose: Optional, if more detailed log is needed
        return:
            field: Dataframe with extra fields with faregroup attributes [-1/0/1] (pandas df series)
            Note: -1 = not available, 0 = available for a fee, 1 = available for free (at no charge)
    '''

encode_datetime

def encode_datetime(df, field, verbose=False):
    '''
        Datetime encoder.
        inputs:
            df: Dataframe which have 'field' column (pandas df series)
            field: the field that has to be encoded
        return:
            df: Dataframe with 6 ecoded values capturing TOD, DOW and WOY
    '''

encode_advanced_purchase

def encode_advanced_purchase(df, dptr_field='departuredate', sales_field='utctimestamp'):
    '''
        advanced purchase encoder.
        inputs:
            df: Dataframe which have dptr_field and sales_field column (pandas df series)
        return:
            df: Dataframe advanced purchase column
    '''

encode_los

def encode_los(df, verbose=False):
    '''
        los and trip_type encoder.
        inputs:
            df: Dataframe which have departuredate, requestid and direction column (pandas df series)
        return:
            df: Dataframe los and trip_type encodes column
    '''

encode_airports

def encode_airports(df, fields, reference_file_path, verbose=True):
    '''
        encode the airports.
        inputs:
            df                      : Dataframe which have airports (pandas df series)
            fields                  : list of fields that you want to run this
                                        function on (list)
            reference_file_path     : path to the codes file (string)
            verbose                 : Indicator for progress bar (bool)
        return:
            df: Dataframe encoded with airports values
    '''

encode_city

def encode_city(df, fields, reference_file_path, verbose=True):
    '''
        encode the airports.
        inputs:
            df                      : Dataframe which have city (pandas df series)
            fields                  : list of fields that you want to run this
                                        function on (list)
            reference_file_path     : path to the codes file (string)
            verbose                 : Indicator for progress bar (bool)
        return:
            df: Dataframe encoded with airports values
    '''

utils

此子包包含低于encoder模块级别的工具,即编码器使用的那些模块。

编码器工具

下面是可以通过将此模块导入为from deepair_encoder.utils.encoder_tools import <function_name>来访问的函数。

one_hot_encoder

def one_hot_encoder(df, fields, fields_drop=True, verbose=True, classes=None):
    '''
        Converts the fields into one hot encoding.
        inputs:
            df: Dataframe containing all those fields (pandas df)
            fields: Dataframe columns that you want to convert to one hot (string)
            verbose: Indicator for progress bar (bool)
            classes: Dictionary for the desired columns (dict)
        return:
            df: updated dataframe (pandas df)
    '''

integer_encoder

def integer_encoder(df, fields, verbose=True):
    '''
        Converts the fields into integer encoding.
        inputs:
            df: Dataframe containing all those fields (pandas df)
            fields: Dataframe columns that you want to convert to one hot (string)
            verbose: Indicator for progress bar (bool)
        returns:
            df: updated dataframe (pandas df)
    '''

get_wom

def get_wom(field):
    '''
        Function for week of the month.
        inputs:
            field: Dataframe column which have timestamp (pandas df series)
        return:
            field: Dataframe column which have wom (pandas df series)
    '''

get_dow

def get_dow(field):
    '''
        Function for day of the week.
        inputs:
            field: Dataframe column which have timestamp (pandas df series)
        return:
            field: Dataframe column which have dow (pandas df series)
    '''

get_month

def get_month(field):
    '''
        Function for month.
        inputs:
            field: Dataframe column which have timestamp (pandas df series)
        return:
            field: Dataframe column which have only month (pandas df series)
    '''

get_year

def get_year(field):
    '''
        Function for year.
        inputs:
            field: Dataframe column which have timestamp (pandas df series)
        return:
            field: Dataframe column which have only year (pandas df series)
    '''

obj2num

def obj2num(df, fields, verbose=True):
    '''
        Converts the fields into numeric from obj data type.
        inputs:
            df: Dataframe containing all those fields (pandas df)
            fields: Dataframe columns that you want to convert to numeric (string)
            verbose: Indicator for progress bar (bool)
        returns:
            df: updated dataframe (pandas df)
    '''

记录器

下面是可以通过将此模块导入为from deepair_encoder.utils.logger import <function_name>来访问的函数。

unique_stats

def unique_stats(df):

_log_with_timestamp

def _log_with_timestamp(message):
    '''
        prints message on console
        input :
            message     : msg to print (string)
    '''

分离器

下面是可以通过将此模块导入为from deepair_encoder.utils.splitters import <function_name>来访问的函数。

type_splitter

def type_splitter(data, keys=['passengertypes'], newheadings=[['ADT', 'CHD', 'INF']]):
    '''
        Split the column (need to make it more generic).
        inputs:
            data: dataframe which contain this column (pd dataframe)
            keys: list of column fields to split(list)
            newheadings: list of list for header columns in keys (list)
        returns:
            data: updated dataframe (pandas df)
    '''

split_datewise

def split_datewise(df, directory='', postfix='', mode='w'):
    '''
        Split a Dataframe according to dates in UTC-Timestamp:

        input :-
            DF          : dataframe which contain this column (pd dataframe)
            directory   : path to the target directory (string)
            postfix     : any postfix you want to add (string)
            mode        : writing mode ('w':writing, 'a': append)
    '''

推荐PyPI第三方库


热门话题
scrip ropemode soco tms generatite find fabric2 prototypes mar headless zigate iperf pymontecarlo blockchain 口哨声 predicate tjson 格点 metro goodman hail 剪刀 luck mining terminals 橱柜 introspection 熔融 opensky tlm eida repackage ixml 剂量 uvision smd neodroid many2one lizard adw0rd wagtailgmaps cvxopt dnb peeper mediaellent anto 分词 槟榔 bibcure apim