Python Zigg包_程序模块 - PyPI

用于hadoop流的python模块

Zigg的Python项目详细描述

齐吉

ziggy为hadoop流提供了一组python方法。齐吉是用于构建复杂的MapReduce程序，使用Hadoop进行批处理在许多文件中，蒙特卡罗过程、图形算法和常见的实用程序任务（例如排序、搜索）。典型用法通常是这样的：

#!/usr/bin/env python

import ziggy.hdmc as hdmc
    from glob import glob

    files_to_process = glob("/some/path/*")
    results = hdmc.submit_checkpoint_inline(script_to_run, output_filename, files_to_process, argument_string)

要安装，请运行：

python setup.py hadoop
python setup.py install

Ziggy由Dan McClary博士撰写，起源于

Amaral Lab at Northwestern University.

安装细节

不出所料，ziggy需要一个hadoop集群。使齐吉与在运行集群之前，需要编辑setup.cfg文件 python setup.py hadoop。这可以确保ziggy创建正确的配置其模块的文件。

setup.cfg当前包含3个必须指定的定义。这些是：

Hadoop主页
The HADOOP_HOME for your system. For example, the default on our clusters at Northwestern is /usr/local/hadoop.
地图任务数
The total number of map tasks your cluster for which your cluster is configured. The default is 20.
共享tmp空间
This is the path to a shared space (usually via NFS) available to all nodes on your Hadoop cluster. While this space is not necessary for building and executing custom Hadoop-streaming calls, the “checkpointing” calls in HDMC require a shared directory from which to coordinate task and data distribution.

一旦你喜欢这些，就跑吧 python setup.py hadoop 创建hadoop_config.py模块。那就跑吧 python setup.py hadoop 安装ziggy

Ziggy的功能

hdmc

hdmc提供了与hadoop服务器交互的3基本方法。使用import ziggy.hdmc导入它。交互类型包括：

调用程序集
Building custom and executing Hadoop streaming calls. This is done using the ^{tt5}$, ^{tt6}$ and ^{tt7}$ methods.
蒙特卡罗映射
Running Monte Carlo-type operations by providing only a mapping script and a number of iterations. This is done using the ^{tt8}$ and ^{tt9}$ methods.
数据/参数分布
Processing several datafiles or a list of arguments in parallel across mappers. This is done using the ^{tt10}$ and the ^{tt11}$ methods.

值得注意的是，蒙特卡罗映射和数据分布违反Hadoop的spirit然而，它们确实提供了一种非常简单的方法模拟传统的计算集群任务而不需要集群管理沿sge或转矩线。类似地，它们不需要real集群，只是一个hadoop安装。

高密度光纤

ziggy提供了与hdfs分布式文件系统交互的方法。从python内部。可以通过导入来访问这些方法，例如， import ziggy.hdmc.hdfs方法调用模拟在hadoop dfs下找到的那些调用。

实用程序

ziggy提供了许多用于操作非常大的数据集的简单实用程序用hadoop。提供的实用程序包括：搜索、grep、数字排序和ascii排序。每个都可以在ziggy.util下访问。注释ziggy.util.search 提供hdfs目录或文件中的文件名和行号。 ziggy.util.grep提供行本身。

图形化
尽管hadoop的map/reduce范式不适合图形算法，但是 GraphReduce模块允许在Hadoop集群上进行某些图形分析。目前分析仅限于：基于度的度量、基于最短路径的度量、页面排名度量，以及连接的组件度量除了页面排名之外，所有路径派生的度量都依赖于平行宽度优先搜索有关更多信息，请参阅epydoc文档。通过导入`ziggy.GraphReduce`

示例

构建自定义Hadoop流调用：

import ziggy.hdmc as hdmc
import ziggy.hdmc.hdfs as hdfs
#load data to hdfs
hdfs.copyToHDFS(localfilename, hfds_input_filename)
mapper = '/path/to/mapper.py'
reducer = '/path/to/reducer.py'
output_filename ='hdfs_relative/output_filename'
supporting_files = [list,of,files,mappers,require]
maps = 20
hadoop_call = hdmc.build_generic_hadoop_call(mapper, reducer, hdfs_input_filename, output_filename, supporting_files, maps)
hdmc.execute_and_wait(hadoop_call)

构建蒙特卡洛作业：

import ziggy.hdmc as hdmc
mapper = '/path/to/job_with_needs_to_be_done_many_times.py'
iterations = 1000
output_file = 'output_filename'
hdmc.submit_inline(mapper, output_file, iterations)

构建任务分发作业：

import ziggy.hdmc as hdmc
url_list = [a, list, of, url, strings]
mapper = '/path/to/script/which/takes/a/url/as/sys.argv[1].py'
output_filename = 'output_file_name'
supporting_files = []
hdmc.submit_checkpoint_inline(mapper, output_filename, url_list, supporting_files, files=False)

构建数据分发作业：

import ziggy.hdmc as hdmc
file_list = [a, list, of, filenames, usually, provided, by, glob]
mapper = '/path/to/script/which/takes/a/filename/as/sys.argv[1].py'
output_filename = 'output_file_name'
supporting_files = [filenames, my, mapper,needs]
hdmc.submit_checkpoint_inline(mapper, output_filename, file_list, supporting_files, files=True)

欢迎加入QQ群-->： 979659372

Ziggy 0.1.3.1

Zigg的Python项目详细描述

齐吉

安装细节

Ziggy的功能

hdmc

高密度光纤

实用程序

示例

推荐PyPI第三方库

jutge-relayer

kumarpackage

bottletools

odoo13-addon-sale-timesheet-task-exclude

cs46-flora-trees

neptune-resolver-rest

gunicorn-torif

VedantPwdChaker

example-magic

urpatimeout

cdktf-cdktf-provider-google

mikeio

transcribe-compare

sentiment-analysis-csci-e89

python-rconfig

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

Ziggy 0.1.3.1

Zigg的Python项目详细描述

齐吉

安装细节

Ziggy的功能

hdmc

高密度光纤

实用程序

示例

推荐PyPI第三方库

jutge-relayer

kumarpackage

bottletools

odoo13-addon-sale-timesheet-task-exclude

cs46-flora-trees

neptune-resolver-rest

gunicorn-torif

VedantPwdChaker

example-magic

urpatimeout

cdktf-cdktf-provider-google

mikeio

transcribe-compare

sentiment-analysis-csci-e89

python-rconfig

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签