Python rpatterson.stripdupes包_程序模块 - PyPI

从文件中删除重复的行序列

rpatterson.stripdupes的Python项目详细描述

安装

$ easy_install rpatterson.stripdupes

用法

请参阅StripDupes控制台脚本的帮助消息。

>>> import subprocess
>>> popen = subprocess.Popen(
...     [stripdupes_script, '--help'],
...     stdout=subprocess.PIPE, stderr=subprocess.PIPE)
>>> print popen.stdout.read()
Usage: stripdupes [options]
Strip duplicated sequences of lines.
Options:
  -h, --help  show this help message and exit
  -m NUM, --min=NUM  Minimum length of duplicated sequence.  If
                     NUM is less than one, use a proportion of the
                     total number of lines, otherwise NUM is a
                     number of lines. [default: 0.01]
  -p REGEXP, --pattern=REGEXP
                        Regular expression pattern used to
                        normalize strings in sequences of strings.
                        The default matches all whitespace. Use an
                        empty string to disable. [default: '\s+']
  -r STRING, --repl=STRING
                        String to replace matches of pattern with
                        for normalizing strings in sequences of
                        strings. [default: ' ']

当给定的输入文件的组合内容包括超过阈值的行在输入文件，输出文件将不重复顺序。

>>> input = """\
... foo
... foo
... bar
... baz
... qux
... quux
... foo
... bar
... baz
... qux
... bah
... blah1
... quux
... blah
... quux
... fin
... """

>>> import cStringIO
>>> from rpatterson import stripdupes
>>> for line in stripdupes.strip(
...     cStringIO.StringIO(input).readlines()): print line,
foo
bar
baz
qux
quux
bah
blah1
blah
fin

>>> input = """\
... blah
... quux
... bah
... foo
... foo\t
... bar
... baz
... qux
... quux
... foo
... bar
... baz
... qux
... fin
... fin
... fin
... null
... fin
... """

>>> for line in stripdupes.strip(
...     cStringIO.StringIO(input).readlines()): print line,
blah
quux
bah
foo
bar
baz
qux
fin
null

确保可以处理奇数序列。

>>> list(stripdupes.strip([]))
[]
>>> list(stripdupes.strip(['foo']))
['foo']

如果重复序列是序列的长度。

>>> seq = range(149)+[0]
>>> len(seq)
150
>>> seq[0] == seq[149]
True
>>> len(list(stripdupes.strip(seq, pattern=None)))
150

>>> seq = range(148)+[0]
>>> len(seq)
149
>>> seq[0] == seq[148]
True
>>> len(list(stripdupes.strip(seq, pattern=None)))
148

更改日志

0.1-2009-05-27

初始版本

欢迎加入QQ群-->： 979659372

rpatterson.stripdupes 0.1

rpatterson.stripdupes的Python项目详细描述

安装

用法

更改日志

0.1-2009-05-27

推荐PyPI第三方库

pysecurebox

bridge-sim

mc-other

Topsis-ArjunMalik-101816029

distributions-pranav

little-brother-taskbar

ipcn

teia

miss-hit-core

uswapper

djangomessagesextends

image2console

hanabi-learning-environment

sample-whois

phylline

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

rpatterson.stripdupes 0.1

rpatterson.stripdupes的Python项目详细描述

安装

用法

更改日志

0.1-2009-05-27

推荐PyPI第三方库

pysecurebox

bridge-sim

mc-other

Topsis-ArjunMalik-101816029

distributions-pranav

little-brother-taskbar

ipcn

teia

miss-hit-core

uswapper

djangomessagesextends

image2console

hanabi-learning-environment

sample-whois

phylline

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签