Python address_extractor包_程序模块 - PyPI

从文本文件中提取美式街道地址的脚本。

address_extractor的Python项目详细描述

从文本文件中提取美式街道地址的脚本

$ address_extractor
1600 Pennsylvania Ave NW, Washington, DC 20500 ^D
1 lines in input
,1600 Pennsylvania Ave NW,Washington DC 20500
$ address_extractor -o output.csv input.csv
4361 lines in input
*snip*
11 lines unable to be parsed
$ ls
output.csv

address_extractor获取包含类似地址的数据的文本或文本文件，每行一个地址，并使用usaddress包将其解析为统一格式。

安装

pypi通过pip：

pip install address_extractor

这将安装模块以及命令行脚本address_extractor。

命令行用法

address_extractor [-h] [-o OUTPUT] [--remove-post-zip] [input]

positional arguments:
  input                 the input file. Defaults to stdin.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        the output file. Defaults to stdout.
  --remove-post-zip, -r
                        when scanning the input lines, remove everything after
                        a sequence of 5 digits followed by a comma. The
                        parsing library used by this script chokes on
                        addresses containing this kind of information, often a
                        county name.

无法解析的行将被打印到STDERR。可以使用标准的bash重定向技术将它们保存到文件中：

$ address_extractor -o good_addresses.csv has_some_bad_addresses.txt 2> bad_addresses.txt

作为模块使用

address_extractor可以用作python模块：

>>> import address_extractor
>>> address_extractor.main(input=input_file_object, output=output_file_object, remove_post_zip=a_bool)

此实现存在一些小问题：

如果分别使用sys.stdin或sys.stdout作为输入或输出，则文件对象仍将关闭。这就给将来尝试使用它们带来了问题。
错误的行仍然打印到sys.stderr，这可能不是预期的。

版本和稳定性

此包作为0.1.0版本上载。没有测试和很少的错误检查——它起源于一个快速脏脚本，我决定将它作为一个包发布，以熟悉这个过程。

欢迎在github页面上提出问题、评论和请求！

欢迎加入QQ群-->： 979659372

address_extractor 0.1.0.post1

address_extractor的Python项目详细描述

安装

命令行用法

作为模块使用

版本和稳定性

推荐PyPI第三方库

elaphe

pypavlok

PyConfDict

lizard-connector

dolmen.sqlcontainer

murano-agent

highcompress

nesovetyu-com

DustyShock

redflask

openerp-procurement

collective.splashdancing

syned

gitplots

jderobot-interfaces-kibotics

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

address_extractor 0.1.0.post1

address_extractor的Python项目详细描述

安装

命令行用法

作为模块使用

版本和稳定性

推荐PyPI第三方库

elaphe

pypavlok

PyConfDict

lizard-connector

dolmen.sqlcontainer

murano-agent

highcompress

nesovetyu-com

DustyShock

redflask

openerp-procurement

collective.splashdancing

syned

gitplots

jderobot-interfaces-kibotics

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签