面向流的csv修改工具

csvsed的Python项目详细描述


一个面向流的csv修改工具。就像脱光的“sed” 命令,但用于表格数据。

tl;dr

安装:

$ pip install csvsed

使用:

# given a sample CSV
$ cat sample.csv

Employee ID,Age,Wage,Status
8783,47,"104,343,873.83","All good, but nowhere to go."2003,32,"98,878,784.00",A-OK

# modify that data with a series of `csvsed` pipes
$ cat sample.csv \
| csvsed -c Wage s/,//g \ # remove commas from the Wage column
| csvsed -c Status 'y/A-Z/a-z/'\ # convert Status to all lowercase
| csvsed -c Status 's/.*(ok|good).*/\1/'\ # restrict to keywords 'ok' & 'good'
| csvsed -c Age 'e/xargs -I {} echo "{}*2" | bc/'# double the Age column

Employee ID,Age,Wage,Status
8783,94,104343873.83,good
2003,64,98878784.00,ok

安装

$ pip install csvsed

用法和示例

安装csvsedpython包还将安装 csvsed命令行工具。对所有命令使用csvsed --help。 行选项,但这里有一些例子让你去。鉴于 输入文件sample.csv

Employee ID,Age,Wage,Status
8783,47,"104,343,873.83","All good, but nowhere to go."
2003,32,"98,878,784.00",A-OK

使用“s”从“工资”列中删除数千个分隔符 (替换)修饰符:

$ cat sample.csv | csvsed -c Wage s/,//g
Employee ID,Age,Wage,Status
8783,47,104343873.83,"All good, but nowhere to go."2003,32,98878784.00,A-OK

使用“s”(替换)和“y”转换/提取一些文本 (音译)修饰语:

$ cat sample.csv | csvsed -c Status 's/^All (.*),.*/\1/'\
| csvsed -c Status 's/^A-(.*)/\1/'\
| csvsed -c Status 'y/a-z/A-Z/'
Employee ID,Age,Wage,Status
8783,47,"104,343,873.83",GOOD
2003,32,"98,878,784.00",OK

使用“e”(执行)修饰符使“age”列成方形:

$ cat sample.csv | csvsed -c Age 'e/xargs -I {} echo "{}^2" | bc/'
Employee ID,Age,Wage,Status
8783,2209,"104,343,873.83","All good, but nowhere to go."2003,1024,"98,878,784.00",A-OK

但是,这称为每个列的外部程序(相当于 大数据集效率低……。所以让我们做更多 高效,使用“连续”模式程序。假设如下 id2name.py程序,在stdin上用一个列获取csv (员工id)并将csv写入stdout,id转换为 姓名:

#!/usr/bin/env pythonimportsys,csvkittable={'8783':'ElfenKyng','2003':'Stradivarius'}# NOTE: *not* using csvkit's reader because it reads-ahead# causing problems since this must be stream-oriented...writer=csvkit.CSVKitWriter(sys.stdout)whileTrue:item=sys.stdin.readline()ifnotitem:breakitem=item.strip()writer.writerow([table[item]ifitemintableelseitem])sys.stdout.flush()

然后下面将有效地转换“employee id”列 收件人姓名:

$ cat sample.csv | csvsed -c 'Employee ID''e|./id2name.py|c'
Employee ID,Age,Wage,Status
ElfenKyng,47,"104,343,873.83","All good, but nowhere to go."
Stradivarius,32,"98,878,784.00",A-OK

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java异常无法从资源中找到可绘制的图标   Android 6.0.0上应用程序崩溃后java Android VM重新启动   当我将java项目导出到runnable jar中时,log4j不会记录日志   java在Jtable netbeans中显示MS访问数据   为什么不呢。Java中的NETstyle委托而不是闭包?   java如何正确使用如此多的CPU停止命令队列循环?   java使用==   java如何将scriptlet转换为JSTL?   java mvn测试失败,但通过IntelliJ IDEA运行测试   java为什么文件在使用另一个按钮后不会被删除   java JDBC MySQL不读取最新插入   java如何在安卓中绘制从Firebase数据库检索到的数据   java HTML解析getElementByClass方法   java Arraylist hashmap 安卓