面向流的csv修改工具
csvsed的Python项目详细描述
一个面向流的csv修改工具。就像脱光的“sed” 命令,但用于表格数据。
tl;dr
安装:
$ pip install csvsed
使用:
# given a sample CSV $ cat sample.csv Employee ID,Age,Wage,Status 8783,47,"104,343,873.83","All good, but nowhere to go."2003,32,"98,878,784.00",A-OK # modify that data with a series of `csvsed` pipes $ cat sample.csv \ | csvsed -c Wage s/,//g \ # remove commas from the Wage column | csvsed -c Status 'y/A-Z/a-z/'\ # convert Status to all lowercase | csvsed -c Status 's/.*(ok|good).*/\1/'\ # restrict to keywords 'ok' & 'good' | csvsed -c Age 'e/xargs -I {} echo "{}*2" | bc/'# double the Age column Employee ID,Age,Wage,Status 8783,94,104343873.83,good 2003,64,98878784.00,ok
安装
$ pip install csvsed
用法和示例
安装csvsedpython包还将安装 csvsed命令行工具。对所有命令使用csvsed --help。 行选项,但这里有一些例子让你去。鉴于 输入文件sample.csv:
Employee ID,Age,Wage,Status 8783,47,"104,343,873.83","All good, but nowhere to go." 2003,32,"98,878,784.00",A-OK
使用“s”从“工资”列中删除数千个分隔符 (替换)修饰符:
$ cat sample.csv | csvsed -c Wage s/,//g Employee ID,Age,Wage,Status 8783,47,104343873.83,"All good, but nowhere to go."2003,32,98878784.00,A-OK
使用“s”(替换)和“y”转换/提取一些文本 (音译)修饰语:
$ cat sample.csv | csvsed -c Status 's/^All (.*),.*/\1/'\ | csvsed -c Status 's/^A-(.*)/\1/'\ | csvsed -c Status 'y/a-z/A-Z/' Employee ID,Age,Wage,Status 8783,47,"104,343,873.83",GOOD 2003,32,"98,878,784.00",OK
使用“e”(执行)修饰符使“age”列成方形:
$ cat sample.csv | csvsed -c Age 'e/xargs -I {} echo "{}^2" | bc/' Employee ID,Age,Wage,Status 8783,2209,"104,343,873.83","All good, but nowhere to go."2003,1024,"98,878,784.00",A-OK
但是,这称为每个列的外部程序(相当于 大数据集效率低……。所以让我们做更多 高效,使用“连续”模式程序。假设如下 id2name.py程序,在stdin上用一个列获取csv (员工id)并将csv写入stdout,id转换为 姓名:
#!/usr/bin/env pythonimportsys,csvkittable={'8783':'ElfenKyng','2003':'Stradivarius'}# NOTE: *not* using csvkit's reader because it reads-ahead# causing problems since this must be stream-oriented...writer=csvkit.CSVKitWriter(sys.stdout)whileTrue:item=sys.stdin.readline()ifnotitem:breakitem=item.strip()writer.writerow([table[item]ifitemintableelseitem])sys.stdout.flush()
然后下面将有效地转换“employee id”列 收件人姓名:
$ cat sample.csv | csvsed -c 'Employee ID''e|./id2name.py|c' Employee ID,Age,Wage,Status ElfenKyng,47,"104,343,873.83","All good, but nowhere to go." Stradivarius,32,"98,878,784.00",A-OK