Python csv读取器:如何通过命令行将输出传递给另一个脚本
我有两个脚本,一个是映射器(mapper),另一个是归约器(reducer)。这两个脚本都从CSV阅读器那里获取输入。映射器脚本需要从一个用制表符分隔的文本文件,也就是dataset.csv,获取输入,而归约器的输入则是映射器的输出。我想把归约器的输出保存到一个文本文件output.txt里。请问应该用什么命令来实现这个过程呢?
映射器:
#/usr/bin/python
import sys, csv
reader = csv.reader(sys.stdin, delimiter='\t')
writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
for line in reader:
if len(line) > 5: # parse only lines in the forum_node.tsv file
if line[5] == 'question':
_id = line[0]
student = line[3] # author_id
elif line[5] != 'node_type':
_id = line[7]
student = line[3] # author_id
else:
continue # ignore header
print '{0}\t{1}'.format(_id, student)
归约器:
#/usr/bin/python
import sys, csv
reader = csv.reader(sys.stdin, delimiter='\t')
writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)
oldID = None
students = []
for line in reader:
if len(line) != 2:
continue
thisID, thisStudent = data
if oldID and oldID != thisID:
print 'Thread: {0}, students: {1}'.format(oldID, ', '.join(students))
students = []
thisID = oldID
students.append(thisStudent)
if oldID != None:
print 'Thread: {0}, students: {1}'.format(oldID, ', '.join(students))
1 个回答
4
把文件连接在一起:
python mapper.py < dataset.csv | python reducer.py > output.txt
< dataset.csv
这个部分是把 CSV 文件传给 mapper.py
,让它在 stdin
(标准输入)中读取。接着,|
是用来把前一个命令的输出(stdout)传给下一个命令。下一个命令是 python reducer.py
,而 > output.txt
则是把这个脚本的输出结果保存到一个叫 `output.txt` 的文件里。