blocks - 将输入发送到Python子进程管道

Question

我正在用Python测试子进程的管道。我知道下面的程序可以直接用Python实现，但我并不是这个意思。我只是想测试这个管道，以便了解如何使用它。

我的系统是Linux Ubuntu 9.04，默认的Python版本是2.6。

我开始时参考了这个文档示例。

from subprocess import Popen, PIPE
p1 = Popen(["grep", "-v", "not"], stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]
print output

这个示例可以运行，但因为p1的stdin没有被重定向，所以我必须在终端输入内容来填充管道。当我输入^D来关闭stdin时，我得到了想要的输出。

不过，我想用一个Python字符串变量来发送数据到管道。首先我尝试在stdin上写入：

p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
p1.stdin.write('test\n')
output = p2.communicate()[0] # blocks forever here

结果不行。我尝试在最后一行用p2.stdout.read()，但它也会阻塞。我加了p1.stdin.flush()和p1.stdin.close()，但也没用。然后我转向使用通信：

p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
p1.communicate('test\n') # blocks forever here
output = p2.communicate()[0]

结果还是不行。

我注意到运行一个单独的进程（比如上面的p1，去掉p2）是完全正常的。而且把文件句柄传给p1（stdin=open(...)）也可以。所以问题是：

在Python中，是否可以在不阻塞的情况下，将数据传递给两个或更多子进程的管道？为什么不可以？

我知道我可以运行一个shell并在其中运行管道，但这不是我想要的。

更新 1：根据下面Aaron Digulla的提示，我现在尝试使用线程来让它工作。

我首先尝试在一个线程中运行p1.communicate。

p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
t = threading.Thread(target=p1.communicate, args=('some data\n',))
t.start()
output = p2.communicate()[0] # blocks forever here

结果不行。我尝试了其他组合，比如改成.write()和p2.read()。都没用。现在我们试试相反的方法：

def get_output(subp):
    output = subp.communicate()[0] # blocks on thread
    print 'GOT:', output

p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)
t = threading.Thread(target=get_output, args=(p2,)) 
t.start()
p1.communicate('data\n') # blocks here.
t.join()

代码在某个地方最终会阻塞。可能是在新创建的线程中，也可能是在主线程中，或者两者都有。所以没成功。如果你知道怎么让它工作，提供一些可用的代码会更简单。我在这里尝试。

更新 2

Paul Du Bois在下面回答了一些信息，所以我进行了更多测试。我阅读了整个subprocess.py模块，了解了它的工作原理。所以我尝试将其准确应用到代码中。

我在Linux上，但由于我在测试线程，我的第一步是复制subprocess.py中communicate()方法的确切Windows线程代码，但用于两个进程而不是一个。以下是我尝试的完整代码：

import os
from subprocess import Popen, PIPE
import threading

def get_output(fobj, buffer):
    while True:
        chunk = fobj.read() # BLOCKS HERE
        if not chunk:
            break
        buffer.append(chunk)

p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)

b = [] # create a buffer
t = threading.Thread(target=get_output, args=(p2.stdout, b))
t.start() # start reading thread

for x in xrange(100000):
    p1.stdin.write('hello world\n') # write data
    p1.stdin.flush()
p1.stdin.close() # close input...
t.join()

结果是，这也不行。即使在调用了p1.stdin.close()之后，p2.stdout.read()仍然会阻塞。

然后我尝试了subprocess.py中的posix代码：

import os
from subprocess import Popen, PIPE
import select

p1 = Popen(["grep", "-v", "not"], stdin=PIPE, stdout=PIPE)
p2 = Popen(["cut", "-c", "1-10"], stdin=p1.stdout, stdout=PIPE)

numwrites = 100000
to_read = [p2.stdout]
to_write = [p1.stdin]
b = [] # create buffer

while to_read or to_write:
    read_now, write_now, xlist = select.select(to_read, to_write, [])
    if read_now:
        data = os.read(p2.stdout.fileno(), 1024)
        if not data:
            p2.stdout.close()
            to_read = []
        else:
            b.append(data)

    if write_now:
        if numwrites > 0:
            numwrites -= 1
            p1.stdin.write('hello world!\n'); p1.stdin.flush()
        else:
            p1.stdin.close()
            to_write = []

print b

在select.select()上也会阻塞。通过在代码中添加print，我发现了以下几点：

读取是有效的。代码在执行过程中多次读取。
写入也是有效的。数据被写入到p1.stdin。
在numwrites结束时，调用了p1.stdin.close()。
当select()开始阻塞时，只有to_read有内容，即p2.stdout。to_write已经为空。
os.read()调用总是返回一些内容，所以p2.stdout.close()从未被调用。

从这两个测试的结论：关闭管道中第一个进程的stdin（示例中的grep）并没有使其将缓冲的输出转发到下一个进程并结束。

没有办法让它工作吗？

附注：我不想使用临时文件，我已经测试过文件，知道它可以工作。我也不想使用Windows。

Linux 数据传输线程子进程文件句柄通信阻塞管道

blocks - 将输入发送到Python子进程管道

11 个回答

处理大文件

示例代码

处理流程中的Python阶段

创建处理流程

驱动处理流程

Python-2.6及之前版本

撰写回答