为什么Python不能将stdin输入读取为字典?

0 投票
1 回答
2521 浏览
提问于 2025-04-18 17:02

我知道我可能在做一些傻事,但我还是想说说。现在我在做一个Udacity的课程作业,课程名叫“Map Reduce和Hadoop入门”。我们的作业是制作一个映射器和归约器,用来统计数据集中(论坛帖子内容)某个词出现的次数。

我有一个大致的思路,但我就是无法让Python把标准输入的数据读入到归约器中作为字典。

到目前为止,我的做法是这样的:

映射器会读取数据(在这个例子中是代码),然后输出一个字典,里面是每个论坛帖子的词和对应的计数:

#!/usr/bin/python
import sys
import csv
import re
from collections import Counter


def mapper():
    reader = csv.reader(sys.stdin, delimiter='\t')
    writer = csv.writer(sys.stdout, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

    for line in reader:
        body = line[4]
        #Counter(body)
        words = re.findall(r'\w+', body.lower())
        c = Counter(words)
        #print c.items()
        print dict(c)





test_text = """\"\"\t\"\"\t\"\"\t\"\"\t\"This is one sentence sentence\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"Also one sentence!\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"Hey!\nTwo sentences!\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"One. Two! Three?\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"One Period. Two Sentences\"\t\"\"
\"\"\t\"\"\t\"\"\t\"\"\t\"Three\nlines, one sentence\n\"\t\"\"
"""

# This function allows you to test the mapper with the provided test string
def main():
    import StringIO
    sys.stdin = StringIO.StringIO(test_text)
    mapper()
    sys.stdin = sys.__stdin__

if __name__ == "__main__":
    main()

论坛帖子的计数会像这样输出到标准输出:

{'this': 1, 'is': 1, 'one': 1, 'sentence': 2}

然后归约器应该把这个标准输入读入作为一个字典:

#!/usr/bin/python
import sys
from collections import Counter, defaultdict
for line in sys.stdin.readlines():
    print dict(line)

但这失败了,给我返回了这个错误信息:

ValueError: dictionary update sequence element #0 has length 1; 2 is required

这意味着(如果我理解得没错)它把每一行当成了文本字符串,而不是字典。我该如何让Python明白输入的这一行是一个字典呢?我试过使用Counter和defaultdict,但还是遇到同样的问题,或者它把每个字符都当成了列表的一个元素,这也不是我想要的。

理想情况下,我希望映射器能读取每一行的字典,然后把下一行的值加起来,这样在第二行之后,值就变成了{'this':1,'is':1,'one':2,'sentence':3,'also':1},依此类推。

谢谢,
JR

1 个回答

1

感谢@keyser,ast.literal_eval()这个方法对我有用。 现在我有了以下内容:

#!/usr/bin/python
import sys
from collections import Counter, defaultdict
import ast
lineDict = {}
c = Counter()
for line in sys.stdin.readlines():
    lineDict = ast.literal_eval(line)
    c.update(lineDict)
print c.most_common()

撰写回答