如何在python代码中使用unix的uniqc命令?

2024-05-23 14:59:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我得检查一段中每个单词出现的次数。我得把这个词连同出现的次数一起打印出来。在

例如,如果段落是

how are you now? Are you better now?

则输出应为:

how-1
are-2
you-2
now-2
better-1

我试着使用子进程

from subprocess import call
sen=raw_input("enter:")
call(["uniq", "-c",sen])

但是函数需要一个文件作为输入。我不想输入文件。我该怎么做。在


Tags: 文件fromyou进程call单词次数now
3条回答

作为对Dimitris Jim的注释(我将作为comment发布,但不够rep),您还需要对输入进行排序。可以在python中通过将regex语句替换为

sen_list = sen.split(" ")
sen_list.sort()
sen = '\n'.join(sen_list)

我确信有一种方法可以用linux sort来实现这一点。类似地,您可以使用tr ' ' '\n'在python中用新行替换空格。在

为了完整起见,下面是在Python中解决它的方法:

import re, collections

paragraph = "how are you now? Are you better now?"

splitter = re.compile('\W')
counts = collections.Counter(word.lower() 
                             for word in splitter.split(paragraph) 
                             if word)
for word, count in counts.most_common():
    print(count, word)

如果您真的想知道如何使用uniq进行计数,那么:

from subprocess import Popen, PIPE

sen = raw_input("Enter: ")
sen = sen.lower().split() # Remove capitals and split into list of words
# Sort list to provide correct count ("-c" option counts only consecutive repeats)
# If you want to get consecutives, just don't sort.
sen.sort()
sen = "\n".join(sen) # Put each word into its own line (for input)
# uniq accepts input from stdin
p = Popen(["uniq", "-c"], stdin=PIPE, stdout=PIPE)
out = p.communicate(sen)[0].split("\n") # Pass the input, and get the output (make it a list by splittin on newlines)
counts = [] # Parse output and put it into a list
for x in out:
    if not x: continue # Skip empty lines (usually appears at the end of output string)
    counts.append(tuple(x.split())) # Split the line into tuple(number, word) and add it to counts list

# And if you want a nice output like you presented in Q:
for x in counts:
    print x[1]+"-"+x[0]

注1:这绝对不是一种方法。你真的应该用Python编写它。在

注2:这在cygwin和ubuntu12.04上进行了测试,结果相同

注3:uniq不是一个函数,它是一个命令,即存储在/bin/uniq和/usr/bin/uniq中的程序

相关问题 更多 >