我有一个txt文件。我如何获取字典键值并打印它们所出现的文本行？

#this function will get just the unique words without the stop words. def openFiles(openFile): for i in openFile: i = i.strip() linelist.append(i) b = i.lower() thislist = b.split() for a in thislist: if a in stopwords: continue else: wordlist.append(a) #print wordlist #this dictionary is used to count the number of times each stop countdict = {} def countWords(this_list): for word in this_list: depunct = word.strip(punctuation) if depunct in countdict: countdict[depunct] += 1 else: countdict[depunct] = 1

3条回答

网友

1楼 · 编辑于 2024-04-25 12:12:35

openFile = open("test.txt", "r")

words = {}

for line in openFile.readlines():
  for word in line.strip().lower().split():
    wordDict = words.setdefault(word, { 'count': 0, 'line': set() })
    wordDict['count'] += 1
    wordDict['line'].add(line)

openFile.close()

print words

网友

2楼 · 编辑于 2024-04-25 12:12:35

from collections import defaultdict

target = 'itsy'
word_summary = defaultdict(list)
with open('itsy.txt', 'r') as f:
    lines = f.readlines()

for idx, line in enumerate(lines):
    words = [w.strip().lower() for w in line.split()]
    for word in words:
        word_summary[word].append(idx)

unique_words = len(word_summary.keys()) 
target_occurence = len(word_summary[target]) 
line_nums = set(word_summary[target])

print "There are %s unique words." % unique_words 
print "There are %s occurences of '%s'" % (target_occurence, target) 
print "'%s' is found on lines %s" % (target, ', '.join([str(i+1) for i in line_nums]))

网友

3楼 · 编辑于 2024-04-25 12:12:35

如果逐行分析输入文本文件，则可以维护另一个字典，即word->；List<；line>；映射。一行中的每一个字都要加一个词条。可能看起来像下面这样。请记住，我对python不是很熟悉，所以我可能错过了一些语法捷径。在

例如

countdict = {}
linedict = {}
for line in text_file:
    for word in line:
         depunct = word.strip(punctuation)
         if depunct in countdict:
             countdict[depunct] += 1
         else:
             countdict[depunct] = 1

         # add entry for word in the line dict if not there already
         if depunct not in linedict:
             linedict[depunct] = []

         # now add the word -> line entry
         linedict[depunct].append(line)

您可能需要做的一个修改是，如果一个单词出现在行中两次，则防止向linedict添加重复项。在

上面的代码假设您只想读取一次文本文件。在

相关问题更多 >

编程相关推荐

热门问题

热门文章