我有一个txt文件。如何提取字典键值并打印它们出现的文本行？

2 投票

3 回答

3016 浏览

提问于 2025-04-17 05:23

我有一个文本文件。我写了一段代码，可以找出文件中独特的单词，以及每个单词出现的次数。现在我需要弄清楚如何打印出这些单词出现的行。该怎么做呢？

下面是一个示例输出：分析哪个文件：itsy_bitsy_spider.txt
文件 itsy_bitsy_spider.txt 的词汇表 itsy : 总计次数：2 行：1：小小的蜘蛛爬上了水管行：4：小小的蜘蛛又爬上了水管

#this function will get just the unique words without the stop words. 
def openFiles(openFile):

    for i in openFile:
        i = i.strip()
        linelist.append(i)
        b = i.lower()
        thislist = b.split()
        for a in thislist:
            if a in stopwords:
                continue
            else:
                wordlist.append(a)
    #print wordlist




#this dictionary is used to count the number of times each stop 
countdict = {}
def countWords(this_list):
    for word in this_list:
        depunct = word.strip(punctuation)
    if depunct in countdict:
        countdict[depunct] += 1
    else:
        countdict[depunct] = 1

文件操作文本处理文本分析词频统计行号提取字典键值唯一单词

3 个回答

如果你逐行读取输入的文本文件，你可以再维护一个字典，这个字典的作用是把每个单词和它所在的行的列表关联起来。也就是说，对于每一行中的每个单词，你都要添加一个记录。可能看起来像下面这样。请注意，我对Python不是很熟悉，所以可能有一些语法上的简化我没有注意到。

例如：

countdict = {}
linedict = {}
for line in text_file:
    for word in line:
         depunct = word.strip(punctuation)
         if depunct in countdict:
             countdict[depunct] += 1
         else:
             countdict[depunct] = 1

         # add entry for word in the line dict if not there already
         if depunct not in linedict:
             linedict[depunct] = []

         # now add the word -> line entry
         linedict[depunct].append(line)

你可能需要做的一个修改是，如果一行中有同一个单词出现两次，就要防止这个单词在字典中重复添加。

上面的代码假设你只想读取文本文件一次。

回答于 2025-04-17 由 Python大师

分享举报

openFile = open("test.txt", "r")

words = {}

for line in openFile.readlines():
  for word in line.strip().lower().split():
    wordDict = words.setdefault(word, { 'count': 0, 'line': set() })
    wordDict['count'] += 1
    wordDict['line'].add(line)

openFile.close()

print words

当然可以！请把你想要翻译的内容发给我，我会帮你把它变得简单易懂。

回答于 2025-04-17 由 Python大师

分享举报

在编程中，有时候我们需要处理一些数据，比如从一个地方获取数据，然后把它放到另一个地方。这个过程就像是搬家，把东西从一个箱子搬到另一个箱子。

在这个过程中，我们可能会用到一些工具，比如函数和变量。函数就像是一个小机器，你给它输入，它就会给你输出。而变量就像是一个盒子，用来存放你需要的数据。

当我们在写代码的时候，有时会遇到一些问题，比如数据没有按照我们想要的方式显示。这就需要我们仔细检查代码，看看哪里出了问题，就像是检查搬家时有没有把东西放错地方。

总之，编程就像是一个解决问题的过程，我们需要不断尝试和调整，才能把事情做好。

from collections import defaultdict

target = 'itsy'
word_summary = defaultdict(list)
with open('itsy.txt', 'r') as f:
    lines = f.readlines()

for idx, line in enumerate(lines):
    words = [w.strip().lower() for w in line.split()]
    for word in words:
        word_summary[word].append(idx)

unique_words = len(word_summary.keys()) 
target_occurence = len(word_summary[target]) 
line_nums = set(word_summary[target])

print "There are %s unique words." % unique_words 
print "There are %s occurences of '%s'" % (target_occurence, target) 
print "'%s' is found on lines %s" % (target, ', '.join([str(i+1) for i in line_nums]))

回答于 2025-04-17 由 Python大师

分享举报

我有一个txt文件。如何提取字典键值并打印它们出现的文本行？

3 个回答

撰写回答