在文本d中使用python查找每个单词的支持

1条回答

网友

1楼 · 发布于 2024-04-25 00:21:12

这似乎管用。为字典采样的最大长度短语是可变pŠlength（I设置3），为排名列表采样的最大长度短语是pŠsize（I设置3，越小，当然最高频率越高），并且最终排名列表中的单词数是可变秩（I设置25）。这些设置在第8-10行。它打印的排名列表的长度（请参见“def top\u list（）：”的末尾），是以单词数表示的达到p\u长度的短语总数。你知道吗

# Load the data
fin = open("b.txt", 'r')
translist = []
for line in fin:
    trans = line.strip().split(' ')
    translist.extend(trans)

p_length = 3
p_size = 3
rank = 25

#Use a dictionary to create a histogram1 of the frequencies of the phrases (but this list is not in order)
def histogram1(translist,p_length):
    global dict1
    dict1 = dict()
    phraseList = []
    for transIndex in range(len(translist)):
        for i in range(p_length):
            if (transIndex+1+i) <= len(translist):
                phraseElementNow = translist[transIndex+i]
            else:
                continue
            if i > 0:
                joinables = (newElement, phraseElementNow)
                newElement = ' '.join(joinables)
            else:
                newElement = phraseElementNow
            phraseList.append(newElement)
    for element2 in phraseList:
        if element2 not in dict1:
            dict1[element2] = 1
        else:
            dict1[element2] += 1
    return dict1

#Create the ranked list of phrases vs their frequency.
def top_list():
    global topList
    topList = []
    for key, value in dict1.items():
        topList.append((value, key))
    topList.sort(reverse = True)
    print("Length of ranking list is: ") #Just a check
    print(len(topList))
    #print(topList[-(rank):])   Used this to check format of ranking list

#Choose the top x ranking to print (I made it 25 on line 9).
def short_list(p_size, rank):
    topTopList = []
    print("The "+str(rank)+" most common phrases "+str(p_size)+" words long are: ")
    for phrase in topList:
        phraseParts = phrase[1].split(' ')
        if len(phraseParts) == p_size:
            topTopList.append(phrase)
        else:
            continue
    for freq, word in topTopList[:rank]:
        wordParts = word.split(' ')
        wordForPrint = ';'.join(wordParts)
        completePrint = str(freq)+':'+wordForPrint
        print(completePrint)

print(histogram1(translist, p_length))
top_list()
short_list(p_size, rank)

相关问题更多 >

编程相关推荐

热门问题

热门文章

在文本d中使用python查找每个单词的支持

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >