当我运行程序时,矢量化根本不起作用。我的问题是什么?

2024-04-19 08:36:27 发布

您现在位置:Python中文网/ 问答频道 /正文

class Document: 
def __init__(self, doc_id):
    # create a new document with its ID
    self.id = doc_id
    # create an empty dictionary 
    # that will hold the term frequency (TF) counts
    self.tfs = {}

def tokenization(self, text):
    # split a title into words, 
    # using space " " as delimiter
    words = text.lower().split(" ")
    for word in words: 
        # for each word in the list
        if word in self.tfs: 
            # if it has been counted in the TF dictionary
            # add 1 to the count
            self.tfs[word] = self.tfs[word] + 1
        else:
            # if it has not been counted, 
            # initialize its TF with 1
            self.tfs[word] = 1


def save_dictionary(diction_data, file_path_name):
    f = open(file_path_name, "w+")

for key in diction_data:
    # Separate the key from the frequency with a space and
    # add a newline to the end of each key value pair
    f.write(key + " " + str(diction_data[key]) + "\n")

f.close()

def vectorize(self, data_path):
Documents = []
for i in range(1, 21):
    file_name = "./textfiles/"+ str(i) + ".txt"
    # create a new document with an ID
doc = Document(i+1)
    #Read the files
with open(file_name, 'r') as f:
    text = f.read()
    # compute the term frequencies
    #read in the files contents
doc.tokenization(text)
    # add the documents to the lists
Documents.append(doc)

save_dictionary(doc.tfs, "tf_" + str(doc.id) + ".txt")

DFS = {}
for doc in Documents:
    for word in doc.tfs:
        DFS[word] = DFS.get(word,0) + 1

    save_dictionary(doc.DFS, "DFS_" + str(doc.id) + ".txt")


vectorize("./")

我在上面添加了我正在使用的代码。当我运行它时,我什么也得不到。我的代码是错误的还是缩进问题的代码是正确的。我对python和一般的编码非常陌生,所以我希望缩进问题是问题之一,但我希望确保我使用的代码是正确的。如果你发现任何问题,请向我指出,我会做出改变来解决它们

提前谢谢你的帮助