训练时,Spacy标记器损失为零

2024-04-25 15:08:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用这段代码在spacy 2.3.0中训练标记器

TRAIN_DATA = posData.train_data_getter()[:80000]
if model is not None:
    nlp = spacy.load(model)  # load existing spaCy model
    print("Loaded model '%s'" % model)
else:
    nlp = spacy.blank('fa')

if "tagger" not in nlp.pipe_names:
    tagger = nlp.create_pipe("tagger")
    for tag, values in TAG_MAP.items():
        tagger.add_label(tag, values)
    nlp.add_pipe(tagger, first=True)

pipe_exceptions = ["tagger"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
# nlp.tokenizer = Tokenizer(nlp.vocab)
with nlp.disable_pipes(*other_pipes):  # only train parser
    optimizer = nlp.begin_training()
    for i in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
        batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            l = []
            for t in texts:
                l.append(normalize(t, remove_punc=True))
            texts = tuple(l)
            nlp.update(texts, annotations, sgd=optimizer, losses=losses)
        print("Losses", losses)

问题是损失值始终为零。我做错了什么


Tags: infordatamodelifnlpspacynot
1条回答
网友
1楼 · 发布于 2024-04-25 15:08:40

抱歉,这是v2.3.0中的一个错误。它将在即将发布的v2.3.1中修复。您可以使用spacy train来训练标记器,或者同时使用v2.2.4

如果您希望尽快安装此修复程序,还可以从源代码安装到当前master分支(修复程序位于提交b7107ac8

相关问题 更多 >

    热门问题