我使用这段代码在spacy 2.3.0
中训练标记器
TRAIN_DATA = posData.train_data_getter()[:80000]
if model is not None:
nlp = spacy.load(model) # load existing spaCy model
print("Loaded model '%s'" % model)
else:
nlp = spacy.blank('fa')
if "tagger" not in nlp.pipe_names:
tagger = nlp.create_pipe("tagger")
for tag, values in TAG_MAP.items():
tagger.add_label(tag, values)
nlp.add_pipe(tagger, first=True)
pipe_exceptions = ["tagger"]
other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
# nlp.tokenizer = Tokenizer(nlp.vocab)
with nlp.disable_pipes(*other_pipes): # only train parser
optimizer = nlp.begin_training()
for i in range(n_iter):
random.shuffle(TRAIN_DATA)
losses = {}
batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
texts, annotations = zip(*batch)
l = []
for t in texts:
l.append(normalize(t, remove_punc=True))
texts = tuple(l)
nlp.update(texts, annotations, sgd=optimizer, losses=losses)
print("Losses", losses)
问题是损失值始终为零。我做错了什么
抱歉,这是v2.3.0中的一个错误。它将在即将发布的v2.3.1中修复。您可以使用
spacy train
来训练标记器,或者同时使用v2.2.4如果您希望尽快安装此修复程序,还可以从源代码安装到当前
master
分支(修复程序位于提交b7107ac8
)相关问题 更多 >
编程相关推荐