在学习了word2vec的牛逼后,开始进入实战,解决问题了。
前言
在学习了word2vec的牛逼后,开始进入实战,解决问题了。
实战
添加依赖
<dependency>
<groupId>com.medallia.word2vec</groupId>
<artifactId>word2vecjava_2.11</artifactId>
<version>1.0-ALLENAI-4</version>
</dependency>
训练模型
由于语料比较小,各项参数,都调小了。
@Service
@Slf4j
public class Word2vecService {
public Word2VecModel train() {
try {
List<String> data = List.of("anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english anarchism originated as a term of abuse first");
List list = Lists.transform(data, var11 -> Arrays.asList(var11.split(" ")));
Word2VecModel word2VecModel = Word2VecModel.trainer().setMinVocabFrequency(1).useNumThreads(4).setWindowSize(1).type(NeuralNetworkType.CBOW).setLayerSize(12).useNegativeSamples(5).setDownSamplingRate(1.0E-4D).setNumIterations(5).setListener((var1, var2) -> System.out.println(String.format("%s is %.2f%% complete", Format.formatEnum(var1), var2 * 100.0D))).train(list);
return word2VecModel;
} catch (InterruptedException e) {
log.error("exception:{}", e);
return null;
}
}
}
验证模型
curl http://localhost:8080/w2v/model
{"vocab":["of","a","abuse","anarchism","as","first","originated","term","the","against","class","diggers","early","english","including","radicals","used","working"]}
curl http://localhost:8080/w2v/matches?name=originated
[{"originated":1.0},{"used":0.31949775120966817},{"against":0.2685951628434606},{"working":0.243240735844125},{"first":0.21188438070768922},{"as":0.14340927658891997},{"anarchism":0.14088247689095754},{"a":0.11150486710031617},{"early":-0.03650820548138082},{"radicals":-0.08134581648415767}]R
作者:Wuxinshui
出处:http://wuxinshui.github.io
版权归作者所有,转载请注明出处