Polyglot NER: Massive Multilingual Named Entity Recognition


We build a Named Entity Recognition system (NER) for 40 languages using only language agnostic methods. Our system relies only on un-supervised methods for feature generation. We obtain training data for the task of NER through a semi-supervised technique not relying whatsoever on anylanguage specific or orthographic features. This approach allows us to scale to large set of languages for which little human expertise and human annotated training data is available.

In Proceedings of the 2015 SIAM International Conference on Data Mining (SDM 2015)