Freshman or Fresher? Quantifying the geographic variation of language in online social media.


We present a new computational technique to detect and analyze statistically significant geographic variation in language. While previous approaches have primarily focused on lexical variation between regions, our method identifies words that demonstrate semantic and syntactic variation as well. We extend recently developed techniques for neural language models to learn word representations which capture differing semantics across geographical regions. In order to quantify this variation and ensure robust detection of true regional differences, we formulate a null model to determine whether observed changes are statistically significant. Our method is the first such approach to explicitly account for random variation due to chance while detecting regional variation in word meaning.Our analysis reveals interesting facets of language change at multiple scales of geographic resolution – from neighboring states to distant continents.

Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016)

More detail can easily be written here using Markdown and $\rm \LaTeX$ math code.