Greetings ! I am a Phd candidate in the Department of Computer Science at Stony Brook University, USA and am a member of the Data Science Lab. I am advised by Prof.Steven Skiena. My research interests lie at the intersection of Text Mining, Machine Learning and Computational Social Science. At Data Science Lab I have worked on several projects with a focus on representation learning geared towards Natural Language Processing. My thesis focuses on developing statistical models for detecting and analyzing linguistic variation in social media [proposal]. I am fortunate to have collaborated on projects with Rami Al-Rfou, Bryan Perozzi at Stony Brook, Abhradeep Guha Thakurta who is currently at Apple, and Yashar Mehdad from Yahoo! Research (now at Airbnb). I am also a part of HLAB, where I collaborate and work with Prof. Andrew Schwartz on analyzing language on social media with a human-centric focus.
In this paper, we propose methods to effectively adapt models learned on one domain onto other domains using distributed word representations. First we analyze the linguistic variation present across domains to identify key linguistic insights that can boost performance across domains. We propose methods to capture domain specific semantics of word usage in addition to global semantics. We then demonstrate how to effectively use such domain specific knowledge to learn NER models that outperform previous baselines in the domain adaptation setting. pdf
We present Walklets, a novel approach for learning multiscale representations of vertices in a network. These representations clearly encode multiscale vertex relationships in a continuous vector space suitable for multi-label classification problems. Unlike previous work, the latent features generated using Walklets are analytically derivable, and human interpretable pdf
We present a new computational technique to detect and analyze statistically significant geographic variation in language. While previous approaches have primarily focused on lexical variation between regions, our method identifies words that demonstrate semantic and syntactic variation as well. We extend recently developed techniques for neural language models to learn word representations which capture differing semantics across geographical regions. In order to quantify this variation and ensure robust detection of true regional differences, we formulate a null model to determine whether observed changes are statistically significant. Our method is the first such approach to explicitly account for random variation due to chance while detecting regional variation in word meaning. Our analysis reveals interesting facets of language change at multiple scales of geographic resolution – from neighboring states to distant continents. pdf
We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words. Such linguistic shifts are especially prevalent on the Internet, where the rapid exchange of ideas can quickly change a word’s meaning. Our meta-analysis approach constructs property time series of word usage, and then uses statistically sound change point detection algorithms to identify significant linguistic shifts. We demonstrate that our approach is scalable by tracking linguistic change across years of micro-blogging using Twitter, a decade of product reviews using a corpus of movie reviews from Amazon, and a century of written books using the Google Book-ngrams. Our analysis reveals interesting patterns of language usage change commensurate with each medium. Project Page
We build a Named Entity Recognition system (NER) for 40 languages using only language agnostic methods. Our system relies only on un-supervised methods for feature generation. We obtain training data for the task of NER through a semi-supervised technique not relying whatsoever on any language specific or orthographic features. This approach allows us to scale to large set of languages for which little human expertise and human annotated training data is available. pdf
Training deep belief networks (DBNs) requires optimizing a non-convex function with an extremely large number of parameters. Dropout is a popular heuristic that has been practically shown to avoid local minima when training these networks. We investigate the robustness and stability properties of Dropout. We empirically validate our stability assertions for dropout in the context of convex ERMs and show that surprisingly, dropout significantly outperforms (in terms of prediction accuracy) the L2 regularization based methods for several benchmark datasets. pdf
We induced networks on continuous space representations of words over the Polyglot and Skipgram models. We compared the structural properties of these networks and demonstrate that these networks differ from networks constructed through other run of the mill methods. We also demonstrated that these networks exhibit a rich and varied community structure. pdf
We investigate sex differences across male and female connectomes identifying several discriminative features. One of our main findings discloses a statistical difference at the pars-orbitalis of the connectome between the sexes, which has been shown to function in language production. pdf
Freshman or Fresher? Quantifying the Geographic Variation of Internet Language
Vivek Kulkarni, Bryan Perozzi, Steven Skiena 10th International Conference of Web and Social Media (ICWSM 2016)
Statistically Significant Detection of Linguistic Change
Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena 24th International World Wide Web Conference (to appear in WWW 2015)
Polyglot-NER: Massive Multilingual Named Entity Recognition
Rami Al-Rfou, Vivek Kulkarni, Bryan Perozzi, Steven Skiena SIAM International Conference on Data Mining (SDM 2015)
A Paper Ceiling: Explaining the Persistent Underrepresentation of Females in Printed News Coverage.
Eran Shor, Arnout van de Rijt, Alex Miltsov, Vivek Kulkarni, and Steven Skiena American Sociological Review
Inducing Language Networks from Continuous Space Word Representations
Bryan Perozzi, Rami Al-Rfou, Vivek Kulkarni, Steven Skiena Fifth Workshop on Complex Networks (CompleNet 2014)
Sex Differences in the Human Connectome
Vivek Kulkarni, Jagat Pudipeddi Sastry, Leman Akoglu et al. Brain and Health Informatics, 2013
I interned at Yahoo! Research in the summer of 2016 and interned at Google during the summer of 2013 and the summer of 2015. I have also spent a couple of years working for Microsoft and Juniper Networks before joining graduate school.
Awarded the prestigious Renaissance Technologies Fellowship 2014-2017.
Stony Brook University Press Release: The Paper Ceiling – Women Underrepresented In Media Coverage
MIT Technology Review: Linguistic Mapping Reveals How Word Meanings Sometimes Change Overnight
Please email me if you would like to get in touch !