Welcome to my website! Please select one of the categories on the left.
I am an undergraduate currently in my senior year at UC Berkeley studying the intersection of computer science and linguistics. In Fall 2008 I will begin the Ph.D. program at Carnegie Mellon's Language Technologies Institute.
Hebrew verbs use a root-and-pattern system, where a three-consonant root is lexicalized in one or more of seven verbal paradigms. Each verb, then, is a pairing of a root, a paradigm, and a meaning. An inflected verb's form is quite predictable, the meaning less so; many verbs have idiosyncratic meanings, but there are some regularities and tendencies which need to be accounted for, e.g. certain frequent alternations between paradigms for a common root. My analysis addresses the following questions:
I argue that construction grammar is an appropriate theoretical framework capable of accounting for the complexities of such a system. In particular, I use the Embodied Construction Grammar formalism to represent the necessary constructions in a manner suitable for automated analysis and simulation. Moreover, I argue that many features of the system are consistent with the notion of language as a best-fit cognitive phenomenon.
With Jerry Feldman, I'm now exploring computational ramifications of the ECG analysis for an honors thesis.
In Fall 2007 I worked with fellow student Will Chang to develop a statistical model that would aid linguistic analysis of texts in Picurís, a Northern Tiwa language of New Mexico. A database of 28 stories in the language was compiled, and students in a recent linguistics course began the painstaking process of identifying the meanings of morphemes (meaning-bearing word fragments) in the texts.
Our model is a Hidden Markov Model over syllables; it predicts (a) the grouping of syllables within each word into morphemes (segmentation), and (b) a tag for each morpheme indicating its category/"part of speech" (classification). Trained with the EM algorithm, the model makes reasonable predictions with just a few labeled examples.
In Spring 2007 I worked with Srini Narayanan on the problem of identifying whether a given verb was being used metonymically or not. I developed and tested a classifier for metonymic vs. literal sentences, and found that—at least for a particular category of verbs—deciding whether the subject is literal or metonymic essentially reduces to determining whether the subject refers to a person or not.
In future work I hope to look for a semi-supervised approach to identifying whether a noun is being used literally or metonymically, which would scale better than a fully-supervised classifier and thus be useful for a variety of natural language understanding applications. As identifying and categorizing the subject, object, and any other arguments can be done with existing lexical resources and tools, the chief difficulty is in determining the semantic categories for a particular verb’s literal arguments.
I plan to investigate whether this be done with an active learning approach, wherein the system attempts to cluster a novel verb with known ones (using a lexical resource as an ontology of verb senses), and asks the user to specify selectional restrictions if it can’t.
My programming language of choice is Python; I've also used Java, C, C++, and Scheme. For the web I use JavaScript and PHP.
I play the violin and enjoy table tennis and photography.
My favorite fonts are: Zapf Humanist/Optima, Segoe UI, Georgia, and Lucida Bright.
Curriculum Vitae[+PDF]