click for contact info

Nathan Schneider

Welcome to my website! Please select one of the categories on the left.

Abstract

I am an undergraduate currently in my senior year at UC Berkeley studying the intersection of computer science and linguistics. In Fall 2008 I will begin the Ph.D. program at Carnegie Mellon's Language Technologies Institute.

Academic Interests

Computer Science

Natural language processing (NLP) and understanding (NLU), and computational linguistics—including:

I would also like to learn more about:

Linguistics

I prefer the theoretical framework of cognitive linguistics; topics of interest include:

Research

Hebrew Verb Morphology & ECG

Hebrew verbs use a root-and-pattern system, where a three-consonant root is lexicalized in one or more of seven verbal paradigms. Each verb, then, is a pairing of a root, a paradigm, and a meaning. An inflected verb's form is quite predictable, the meaning less so; many verbs have idiosyncratic meanings, but there are some regularities and tendencies which need to be accounted for, e.g. certain frequent alternations between paradigms for a common root. My analysis addresses the following questions:

  1. What are the forms and meanings of the morphological components of verbs—roots, paradigms, stems, and inflectional affixes?
  2. How do the forms and meanings of these constructions combine to yield actual verbs in sentences?
  3. How can these constructions be formalized in a structured representation that can be used for computational analysis?

I argue that construction grammar is an appropriate theoretical framework capable of accounting for the complexities of such a system. In particular, I use the Embodied Construction Grammar formalism to represent the necessary constructions in a manner suitable for automated analysis and simulation. Moreover, I argue that many features of the system are consistent with the notion of language as a best-fit cognitive phenomenon.

With Jerry Feldman, I'm now exploring computational ramifications of the ECG analysis for an honors thesis.

Picurís Tagger

In Fall 2007 I worked with fellow student Will Chang to develop a statistical model that would aid linguistic analysis of texts in Picurís, a Northern Tiwa language of New Mexico. A database of 28 stories in the language was compiled, and students in a recent linguistics course began the painstaking process of identifying the meanings of morphemes (meaning-bearing word fragments) in the texts.

Our model is a Hidden Markov Model over syllables; it predicts (a) the grouping of syllables within each word into morphemes (segmentation), and (b) a tag for each morpheme indicating its category/"part of speech" (classification). Trained with the EM algorithm, the model makes reasonable predictions with just a few labeled examples.

Metonymy Classification

In Spring 2007 I worked with Srini Narayanan on the problem of identifying whether a given verb was being used metonymically or not. I developed and tested a classifier for metonymic vs. literal sentences, and found that—at least for a particular category of verbs—deciding whether the subject is literal or metonymic essentially reduces to determining whether the subject refers to a person or not.

In future work I hope to look for a semi-supervised approach to identifying whether a noun is being used literally or metonymically, which would scale better than a fully-supervised classifier and thus be useful for a variety of natural language understanding applications. As identifying and categorizing the subject, object, and any other arguments can be done with existing lexical resources and tools, the chief difficulty is in determining the semantic categories for a particular verb’s literal arguments.

I plan to investigate whether this be done with an active learning approach, wherein the system attempts to cluster a novel verb with known ones (using a lexical resource as an ontology of verb senses), and asks the user to specify selectional restrictions if it can’t.

Selected Coursework

Computer Science

Linguistics

Languages

Other

Potpourri

Activities

Clubs

Programming

My programming language of choice is Python; I've also used Java, C, C++, and Scheme. For the web I use JavaScript and PHP.

Hobbies

I play the violin and enjoy table tennis and photography.

Typography

My favorite fonts are: Zapf Humanist/Optima, Segoe UI, Georgia, and Lucida Bright.

Random Unicode character:

Firefox Add-ons

I've found that these add-ons greatly enhance my Firefox browsing experience:

CV

Curriculum Vitae[+PDF]

fancy view plain view
last updated 29 april 2008