CS6120: Natural Language Processing

Homework 3

Assigned: Thursday, 13 March 2013
Due: 11:59pm, Friday, 28 March 2013

Instructions

If you collaborated with others, you must write down with whom you worked on the assignment. If this changes from problem to problem, then you should write down this information separately with each problem.
Email the instructor the requested written answers, code, and instructions on how to (compile and) run the code.

Context-free parsing [45 points]

Using the programming language of your choice, write a recognizer using Earley's algorithm as discussed in class. The parser should read grammars in the format from Homework 2 and read input sentences in the space-separated format you used above to output generated sentences. In other words, you should be able to parse the sentences generated by your own and others' grammars. To get sufficient speed, you may want to implement the left-corner indexing trick discussed in class. (Other tricks may not offer as much benefit for a given implementation time within the scope of this assignment.)

You should pass a grammar file and a start symbol as arguments to your parser and then read sentences from the standard input, like so:

./parse grammar S1 < sentences

For sentences in the language defined by the grammar, output true; otherwise, false.

Since this is an unweighted parser, you don't have to do the bookkeeping of weights or decide among competing alternatives when attaching completed rules.

Extra credit: Competitive grammar writing [5 points]

Since you're writing unweighted parsers, they won't be able to score sentences by their log-likelihood. We will, however, be able to see how many of other people's grammatical sentences your parser and S1 subgrammar of grammar5 are able to produce.