Return to basic course information.
Assigned: Tuesday, 27 February 2013
Due: 11:59pm, Friday, 14 March 2013
Generating sentences from a grammar [10 points]
First, download the initial
grammar: grammar1
. The file
contains self explanatory comments, which are delimited by
#. The basic format is:
1 VP Verb NPIgnore the leading "1" for now. The first symbol is the left-hand side of a context-free rewrite rule; the remaining one or more symbols are right-hand-side terminals and non-terminals. There is no typographic distinction enforced between terminals and non-terminals; rather, any symbol that does not appear on a left-hand side is by definition a terminal. You may add new nonterminals, but don't add any new terminals. We'll explain why below.
Now let's get down to work:
Write a random sentence generator in the language of your choice. When handing it in, make sure we can run it like so:
./generate grammar1 5to print five random sentences from the grammar
grammar1
to standard output.
You should see output like:
the king rides every horse .
Your program should start with the START
symbol
and recursively choose rewrite rules until termination. Each
rule is preceded by a non-negative real number.
In grammar1
, there are two possible expansions
of NP
with the weights 20 and 1. This means you
should pick the first option with probability 20/(20 + 1).
Don't hardwire this grammar into your program. Read it anew each
time it runs, so that you can modify the grammar later.
To see what happens with different weights,
copy grammar1
to grammar2
and
modify it. For instance, you can assert that the
is a more common determiner than a
or another like so:
1 Det a 1 Det another ... 4 Det the 1 Det thisIn particular, the would now have probability 1/3 to all the others' 1/12.
Play around with the weights in grammar2
and see how your generated sentences change. Can you
make the average sentence length much longer or
shorter?
Hand in the probabilized generation program, your
modified grammar2
, and ten random
sentences.
Describing English with context-free rules [20 points]
grammar2
to grammar3
. Modify the latter so that, in
addition to the strings of grammar2
, it can also
generate phenomena like:
do coconuts speak ? who does Arthur suggest she carry ? are they suggesting Arthur ride to Camelot ? the Holy Grail was covered by a yellow fruit . do not speak ! Arthur will have been riding for eight nights . Arthur and Guinevere migrate frequently . he knows what they are covering with that story . the king drank to the castle that was his home .Of course, you don't want to generate just the nine sentences above, but others with the same constructions.
Hand in your expanded grammar3
.
Create a final grammar, grammar4
. Modify it
to model some new constructions of English that you find
interesting. Explain in comments to your grammar file what you
are doing. Hand in grammar4
.
Just to be safe, use a backoff bigram model [5 points]
Even after working on your grammar for a while, you won't be
able to generate all grammatical sentences of English. While it
might be OK for some applications to restrict
their output to a subset of English, you often want to
handle input of a great range of expressions. We've
therefore included a second subgrammar in the file, headed by the
nonterminal S2
. This subgrammar is simply a tag
bigram model. Copy grammar4
to grammar5
, and set S2
's weight to
something greater than its initial value of 0. You might also
tinker with the transitions among the different tags. Unless you
set S2
's weight overly high, you shouldn't see too
many word salad sentences. (For fun, though, you could reverse the
mixing proportions of the two grammars and see what happens.)
Addendum: Competititve grammar writing
Besides the instructor and TA looking over your shoulder, what's
your incentive to come up with interesting constructions in the
question above or a good balance between S1
and S2
? The answer is simple and
trendy: gamification!
Using the final grammar5
you hand in, we'll generate
some sentences from the S1
subgrammar and judge their
grammaticality. We'll then try to parse your grammatical output
using other students' grammars. The winning grammar will
be one that assigns a high average log likelihood on the
set of grammatical sentences. A nominal number of extra credit
points—and glory!—will be granted.