# Computation – Context-Free Grammars

September 9, 2009 Leave a comment

**Well-defined Rules**

Instead of thinking of patterns that may be matched by some strings, it is our aim here to generate all the words of some language following so well-defined rules. The way of reading the rules is to see them as way of rewriting the string generated so far. For example, by using the given rules, to change one of the symbols in the current string, into a string of others. Such as, changing ‘S’ in this string to ‘at’:

- mSt => m(at)t => matt

Time for a proper example, consider the following:

- S = B | (S) | S+S | S-S | SxS | S/S
- B = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Now we just have to know where to start, and it is usually common to start with the letter ‘S’, which you can imagine, like in automatons, to mean **Start State**. Once we have ‘S’, then we can use any of the rules in the ‘S’ to replace it. For example:

**S**=>**SxS**=>**Sx(S)**=>**Sx(S-S)**=>**Bx(S-S)**=>**5x(S-S)**=>**5x(B-S)**=>**5x(B-B)**=>**5x(3-B)**=>**5x(3-4)**

We call this a **derivation** from the grammar.

The symbols ‘S’ and ‘B’ are really only used to help us construct the required string. We call the letters we want our final string to be built of as **terminal symbols**. Such symbols from the latter example are 0, 1, 2 and 3. A **terminal symbol** cannot be replaced. However the other symbols, namely ‘S’ and ‘B’ are called **non-terminal symbols** or sometimes, but very rarely **variables**.

**Context-Free Grammars**

**———————————————————————————————–**

**Definition 15:** A **context-free grammar** or **CFG** is given by the following rules:

- An alphabet Σ of terminal symbols, also called the
**object alphabet**. - An alphabet Ξ of non-terminal symbols, the elements of which are also referred to as
**auxiliary characters**,**placeholders**or in some books**variables**. Ξ n Σ = Ø. - A special non-terminal symbol S Є Ξ, this is called the
**start symbol**.

The rules are called context-free because they tell us that we may replace any occurrence of a non-terminal symbol as directed.

**Definition 16:** A string ** X **Є

**(Σ**U

**Ξ)***is generated by a grammar ‘

*G*’ if there is a sequence of strings:

S = X_{0} => X_{1} => ··· => X_{n} = X

Such that each step X_{i} => X_{i+1} are obtained by the application of one of G’s production rules to a non-terminal symbol occurring in X_{i} as follows.

———————————————————————————————–

Let ‘**R** Є **Ξ’** occur in X_{i} and assume that there is a production rule R–>Y. Then X_{i+1} are obtained from X_{i} by replacing one occurrence of ‘R’ in X_{i} by the string ‘Y’.

Let’s see another example. Assume we are told to generate a grammar for the language of all strings that are non-empty and start with a ‘0’ over the alphabet {0, 1}.

So, how do we make sure that our word starts with a 0? Well we could use this production rule: S–>0S. But then eventually we do get a problem. Sooner or later we have to allow 1’s into the word, but we most certainly can’t use S–>1S!! So we have to include another non-terminal symbol, lets say A. So now we can use these production rules:

- S –> 0A
- A –> 0A | 1A | ε

For completeness, yes, it is possible to do this with just one non-terminal:

- S –> S0 | S1 | 0

*Note: the non-terminal symbols are ‘S’ and ‘A’, the terminal symbols are 0, 1 and ε.*

*———————————————————————————————–*

* *

**Definition 17:** A language ζ over Σ is context-free if there is a context-free grammar whose alphabet of terminal symbols is Σ such that ζ is generated by the grammar.

———————————————————————————————–

**Parsing**

When a compiler deals with a piece of code it has to parse it. In other words, it has to break it down into its constituent parts in a way that makes sense. Similarly, if you were to write a program that could take (possibly quite long) arithmetic expressions as input and evaluate it, it would have to break down the expression to decide in which order to carry out the various operations.

When we give a derivation we automatically provide one way of breaking down the given string. Parsing is an attempt of carrying out the opposite process, to take a string and find out how to assemble it from simpler parts. Instead of finding a derivation it is often more meaningful to create a **parse**** ****tree** which gives a better overview of how the various bits fit together.

We can just read off the required string that is generated from the leaves of the tree. So from this example we have:**5x(3-4).** As well as seeing the generated string here, we can also see the production rules that were used!

However, for strings such as **5×3-4** with no brackets, we can produce two parse trees. If there are two different parsing for the same word, then we say that the word is **ambiguously generated** and that the grammar is **ambiguous**. If it is required, it is always best to try and give an **un-ambiguous grammar**. Yet, it is not always possible to do this.

**The Limitations**

As you may be aware, there are languages that are not context-free. For example:

{*a ^{n}b^{n}c^{n}* |

*n*Є N}

This language, over the alphabet {*a, b, c*} is not context-free. There is a formal result, called the **Pumping Lemma** for context-free languages that describes some of the possible properties that indicate that a language fails to be context-free.

If you wish to find out more on the pumping lemma for CGFs, go here:http://en.wikipedia.org/wiki/Pumping_lemma_for_context-free_languages

Next: Computational Complexity