# 2.2 Normal Forms for CFG

A grammar is said to be in **Chomsky Normal Form (CNF)** if all of its grammar rules follow one of the two patterns:

- X => YZ (exactly 2 non-terminals on the right side)
- X => a (exactly 1 terminal on the right side)

So the parse tree is a binary tree!

## CFG to CNF

### Step 1: Deal with nullable symbols.

First we have to find nullable symbols. Simple. Using a stack, go from all of , for each symbols X goes by, if there is a rule such that all of the symbols on the right hand side are nullable, then this symbol X is nullable.

To Deal with nullable symbols in the grammar:

- remove all nullable symbols
- if there is a rule that has only 1 nullable symbol on the right, then change this rule into
*original rule | rule without this symbol*(X => aYb, if Y is nullable then change it to X=> aYb | ab) - similarly, if there is a rule that has m nullable symbols, replace this rule into conbination of possible rules(exist or not exist)

### Step 2: Eliminate unit rules.

(A, C) is a unit pair if A=>B, B=>C. This kind of forms is not allowed in CNF.

To find any unit pairs, first we mark all nonterminal Symbol(A,A). If (A, B) is marked and A=>C, then we mark(C,B). Continue this procedure until no more pair could be marked.

To deal with unit rules in the grammar G:

- create a new empty grammar G'
- for each unit pair (A, B), if B=>a in G, then add A=>a to G'

### Step 3: Eliminate all symbols that aren't generating.

Using a stack, go from terminal symbols only to S, mark any symbol meets, then delete unmarked symbols.

### Step 4: Eliminate all symbols that aren't reachable.

Similarly, using a stack, go from S to terminal symbols, mark any symbol meets, then delete unmarked symbols.

The sequence of 3 and 4 matters, if we remove unreachable then ungenerating, we will have some rules that has some symbol X on the left hand side that start state cannot go to(two states that has no connections). Because we could go to this X before we move all ungenerating rules, so it is left.

### Step 5: Substitute terminal symbols.

For any grammar where **n>1**, if is a terminal symbol, then we substitue using and add a new rule .

### Step 6: Split long rules.

For any grammar where n>2, make a new set of rules:

Where are new nonterminal symbols.

So every context-free language has a CNF grammar that derives all of the language except {}.