# Computation – Sets & Languages

September 10, 2009 Leave a comment

## Sets & Languages

In order for a computer to organise a textual input, the computer has to be able to recognise certain strings, such as, in parsing a programming language, it has to be able to find ‘if’ or ‘while’.

The first part of Computation deals with how we describe to a computer what string to look for, how to come up with the right description to solve the problem, and how to do more advanced things such as check whether a piece of code is correctly formed, making sure that opening and closing brackets match, or checking that a webpage is written in valid HTML.

## Symbols, Words and Strings – Definitions

Input to a computer is given in symbols, such as letters and numbers. A collection of symbols is an alphabet. An alphabet is usually denoted by the symbol ∑.

Symbols can be combined into strings, which can be called ‘words’. We can have words consisting of any number of symbols, including zero. A word with zero symbols is called the empty string, and is referred to by ϵ.

A space can be a symbol, and we can use the symbol ı_ı to show it. (Though in these blog posts, I’ll just use _ since there’s no single character for ı_ı).

The length of a string can be shown by a modulus symbol. For example, |s| is the length of the string s. |ϵ| = 0, obviously, because it is the empty string.

## Languages

A language is a set of words that can be created with an alphabet. Some examples of languages are the empty language, {}, the language consisting of the empty word {ϵ}, or languages consisting of some number of strings, {ϵ,1,aba,abba,abbba,abbbba}. A language is referred to with the symbol **L**.

We can use set notation to define a language. For example, say we wanted the language of all words that could be made with the alphabet of the symbols ‘a’, ‘b’, and ‘c’. We can define this language as {s | s is a word over the alphabet {a,b,c}}. The | should be read as ‘where’, so this is ‘All languages consisting of the word ‘s’, where s is a word over the alphabet {a,b,c}’

## Concatenation

Two strings can be concatenated, that is, put together. So the strings Hel and lo could be concatenated into the single string Hello. To show this, we use a concatenation symbol as such: Hel · lo = Hello.

For multiple concatentation, we can use a power symbol. For example, h³ = hhh. h° = ϵ (ie. the string consisting of 0 numbers of h). We can also use n to denote an arbitrary number of concatenations. For example, a^{n} is the language {ϵ, a, aa, aaa, aaaa, aaaaa, …}.

Using set notation, we can write it as **L** = {a^{n} | n ∈ N}. This can be read as ‘The language **L** consists of all strings a^{n} where n is a whole number’. We could also exclude the empty string by saying ‘where n is a positive whole number’ with the symbol N^{+}.

We can also concatenate two languages as follows: **L**1 · **L**2 = {st | s ∈ **L**1 and t ∈ **L**2}. ie. ‘the set of all concatenated strings where s is a string from **L**1 and t is a string from **L**2′.