Why the Set of Languages over Any Non Empty Alphabet Is Uncountable

Published on 2019-12-27

Warning: this is not very rigorous, I just want to provide an intuitive explanation.

I have seen a lot of different proofs on why the set of languages over an alphabet \Sigma is uncountable, however the one I'm about to show you now is the simplest in my opinion. It involves using Cantor's diagonalisation argument and forming a couple of bijections.

Firstly, we can see that if we prove that the set of languages over the alphabet \Sigma = \{0\} denoted by \mathcal{L} is uncountable, then the set of languages over any non-empty alphabet is uncountable, as we can just take a one element subset of the alphabet, form a bijection to \{0\}, and as the set of languages with that subset is uncountable, and this set is a subset of all languages possible from the alphabet, then the set of languages for the whole alphabet must be uncountable. Thus it is sufficient to prove that the set of languages over the alphabet \Sigma = \{0\} is uncountable.

Ok, first of all, let us observe the set S of all possible finite-length strings formed from the alphabet \Sigma = \{0\}. S will just be strings of a finite number of 0s concatenated i.e.,

S = \{0, 00, 000, 0000, ... \}

We can create an injective function
S \rightarrow \mathbb{N}

s \mapsto |s|
where |s| is the length of the string s. Clearly, this is injective, and thus the set S is countable.

As S is countable, we can assign a unique natural number to each element of S, i.e., we can say that S is of the form

S = \{s_1, s_2, s_3, ...\}

Let L \in \mathcal{L}. Then, \forall s_i \in S, either s_i \in L or s_i \notin L. Thus, we can represent L as so:


 \begin{tabular}{||c c c c c c||} 
 \hline
 $S:$ & $s_1$ & $s_2$ & $s_3$ & $s_4$ & ... \\ [0.5ex] 
 \hline\hline
 $L:$ & 0 & 1 & 1 & 0 & ...\\ 
 \hline
\end{tabular}

where a 0 means that s_i \notin L and a 1 means that s_i \in L for the s_i in that column.

Now, we want to prove that \mathcal{L} is uncountable. We will do this using a proof by contradiction. Let us assume that \mathcal{L} is countable. This means that there must be some enumeration of elements in \mathcal{L} where we assign a unique natural number to each element of \mathcal{L}. Let us take one such enumeration.
For this enumeration, we can create a similar representation in terms of 0s and 1s for all the L \in \mathcal{L}:



 \begin{tabular}{||c c c c c c||} 
 \hline
 $S:$ & $s_1$ & $s_2$ & $s_3$ & $s_4$ & ... \\ [0.5ex] 
 \hline\hline
 $L_1:$ & 0 & 1 & 1 & 0 & ...\\ 
 \hline
 $L_2:$ & 1 & 0 & 0 & 1 & ...\\ 
 \hline
 $L_3:$ & 1 & 1 & 0 & 0 & ...\\ 
 \hline
 $L_4:$ & 0 & 0 & 0 & 1 & ...\\ 
 \hline
\end{tabular}

and so on. If we can prove that this enumeration does not contain every language over \Sigma, then we know that this enumeration is not valid as it is not surjective. Now, let us create a language L', where the column containing s_i is 0 if it is 1 for L_i, and 1 if it is 0 for L_i. That is, we take the complement of every element in the diagonal of the above table:


 \begin{tabular}{||c c c c c c||}
 \hline
 $S:$ & $s_1$ & $s_2$ & $s_3$ & $s_4$ & ... \\ [0.5ex] 
 \hline\hline
 $L_1:$ & {\bfseries 0} & 1 & 1 & 0 & ...\\ 
 \hline
 $L_2:$ & 1 & {\bfseries 0} & 0 & 1 & ...\\ 
 \hline
 $L_3:$ & 1 & 1 & {\bfseries 0} & 0 & ...\\ 
 \hline
 $L_4:$ & 0 & 0 & 0 & {\bfseries 1} & ...\\ 
 \hline
\end{tabular}

So in this case, L' includes s_1, s_2, s_3 but doesn't include s_4 as that is what the complement of every element in the above diagonal represents. We can see that the i^{th} element in L' differs with the i^{th} element in L_i, and thus L' is different from every other L_i. Thus this enumeration does not cover every element in \mathcal{L}. As the above argument can be applied to any enumeration, no such enumeration can exist. Thus we cannot form a bijection to \mathbb{N}, and so \mathcal{L} is uncountable.

References:

Email a comment to the public inbox!


Creative Commons License
This work by Chinmaya Krishnan Mahesh is licensed under a Creative Commons Attribution 4.0 International License. Source Code.