Thursday, November 13, 2008

Lexical analysis

Lexical analysis is the first step of compling a program. It involves converting sequence of characters to a sequence of tokens. Programs to perform lexical analysis known as lexers or lexical analysers are used to such conversions.

Tokens created by lexical analysis have two parts, one is the block of text corresponding to the token(Known as lexeme) and the other is the token type. So to give an example, a sequence like "a=1+2" after tokenisation will look like
Lexeme Token
a Variable
= equaltooperator
1 numericconstant
+ additionoperator
2 numericconstant
Now that we understans the basic concept of lexical anlysis, we might be tempted to write a Lexer :) . Writing lexers though possible might be a waste of time at times. There are programs known as "Lexer generators", yes u guessed it write, these are programs that generate lexer programs.

The lexer generators take a input file with details about the required characteristics of a lexer and generate the lexer. One lexer that seems to be popular is "flex". The input file for flex consists of 3 parts, the definiton part, the rules part and the user code section.

Each of the parts is separated by "%%". So a simple program in flex would be like

%%sayhello printf("Hello");

This just makes "sayhello" a token that has the action printf("Hello");. The flex has much much more that can be done. There are features to optimise the memory usage to suit the lexer u design.

No comments: