In this post, I will introduce what Lex is and how we can use Lex.
Compiler Structure
What is Lex?
Lex is a computer program that generates lexical analyzers (“scanners” or “lexers”). Lex is commonly used with the yacc parser generator. Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the lexer in the C programming language.
How to write Lex?
The following is an example Lex file for the flex version of Lex. It recognizes strings of numbers (positive integers) in the input, and simply prints them out.
/* Definition Section (required) */
%{
#include <stdio.h>
/* The Definition Section will be copied
to the top of generated C program.
Include header files, declare variables. */
}%
POS_INTEGAR ([+]?[0-9]+)
NEG_INTEGAR ([-][0-9]+)
INTEGAR ([-+]?[0-9]+)
/* Above is more elegant way to write regular xxpressions */
%% /* Separate definition section from rules section */
/* Rules Section (required) */
/* The Rules Section is for writing regular
expression to recognize tokens. When pattern
is matched, then execute action.
[Regular expression rule] { The things you want to do; } */
[0-9]+ { printf("Saw an integer: %s\n", yytext); }
/* {POS_INTEGAR} { printf("Saw an integer: %s\n", yytext); }
We can write like above to get the same result. */
.|\n { /* Ingore and do nothing */ }
// "." is wild card character, represent any character expect line feed \n
%% /* Separate rules section from C code section */
/* C code section */
/* The C Code Section will be copied to the
bottom of generated C program. */
int main(void)
{
/* Call the lexer, then quit. */
yylex();
return 0;
}
For the rules section:
- Always choose the longest matching pattern.
- If the length are the same, choose the first met rule.
Lex predefined variables
Name | Functions |
---|---|
char* yytext | Pointer to matched string. |
int yyleng | Length of matched string. |
int yylex(void) | Function call to invoke lexer and return token. |
int yywrap(void) | Return 1 if no more files to be read. |
char* yymore(void) | Return the next token. |
int yyless(int n) | Retain the first n characters in yytext and (sort of) return the rest back to the input stream. |
FILE* yyin | Input stream pointer. |
FILE* yyout | Output stream pointer. |
ECHO | Print out the yytext. |
BEGIN | Condition switch. |
REJECT | Go to the next alternative rule. |
Condition
// TODO
How to compile lex file
flex scanner.l
gcc -o scanner lex.yy.c -lfl
./scanner < <input_C_file>