Lexical analyzer c++

To program in the C++ programming language
To understand the lexical analysis phase of program
compilation
To program in the C++ programming language To understand the lexical analysis phase of program
compilation Assignment: The first phase of compilation is called scanning or lexical
analysis. This phase interprets the input program as a sequence of
characters and produces a sequence of tokens, which will be used by
the parser. Write a C++ program that implements a simple scanner for a
source file given as a command-line argument. The format of the
tokens is described below. You may assume that the input is
syntactically correct. Your program should build a symbol table (a
hash table is a good choice), which contains an entry for each
token that was found in the input. When all the input has been
read, your program should produce a summary report that includes a
list of all the tokens that appeared in the input, the number of
times each token appears in the input and the class of each token.
Your program should also list how many times tokens of each class
appeared in the input. The token format: keyword -> if | then | else | begin | end
identifier -> character | character identifier
integer -> digit | digit integer
real -> integer.integer
special -> ( | ) | [ | ] | + | – | = | , | ;
digit -> 0|1|2|3|4|5|6|7|8|9
character -> a|b|c … |z|A|B|C … |Z
More details:
Case is not used to distinguish keywords or identifiers.
Case is not used to distinguish keywords or identifiers.
The delimiters are space, tab, newline, and the special
characters.
The token classes that should be recognized are keyword,
identifier, integer, real and special.
Your final program must compile and run using a C++ compiler
(your choice).
The delimiters are space, tab, newline, and the special
characters. The token classes that should be recognized are keyword,
identifier, integer, real and special. Your final program must compile and run using a C++ compiler
(your choice).

DO RATE if satisfied. Find code with indentation at
http://pastebin.com/0A0JU7DZ
With comments find at http://pastebin.com/jSrW1g5M
Input file samp.txt file======================
if sample sampeui
then else
34 + 34.4545+ 44=sample
(this)(begin)[end]
===============================

#include
#include
#include
#include
using namespace std;
#define KW_COUNT 5 //count of keywords
#define SC_COUNT 9 //count of special characters
#define MAX_SYMBOLS 100 //This is maximum symbols that the program
can handle. Increase if needed
#define MAX_TOK_SIZE 256 //Max token size
class LexParser{
private:
static const string keywords[]; //list of keywords
static const char spChar[]; //list of special characters
enum tokenClass { KEYWORD, IDENTIFIER, INTEGER, REAL, SPECIAL};
struct symbol{
string token; //token
int count; //token count
tokenClass tc; //token class
};
symbol symbolTable[MAX_SYMBOLS]; //list of all symbols
int symbolCount;
char nextToken; //This holds first character of next token but not
entire token
string error;
public:
LexParser(){
symbolCount = 0;
error = “”;
nextToken = 0;
}
bool parse(char* fn){
ifstream fin;
string s;
fin.open(fn);
if( fin.bad() ){
error = “Cannot open the file”;
fin.close();
return false;
}
nextToken = fin.get();
//Parse through entire file
while( !fin.eof() ){
//Read the next token and according to the type
readToken( fin, s );
if( isKeyword(s) )
addSymbol(s, KEYWORD);
else if ( isInteger(s) ){
if ( nextToken == ‘.’ ){
//if “.” is found after integer then this will be real so parse
next token and
//ass real number
string s2;
nextToken = fin.get();
readToken(fin, s2);
s = s + “.” + s2;
addSymbol( s, REAL );
} else
addSymbol(s, INTEGER);
}
else if ( isSpecialChar(s) )
addSymbol( s, SPECIAL );
else if( isIdentifier(s) )
addSymbol(s, IDENTIFIER);
else{
error = “Error occurred. Unknown token “” + s + “””;
return false;
}
}
fin.close();
return true;
}
string getError(void){
return error;
}
//This function is used to read the next token. Token are
seperated by delimiters and special
//symbols. this function identifies those delimiters and Sp.Chars
and returns the token
void readToken(ifstream& fin, string &s){
char scanstring[] = “nt ()[]+-=,;.”;
char spacestring[] = “nt “;
char specialstring[] = “()[]+-=,;”;
char token[MAX_TOK_SIZE];
int i=0;
//Special Characters are to be handled seperately
if( strchr(specialstring, nextToken) ){
token[i++] = nextToken;
nextToken = fin.get();
while( strchr(spacestring, nextToken) && fin.good()
){
nextToken = fin.get();
}
token[i] = ”;
s = string(token);
return;
}
//This loop will handle all tokens except special characters
while ( fin.good() ){
if( strchr(scanstring, nextToken) ){
while( strchr(spacestring, nextToken) && fin.good()
){
nextToken = fin.get();
}
token[i] = ”;
s = string(token);
return;
}
else
token[i++] = nextToken;
nextToken = fin.get();
}
}
bool isKeyword(string& s){
for(int i=0; i=’a’ && s.at(i) <='z') || (s.at(i) >=’A’ && s.at(i) <='Z') ) ) return false; return true; } bool isInteger(string& s){ for(int i=0; i‘9’)
return false;
return true;
}
/*
* This function will add symbols to the symbolTable. If a symbols
already exists it increases
* the symbol count instead to adding it again.
*/
void addSymbol(string& tok, tokenClass tc){
for( int i=0; i

Leave a Comment