I have written a simple tokenizer that will split a command line into seperate lines each containing a single word. I am trying to ...
Make the program close if the first word of a command line is "quit"
Recognize instructions such as "Pickup", "Save", and "Go" in which the compiler will then look to the next token.
My idea has been to use a simple switch with cases to check for these commands, but I cannot figure out where to place it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char command[256];
int commandIndex;
char token[32];
int isWhiteSpace(char character) {
if (character == ' ') {
return 1;
}
else if(character == '\t') {
return 1;
}
else if(character < ' ') {
return 1;
}
else {
return 0;
}
} char* getToken() {
int index = 0; // Skip white spaces
while(commandIndex<256 && isWhiteSpace(command[commandIndex])) {
commandIndex ++;
} // If at end of line return empty token
if(commandIndex>=256) {
token[0] = 0;
return token;
} // Capture token
while(commandIndex<256 && !isWhiteSpace(command[commandIndex])) {
token[index] = command[commandIndex];
index++;
commandIndex ++;
}
token[index] = 0;
return token;
}
void main() {
printf("Zeta - Version 2.0\n");
while(1) {
printf("Command: ");
gets_s(command);
commandIndex = 0;
char* token = getToken();
while (strcmp(token,"") != 0) {
printf("%s\n", token);
token = getToken();
}
}
}
A little reorganization of the loop you have in main will do it.
int main() {
printf("Zeta - Version 2.0\n");
bool done = false;
while (!done) {
printf("Command: ");
gets_s(command);
commandIndex = 0;
char* token = getToken();
if (strcmp(token, "quit") == 0) {
done = true;
} else if (strcmp(token, "pickup") == 0) {
doPickup();
} else if (strcmp(token, "save") == 0) {
char * filename = getToken();
doSave(filename);
} ...
}
return 0;
}
You can't use a switch statement with strings, so you just use a bunch of if ... else if ... statements to check for each command. There are other approaches, but this one required the fewest changes from the code you already have.
In the example, under the handling for "save" I showed how you can just call getToken again to get the next token on the same command line.
(Note that I also fixed the return value for main. Some compilers will let you use void, but that's not standard so it's best if you don't do that.)
Related
My program is exiting after giving one command every time and I am unable to find a logical reason why. I have checked all my loops and if-statements for exit codes but was not able to located any.
the program includes many classes and functions, but here is main:
int main()
{
int local_location = 0;
vector<string>Inventory = { "", "", "" };
unordered_set<string> excl = { "in", "on", "the", "with" };
string word;
array<string, 2> command;
size_t n = 0;
command.at(1) = "";
command.at(0) = "";
while (n < command.size() && cin >> word) {
auto search = excl.find(word);
if (search == excl.end())
command.at(n++) = word;
}
if (command.at(0) == "look") {
look(command.at(1), local_location, Inventory);
}
else if (command.at(0) == "get") {
look(command.at(1), local_location, Inventory);
}
else if (command.at(0) == "drop") {
look(command.at(1), local_location, Inventory);
}
else if (command.at(0) == "bag") {
bag(Inventory);
}
else if (command.at(0) == "go") {
look(command.at(1), local_location, Inventory);
}
}
Loop over standard input and reset the condition on n after processing the command.
while(cin>>word)
{
auto search = excl.find(word);
if (search == excl.end())
command.at(n++) = word;
if (n== command.size())
{
// process the command
// reset n=0
}
}
I admittedly am an extreme C++ novice, so please forgive me for my probably very naive question.
I am writing code that should parse an assembly language file in its fundamental parts, to be translated into machine language at a second stage.
I have built a parser class, but I am not having success in opening the external assembly .asm textfile, and in feeding it to the various functions that compose my parser class.
More in particular, there are problems with the constructor.
I attach the full code I wrote below:
// parses .asm assembly files
#include <iostream>
#include <fstream>
#include <varargs.h>
#include <string>
using namespace std;
class parser
{
private:
istream inputfile;
char inputname[30];
string line;
bool endfile;
bool a_command, l_command, c_command;
string parsedLine, destParsedLine, compParsedLine, jumpParsedLine;
public:
// default parser constructor
parser()
{
}
//parser(char* argv[])
//{
// reader(argv[]);
//}
// opens input file
string reader(char* argv[])
{
strcpy(inputname,argv[1]);
strcat(inputname,".asm");
// opens input .asm file
ifstream inputfile(inputname);
// reads first line
getline(inputfile,line);
if (line[0] == '/' || line.empty())
inputfile.ignore(line.length(),'\n');
return line;
}
// checks if at end file
bool hasMoreCommands()
{
a_command = false;
l_command = false;
c_command = false;
endfile = false;
if (inputfile.eof())
endfile = true;
return endfile;
}
// advances read of inputfile
void advance()
{
if (line[0] == '/' || line.length() == 0)
inputfile.ignore(line.length(),'\n');
getline(inputfile,line);
}
/* function for labelling the type of command (address,computation,label) */
bool commandType()
{
if (line[0] == '#')
a_command = true;
else if (line[0] == '(')
l_command = true;
else
c_command = true;
return a_command, l_command, c_command;
}
// function to select parsing function
string selector()
{
if (a_command || l_command)
symbol();
else if (c_command)
{
dest();
comp();
jump();
string parsedLine = destParsedLine + compParsedLine + jumpParsedLine;
}
return parsedLine;
}
// function returning address or label symbol
string symbol()
{
if (a_command)
string parsedLine = line.substr(1);
else if (l_command)
string parsedLine = line.substr(1,line.length()-1);
return parsedLine;
}
// functions returning computation destination
string dest()
{
size_t equal = line.find('='); //no '=' found = returns 'npos'
string destParsedLine = line.substr(0,equal);
return destParsedLine;
}
string comp()
{
size_t equal = line.find('=');
size_t semicolon = line.find(';');
string compParsedLine = line.substr(equal,semicolon);
return compParsedLine;
}
string jump()
{
size_t semicolon = line.find(';');
string jumpParsedLine = line.substr(semicolon);
return jumpParsedLine;
}
};
// main program
int main (int argc, char *argv[])
{
bool endfile = false;
string parsedLine;
int count = 0;
if ((argc != 2) || (strchr(argv[1],'.') != NULL))
{
cout << argv[0] << ": assembly .asm file argument should be supplied, without .asm extension\n";
return 1;
}
parser attempt1 = parser();
attempt1.reader(argv[]);
while (!endfile)
{
attempt1.hasMoreCommands();
if (endfile)
return 0;
if (count > 0)
attempt1.advance();
attempt1.commandType();
attempt1.selector();
cout << parsedLine << endl; //debugging purposes
count++;
}
}
I provide the name of the .asm textfile to be opened, from the command line (.asm file located in the same folder of this cpp file).
Hence I need to use varargs.h which I suppose may be part of the problem.
When I try to build this, visual studio 2008 gives me the following 2 errors:
1 error C2512: 'std::basic_istream<_Elem,_Traits>' : no appropriate default constructor available line 21
2 error C2059: syntax error : ']' line 137
Help appreciated, and insults tolerated, thanks :)
Your class uses std::istream for the inputfile member, but does not initialize it. That will not work.
In this situation, your class would need to use std::ifstream instead for its inputfile member, and then call its open() method before trying to read from it.
Also, your reader() method is ignoring the inputfile member, instead creating a local variable of the same name to read from. You need to get rid of that local variable, and instead call open() on your class member.
Following #Remy Lebeau suggestions, the modified code below at least compiles correctly (still does not do what it is supposed to do though)
// parses .asm assembly files
#include <iostream>
#include <fstream>
#include <varargs.h>
#include <string>
using namespace std;
class parser
{
private:
istream inputfile;
char inputname[30];
string line;
bool endfile;
bool a_command, l_command, c_command;
string parsedLine, destParsedLine, compParsedLine, jumpParsedLine;
public:
// default parser constructor
parser()
{
}
// ignores inputfile line if comment or empty
void ignoreline()
{
if (line[0] == '/' || line.empty())
inputfile.ignore(line.length(),'\n');
}
// composes inputfile name and opens input file
void reader(char* argv[])
{
strcpy(inputname,argv[1]);
strcat(inputname,".asm");
// opens input .asm file
inputfile.open(inputname, fstream::in);
// reads first line
getline(inputfile,line);
ignoreline();
}
// checks if at end file
bool hasMoreCommands()
{
a_command = false;
l_command = false;
c_command = false;
endfile = false;
if (inputfile.eof())
endfile = true;
return endfile;
}
// advances read of inputfile
void advance()
{
ignoreline();
getline(inputfile,line);
}
/* function for labelling the type of command (address,computation,label) */
bool commandType()
{
if (line[0] == '#')
a_command = true;
else if (line[0] == '(')
l_command = true;
else
c_command = true;
return a_command, l_command, c_command;
}
// function to select parsing function
string selector()
{
if (a_command || l_command)
symbol();
else if (c_command)
{
dest();
comp();
jump();
string parsedLine = destParsedLine + compParsedLine + jumpParsedLine;
}
return parsedLine;
}
// function returning address or label symbol
string symbol()
{
if (a_command)
string parsedLine = line.substr(1);
else if (l_command)
string parsedLine = line.substr(1,line.length()-1);
return parsedLine;
}
// functions returning computation destination
string dest()
{
size_t equal = line.find('='); //no '=' found = returns 'npos'
string destParsedLine = line.substr(0,equal);
return destParsedLine;
}
string comp()
{
size_t equal = line.find('=');
size_t semicolon = line.find(';');
string compParsedLine = line.substr(equal,semicolon);
return compParsedLine;
}
string jump()
{
size_t semicolon = line.find(';');
string jumpParsedLine = line.substr(semicolon);
return jumpParsedLine;
}
};
// main program
int main (int argc, char *argv[])
{
bool endfile = false;
string parsedLine;
int count = 0;
if ((argc != 2) || (strchr(argv[1],'.') != NULL))
{
cout << argv[0] << ": assembly .asm file argument should be supplied, without .asm extension\n";
return 1;
}
parser attempt1 = parser();
attempt1.reader(argv);
while (!endfile)
{
attempt1.hasMoreCommands();
if (endfile)
return 0;
if (count > 0)
attempt1.advance();
attempt1.commandType();
attempt1.selector();
cout << parsedLine << endl;
count++;
}
return 0;
}
I'm trying to create a lexer for a functional language, one of the methods of which should allow, on each call, to return the next token of a file.
For example :
func main() {
var MyVar : integer = 3+2;
}
So I would like every time the next method is called, the next token in that sequence is returned; in that case, it would look like this :
func
main
(
)
{
var
MyVar
:
integer
=
3
+
2
;
}
Except that the result I get is not what I expected:
func
main(
)
{
var
MyVar
:
integer
=
3+
2
}
Here is my method:
token_t Lexer::next() {
token_t ret;
std::string token_tmp;
bool IsSimpleQuote = false; // check string --> "..."
bool IsDoubleQuote = false; // check char --> '...'
bool IsComment = false; // check comments --> `...`
bool IterWhile = true;
while (IterWhile) {
bool IsInStc = (IsDoubleQuote || IsSimpleQuote || IsComment);
std::ifstream file_tmp(this->CurrentFilename);
if (this->eof) break;
char chr = this->File.get();
char next = file_tmp.seekg(this->CurrentCharIndex + 1).get();
++this->CurrentCharInCurrentLineIndex;
++this->CurrentCharIndex;
{
if (!IsInStc && !IsComment && chr == '`') IsComment = true; else if (!IsInStc && IsComment && chr == '`') { IsComment = false; continue; }
if (IsComment) continue;
if (!IsInStc && chr == '"') IsDoubleQuote = true;
else if (!IsInStc && chr == '\'') IsSimpleQuote = true;
else if (IsDoubleQuote && chr == '"') IsDoubleQuote = false;
else if (IsSimpleQuote && chr == '\'') IsSimpleQuote = false;
}
if (chr == '\n') {
++this->CurrentLineIndex;
this->CurrentCharInCurrentLineIndex = -1;
}
token_tmp += chr;
if (!IsInStc && IsLangDelim(chr)) IterWhile = false;
}
if (token_tmp.size() > 1 && System::Text::EndsWith(token_tmp, ";") || System::Text::EndsWith(token_tmp, " ")) token_tmp.pop_back();
++this->NbrOfTokens;
location_t pos;
pos.char_pos = this->CurrentCharInCurrentLineIndex;
pos.filename = this->CurrentFilename;
pos.line = this->CurrentLineIndex;
SetToken_t(&ret, token_tmp, TokenList::ToToken(token_tmp), pos);
return ret;
}
Here is the function IsLangDelim :
bool IsLangDelim(char chr) {
return (chr == ' ' || chr == '\t' || TokenList::IsSymbol(CharToString(chr)));
}
TokenList is a namespace that contains the list of tokens, as well as some functions (like IsSymbol in this case).
I have already tried other versions of this method, but the result is almost always the same.
Do you have any idea how to improve this method?
The solution for your problem is using a std::regex. Understanding the syntax is, in the beginning, a little bit difficult, but after you understand it, you will always use it.
And, it is designed to find tokens.
The specific critera can be expressed in the regex string.
For your case I will use: std::regex re(R"#((\w+|\d+|[;:\(\)\{\}\+\-\*\/\%\=]))#");
This means:
Look for one or more characters (That is a word)
Look for one or more digits (That is a integer number)
Or look for all kind of meaningful operators (Like '+', '-', '{' and so on)
You can extend the regex for all the other stuff that you are searching. You can also regex a regex result.
Please see example below. That will create your shown output from your provided input.
And, your described task is only one statement in main.
#include <iostream>
#include <string>
#include <algorithm>
#include <regex>
// Our test data (raw string) .
std::string testData(
R"#(func main() {
var MyVar : integer = 3+2;
}
)#");
std::regex re(R"#((\w+|\d+|[;:\(\)\{\}\+\-\*\/\%\=]))#");
int main(void)
{
std::copy(
std::sregex_token_iterator(testData.begin(), testData.end(), re, 1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n")
);
return 0;
}
You try to parse using single loop, which makes the code very complicated. Instead i suggest something like this:
struct token { ... };
struct lexer {
vector<token> tokens;
string source;
unsigned int pos;
bool parse_ident() {
if (!is_alpha(source[pos])) return false;
auto start = pos;
while(pos < source.size() && is_alnum(source[pos])) ++pos;
tokens.push_back({ token_type::ident, source.substr(start, pos - start) });
return true;
}
bool parse_num() { ... }
bool parse_comment() { ... }
...
bool parse_whitespace() { ... }
void parse() {
while(pos < source.size()) {
if (!parse_comment() && !parse_ident() && !parse_num() && ... && !parse_comment()) {
throw error{ "unexpected character at position " + std::to_string(pos) };
}
}
}
This is standard structure i use, when lexing my files in any scripting language i've written. Lexing is usually greedy, so you don't need to bother with regex (which is effective, but slower, unless some crazy template based implementation). Just define your parse_* functions, make sure they return false, if they didn't parsed a token and make sure they are called in correct order.
Order itself doesn't matter usually, but:
operators needs to be checked from longest to shortest
number in style .123 might be incorrectly recognized as . operator (so you need to make sure, that after . there is no digit.
numbers and identifiers are very lookalike, except that identifiers starts with non-number.
So my task is to fill out my function to work with a test driver that feeds it a random string during every run. For this function I have to convert the first character of every word to a capital and everything else must be lower.
It mostly works but the issue i'm having with my code is that it won't capitalize the very first character and if there is a period before the word like:
.word
The 'w' in this case would remain lower.
Here is my source:
void camelCase(char line[])
{
int index = 0;
bool lineStart = true;
for (index;line[index]!='\0';index++)
{
if (lineStart)
{
line[index] = toupper(line[index]);
lineStart = false;
}
if (line[index] == ' ')
{
if (ispunct(line[index]))
{
index++;
line[index] = toupper(line[index]);
}
else
{
index++;
line[index] = toupper(line[index]);
}
}else
line[index] = tolower(line[index]);
}
lineStart = false;
}
Here's a solution that should work and is a bit less complicated in my opinion:
#include <iostream>
#include <cctype>
void camelCase(char line[]) {
bool active = true;
for(int i = 0; line[i] != '\0'; i++) {
if(std::isalpha(line[i])) {
if(active) {
line[i] = std::toupper(line[i]);
active = false;
} else {
line[i] = std::tolower(line[i]);
}
} else if(line[i] == ' ') {
active = true;
}
}
}
int main() {
char arr[] = "hELLO, wORLD!"; // Hello, World!
camelCase(arr);
std::cout << arr << '\n';
}
The variable active tracks whether the next letter should be transformed to an uppercase letter. As soon as we have transformed a letter to uppercase form, active becomes false and the program starts to transform letters into lowercase form. If there's a space, active is set to true and the whole process starts again.
Solution using std::string
void toCamelCase(std::string & s)
{
char previous = ' ';
auto f = [&](char current){
char result = (std::isblank(previous) && std::isalpha(current)) ? std::toupper(current) : std::tolower(current);
previous = current;
return result;
};
std::transform(s.begin(),s.end(),s.begin(),f);
}
I have a stream of characters coming over the serial port like this;
FILE1,FILE2,FILE3,
I'm trying to read them in like this;
char* myFiles[20];
boolean done = false;
int fileNum = 0;
int charPos = 0;
char character;
while (!done)
{
if (Serial.available())
{
character = Serial.read();
if ((character == '\n') || (character == '\r'))
{
done = true;
}
else if (character == ',')
{
myFiles[fileNum][charPos] = '\0';
fileNum++;
charPos = 0;
}
else
{
myFiles[fileNum][charPos] = character;
charPos++;
}
}
}
when I try to print the first value like this;
Serial.println(myFiles[0]);
i get a continuous stream of characters.
What am i doing wrong?
What you are doing wrong is not allocating any memory for your strings.
Here's one way to do this
#include <vector>
#include <string>
std::vector<std::string> myFiles;
std::string file;
bool done = false;
char character;
while (!done)
{
if (Serial.available())
{
character = Serial.read();
if ((character == '\n') || (character == '\r'))
{
done = true;
}
else if (character == ',')
{
myfiles.push_back(file);
file = "";
}
else
{
file += character;
}
}
}
Serial.println(myFiles[0].c_str());
Since you are programming in C++ you should learn how to use std::vector and std::string, they will save you a lot of grief.
If std::vector and std::string are not available to you (apparently so on Arduino) then the quick hack would be to preallocate a fixed amount of memory for your strings by replacing
char* myFiles[20];
with
char myFiles[20][100];