I found a project done a few years ago found here that does some simple command line parsing. While I really like it's functionality, it does not support parsing special characters, such as <, >, &, etc. I went ahead and attempted to add some functionality to parse these characters specifically by adding some of the same conditions that the existing code used to look for whitespace, escape characters, and quotes:
bool _isQuote(char c) {
if (c == '\"')
return true;
else if (c == '\'')
return true;
return false;
}
bool _isEscape(char c) {
if (c == '\\')
return true;
return false;
}
bool _isWhitespace(char c) {
if (c == ' ')
return true;
else if(c == '\t')
return true;
return false;
}
.
.
.
What I added:
bool _isLeftCarrot(char c) {
if (c == '<')
return true;
return false;
}
bool _isRightCarrot(char c) {
if (c == '>')
return true;
return false;
}
and so on for the rest of the special characters.
I also tried the same approach as the existing code in the parse method:
std::list<string> parse(const std::string& args) {
std::stringstream ain(args); // iterates over the input string
ain >> std::noskipws; // ensures not to skip whitespace
std::list<std::string> oargs; // list of strings where we will store the tokens
std::stringstream currentArg("");
currentArg >> std::noskipws;
// current state
enum State {
InArg, // scanning the string currently
InArgQuote, // scanning the string that started with a quote currently
OutOfArg // not scanning the string currently
};
State currentState = OutOfArg;
char currentQuoteChar = '\0'; // used to differentiate between ' and "
// ex. "sample'text"
char c;
std::stringstream ss;
std::string s;
// iterate character by character through input string
while(!ain.eof() && (ain >> c)) {
// if current character is a quote
if(_isQuote(c)) {
switch(currentState) {
case OutOfArg:
currentArg.str(std::string());
case InArg:
currentState = InArgQuote;
currentQuoteChar = c;
break;
case InArgQuote:
if (c == currentQuoteChar)
currentState = InArg;
else
currentArg << c;
break;
}
}
// if current character is whitespace
else if (_isWhitespace(c)) {
switch(currentState) {
case InArg:
oargs.push_back(currentArg.str());
currentState = OutOfArg;
break;
case InArgQuote:
currentArg << c;
break;
case OutOfArg:
// nothing
break;
}
}
// if current character is escape character
else if (_isEscape(c)) {
switch(currentState) {
case OutOfArg:
currentArg.str(std::string());
currentState = InArg;
case InArg:
case InArgQuote:
if (ain.eof())
{
currentArg << c;
throw(std::runtime_error("Found Escape Character at end of file."));
}
else {
char c1 = c;
ain >> c;
if (c != '\"')
currentArg << c1;
ain.unget();
ain >> c;
currentArg << c;
}
break;
}
}
What I added in the parse method:
// if current character is left carrot (<)
else if(_isLeftCarrot(c)) {
// convert from char to string and push onto list
ss << c;
ss >> s;
oargs.push_back(s);
}
// if current character is right carrot (>)
else if(_isRightCarrot(c)) {
ss << c;
ss >> s;
oargs.push_back(s);
}
.
.
.
else {
switch(currentState) {
case InArg:
case InArgQuote:
currentArg << c;
break;
case OutOfArg:
currentArg.str(std::string());
currentArg << c;
currentState = InArg;
break;
}
}
}
if (currentState == InArg) {
oargs.push_back(currentArg.str());
s.clear();
}
else if (currentState == InArgQuote)
throw(std::runtime_error("Starting quote has no ending quote."));
return oargs;
}
parse will return a list of strings of the tokens.
However, I am running into issues with a specific test case when the special character is attached to the end of the input. For example, the input
foo-bar&
will return this list: [{&},{foo-bar}] instead of what I want: [{foo-bar},{&}]
I'm struggling to fix this issue. I am new to C++ so any advice along with some explanation would be great help.
When you handle one of your characters, you need to do the same sorts of things that the original code does when it encounters a space. You need to look at the currentState, then save the current argument if you are in the middle of one (and reset it since you no longer are in one).
Related
I have a comma-delimited string that I want to store in a string vector. The string and vectors are:
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v;
Ideally I want the strings 'abc' and 'test, 1' to be stored without the single quotes as below, but I can live with storing them with single quotes:
v[0] = "1";
v[1] = "10";
v[2] = "abc";
v[3] = "test, 1";
bool nextToken(const string &s, string::size_type &start, string &token)
{
token.clear();
start = s.find_first_not_of(" \t", start);
if (start == string::npos)
return false;
string::size_type end;
if (s[start] == '\'')
{
++start;
end = s.find('\'', start);
}
else
end = s.find_first_of(" \t,", start);
if (end == string::npos)
{
token = s.substr(start);
start = s.size();
}
else
{
token = s.substr(start, end-start);
if ((s[end] != ',') && ((end = s.find(',', end + 1)) == string::npos))
start = s.size();
else
start = end + 1;
}
return true;
}
string s = "1, 10, 'abc', 'test, 1'", token;
vector<string> v;
string::size_type start = 0;
while (nextToken(s, start, token))
v.push_back(token);
Demo
What you need to do here, is make yourself a parser that parses as you want it to. Here I have made a parsing function for you:
#include <string>
#include <vector>
using namespace std;
vector<string> parse_string(string master) {
char temp; //the current character
bool encountered = false; //for checking if there is a single quote
string curr_parse; //the current string
vector<string>result; //the return vector
for (int i = 0; i < master.size(); ++i) { //while still in the string
temp = master[i]; //current character
switch (temp) { //switch depending on the character
case '\'': //if the character is a single quote
if (encountered) encountered = false; //if we already found a single quote, reset encountered
else encountered = true; //if we haven't found a single quote, set encountered to true
[[fallthrough]];
case ',': //if it is a comma
if (!encountered) { //if we have not found a single quote
result.push_back(curr_parse); //put our current string into our vector
curr_parse = ""; //reset the current string
break; //go to next character
}//if we did find a single quote, go to the default, and push_back the comma
[[fallthrough]];
default: //if it is a normal character
if (encountered && isspace(temp)) curr_parse.push_back(temp); //if we have found a single quote put the whitespace, we don't care
else if (isspace(temp)) break; //if we haven't found a single quote, trash the whitespace and go to the next character
else if (temp == '\'') break; //if the current character is a single quote, trash it and go to the next character.
else curr_parse.push_back(temp); //if all of the above failed, put the character into the current string
break; //go to the next character
}
}
for (int i = 0; i < result.size(); ++i) {
if (result[i] == "") result.erase(result.begin() + i);
//check that there are no empty strings in the vector
//if there are, delete them
}
return result;
}
This parses your string as you want it to, and returns a vector. Then, you can use it in your program:
#include <iostream>
int main() {
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v = parse_string(s);
for (int i = 0; i < v.size(); ++i) {
cout << v[i] << endl;
}
}
and it properly prints out:
1
10
abc
test, 1
A proper solution would require a parser implementation. If you need a quick hack, just write a cell reading function (demo). The c++14's std::quoted manipulator is of great help here. The only problem is the manipulator requires a stream. This is easily solved with istringstream - see the second function. Note that the format of your string is CELL COMMA CELL COMMA... CELL.
istream& get_cell(istream& is, string& s)
{
char c;
is >> c; // skips ws
is.unget(); // puts back in the stream the last read character
if (c == '\'')
return is >> quoted(s, '\'', '\\'); // the first character of the cell is ' - read quoted
else
return getline(is, s, ','), is.unget(); // read unqoted, but put back comma - we need it later, in get function
}
vector<string> get(const string& s)
{
istringstream iss{ s };
string cell;
vector<string> r;
while (get_cell(iss, cell))
{
r.push_back( cell );
char comma;
iss >> comma; // expect a cell separator
if (comma != ',')
break; // cell separator not found; we are at the end of stream/string - break the loop
}
if (char c; iss >> c) // we reached the end of what we understand - probe the end of stream
throw "ill formed";
return r;
}
And this is how you use it:
int main()
{
string s = "1, 10, 'abc', 'test, 1'";
try
{
auto v = get(s);
}
catch (const char* e)
{
cout << e;
}
}
I can't understand how I would do this.
The input will be:
3
13894
30-something
-Ex42
and the output needs to be:
13894
30
Ex42
The main assignment is to make a function that converts a duodecimal number into the decimal format. I have figured that part out and don't need help with it. I've basically cut out all the code surrounding the duodecimal conversion and just included the stuff I can't figure out.
#include <iostream>
#include <string>
#include <iomanip>
using namespace std;
int to_decimal(const string& str);
int main () {
string str; // Initializes string str for input
cin.ignore (256, '\n'); //ignores the 3 from the input
while (getline(cin, str)) {
//runs str through to_decimal and outputs
cout << to_decimal(str) << endl;
}
}
int to_decimal(const string& str) {
int f = 0;
string localString; // Initialize local string variable
//sets local string to the same as inputted string
localString = str; //used for local string erasing
//This is the idea I have been working on and I cant figure it out
for (unsigned x = 0; x < localString.length(); x++) {
f = localString.at(x);
if (isdigit(f)) {
} else if (f == 'E'){
} else if (f == 'e') {
} else if (f == 'X') {
} else if (f == 'x') {
} else if (f == '-') {
} else if (f == ' ') {
} else {
f = localString.length() - x;
localString.erase(x, f);
break;
}
}
}
I am a bit confused. You say that you need to convert duodecimal numbers to decimal, however in your sample output only the line that has Ex is converted, yet 30-something stays 30, as if it is not converted - and 30 in duodecimal is 36 in decimal. Same for the number 13894.
Assuming that you really want to convert all of the lines from duodecimal to decimal, you can base your solution on the standard library function std::stoi() which can convert a string from most number bases up to 36. It requires that the digits bigger than 9 are encoded using the letters in alphabetic order - A to Z. So you need to simply convert all you x to a and all you e to b. Example:
int to_decimal(const string& str) {
bool foundDigit = false;
std::string transformedString;
for (auto c : str) {
if (std::isdigit(c) || c == 'E' || c =='e' || c == 'X' || c == 'x') {
foundDigit = true;
// If needed, convert the character.
if (c == 'E' || c == 'e') {
c = 'b';
} else if (c == 'X' || c == 'x') {
c = 'a';
}
transformedString += c;
} else if (foundDigit) {
// Skip everything to the end of the line, if we've already found some digits
break;
}
}
return std::stoi(transformedString, 0, 12);
}
If you just want to extract the characters and then do the conversion yourself, then you can do something like this:
#include <iostream>
#include <string>
#include <sstream>
bool isNumber(const char c)
{
switch (c) {
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
case '8':
case '9':
case '0':
case 'e':
case 'E':
case 'x':
case 'X':
case '-':
return true;
default:
return false;
}
}
std::string getNumber(std::istream& in)
{
std::stringstream s;
for (char c;in.get(c);)
{
if (isNumber(c)) {
s << c;
break;
}
}
for (char c;in.get(c);)
{
if (!isNumber(c))
break;
s << c;
}
return s.str();
}
int main()
{
std::string bla = "3\n13894\n 30-something\n-Ex42\n";
std::stringstream klaf{ bla };
for (std::string s;(s = getNumber(klaf)) != "";) //<- use a local stringstream as input to test
//for (std::string s;(s = getNumber(std::cin)) != "";) //<- use std::cin for input
{
std::cout << s << '\n';
}
}
This outputs:
3
13894
30-
e
-Ex42
So, not exactly what you were after, but it should at least get you a starting point to improve from. For example, you may want to remove - from isNumber and then change to logic in getNumber to only accept it as the first character in a new number.
I was reading the following question Parsing a comma-delimited std::string on how to split a string by a comma (Someone gave me the link from my previous question) and one of the answers was:
stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;
while( ss.good() )
{
string substr;
getline( ss, substr, ',' );
result.push_back( substr );
}
But what if my string was like the following, and I wanted to separate values only by the bold commas and ignoring what appears inside <>?
<a,b>,<c,d>,,<d,l>,
I want to get:
<a,b>
<c,d>
"" //Empty string
<d,l>
""
Given:<a,b>,,<c,d> It should return: <a,b> and "" and <c,d>
Given:<a,b>,<c,d> It should return:<a,b> and <c,d>
Given:<a,b>, It should return:<a,b> and ""
Given:<a,b>,,,<c,d> It should return:<a,b> and "" and "" and <c,d>
In other words, my program should behave just like the given solution above separated by , (Supposing there is no other , except the bold ones)
Here are some suggested solution and their problems:
Delete all bold commas: This will result in treating the following 2 inputs the same way while they shouldn't
<a,b>,<c,d>
<a,b>,,<c,d>
Replace all bold commas with some char and use the above algorithm: I can't select some char to replace the commas with since any value could appear in the rest of my string
Adding to #Carlos' answer, apart from regex (take a look at my comment); you can implement the substitution like the following (Here, I actually build a new string):
#include <algorithm>
#include <iostream>
#include <string>
int main() {
std::string str;
getline(std::cin,str);
std::string str_builder;
for (auto it = str.begin(); it != str.end(); it++) {
static bool flag = false;
if (*it == '<') {
flag = true;
}
else if (*it == '>') {
flag = false;
str_builder += *it;
}
if (flag) {
str_builder += *it;
}
}
}
Why not replace one set of commas with some known-to-not-clash character, then split it by the other commas, then reverse the replacement?
So replace the commas that are inside the <> with something, do the string split, replace again.
I think what you want is something like this:
vector<string> result;
string s = "<a,b>,,<c,d>"
int in_string = 0;
int latest_comma = 0;
for (int i = 0; i < s.size(); i++) {
if(s[i] == '<'){
result.push_back(s[i]);
in_string = 1;
latest_comma = 0;
}
else if(s[i] == '>'){
result.push_back(s[i]);
in_string = 0;
}
else if(!in_string && s[i] == ','){
if(latest_comma == 1)
result.push_back('\n');
else
latest_comma = 1;
}
else
result.push_back(s[i]);
}
Here is a possible code that scans a string one char at a time and splits it on commas (',') unless they are masked between brackets ('<' and '>').
Algo:
assume starting outside brackets
loop for each character:
if not a comma, or if inside brackets
store the character in the current item
if a < bracket: note that we are inside brackets
if a > bracket: note that we are outside brackets
else (an unmasked comma)
store the current item as a string into the resulting vector
clear the current item
store the last item into the resulting vector
Only 10 lines and my rubber duck agreed that it should work...
C++ implementation: I will use a vector to handle the current item because it is easier to build it one character at a time
std::vector<std::string> parse(const std::string& str) {
std::vector<std::string> result;
bool masked = false;
std::vector<char> current; // stores chars of the current item
for (const char c : str) {
if (masked || (c != ',')) {
current.push_back(c);
switch (c) {
case '<': masked = true; break;
case '>': masked = false;
}
}
else { // unmasked comma: store item and prepare next
current.push_back('\0'); // a terminating null for the vector data
result.push_back(std::string(¤t[0]));
current.clear();
}
}
// do not forget the last item...
current.push_back('\0');
result.push_back(std::string(¤t[0]));
return result;
}
I tested it with all your example strings and it gives the expected results.
Seems quite straight forward to me.
vector<string> customSplit(string s)
{
vector<string> results;
int level = 0;
std::stringstream ss;
for (char c : s)
{
switch (c)
{
case ',':
if (level == 0)
{
results.push_back(ss.str());
stringstream temp;
ss.swap(temp); // Clear ss for the new string.
}
else
{
ss << c;
}
break;
case '<':
level += 2;
case '>':
level -= 1;
default:
ss << c;
}
}
results.push_back(ss.str());
return results;
}
I am learning c++ so bear with me and apologize for any idiocy beforehand.
I am trying to write some code that matches the first word on each line in a file called "command.txt" to either "num_lines", "num_words", or "num_chars".
If the first word of the first line does not match the previously mentioned words, it reads the next line.
Once it hits a matching word (first words only!) it prints out the matching word.
Here is all of my code:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
ifstream comm_in("commands.txt"); // opens file
string command_name = "hi"; // stores command from file
bool is_command() {
if (command_name == "num_words" || command_name == "num_chars" || command_name == "num_lines") {
return true;
} else {
return false;
}
}
// FIND a first word of a line in file THAT MATCHES "num_words", "num_chars" or "num_lines"
void get_command() {
string line;
char c;
while (!is_command()) { // if command_name does not match a command
// GET NEXT LINE OF FILE TO STRING
getline(comm_in, line);
// SUPPOSED TO GET THE FIRST WORD OF A STRING (CANT USE SSTREAM)
for (int i = 0; i < line.size(); i++) { // increment through line
c = line[i]; // assign c as index value of line
if (c == ' ' || c == '\t') { // if c is a space/tab
break; // end for loop
} else {
command_name += c; // concatenate c to command_name
} // if
} // for
} // while
return;
}
int main() {
get_command();
cout << command_name; // supposed to print "num_lines"
}
The contents of the command.txt file:
my bear is happy
and that it
great ha
num_lines sigh
It compiles properly, but when I run it in my terminal, nothing shows up; it doesn't seem to ever stop loading.
How can I fix this?
Unless you really want to hate yourself in the morning (so to speak) you want to get out of the habit of using global variables. You'll also almost certainly find life easier if you break get_command into (at least) two functions, one specifically to get the first word from the string containing the line.
I'd write the code more like this:
bool is_cmd(std::string const &s) {
return s == "num_words" || s == "num_chars" || s == "num_lines";
}
std::string first_word(std::istream &is) {
std::string line, ret;
if (std::getline(is, line)) {
auto start = line.find_first_not_of(" \t");
auto end = line.find_first_of(" \t", start);
ret = line.substr(start, end - start);
}
return ret;
}
void get_command(std::istream &is) {
std::string cmd;
while (!(cmd = first_word(is)).empty())
if (is_cmd(cmd)) {
std::cout << cmd;
break;
}
}
This still isn't perfect (e.g., badly formed input could still cause it to fail) but at least it's a move in what I'd say is a better direction.
If something goes wrong and you reach the end of file the loop will never stop. You should change getline(comm_in, line) to if(!getline(comm_in, line)) break;, or better yet, use that as the condition for the loop.
You also have to reset command_name for each pass:
while(getline(comm_in, line))
{
command_name = "";
for(int i = 0; i < line.size(); i++)
{
c = line[i];
if(c == ' ' || c == '\t')
break;
else
command_name += c;
}
if(is_command())
break;
}
// FIND a first word of a line in file THAT MATCHES "num_words", "num_chars" or "num_lines"
void get_command()
{
string line;
char c;
while (!is_command()) { // if command_name does not match a command
// GET NEXT LINE OF FILE TO STRING
if(getline(comm_in, line),comm_in.fail()){
// end reading
break;
}
//clear
command_name = "";
// SUPPOSED TO GET THE FIRST WORD OF A STRING (CANT USE SSTREAM)
for (int i = 0; i < line.size(); i++) { // increment through line
c = line[i]; // assign c as index value of line
if (c == ' ' || c == '\t') { // if c is a space/tab
break; // end for loop
} else {
command_name += c; // concatenate c to command_name
} // if
} // for
} // while
return;
}
The key of this problem is that you didn't clear the command_name.
What's more, you have to add a judge about whether reaching the end of the file.
ps: if(getline(comm_in, line),comm_in.fail()) is equal to if(getline(comm_in, line)),
In the below code the "\b" removes a char from the string, but it increases its size as if the char could be inside it but not visible.
while (true) {
c = _getch();
if (c=='\r') {break;}
else if (c=='\b') { cout<<"\b"<<" "<<"\b"; s+="\b \b"; }
else {cout<<"*"; s=s+c;}
}
For instance the the size of this string (abc"\b"d), "c is removed and replaced by d", is still 5.
I would like to know how to efficiently handle the backspace in this circumstance.
If you are reading character by character into a string, you could do something like this:
std::string mystr;
while (true) {
c = _getch();
if (c=='\r') {break;}
if(c == '\b')
{
// This will remove last character from your string
if(mystr.size () > 0)
{
cout<<"\b"<<" "<<"\b";
mystr.resize (mystr.size () - 1);
// or mystr.pop_back() in C++11
}
}
else
{
cout<<"*";
mystr += c;
}
}
You need to "physically" remove the last character from the string when you get a backspace:
while (true) {
c = _getch();
if (c=='\r') {
break;
}
if (c=='\b') {
cout<<"\b"<<" "<<"\b";
if (s.length() > 0) {
s = s.substring(0, s.length()-1);
}
}
else {cout<<"*"; s=s+c;}
}
As an optimization, we can trim s instead of reassigning, as suggested by Jason:
s.resize(s.size() -1);
(While we're at it, we could save s.length() (or s.size()) into a local variable to avoid the extra call - assuming the compiler, knowing about std::string, doesn't do it already).
for(char c=_getch(); c!='\r'; c=_getch())
if(c=='\b')
mystr.pop_back();
else
mystr.push_back(c);