C++ string parsing - c++

All:
I got one question in string parsing:
For now, if I have a string like "+12+400-500+2:+13-50-510+20-66+20:"
How can I do like calculate total sum of each segment( : can be consider as end of one segment). For now, what I can figure out is only use for to loop through and check +/- sign, but I do not think it is good for a Universal method to solve this kind of problem :(
For example, the first segment, +12+400-500+2 = -86, and the second segment is
+13-50-510+20-66+20 = -573
1) The number of operand is varied( but they are always integer)
2) The number of segment is varied
3) I need do it in C++ or C.
I do not really think it as a very simple question to most newbie, and also I will claim this is not a homework. :)
best,

Since the string ends in a colon, it is easy to use find and substr to separate out parts of the string partitioned by ':', like this:
string all("+12+400-500+2:+13-50-510+20-66+20:");
int pos = 0;
for (;;) {
int next = all.find(':', pos);
if (next == string::npos) break;
string expr(all.substr(pos, (next-pos)+1));
cout << expr << endl;
pos = next+1;
}
This splits the original string into parts
+12+400-500+2:
and
+13-50-510+20-66+20:
Since istreams take leading plus as well as leading minus, you can parse out the numbers using >> operator:
istringstream iss(expr);
while (iss) {
int n;
iss >> n;
cout << n << endl;
}
With these two parts in hand, you can easily total up the individual numbers, and produce the desired output. Here is a quick demo.

You need to seperate operands and operators. To do this you can use two queue data types one for operands and one for operators

split by :, then by +, then by -. translate into int and there you are.

Your expression language seems regular: you could use a regex library - like boost::regex - to match the numbers, the signs, and the segments in groups directly, with something like
((([+-])([0-9]+))+)(:((([+-])([0-9]))+))+

Related

How does "to_string" work in array of string?

I want to combine the first 4 characters of each line in txt file and compare it with the keyword I have, but
when I combine the characters, I get the sum of these 4 characters' ascii numbers(whatever).
How can I solve this problem. My code is here:
When I debuged, I saw the string search(variable) was 321.
int main() {
ifstream file("sentence.txt");
if (file.is_open()) {
string line;
while (getline(file, line)) {
string search = to_string(line[0] + line[1] + line[2]); // you see what I mean
if ("dog" == search) {
cout << "there is dog";
}
else {
cout << "there is no dog"<<endl;
}
}
}
}
The function std::to_string() is designed to convert a number into a string representation. It is not what you need.
There is no need to create the new string search to check whether the string line starts with the string "dog".
Creating the new string search is inefficient.
Instead, you could write for example
if ( line.compare( 0, 3, "dog" ) == 0 ) {
cout << "there is dog";
}
else {
cout << "there is no dog" << endl;
}
Or, if your compiler supports C++20, you can also write:
if ( line.starts_with( "dog" ) ) {
cout << "there is dog";
}
else {
cout << "there is no dog" << endl;
}
line[0], line[1], and line[2] are chars, not std::strings. char is an integer type, so adding two chars together results in a single integer that is the sum of the two operands. It does not produce a std::string that is the concatenation of the two chars.
To get a substring of a std::string use the substr member function:
std::string search = line.substr(0, 3);
Or, if you actually need to construct a std::string from individual chars, use the constructor that accepts a std::initializer_list<char>:
std::string search{line[0], line[1], line[2]};
A string made from the first characters of line can be obtained via std::substr. In this case I'd actually prefer the constructor that takes two iterators:
std::string first3chars{line.begin(),line.begin()+3};
Take care of lines that are less than 3 characters.
Your code adds the values of three chars. Adding chars via + does not concatenate them, and if it would why call std::to_string on the result? char is an integer type and what you see as 321 is the result of adding the number representations of the first 3 characters in line.
Is there a way for you to cast those chars (which appear to be integer type for some reason) into char type once again. Perhaps that ought to resolve the issue in case the "to_string" concatenates those 3 inputs into one; additionally intelli-sense should do the trick of explaining parameter usage and returning value.
The problem with this code is that when you access an element of a string you get a character which is an ASCII number, when you try to sum two characters you are adding their ASCII codes.
In your specific case, as you want sequential characters, the best solution would probably be to use the substr function (documentation) for strings. Otherwise, you would probably need to convert one of the characters to a string and then “add” the other characters to it.

Why when a character array is compared to another character array, output is wrong but when character array is compared to a string output is correct? [duplicate]

This question already has an answer here:
C++ string and string literal comparison
(1 answer)
Closed 1 year ago.
Question - The translation from the Berland language into the Birland language is not an easy task. Those languages are very similar: a berlandish word differs from a birlandish word with the same meaning a little: it is spelled (and pronounced) reversely. For example, a Berlandish word code corresponds to a Birlandish word edoc. However, it's easy to make a mistake during the «translation». Vasya translated word s from Berlandish into Birlandish as t. Help him: find out if he translated the word correctly.
Input -
The first line contains word s, the second line contains word t. The words consist of lowercase Latin letters. The input data do not consist unnecessary spaces. The words are not empty and their lengths do not exceed 100 symbols.
Output -
If the word t is a word s, written reversely, print YES, otherwise print NO.
When I write this code, the output is wrong -
int main(){
char s[100000],a[100000];
cin >> s >> a;
strrev(s);
if(s==a){
cout << "YES";
}else{cout << "NO";}
}
But when I write this code, the output is correct -
int main(){
char s[100000];
string a;
cin >> s >> a;
strrev(s);
if(s==a){
cout << "YES";
}else{cout << "NO";}
}
Why is it like this, is there a rule that a character array cannot be compared to another character array and if so, how can it be compared to a string?
Remember that arrays naturally decay to pointers to their first elements, and it's such pointers that you are comparing.
In short, what you're really doing is:
if(&s[0] == &a[0])
And those two pointers will never be equal.
To compare the contents of character arrays, you need to use strcmp() or similar function instead, eg:
if(strcmp(s, a) == 0)
Since you're programming in C++, please use std::string for all your strings. There are overloads for the == operator that do the right thing if you have std::string values.

How to arrange the loops to check for numbers

I have a program that reads a credit card number. I want to add in something that makes sure that 16 numbers are added in, no letters, and as many spaces as wanted (although they don't count towards numbers). Is there a function or set of functions to do this, or should I just make a bunch of while and if loops that use isdigit() and isalpha() that goes through the array one element at a time?
char cardNum[32];
cout << "Enter credit card number: ";
cin.getline(cardNum, 32); //Read in the entire line for the name
There are numerous things you could do. One idea is to use std::find_if with a custom predicate. For example:
bool IsCharIllegal(char ch)
{
// return true or false based on whatever your exact requirements are
}
Then:
auto itFound = std::find_if(cardNum, cardNum + 32, IsCharIllegal);
if(itFound != cardNum + 32)
// invalid character was entered!
There is a case to be made for using std::string instead of a raw char array too:
std::string cardNum;
cout << "Enter credit card number: ";
std::cin >> cardNum;
Followed by:
auto itFound = std::find_if(cardNum.begin(), cardNum.end(), IsCharIllegal);
if(itFound != cardNum.end())
// invalid character was entered!
That helps avoid the magic 32 and also allows inputs of any length.
I would use a regular expression to match this pattern. If you're using C++11, you can use the built in header: http://www.cplusplus.com/reference/regex/
Otherwise, take a look here for some alternative libraries you can use: C++: what regex library should I use?
Unfortunately, I'm not very good at regular expressions, but the following should match 16 numbers with spaces inbetween.
(\d[ ]*){16}
If you're looking for more info on regular expressions, here is a good cheat sheet I use often: http://regexlib.com/CheatSheet.aspx
I also like to test my expressions using this site: http://regexpal.com/
So something like this would be allowed?
"123 456 789 01 23456"
Use std::string and the free-standing std::getline function. It's better than the member function of the same name, because it doesn't force you to deal with pointers.
std::string line;
std::getline(std::cin, line);
if (!std::cin)
{
// catastrophic input failure
}
Then you have a string as in my example above in line. You could use std::find_if to verify that there are no illegal characters and std::count_if to make sure there are exactly 16 digits, but I think writing your own loop (not "bunch of loops") would yield more readable code here.
By the way, beware of isdigit! For historical reasons, you must cast its argument to unsigned char in order to use it safely.
There is an atoll function in the cstdlib. http://www.cplusplus.com/reference/cstdlib/atoll/
It converts character string input into a long long, which could store a 16 digit long credit card number, however this quits as soon as there's an non-numeric character in the input so
atoll("123456abc890")
would return 123456
If you want to check each character by character, you could just use atoi on each character then reassemble the string for each character that passes.
http://www.cplusplus.com/reference/cstdlib/atoi/

How to capture 0-2 groups in C++ regular expressions and print them?

Edit 3
I went to the good'ol custom parsing approach as I got stuck with the regular expression. It didn't turn out to be that bad, as the file contents can be tokenized quite neatly and the tokens can be parsed in a loop with a very simple state machine. Those who want to check, there's a snippet of code doing this with range-for, ifstream iterators and custom stream tokenizer at my other question in Stackoverflow here. These techniques lessen considerably the complexity of doing a custom parser.
I'd like to tokenize file contents in first part in capture groups of two and then just line by line. I have like a semi-functional solution, but I'd like to learn how to make this better. That is, without "extra processing" to make-up my lack of knowledge with capture groups. Next some preliminaries and in the end a more exact question (the line
const std::regex expression("([^:]+?)(^:|$)");
...is the one I'd like to ask about in combination with processing the results of it).
The files which are basically defined like this:
definition_literal : value_literal
definition_literal : value_literal
definition_literal : value_literal
definition_literal : value_literal
HOW TO INTERPRET THE FOLLOWING SECTION OF ROWS
[DATA ROW 1]
[DATA ROW 2]
...
[DATA ROW n]
Where each of the data rows consists of a certain number of either integers or floating point numbers separated by whitespace. Each row having as many numbers as the others (e.g. each row could have four integers). So, the "interpretation section" basically tells this format in plain text in one row.
I have an almost working solution that reads such files like this:
int main()
{
std::ifstream file("xyz", std::ios_base::in);
if(file.good())
{
std::stringstream file_memory_buffer;
file_memory_buffer << file.rdbuf();
std::string str = file_memory_buffer.str();
file.close();
const std::regex expression("([^:]+?)(^:|$)");
std::smatch result;
const std::sregex_token_iterator end;
for(std::sregex_token_iterator i(str.begin(), str.end(), expression); i != end; ++i)
{
std::cout << (*i) << std::endl;
}
}
return EXIT_SUCCESS;
}
With the regex defined expression, it now prints the <value> parts of the definition file, then the interpretation part and then the data rows one by one. If I change the regex to
"([^:]+?)(:|$)"
...it prints all the lines tokenized in groups of one, almost like I would like to, but how tokenize the first part in groups of two and the rest line by line?
Any pointers, code, explanations are truly welcomed. Thanks.
EDIT:
As noted to Tom Kerr already, but some additional points, this is also a rehearsal, or coding kata if you will, to not to write a custom parser, but to see if I could -- or we could :-) -- accomplish this with regex. I know regex isn't the most efficient thing to do here, but it doesn't matter.
What I'd hope to have is something like a list of tuples of header information (tuple of size 2), then the INTERPRET line (tuple of size 1), which I could use to choose a function on what to do with the data lines (tuple of size 1).
Yep, the "HOW TO INTERPRET" line is contained in a set of well-defined strings and I could just read line by line from the beginning, splitting strings along the way, until one of the INTERPRET lines is met. This regex solution is not the most efficient method, I know, but more like coding kata to get myself to write something else than customer parsers (and it's quite some time I've write in C++ the last time, so this is rehearsing otherwise too).
EDIT 2
I have managed to get access to the tuples (in the context of this question) by changing the iterator type, like so
const std::sregex_iterator end;
for(std::sregex_iterator i(str.begin(), str.end(), expression); i != end; ++i)
{
std::cout << "0: " << (*i)[0] << std::endl;
std::cout << "1: " << (*i)[1] << std::endl;
std::cout << "2: " << (*i)[2] << std::endl;
std::cout << "***" << std::endl;
}
Though this is still way off what I'd like to have, there's something wrong with the regular expression I'm trying ot use. In any event, this new find, another kind of iterator, helps too.
I believe the re you are attempting is this:
TEST(re) {
static const boost::regex re("^([^:]+) : ([^:]+)$");
std::string str = "a : b";
CHECK(boost::regex_match(str, re));
CHECK(!boost::regex_match("a:a : bbb", re));
CHECK(!boost::regex_match("aaa : b:b", re));
boost::smatch what;
CHECK(boost::regex_match(str, what, re, boost::match_extra));
CHECK_EQUAL(3, what.size());
CHECK_EQUAL(str, what[0]);
CHECK_EQUAL("a", what[1]);
CHECK_EQUAL("b", what[2]);
}
I'm not sure I would recommend regex in this instance though. I think you'll find simply reading a line at a time, splitting on :, and then trimming the spaces more manageable.
I guess if you can't depend the below line as a sentinel, then it would be more difficult. Usually I would expect a format like this to be obvious from that line, not the format of each line of the header.
HOW TO INTERPRET THE FOLLOWING SECTION OF ROWS

Reversed offset tokenizer

I have a string to tokenize. It's form is HHmmssff where H, m, s, f are digits.
It's supposed to be tokenized into four 2-digit numbers, but I need it to also accept short-hand forms, like sff so it interprets it as 00000sff.
I wanted to use boost::tokenizer's offset_separator but it seems to work only with positive offsets and I'd like to have it work sort of backwards.
Ok, one idea is to pad the string with zeroes from the left, but maybe the community comes up with something uber-smart. ;)
Edit: Additional requirements have just come into play.
The basic need for a smarter solution was to handle all cases, like f, ssff, mssff, etc. but also accept a more complete time notation, like HH:mm:ss:ff with its short-hand forms, e.g. s:ff or even s: (this one's supposed to be interpreted as s:00).
In the case where the string ends with : I can obviously pad it with two zeroes as well, then strip out all separators leaving just the digits and parse the resulting string with spirit.
But it seems like it would be a bit simpler if there was a way to make the offset tokenizer going back from the end of string (offsets -2, -4, -6, -8) and lexically cast the numbers to ints.
I keep preaching BNF notation. If you can write down the grammar that defines your problem, you can easily convert it into a Boost.Spirit parser, which will do it for you.
TimeString := LongNotation | ShortNotation
LongNotation := Hours Minutes Seconds Fractions
Hours := digit digit
Minutes := digit digit
Seconds := digit digit
Fraction := digit digit
ShortNotation := ShortSeconds Fraction
ShortSeconds := digit
Edit: additional constraint
VerboseNotation = [ [ [ Hours ':' ] Minutes ':' ] Seconds ':' ] Fraction
In response to the comment "Don't mean to be a performance freak, but this solution involves some string copying (input is a const & std::string)".
If you really care about performance so much that you can't use a big old library like regex, won't risk a BNF parser, don't want to assume that std::string::substr will avoid a copy with allocation (and hence can't use STL string functions), and can't even copy the string chars into a buffer and left-pad with '0' characters:
void parse(const string &s) {
string::const_iterator current = s.begin();
int HH = 0;
int mm = 0;
int ss = 0;
int ff = 0;
switch(s.size()) {
case 8:
HH = (*(current++) - '0') * 10;
case 7:
HH += (*(current++) - '0');
case 6:
mm = (*(current++) - '0') * 10;
// ... you get the idea.
case 1:
ff += (*current - '0');
case 0: break;
default: throw logic_error("invalid date");
// except that this code goes so badly wrong if the input isn't
// valid that there's not much point objecting to the length...
}
}
But fundamentally, just 0-initialising those int variables is almost as much work as copying the string into a char buffer with padding, so I wouldn't expect to see any significant performance difference. I therefore don't actually recommend this solution in real life, just as an exercise in premature optimisation.
Regular Expressions come to mind. Something like "^0*?(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)$" with boost::regex. Submatches will provide you with the digit values. Shouldn't be difficult to adopt to your other format with colons between numbers (see sep61.myopenid.com's answer). boost::regex is among the fastest regex parsers out there.