iterate char by char through vector of strings - c++

I want to iterate char by char in a vector of strings. In my code I created a nested loop to iterate over the string, but somehow I get an out of range vector.
void splitVowFromCons(std::vector<std::string>& userData, std::vector<std::string>& allCons, std::vector<std::string>& allVows){
for ( int q = 0; q < userData.size(); q++){
std::string userDataCheck = userData.at(q);
for ( int r = 0; r < userDataCheck.size(); r++){
if ((userDataCheck.at(r) == 'a') || (userDataCheck.at(r) == 'A') || (userDataCheck.at(r) == 'e') || (userDataCheck.at(r) == 'E') || (userDataCheck.at(r) == 'i') || (userDataCheck.at(r) == 'I') || (userDataCheck.at(r) == 'o') || (userDataCheck.at(r) == 'O') || (userDataCheck.at(r) == 'u') || (userDataCheck.at(r) == 'U')){
allVows.push_back(userData.at(r));
}
else if ((userDataCheck.at(r) >= 'A' && userDataCheck.at(r) <= 'Z') || (userDataCheck.at(r) >= 'a' && userDataCheck.at(r) <= 'z')){
allCons.push_back(userData.at(r));
}
else {
continue;;
}
}
}
}

The error here is in these lines:
allVows.push_back(userData.at(r));
allCons.push_back(userData.at(r));
the r variable is your index into the current string, but here you're using it to index into the vector, which looks like a typo to me. You can make this less error prone using range-for loops:
for (const std::string& str : userData) {
for (char c : str) {
if (c == 'a' || c == 'A' || ...) {
allVows.push_back(c);
}
else if (...) {
....
}
}
}
which I hope you'll agree also has the benefit of being more readable due to less noise. You can further simplify your checks with a few standard library functions:
for (const std::string& str : userData) {
for (char c : str) {
if (!std::isalpha(c)) continue; // skip non-alphabetical
char cap = std::toupper(c); // capitalise the char
if (cap == 'A' || cap == 'E' || cap == 'I' || cap == 'O' || cap == 'U') {
allVows.push_back(c);
}
else {
allCons.push_back(c);
}
}
}

Since this question is about debugging actually, I think it is a nice illustration of how the usage of std::algorithms of C++ can decrease the effort needed to see what is wrong with a non working code.
Here is how it can be restructured:
bool isVowel(char letter)
{
return letter == 'A' || letter == 'a' ||
letter == 'E' || letter == 'e'||
letter == 'O' || letter == 'o'||
letter == 'Y' || letter == 'y'||
letter == 'U' || letter == 'u';
}
bool isConsonant(char letter)
{
return std::isalpha(letter) && !isVowel(letter);
}
void categorizeLetters(const std::vector<std::string> &words, std::vector<char> &vowels, std::vector<char> &consonants)
{
for( const std::string &word : words){
std::copy_if(word.begin(), word.end(), std::back_inserter(vowels), isVowel);
std::copy_if(word.begin(), word.end(), std::back_inserter(consonants), isConsonant);
}
}
With a solution like this, you avoid the error-prone access-with-index that lead to your problem. Also, code is readable and comprehensive

Related

String and character comparison

I am new to programming and I need to search any string to see if it includes only the letters a,b,c,d,e or f. The minute the program finds a letter that is not one of those the program should return false. Here is my function
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
return true;
}
else {
return false;
}
index++;
}
}
Thank you very much for nay help! :)
The moment the return statement is encountered, the function is exited. This means that the moment any of the characters 'a', 'b', 'c', 'd', 'e', 'f' is encountered while iterating, due to the return statement the function will be exited immediately.
You can use std::string::find_first_not_of as shown below:
std::string input = "somearbitrarystring";
std::string validChars = "abcdef";
std::size_t found = input.find_first_not_of(validChars);
if(found != std::string::npos)
std::cout << "Found nonfavorite character " <<input[found]<<" at position "<<found<< std::endl;
else
{
std::cout<<"Only favorite characters found"<<std::endl;
}
If you unroll the loop by hand, you will spot the problem immediately:
if ((word[0] == 'a') || (word[0] == 'b') || (word[0] == 'c')||
(word[0] == 'd')|| (word[0] == 'e')|| (word[0] == 'f')) {
return true;
}
else {
return false;
}
if ((word[1] == 'a') || (word[1] == 'b') || (word[1] == 'c')||
(word[1] == 'd')|| (word[1] == 'e')|| (word[1] == 'f')) {
return true;
}
else {
return false;
}
//...
That is, the return value depends only on the first element.
"The minute the program finds a letter that is not one of those the program should return false" means
if ((word[0] != 'a') || (word[0] != 'b') || (word[0] != 'c')||
(word[0] != 'd')|| (word[0] != 'e')|| (word[0] != 'f')) {
return false;
}
if ((word[1] != 'a') || (word[1] != 'b') || (word[1] != 'c')||
(word[1] != 'd')|| (word[1] != 'e')|| (word[1] != 'f')) {
return false;
}
// ...
// After checking all the characters, you know what all them were in
// your desired set, so you can return unconditionally.
return true;
or, with a loop:
while (index < length) {
if ((word[index] != 'a') || (word[index] != 'b') || (word[index] != 'c')||
(word[index] != 'd')|| (word[index] != 'e')|| (word[index] != 'f')) {
return false;
}
index++;
}
return true;
bool is_favorite(string word){
return ( word.find_first_not_of( "abcdef" ) == std::string::npos );
}
It returns true if, and only if, there are only the characters 'a' through 'f' in the string. Any other character ends the search immediately.
And if you exchange string word with const string & word, your function will not have to create a copy of each word you pass to it, but work on a read-only reference to it, improving efficiency.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if (word[index] > 'f' || word[index] < 'a')
return false;
index++;
}
return true;
}
The return true is logically in the wrong place in your code.
Your version returns true as soon as it finds one letter that is a through f. It's premature to conclude that the whole string is valid at that point, because there may yet be an invalid character later in the string.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
return true; // This is premature.
}
else {
return false;
}
index++;
}
}
Minimal change that illustrates where the return true should be: after the loop. The return true is reached only if and only if we did not detect any invalid characters in the loop.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
// Do nothing here
}
else {
return false;
}
index++;
}
return true;
}
Obviously now that the affirmative block of the if is empty, you could refactor a little and only check for the negative condition. The logic of it should read closely to the way you described the problem in words:
"The minute the program finds a letter that is not one of those the program should return false."
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if (!is_letter_a_through_f((word[index])
return false;
index++;
}
return true;
}
I replaced your large logical check against many characters with a function in the above code to make it more readable. I trust you do that without difficulty. My own preference is to keep statements short so that they are readable, and so that when you read the code, you can hold in your short-term memory the logic of what you are saying about control flow without being overloaded by the mechanics of your letter comparison.

Reduce cyclomatic complexity

I'm writing an NMEAParser library. As its name suggests, it parses NMEA sentences. Nothing crazy.
Its entry point is a function that accepts an NMEA string as its only parameter and looks at its beginning to pass it to the right decoder. Here is the function:
bool NMEAParser::dispatch(const char *str) {
if (!str[0]) {
return false;
}
//check NMEA string type
if (str[0] == '$') {
//PLSR245X
if (str[1] == 'P' && str[2] == 'L' && str[3] == 'S' && str[4] == 'R' && str[5] == ',' && str[6] == '2' && str[7] == '4' && str[8] == '5' && str[9] == ',') {
if (str[10] == '1')
return parsePLSR2451(str);
if (str[10] == '2')
return parsePLSR2452(str);
if (str[10] == '7')
return parsePLSR2457(str);
} else if (str[1] == 'G' && str[2] == 'P') {
//GPGGA
if (str[3] == 'G' && str[4] == 'G' && str[5] == 'A')
return parseGPGGA(str);
//GPGSA
else if (str[3] == 'G' && str[4] == 'S' && str[5] == 'A')
return parseGPGSA(str);
//GPGSV
else if (str[3] == 'G' && str[4] == 'S' && str[5] == 'V')
return parseGPGSV(str);
//GPRMC
else if (str[3] == 'R' && str[4] == 'M' && str[5] == 'C')
return parseGPRMC(str);
//GPVTG
else if (str[3] == 'V' && str[4] == 'T' && str[5] == 'G')
return parseGPVTG(str);
//GPTXT
else if (str[3] == 'T' && str[4] == 'X' && str[5] == 'T')
return parseGPTXT(str);
//GPGLL
else if (str[3] == 'G' && str[4] == 'L' && str[5] == 'L')
return parseGPGLL(str);
}
//HCHDG
else if (str[1] == 'H' && str[2] == 'C' && str[3] == 'H' && str[4] == 'D' && str[5] == 'G')
return parseHCHDG(str);
}
return false;
}
The problem I have is that this function's cyclomatic complexity is quite high, and my SonarQube complains about it:
It's not really a problem as the code is quite easy to read. But I was wondering how I could reduce its complexity while still keeping it simple to read and efficient.
You can simplify this quite a lot:
if (std::string_view{str, 10} == "$PLSR,245,")
{
switch (str[10])
{
case '1' : return parsePLSR2451(str);
case '2' : return parsePLSR2452(str);
case '7' : return parsePLSR2457(str);
}
}
else if (std::string_view{str + 1, 2} == "GP")
{
auto s = std::string_view{str + 3, 3};
if (s == "GGA")
return parseGPGGA(str);
if (s == "GSA")
return parseGPGSA(str);
// ... etc
}
else if (std::string_view{str + 1, 5} == "HCHDG")
{
return parseHCHDG(str);
}
return false;
There's no extra strings being constructed either, so it should be at least as efficient.

How can I evaluate a QChar object in a switch?

Going through a bunch of code, looking to improve it.
I came across this bit:
if (c == '<' || c == '>') {
pattern.append("\\b");
} else if (c == 'a') {
pattern.append("[a-zA-Z]");
} else if (c == 'A') {
pattern.append("[^a-zA-Z]");
} else if (c == 'h') {
pattern.append("[A-Za-z_]");
} else if (c == 'H') {
pattern.append("[^A-Za-z_]");
} else if (c == 'c' || c == 'C') {
ignorecase = (c == 'c');
} else if (c == 'l') {
pattern.append("[a-z]");
} else if (c == 'L') {
pattern.append("[^a-z]");
} else if (c == 'o') {
pattern.append("[0-7]");
} else if (c == 'O') {
pattern.append("[^0-7]");
} else if (c == 'u') {
pattern.append("[A-Z]");
} else if (c == 'U') {
pattern.append("[^A-Z]");
} else if (c == 'x') {
pattern.append("[0-9A-Fa-f]");
} else if (c == 'X') {
pattern.append("[^0-9A-Fa-f]");
} else if (c == '=') {
pattern.append("?");
} else {
pattern.append('\\');
pattern.append(c);
}
If c was a char, this would be easy to turn into a switch. c is a QChar;
How should I turn QChar into an interger and reliably compare it to the various cases >, = etc?
A QChar is a wrapper for a 16-bit UTF-16 character.
You can retrieve the value using QChar::unicode() that returns an unsigned short.
You can the write your switch like this:
QChar c;
switch (c.unicode()) {
case u'a':
...
}
Be careful with your case statements as if you use 8-bit char literals, it might not work as expected.
For instance é might be 0xE9 (Latin-1, UTF16), or 0x82 (CP437) or even 0xC3 0xA9 (UTF-8, which will not compile as it needs 2 characters).
The solution is to use UTF-16 literals that are part of C++ since C++11.
For exampleu'é' will always be compiled as a char16_t (~unsigned short) of value 0x00E9.
you can define something like a dictionary, and I mean a Map:
int main(int argc, char* argv[])
{
QMap<QChar, QString> myMap{{'a', "[a-zA-Z]"},{'X', "[^0-9A-Fa-f]"}, {'h', "[A-Za-z_]"}};
QString regex{};
regex.append(myMap.value('a', ""));
regex.append(myMap.value('5', ""));
regex.append(myMap.value('X', ""));
qDebug() << "myRegex: " << regex;
return 0;

Algorithm for template argument deduction (as strings)?

For some reason, I want to implement something simliar to the what C++ compilers used to deduce template arguments. With a known set of template parameters like
"T0", "T1", "T2"...
Given 2 strings like:
str_param = "vector<T0>"
str_arg = "vector<float>"
The result should be that "T0" is mapped to "float":
map["T0"]=="float"
I don't need a full-featured template preprocessor, which means, I'll be satisfied if I can just handle the cases where the template argument can be literally deduced. No need to consider things like "typedef" in the context.
In other words, if I use the resulted map to replace template parameters in str_param, it should become str_arg. If that is not possible, I consider it as "fail to match".
I currenlty have problem handling cases like:
str_param = "T1*"
str_arg = "int**"
Where expected result is:
map["T1"]=="int*"
My algorithm mistakes it as:
map["T1"]=="int"
Putting my poor algorithm here:
std::vector<std::string> templ_params({"T1"});
std::vector<std::string> templ_args(templ_params.size());
std::string str_param = "T1*";
std::string str_arg = "int**";
const char* p_str_param = str_param.c_str();
const char* p_str_arg = str_arg.c_str();
while (*p_str_param != 0 && *p_str_arg != 0)
{
while (*p_str_param == ' ' || *p_str_param == '\t') p_str_param++;
while (*p_str_arg == ' ' || *p_str_arg == '\t') p_str_arg++;
if (*p_str_param == 0 || *p_str_arg == 0) break;
if (*p_str_param != *p_str_arg)
{
std::string templ_param;
std::string templ_arg;
while (*p_str_param == '_' ||
(*p_str_param >= 'a' && *p_str_param <= 'z') ||
(*p_str_param >= 'A' && *p_str_param <= 'Z') ||
(*p_str_param >= '0' && *p_str_param <= '9'))
templ_param += *(p_str_param++);
while (*p_str_param == ' ' || *p_str_param == '\t') p_str_param++;
char end_marker = *p_str_param;
const char* p_str_arg_end = p_str_arg;
while (*p_str_arg_end != end_marker) p_str_arg_end++;
while (*(p_str_arg_end - 1) == ' ' || *(p_str_arg_end - 1) == '\t')
p_str_arg_end--;
while (p_str_arg<p_str_arg_end) templ_arg += *(p_str_arg++);
for (size_t i=0; i < templ_params.size(); j++)
{
if (templ_params[i]==templ_param)
{
templ_args[i]=templ_arg;
break;
}
}
}
else
{
p_str_param++;
p_str_arg++;
}
}

Concise way to say equal to set of values in C++

For example I have the following string,
if (str[i] == '(' ||
str[i] == ')' ||
str[i] == '+' ||
str[i] == '-' ||
str[i] == '/' ||
str[i] == '*')
My question is there a concise way to say if this value one of these set of values in c++?
You can search for single character str[i] in a string with your special characters:
std::string("()+-/*").find(str[i]) != std::string::npos
Not glorious because it is C instead of C++, but the C standard library is always accessible from C++ code, and my first idea as an old dinosaur would be:
if (strchr("()+-/*", str[i]) != NULL)
Simple and compact
You may use the following:
const char s[] = "()+-/*";
if (std::any_of(std::begin(s), std::end(s), [&](char c){ return c == str[i]})) {
// ...
}
It really depends on your application actually. For such a small check and depending the context, one acceptable option could be to use a macro
#include <iostream>
#define IS_DELIMITER(c) ((c == '(') || \
(c == ')') || \
(c == '+') || \
(c == '-') || \
(c == '/') || \
(c == '*') )
int main(void)
{
std::string s("TEST(a*b)");
for(int i = 0; i < s.size(); i ++)
std::cout << "s[" << i << "] = " << s[i] << " => "
<< (IS_DELIMITER(s[i]) ? "Y" : "N") << std::endl;
return 0;
}
A more C++ish way of doing it would be to use an inline function
inline bool isDelimiter(const char & c)
{
return ((c == '(') || (c == ')') || (c == '+') ||
(c == '-') || (c == '/') || (c == '*') );
}
This post might be interesting then : Inline functions vs Preprocessor macros
Maybe not "more concise", but I think this style is succinct and expressive at the point of the test.
Of course is_arithmetic_punctuation needn't be a lambda if you're going to use it more than once. It could be a function or a function object.
auto is_arithmetic_punctuation = [](char c)
{
switch(c)
{
case '(':
case ')':
case '+':
case '-':
case '/':
case '*':
return true;
default:
return false;
}
};
if (is_arithmetic_punctuation(str[i]))
{
// ...
}