Reading white spaces crashes Parser. Why? - c++

I'm ultimately trying to code a shell, so I need to be able to parse commands. I'm trying to convert every word and special symbol into tokens, while ignoring whitespaces. It works when characters separating tokens are | & < > however as soon as I use a single whitespace character, the program crashes. Why is that?
I'm a student, and I realize the way I came up with separating the tokens is rather crude. My apologies.
#include <iostream>
#include <stdio.h>
#include <string>
#include <cctype>
using namespace std;
#define MAX_TOKENS 10
int main()
{
//input command for shell
string str;
string tokens[MAX_TOKENS] = { "0", "0", "0", "0", "0", "0", "0", "0", "0", "0" };
int token_index = 0;
int start_token = 0;
cout << "Welcome to the Shell: Please enter valid command: " << endl << endl;
cin >> str;
for (unsigned int index = 0; index < str.length(); index++)
{
//if were at end of the string, store the last token
if (index == (str.length() - 1)) tokens[token_index++] = str.substr(start_token, index - start_token + 1);
//if char is a whitespace store the token
else if (isspace(str.at(index)) && (index - start_token > 0))
{
tokens[token_index++] = str.substr(start_token, index - start_token);
start_token = index + 1;
}
//if next char is a special char - store the existing token, and save the special char
else if (str[index] == '|' || str[index] == '<' || str[index] == '>' || str[index] == '&')
{
//stores the token before our special character
if ((index - start_token != 0)) //this if stops special character from storing twice
{
//stores word before Special character
tokens[token_index++] = str.substr(start_token, index - start_token);
}
//stores the current special character
tokens[token_index++] = str[index];
if (isspace(str.at(index + 1))) start_token = index + 2;
else start_token = index + 1;
}
}
cout << endl << "Your tokens are: " << endl;
for (int i = 0; i < token_index; i++)
{
cout << i << " = " << tokens[i] << endl;
}
return 0;
}

A few things:
Check that token_index is less than MAX_TOKENS before using it again after each increment, otherwise you have a buffer overflow. If you change tokens to be a std::vector then you can use the at() syntax as a safety net for that.
The expression index - start_token has type unsigned int so it can never be less than 0. Instead you should be doing index > start_token as your test.
str.at(index) throws an exception if index is out of range. However you never catch the exception; depending on your compiler, this may just look like the program crashing. Wrap main()'s code in a try...catch(std::exception &) block.
Finally, this is a long shot but I will mention it for completeness. Originally in C89, isspace and the other is functions had to take a non-negative argument. They were designed so that the compiler could implement them via an array lookup, so passing in a signed char with a negative value would cause undefined behaviour. I'm not entirely sure whether or not this was "fixed" in the various later versions of C and C++, but even if standards mandate it , it's possible you have a compiler that still doesn't like receiving negative chars. To eliminate this as a possibility from your code, use isspace( (unsigned char)str.at(index) ), or even better, use the C++ locale interface.

Related

How to solve this problem trying to iterate a string?

I'm trying to invert the case of some strings, and I did it, but I have some extra characters in my return, is it a memory problem? Or because of the length?
char* invertirCase(char* str){
int size = 0;
char* iterator = str;
while (*iterator != '\0') {
size++;
iterator++;
}
char* retorno = new char[size];
for (int i = 0; i < size; i++) {
//if capital letter:
if (str[i] < 96 && str[i] > 64) {
retorno[i] = str[i] + 32;
}
// if lower case:
else if (str[i] > 96 && str[i] < 123) {
retorno[i] = str[i] - 32;
}
//if its not a letter
else {
retorno[i] = str[i];
}
}
return retorno;
}
For example, if I try to use this function with the value "Write in C" it should return "wRITE IN c", but instead it returns "wRITE IN cýýýýÝݱ7ŽÓÝ" and I don't understand where those extra characters are coming from.
PS: I know I could use a length function, but this is from school, so I can't do that in this case.
add +1 to the size of the char array.
char* retorno = new char[size+1];
add a null-terminated string before returning retorno.
retorno[size] = '\0';
Your output string is not null-terminated
When you iterate through the input string, you increment size until you reach null. That means the null is not copied to the output string. After you exit the loop, you should increment size once more to capture the end.
As an aside, it's probably a good idea to constrain size to some maximum (while(*iterator != '\0' && size < MAXSIZE)) in case someone passes a non-terminated string into your function. If you hit the max size condition, you'd need to explicitly add the null at the end of your output.
Your string should be null terminated; which is what you are looking for when you get the initial size of the string. When you create the new string, you should allocated size+1 chars of space, then retorno[size] should be set to a null terminating character (i.e. '\0'). When you attempt to print a char* using printf or cout (or similar mechanisms), it will keep printing characters until it find the null terminating character, which is why you are getting the garbage values after your expected output.
On another note, c++ has helpful functions like std::islower / std::isupper and std::tolower / std::toupper
From what I can tell, there could be 2 things going on here:
Like everyone here mentioned, the absence of a null terminating character ('\0') at the end of your char array could be causing this.
It could be the way you are printing results of your retorno character string outside of your invertirCase() function.
I tested out your function in C++14, C++17 and C++20 and it returned the correct result each time, both with the null terminating character at the end of the retorno char array and without it.
Try printing your result inside of your function before returning it, to identify if this is being caused inside of your function or outside of it. Like so:
char* invertirCase(char* str){
// [... truncated code here]
for (int i = 0; i < size; i++) {
// [... truncated code here]
}
cout << " **** TESTING INSIDE FUNCTION ****" << endl;
cout << "-- Testing index iteration" << endl;
for (int i = 0; i < size; i++) {
cout << retorno[i];
}
cout << endl;
cout << "-- Testing iterator iteration" << endl;
for (char* iterator = retorno; *iterator != '\0'; iterator++) {
cout << *iterator;
}
cout << endl;
cout << "-- Testing advanced for loop" << endl;
for (char character : retorno) {
cout << character;
}
cout << " **** END TESTING ****" << endl;
cout << endl;
return retorno;
}
This way you could possibly identify both if the problem occurs inside of your function or if the problem is occurring because of the way you may be printing your result as well.

Using find_first_of with a string instead of a set of predefined characters in c++

I want to take in a code, for example ABC and check whether the characters in the code appear in that exact order in a string, for example with the code ABC, and the string HAPPYBIRTHDAYCACEY, which meets the criteria. The string TRAGICBIRTHDAYCACEY with the code ABC however does not pass, because there's a "c" before the "b" after the "a". I want to use the find_first_of function to search through my string, but i want to check for any of the characters in "code", without knowing what characters are in "code" beforehand. Here is my program so far:
#include <iostream>
#include <string>
using namespace std;
int main() {
string code, str, temp;
int k = 0;
int pos = 0;
cin >> code >> str;
while (k < code.size()) {
pos = str.find_first_of(code,pos);
temp[k] = str[pos];
++k;
++pos;
}
cout << temp << endl; // debug. This is just outputs a newline when i
//run the program
if (temp == code) {
cout << "PASS" << endl;
}
else {
cout << "FAIL" << endl;
}
return 0;
}
I think your best bet is to find just the first character, once found, find the next in the remainder of the string, repeat until end of string or all characters found (and return false or true, respectively).
I don't think there's anything builtin for this. If the characters would need to appear directly after each other, you could use std::string::find() which searches for a substring, but that is not what you want.

how to find a substring or string literal

I am trying to write a code that will search userInput for the word "darn" and if it is found, print out "Censored". if it is not found, it will just print out the userInput. It works in some cases, but not others. If the userInput is "That darn cat!", it will print out "Censored". However, if the userInput is "Dang, that was scary!", it also prints out "Censored". I am trying to use find() to search for the string literal "darn " (the space is because it should be able to determine between the word "darn" and words like "darning". I am not worrying about punctuation after "darn"). However, it seems as though find() is not doing what I would like. Is there another way I could search for a string literal? I tried using substr() but I couldn't figure out what the index and the len should be.
#include <iostream>
#include <string>
using namespace std;
int main() {
string userInput;
userInput = "That darn cat.";
if (userInput.find("darn ") > 0){
cout << "Censored" << endl;
}
else {
cout << userInput << endl;
} //userText.substr(0, 7)
return 0;
}
The problem here is your condition. std::string::find returns a object of std::string::size_type which is an unsigned integer type. That means it can never be less than 0 which means
if (userInput.find("darn ") > 0)
will always be true unless userInput starts with "darn ". Because of this if find doesn't find anything then it returns std::string::npos. What you need to do is compare against that like
if (userInput.find("darn ") != std::string::npos)
Do note that userInput.find("darn ") will not work in all cases. If userInput is just "darn" or "Darn" then it won't match. The space needs to be handled as a separate element. For example:
std::string::size_type position = userInput.find("darn");
if (position != std::string::npos) {
// now you can check which character is at userInput[position + 4]
}
std::search and std::string::replace were made for this:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string userInput;
userInput = "That darn cat is a varmint.";
static const string bad_words [] = {
"darn",
"varmint"
};
for(auto&& bad : bad_words)
{
const auto size = distance(bad.begin(), bad.end());
auto i = userInput.begin();
while ((i = std::search(i, userInput.end(), bad.begin(), bad.end())) != userInput.end())
{
// modify this part to allow more or fewer leading letters from the offending words
constexpr std::size_t leading_letters = 1;
// never allow the whole word to appear - hide at least the last letter
auto leading = std::min(leading_letters, std::size_t(size - 1));
auto replacement = std::string(i, i + leading) + std::string(size - leading, '*');
userInput.replace(i, i + size, replacement.begin(), replacement.end());
i += size;
}
}
cout << userInput << endl;
return 0;
}
expected output:
That d*** cat is a v******.

I am writing a code to convert lowercase letters to uppercase letters using arrays

I have the code but it prints the letters in uppercase but also prints some weird characters afterwards. I just wanted to know how to just get the letters.
the program executing picture.
using namespace std;
int main()
{
const int SIZE = 81; // Constant for size of an array
const int MIN_LOWERCASE = 97; // Start of lowercase letters in ASCII
const int MAX_LOWERCASE = 122; // End of lowercase letters in ASCII
char line[SIZE]; // Initializing character line for input
cout << "Enter a string of 80 or fewer characters:\n";
cin.getline(line,SIZE); // Getting input from the user.
for (int count = 0; count < SIZE; count++)
{
if (line[count] >= MIN_LOWERCASE && line[count] <= MAX_LOWERCASE) // Checking whether the selected letter is in the reange of lowercase letters.
{
line[count] - 32;
cout << static_cast<char>(line[count] - 32); // converting and displaying lowercase letters to uppercase letters.
}
else
{
cout << static_cast<char>(line[count]);//Displaying the same character if it is in uppercase.
}
}
cout << endl;
system("pause");
return 0;
}
You need to use the actual size of the text that you read. Else you will print extra characters.
for (int count = 0; count < strlen(line); count++)
You might need #include <cstring> to use strlen().
cout << "Enter a string of 80 or fewer characters:\n";
cin.getline(line,SIZE); // Getting input from the user.
int strLen=strlen(line)
for (int count = 0; count < strLen; count++)
{
if (line[count] >= MIN_LOWERCASE && line[count] <= MAX_LOWERCASE) // Checking whether the selected letter is in the reange of lowercase letters.
{
line[count] - 32;
cout << static_cast<char>(line[count] - 32); // converting and displaying lowercase letters to uppercase letters.
}
else
{
cout << static_cast<char>(line[count]);//Displaying the same character if it is in uppercase.
}
}
Ypur loop is running 80 times no matter what is the size of string.
Actually, getline / cin / scanf etc. functions for char[] is explained following example:
The string is initialized for char c[10];. The input is "abcd".
First, c[i] is initialized unknown value, because it is local variable (If it is a global variable, you can assume that c[i] = 0)
Second, If you input, the value of c[i] only changed where 0<=i<=4 because the length of input is 4.
In this case, currently c = { 'a', 'b', 'c', 'd', '\0', ?, ?, ?, ?, ?}. (? denotes unknown value)
Third, you are looping i for 0 to size_of_array_c, so your output will be "abcd?????" (I don't know the value of ?).
So, you can fix the bug if you only loop while c[i] != '\0'.
The idiomatic way of doing this in C++ is
#include <string>
#include <locale>
#include <algorithm>
#include <iterator>
#include <iostream>
int main()
{
std::locale loc(""); //< the current system locale
std::string line; //< will contain the input line
std::cout << "Enter a string of 80 or fewer characters:\n";
std::getline(std::cin,line);
std::string lower; //< will contain the output
// This is the "key" of everything
std::transform(line.begin(),line.end(), // transform the entire input...
std::back_inserter(lower), // by writing into the back of the output string ...
[&loc](auto c){ return std::tolower(c,loc); }); // the result of std::tolower applied to all character, using the system locale
std::cout << "The transformed string is:\n" << lower << std::endl;
return 0;
}
// look ma! No pointers, array sizes, overflows and explicit memory management.
// And works consistently with the language your computer is set up.

Brute Force Character Generation in C++

So I'm trying to make a brute force string generator to match and compare strings in CUDA. Before I start trying to mess around with a language I don't know I wanted to get one working in C++. I currently have this code.
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int sLength = 0;
int count = 0;
int charReset = 0;
int stop = 0;
int maxValue = 0;
string inString = "";
static const char charSet[] = //define character set to draw from
"0123456789"
"!##$%^&*"
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int stringLength = sizeof(charSet) - 1;
char genChars()
{
return charSet[count]; //Get character and send to genChars()
}
int main()
{
cout << "Length of string to match?" << endl;
cin >> sLength;
cout << "What string do you want to match?" << endl;
cin >> inString;
string sMatch(sLength, ' ');
while(true)
{
for (int y = 0; y < sLength; y++)
{
sMatch[y] = genChars(); //get the characters
cout << sMatch[y];
if (count == 74)
{
charReset + 1;
count = 0;
}
if (count == 2147000000)
{
count == 0;
maxValue++;
}
}
count++;
if (sMatch == inString) //check for string match
{
cout << endl;
cout << "It took " << count + (charReset * 74) + (maxValue*2147000000) << " randomly generated characters to match the strings." << endl;
cin >> stop;
}
cout << endl;
}
}
Now this code runs and compiles but it doesn't exactly do what I want it to. It will do 4 of the same character, EX. aaaa or 1111 and then go onto the next without incrementing like aaab or 1112. I've tried messing around with things like this
for (int x = 0; x < sLength; x++)
{
return charSet[count-sLength+x];
}
Which in my mind should work but to no avail.
You basically just need to increment a counter, than convert the count number to base (size of char array)
Here's an example which does normal numbers up to base 16.
http://www.daniweb.com/code/snippet217243.html
You should be able to replace
char NUMS[] = "0123456789ABCDEF";
with your set of characters and figure it out from there. This might not generate a large enough string using a uint, but you should be able to break it up into chunks from there.
Imagine your character array was "BAR", so you would want to convert to a base 3 number using your own symbols instead of 0 1 and 2.
What this does is perform a modulus to determine the character, then divide by the base until the number becomes zero. What you would do instead is repeat 'B' until your string length was reached instead of stopping when you hit zero.
Eg: A four character string generated from the number 13:
14%3 = 2, so it would push charSet[2] to the beginning of the empty string, "R";
Then it would divide by 3, which using integer math would = 4. 4%3 is again 1, so "A".
It would divide by 3 again, (1) 1%3 is 1, so "A".
It would divide by 3 again, (0) -- The example would stop here, but since we're generating a string we continue pushing 0 "B" until we reach 4 our 4 characters.
Final output: BAAR
For an approach which could generate much larger strings, you could use an array of ints the size of your string, (call it positions), initialize all the ints to zero and do something like this on each iteration:
i = 0;
positions[i]++;
while (positions[i] == base)
{
positions[i] = 0;
positions[++i]++;
}
Then you would go through the whole array, and build the string up using charSet[positions[i]] to determine what each character is.