concatenating tokens of an old cstring into a new c-string - c++

Our professor gave us a palindrome assignment, and in this assignment we need to write a function that removes all punctation marks, spaces, and converts uppercase letter to lowecase letters in a c-string.
The problem I am getting is when I debug/run it, after I enter the cstring for the function, it gives an "Debug Assertion failed" error, and gives the output of only lower case letter version of the c-string input. Does anyone have suggestions how I can fix or improve this piece of code?
Update: I fixed my error by tokenizing the string as geeksforgeeks did. But now the problem I am getting is, when concatenating tokens of s cstring into new_s c-string, it only concatenates the first token of s to new_s.
This is my code:
#define _CRT_SECURE_NO_WARNINGS //I added this because my IDE kept giving me error saying strtok is unsafe and I should use strtok_s.
#include <iostream>
#include <iomanip>
#include <cstring>
using namespace std;
/*This method removes all spaces and punctuation marks from its c-string as well as change any uppercase letters to lowercase. **/
void removePuncNspace(char s[])
{
char new_s[50], *tokenptr;
//convert from uppercase to lowercase
for (int i = 0; i < strlen(s); i++) (char)tolower(s[i]);
//use a cstring function and tokenize s into token pointer and eliminate spaces and punctuation marks
tokenptr = strtok(s, " ,.?!:;");
//concatenate the first token into a c-string.
strcpy_s(new_s,tokenptr);
while (tokenptr != NULL)
{
tokenptr = strtok('\0', " ,.?!:;"); //tokenize rest of the string
}
while (tokenptr != NULL)
{
// concat rest of the tokens to a new cstring. include the \0 NULL as you use a cstrig function to concatenate the tokens into a c-string.
strcat_s(new_s, tokenptr);
}
//copy back into the original c - string for the pass by reference.
strcpy(s, new_s);
}
My output is:
Enter a line:
Did Hannah see bees? Hannah did!
Did is palindrome

Firstly, as #M.M said, when you want to continue tokenizing the same string, you should call strk(NULL, ".."), not with '\0'.
Secondly, your program logic doesn't make much sense. You split the string s into substrings, but never actually concatenate them to new_s. By the time you get to the second while, tokenptr is surely NULL, so you never enter the loop.
To fix your code I merged the two whiles into a single one and added an if to not call strcat(new_s, tokenptr) if tokenptr is NULL.
void removePuncNspace(char s[])
{
char new_s[50], *tokenptr;
//convert from uppercase to lowercase
for (int i = 0; i < strlen(s); i++) (char)tolower(s[i]);
//use a cstring function and tokenize s into token pointer and eliminate spaces and punctuation marks
tokenptr = strtok(s, " ,.?!:;");
//concatenate the first token into a c-string.
strcpy(new_s,tokenptr);
while (tokenptr != NULL)
{
tokenptr = strtok(nullptr, " ,.?!:;"); //tokenize rest of the string
if (tokenptr != NULL)
strcat(new_s, tokenptr);
}
//copy back into the original c - string for the pass by reference.
strcpy(s, new_s);
}
P.S: I used the non-secure versions of cstring functions because, for some reason, my compiler doesn't like the secure ones.

Related

Null terminator carrying through when indexing string

I am trying pull specific characters from a string array, and assign them to defined indices in a new variable. I am having issues with what I expect is the null terminator, as there appear to be random assortment of undefined characters at the end of my strings.
I am new to coding in C++, and lower level programming in general. Note that the function "charBi" works perfectly, but it no longer works when assigning the output of "charBi" to the variable "binar" in the "strBi" function. I realize the code is probably not great, but any help is welcome, especially as it relates to getting rid of the random characters at the end of my "binar" string.
Thanks!
#include <iostream>
#include <array>
using namespace std;
//Program meant to output a string of binary for an input word or phrase
//library of letter and binary pairs
char letterNumber[27][10]={"A01000001","B01000010","C01000011","D01000100","E01000101","F01000110","G01000111",
"H01001000","I01001001","J01001010","K01001011","L01001100","M01001101","N01001110",
"O01001111","P01010000","Q01010001","R01010010","S01010011","T01010100","U01010101",
"V01010110","W01010111","X01011000","Y01011001","Z01011010"," 01011111"};
//finds binary number associated with input character. One character input
string charBi(char inputVar){ //WHY DOES THIS ONLY WORK IF THE FUNCTION IS A STRING?
//loop setup
int n=0;
int last=sizeof(letterNumber)/sizeof(letterNumber[0]); // equal 27
//loops through each of the strings in letterNumber
while (n!=last) {
if (letterNumber[n][0]==inputVar){ // if the letter is equal to input letter
char bina[8]; //number of numbers following a letter
for(int i=1;i<9;i++){ // writes the number associated with the letter to bina
bina[i-1]=letterNumber[n][i]; // assigns number to specific index
}
return bina; //BINA DEFINED AS CHAR, BUT OUTPUTTING AS STRING
}
n++;
}
}
//forms binary string of numbers for input phrase
string strBi(string strg){ //WHY DOES THIS ONLY WORK IF THE FUNCTION IS A STRING?
char binar[8*strg.length()]; //binary for each character is 8 numbers long
for(int i=0;i<strg.length();i++){ // for every letter in the input phrase
string chbi=charBi(strg[i]); //gets binary for individual letter from charBi function
cout<<"charbi sends: "<<chbi<<endl; //for debugging
for(int n=0;n<8;n++){ //for every 1 or 0 in the binary for an idividual letter
binar[(8*i)+n]=chbi[n]; // assign in order to string binar
}
cout<<"binar updates to: "<<binar<<endl; //for debugging
getchar(); //for debugging
}
return binar; //BINAR DEFINED AS CHAR, BUT OUTPUTTING AS STRING
}
int main(){
cout<<"final string is: "<<strBi("HELLO WORLD");
return 0;
}
Since you didn't properly terminate your arrays, the program is undefined.
In order to store a k-letter string, you need to use a k+1-element array and terminate it – char bina[9] = {}; and char binar[8*strg.length() + 1] = {}; should do the trick.
But you can simplify things a bit by leaving C behind:
std::map<char, std::string> letterNumber =
{{'A', "01000001"},
{'B', "01000010"},
// ...
{' ', "01011111}"}};
//forms binary string of numbers for input phrase
std::string strBi(const std::string& strg)
{
std::string binar;
binar.reserve(8 * strg.size());
std::for_each(strg.begin(), strg.end(), [&binar](char c) { binar += letterNumber[c]; });
return binar;
}
Make binar one character longer (char binar [8 * strg.length() + 1];) and set the last character to NUL (just before returning, do binar[8 * strg.length()] = '\0';)

Retrieve char defined in string

I'm currently writing an assembler and VM program. My assembler reads in a .asm file and converts it to byte code that my VM then runs.
Currently I read in a line from my assembly file, break that line into it's components, and then determine what the line contains (is it a directive, or an instruction)
getline(assemblyFile, line);
istringstream iss(line);
vector<string> instruction{
std::istream_iterator<std::string>(iss),{}
};
This gives me a vector of strings that has been working well for me up to this point. If my directive is an int, I'm able to retrieve it simply by saying
mem[dataCounter] = stoi(instruction[VALUE]);
This was also working well when I was using ASCII values for my characters. However, I'm trying now to be able to provide either ASCII representation, or use a notation like
J .BYT 'J'
Where the first J is a label, the .BYT tells me what data type it is, and my 'J' is the byte I'm wanting to store in my byte array. If I don't use quotes,
J .BYT J
the following works nicely
mem[dataCounter] = int(instruction[VALUE].c_str()[0]);
(gives me the decimal/byte value), where instruction is whole line, and VALUE is an index of 2. If I use the former, it of course returns the first quote. Not using quotes may be the solution in and of itself, however, I'm also having trouble reading in special characters, such as spaces, or newline characters. In the case of spaces, my directive looks like
SPACE .BYT ' '
which returns me a vector that has four elements, "SPACE", ".BYT", "'" and "'", and in the case of my newline which I've been attempting as
NEWLN .BYT \n
I have three elements with the last being "\n".
In none of these cases have I been able to find yet a way to retrieve the characters I am attempting to represent in my .asm file to their equivalent char/decimal value. I would like to continue to use string as it's been convenient and changing would require a fair bit of refactoring, but can be done to support the functionality.
What methods/functions are available that can help me retrieve these characters, in particular the special characters?
I would use strtok() and treat special characters with caution.
For example, I would examine whether the token is a newline, and if it is explicitly state it.
For the ' ', I would search for it in the string, and if found, remember its information (starting position for example in the string) and then erase it from the string. Afterwards, I would split into tokens.
Minimal Example for demonstrative purposes only:
#include <cstdio>
#include <cstring>
#include <string>
#include <iostream>
int main ()
{
//std::string str ="SPACE .BYT \n";
//std::string str = "J .BYT 'J'";
std::string str ="SPACE .BYT ' '";
std::size_t start_position_to_erase = str.find("' '");
if(start_position_to_erase != std::string::npos) {
std::cout << "Found: " << std::string(str, start_position_to_erase, start_position_to_erase+3) << std::endl;
str.erase(start_position_to_erase, 3);
}
char * pch;
printf ("Splitting string \"%s\" into tokens:\n", str.c_str());
pch = strtok ((char*)str.c_str()," ");
while (pch != NULL)
{
if(pch[0] == '\n')
printf ("\\n");
else
printf ("%s\n",pch);
pch = strtok (NULL, " ");
}
return 0;
}
Output:
Found: ' '
Splitting string "SPACE .BYT " into tokens:
SPACE
.BYT

how to fix strtok compiler error?

hi so i am writing this code to to look into a text file and put each word it finds in a c string array. I was able to write the code but I get problems when there is a mistake in the actual text file. for example my program would crash if there is a double space in the sentence like "the car goes fast" it would stop at car. looking at my code i believe that this is because of strtok. i think to fix the problem i need to make strtok make a token of then next value but i am not sure how to do so
my code
#include <iostream>
#include <fstream>
#include <string.h>
#include <stdlib.h>
using namespace std;
int main() {
ifstream file;
file.open("text.txt");
string line;
char * wordList[10000];
int x=0;
while (getline(file,line)){
// initialize a sentence
char *sentence = (char*) malloc(sizeof(char)*line.length());
strcpy(sentence,line.c_str());
// intialize a pointer
char* word;
// this gives us a pointer to the first instance of a space, comma, etc.,
// that is, the characters in "sentence" will be read into "word"
// until it reaches one of the token characters (space, comma, etc.)
word = strtok(sentence, " ,!;:.?");
// now we can utilize a while loop, so every time the sentence comes to a new
// token character, it stops, and "word" will equal the characters from the last
// token character to the new character, giving you each word in the sentence
while (NULL != word){
wordList[x]=word;
printf("%s\n", wordList[x]);
x++;
word = strtok(NULL," ,!;:.?");
}
}
printf("done");
return 0;
}
I know some of the code is in c++ and some is in c but I am trying to do the most of it in c
The problem might be that you are not allocating enough space for a null terminated string.
char *sentence = (char*) malloc(sizeof(char)*line.length());
strcpy(sentence,line.c_str());
If you need to capture "abc", you need 3 elements for the characters and another one for the terminating null character, i.e. 4 characters total.
The value of the argument to malloc needs to be increased by 1.
char *sentence = (char*) malloc(line.length()+1);
strcpy(sentence,line.c_str());
It's not clear why you are using malloc in a C++ program. I suggest use new.
char *sentence = new char[line.length()+1];
strcpy(sentence,line.c_str());

Why does a C++ string need a \0?

I was hoping that I could get some further explanation. I was told that I need to explicitly add \0 to the end of a string. Apparently this is for the C++ string class and that it is actually an array of characters that seems to be parsed under the hood. I was told that we must use the \0 in order to tell where the end of the string is as seen below:
int main()
{
char str[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
cout << str << endl;
return 0;
}
However, if I have a user input their name, for example, I don't believe that C++ automatically uses the \0 to terminate the string. So the argument that the \0 must be there to know where the string ends makes no sense. Why cant we use the .length() function to account for the length of the string?
I wrote the following program to illustrate that the length of the input can be found from the .length() function.
int main()
{
string firstName;
cout << "Enter your first name: ";
cin >> firstName;
cout << "First Name = " << firstName << endl;
cout << "String Length = " << firstName.length() << endl;
return 0;
}
So, if the user inputs the name "Tom". Then the output would be the following:
First Name = Tom
String Length = 3
I brought this to my professor's attention and also this article http://www.cplusplus.com/reference/string/string/length/
and I was told that is why I am in college because it cannot be done this way. Can any one offer any insight, since I don't understand what I am missing?
The "C string" was adopted into C++ from the C language. The C language did not have a string type. Strings in C were represented as an array of char, and the string was terminated with the NUL byte (\0). A plain string literal in C++ still has these semantics.
The C++ string type maintains the length within the object, as you say, so in a string, the NUL is not required. To get a "C string" from a string, you can use the c_str() method on the string. This is useful if you need to pass the contents of the C++ string to a function that only understands the NUL terminated variety.
std::string s("a string"); // s is initialized,
// the length is computed when \0 is encountered.
assert(s.size() == sizeof("a string")-1);
// sizeof string literal includes the \0
assert(s.c_str()[s.size()] == '\0');
// c_str() includes the \0
In your first program, you are initializing an array of char with an initializer list. The initialization is equivalent to the following:
char str[6] = "Hello";
This style of initializing an array of char is a special allowance that C++ provides since it is the syntax accepted by C.
In your second program, you are getting the name from the standard input. When C++ scans the input to populate the string argument, it essentially scans byte by byte until it encounters a separator (whitespace characters, by default). It may or may not insert a NUL byte at the end.
You're not missing anything per se. The null terminator is used on character arrays to indicate the end. However, the string class takes care of all of that for you. The length attribute is a perfectly acceptable way of doing it since you're using strings.
However, if you're using a character array, then yes, you would need to check if you're on the null terminator, as you may not know the length of your string.
The following will give you no issues.
int length = 2;
char str[] = "AB";
However, try the following, and you'll see some issues.
int length = 5;
char str[length + 1] = "ABCDE"; // +1 makes room for automatic \0
char str2[length + 1] = "ABC";
Try the second snipped using your for loop method knowing the length, and the first one will give you ABCDE, but the second one will give you "ABC" followed by one junk character. It's only one because you'll have [A][B][C][\0][JUNK] in your array. Make length larger and you'll see more junk.

Split a wstring by specified separator

I have a std::wstring variable that contains a text and I need to split it by separator. How could I do this? I wouldn't use boost that generate some warnings. Thank you
EDIT 1
this is an example text:
hi how are you?
and this is the code:
typedef boost::tokenizer<boost::char_separator<wchar_t>, std::wstring::const_iterator, std::wstring> Tok;
boost::char_separator<wchar_t> sep;
Tok tok(this->m_inputText, sep);
for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
{
cout << *tok_iter;
}
the results are:
hi
how
are
you
?
I don't understand why the last character is always splitted in another token...
In your code, question mark appears on a separate line because that's how boost::tokenizer works by default.
If your desired output is four tokens ("hi", "how", "are", and "you?"), you could
a) change char_separator you're using to
boost::char_separator<wchar_t> sep(L" ", L"");
b) use boost::split which, I think, is the most direct answer to "split a wstring by specified character"
#include <string>
#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>
int main()
{
std::wstring m_inputText = L"hi how are you?";
std::vector<std::wstring> tok;
split(tok, m_inputText, boost::is_any_of(L" "));
for(std::vector<std::wstring>::iterator tok_iter = tok.begin();
tok_iter != tok.end(); ++tok_iter)
{
std::wcout << *tok_iter << '\n';
}
}
test run: https://ideone.com/jOeH9
You're default constructing boost::char_separator. The documentation says:
The function std::isspace() is used to identify dropped delimiters and std::ispunct() is used to identify kept delimiters. In addition, empty tokens are dropped.
Since std::ispunct(L'?') is true, it is treated as a "kept" delimiter, and reported as a separate token.
Hi you can use wcstok function
You said you don't want boost so...
This is maybe a wierd approach to use in C++ but I used it one in a MUD where i needed a lot of tokenization in C.
take this block of memory assigned to the char * chars:
char chars[] = "I like to fiddle with memory";
If you need to tokenize on a space character:
create array of char* called splitvalues big enough to store all tokens
while not increment pointer chars and compare value to '\0'
if not already set set address of splitvalues[counter] to current memory address - 1
if value is ' ' write 0 there
increment counter
when you finish you have the original string destroyed so do not use it, instead you have the array of strings pointing to the tokens. the count of tokens is the counter variable (upperbound of the array).
the approach is this:
iterate the string and on first occurence update token start pointer
convert the char you need to split on to zeroes that mean string termination in C
count how many times you did this
PS. Not sure if you can use a similar approach in a unicode environment tough.