creating a string split function in C++ [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Splitting a string in C++
Im trying to create a function that mimics the behavior of the getline() function, with the option to use a delimiter to split the string into tokens.
The function accepts 2 strings (the second is being passed by reference) and a char type for the delimiter. It loops through each character of the first string, copying it to the second string and stops looping when it reaches the delimiter. It returns true if the first string have more characters after the delimiter and false otherwise. The position of the last character is being saved in a static variable.
for some reason the the program is going into an infinite loop and is not executing anything:
const int LINE_SIZE = 160;
bool strSplit(string sFirst, string & sLast, char cDelim) {
static int iCount = 0;
for(int i = iCount; i < LINE_SIZE; i++) {
if(sFirst[i] != cDelim)
sLast[i-iCount] = sFirst[i];
else {
iCount = i+1;
return true;
}
}
return false;
}
The function is used in the following way:
while(strSplit(sLine, sToken, '|')) {
cout << sToken << endl;
}
Why is it going into an infinite loop, and why is it not working?
I should add that i'm interested in a solution without using istringstream, if that's possible.

It is not exactly what you asked for, but have you considered std::istringstream and std::getline?
// UNTESTED
std::istringstream iss(sLine);
while(std::getline(iss, sToken, '|')) {
std::cout << sToken << "\n";
}
EDIT:
Why is it going into an infinite loop, and why is it not working?
We can't know, you didn't provide enough information. Try to create an SSCCE and post that.
I can tell you that the following line is very suspicious:
sLast[i-iCount] = sFirst[i];
This line will result in undefined behavior (including, perhaps, what you have seen) in any of the following conditions:
i >= sFirst.size()
i-iCount >= sLast.size()
i-iCount < 0
It appears to me likely that all of those conditions are true. If the passed-in string is, for example, shorter than 160 lines, or if iCount ever grows to be bigger than the offset of the first delimiter, then you'll get undefined behavior.

LINE_SIZE is probably larger than the number of characters in the string object, so the code runs off the end of the string's storage, and pretty much anything can happen.
Instead of rolling your own, string::find does what you need.
std::string::size_type pos = 0;
std::string::size_type new_pos = sFirst.find('|', pos);
The call to find finds the first occurrence of '|' that's at or after the position 'pos'. If it succeeds, it returns the index of the '|' that it found. If it fails, it returns std::string::npos. Use it in a loop, and after each match, copy the text from [pos, new_pos) into the target string, and update pos to new_pos + 1.

are you sure it's the strSplit() function that doesn't return or is it your caller while loop that's infinite?
Shouldn't your caller loop be something like:
while(strSplit(sLine, sToken, '|')) {
cout << sToken << endl;
cin >> sLine >> endl;
}
-- edit --
if value of sLine is such that it makes strSplit() to return true then the while loop becomes infinite.. so do something to change the value of sLine for each iteration of the loop.. e.g. put in a cin..

Check this out
std::vector<std::string> spliString(const std::string &str,
const std::string &separator)
{
vector<string> ret;
string::size_type strLen = str.length();
char *buff;
char *pch;
buff = new char[strLen + 1];
buff[strLen] = '\0';
std::copy(str.begin(), str.end(), buff);
pch = strtok(buff, separator.c_str());
while(pch != NULL)
{
ret.push_back(string(pch));
pch = strtok(NULL, separator.c_str());
}
delete[] buff;
return ret;
}

Related

What is the reason behind the debugging getting stopped abruptly in the following code?

Here is the code to find the number of matches of a string, which is input from the user, can be found in the file temp.txt. If, for example, we want love to be counted, then matches like love, lovely, beloved should be considered. We also want to count the total number of words in temp.txt file.
I am doing a line by line reading here, not word by word.
Why does the debugging stop at totalwords += counting(line)?
/*this code is not working to count the words*/
#include<iostream>
#include<fstream>
#include<string>
using namespace std;
int totalwords{0};
int counting(string line){
int wordcount{0};
if(line.empty()){
return 1;
}
if(line.find(" ")==string::npos){wordcount++;}
else{
while(line.find(" ")!=string::npos){
int index=0;
index = line.find(" ");
line.erase(0,index);
wordcount++;
}
}
return wordcount;
}
int main() {
ifstream in_file;
in_file.open("temp.txt");
if(!in_file){
cerr<<"PROBLEM OPENING THE FILE"<<endl;
}
string line{};
int counter{0};
string word {};
cout<<"ENTER THE WORD YOU WANT TO COUNT IN THE FILE: ";
cin>>word;
int n {0};
n = ( word.length() - 1 );
while(getline(in_file>>ws,line)){
totalwords += counting(line);
while(line.find(word)!=string::npos){
counter++;
int index{0};
index = line.find(word);
line.erase(0,(index+n));
}
}
cout<<endl;
cout<<counter<<endl;
cout<<totalwords;
return 0;
}
line.erase(0, index); doesn't erase the space, you need
line.erase(0, index + 1);
Your code reveals a few problems...
At very first, counting a single word for an empty line doesn't appear correct to me. Second, erasing again and again from the string is pretty inefficient, with every such operation all of the subsequent characters are copied towards the front. If you indeed wanted to do so you might rather want to search from the end of the string, avoiding that. But you can actually do so without ever modifying the string if you use the second parameter of std::string::find (which defaults to 0, so has been transparent to you...):
int index = line.find(' ' /*, 0*); // first call; 0 is default, thus implicit
index = line.find(' ', index + 1); // subsequent call
Note that using the character overload is more efficient if you search for a single character anyway. However, this variant doesn't consider other whitespace like e. g. tabulators.
Additionally, the variant as posted in the question doesn't consider more than one subsequent whitespace! In your erasing variant – which erases one character too few, by the way – you would need to skip incrementing the word count if you find the space character at index 0.
However I'd go with a totally new approach, looking at each character separately; you need a stateful loop for in that case, though, i.e. you need to remember if you already are within a word or not. It might look e. g. like this:
size_t wordCount = 0; // note: prefer an unsigned type, negative values
// are meaningless anyway
// size_t is especially fine as it is guaranteed to be
// large enough to hold any count the string might ever
// contain characters
bool inWord = false;
for(char c : line)
{
if(isspace(static_cast<unsigned char>(c)))
// you can check for *any* white space that way...
// note the cast to unsigned, which is necessary as isspace accepts
// an int and a bare char *might* be signed, thus result in negative
// values
{
// no word any more...
inWord = false;
}
else if(inWord)
{
// well, nothing to do, we already discovered a word earlier!
//
// as we actually don't do anything here you might just skip
// this block and check for the opposite: if(!inWord)
}
else
{
// OK, this is the start of a word!
// so now we need to count a new one!
++wordCount;
inWord = true;
}
}
Now you might want to break words at punctuation characters as well, so you might actually want to check for:
if(isspace(static_cast<unsigned char>(c)) || ispunct(static_cast<unsigned char>(c))
A bit shorter is the following variant:
if(/* space or punctuation */)
{
inWord = false;
}
else
{
wordCount += inWord; // adds 0 or 1 depending on the value
inWord = false;
}
Finally: All code is written freely, thus unchecked – if you find a bug, please fix yourself...
debugging getting stopped abruptly
Does debugging indeed stop at the indicated line? I observed instead that the program hangs within the while loop in counting. You may make this visible by inserting an indicator output (marked by HERE in following code):
int counting(string line){
int wordcount{0};
if(line.empty()){
return 1;
}
if(line.find(" ")==string::npos){wordcount++;}
else{
while(line.find(" ")!=string::npos){
int index=0;
index = line.find(" ");
line.erase(0,index);
cout << '.'; // <--- HERE: indicator output
wordcount++;
}
}
return wordcount;
}
As Jarod42 pointed out, the erase call you are using misses the space itself. That's why you are finding spaces and “counting words” forever.
There is also an obvious misconception about words and separators of words visible in your code:
empty lines don't contain words
consecutive spaces don't indicate words
words may be separated by non-spaces (parentheses for example)
Finally, as already mentioned: if the problem is about counting total words, it's not necessary to discuss the other parts. And after the test (see HERE) above, it also appears to be independent on file input. So your code could be reduced to something like this:
#include <iostream>
#include <string>
int counting(std::string line) {
int wordcount = 0;
if (line.empty()) {
return 1;
}
if (line.find(" ") == std::string::npos) {
wordcount++;
} else {
while (line.find(" ") != std::string::npos) {
int index = 0;
index = line.find(" ");
line.erase(0, index);
wordcount++;
}
}
return wordcount;
}
int main() {
int totalwords = counting("bla bla");
std::cout << totalwords;
return 0;
}
And in this form, it's much easier to see if it works. We expect to see a 2 as output. To get there, it's possible to try correcting your erase call, but the result would then still be wrong (1) since you are actually counting spaces. So it's better to take the time and carefully read Aconcagua's insightful answer.

Recognize string formatting Debug Assertion

I have a runtime problem with code below.
The purpose is to "recognize" the formats (%s %d etc) within the input string.
To do this, it returns an integer that matches the data type.
Then the extracted types are manipulated/handled in other functions.
I want to clarify that my purpose isn't to write formatted types in a string (snprintf etc.) but only to recognize/extract them.
The problem is the crash of my application with error:
Debug Assertion Failed!
Program:
...ers\Alex\source\repos\TestProgram\Debug\test.exe
File: minkernel\crts\ucrt\appcrt\convert\isctype.cpp
Line: 36
Expression: c >= -1 && c <= 255
My code:
#include <iostream>
#include <cstring>
enum Formats
{
TYPE_INT,
TYPE_FLOAT,
TYPE_STRING,
TYPE_NUM
};
typedef struct Format
{
Formats Type;
char Name[5 + 1];
} SFormat;
SFormat FormatsInfo[TYPE_NUM] =
{
{TYPE_INT, "d"},
{TYPE_FLOAT, "f"},
{TYPE_STRING, "s"},
};
int GetFormatType(const char* formatName)
{
for (const auto& format : FormatsInfo)
{
if (strcmp(format.Name, formatName) == 0)
return format.Type;
}
return -1;
}
bool isValidFormat(const char* formatName)
{
for (const auto& format : FormatsInfo)
{
if (strcmp(format.Name, formatName) == 0)
return true;
}
return false;
}
bool isFindFormat(const char* strBufFormat, size_t stringSize, int& typeFormat)
{
bool foundFormat = false;
std::string stringFormat = "";
for (size_t pos = 0; pos < stringSize; pos++)
{
if (!isalpha(strBufFormat[pos]))
continue;
if (!isdigit(strBufFormat[pos]))
{
stringFormat += strBufFormat[pos];
if (isValidFormat(stringFormat.c_str()))
{
typeFormat = GetFormatType(stringFormat.c_str());
foundFormat = true;
}
}
}
return foundFormat;
}
int main()
{
std::string testString = "some test string with %d arguments"; // crash application
// std::string testString = "%d some test string with arguments"; // not crash application
size_t stringSize = testString.size();
char buf[1024 + 1];
memcpy(buf, testString.c_str(), stringSize);
buf[stringSize] = '\0';
for (size_t pos = 0; pos < stringSize; pos++)
{
if (buf[pos] == '%')
{
if (buf[pos + 1] == '%')
{
pos++;
continue;
}
else
{
char bufFormat[1024 + 1];
memcpy(bufFormat, buf + pos, stringSize);
bufFormat[stringSize] = '\0';
int typeFormat;
if (isFindFormat(bufFormat, stringSize, typeFormat))
{
std::cout << "type = " << typeFormat << "\n";
// ...
}
}
}
}
}
As I commented in the code, with the first string everything works. While with the second, the application crashes.
I also wanted to ask you is there a better/more performing way to recognize types "%d %s etc" within a string? (even not necessarily returning an int to recognize it).
Thanks.
Let's take a look at this else clause:
char bufFormat[1024 + 1];
memcpy(bufFormat, buf + pos, stringSize);
bufFormat[stringSize] = '\0';
The variable stringSize was initialized with the size of the original format string. Let's say it's 30 in this case.
Let's say you found the %d code at offset 20. You're going to copy 30 characters, starting at offset 20, into bufFormat. That means you're copying 20 characters past the end of the original string. You could possibly read off the end of the original buf, but that doesn't happen here because buf is large. The third line sets a NUL into the buffer at position 30, again past the end of the data, but your memcpy copied the NUL from buf into bufFormat, so that's where the string in bufFormat will end.
Now bufFormat contains the string "%d arguments." Inside isFindFormat you search for the first isalpha character. Possibly you meant isalnum here? Because we can only get to the isdigit line if the isalpha check passes, and if it's isalpha, it's not isdigit.
In any case, after isalpha passes, isdigit will definitely return false so we enter that if block. Your code will find the right type here. But, the loop doesn't terminate. Instead, it continues scanning up to stringSize characters, which is the stringSize from main, that is, the size of the original format string. But the string you're passing to isFindFormat only contains the part starting at '%'. So you're going to scan past the end of the string and read whatever's in the buffer, which will probably trigger the assertion error you're seeing.
Theres a lot more going on here. You're mixing and matching std::string and C strings; see if you can use std::string::substr instead of copying. You can use std::string::find to find characters in a string. If you have to use C strings, use strcpy instead of memcpy followed by the addition of a NUL.
You could just demand it to a regexp engine which bourned to search through strings
Since C++11 there's direct support, what you have to do is
#include <regex>
then you can match against strings using various methods, for instance regex_match which gives you the possibility, together with an smatch to find out your target with just few lines of codes using standard library
std::smatch sm;
std::regex_match ( testString.cbegin(), testString.cend(), sm, str_expr);
where str_exp is your regex to find what you want specifically
in the sm you have now every matched string against your regexp, which you can print in this way
for (int i = 0; i < sm.size(); ++i)
{
std::cout << "Match:" << sm[i] << std::endl;
}
EDIT:
to better express the result you would achieve i'll include a simple sample below
// target string to be searched against
string target_string = "simple example no.%d is: %s";
// pattern to look for
regex str_exp("(%[sd])");
// match object
smatch sm;
// iteratively search your pattern on the string, excluding parts of the string already matched
cout << "My format strings extracted:" << endl;
while (regex_search(target_string, sm, str_exp))
{
std::cout << sm[0] << std::endl;
target_string = sm.suffix();
}
you can easily add any format string you want modifying the str_exp regex expression.

std::string returning inappropriate value

I wrote a program which perform string compression using counts of repeated characters. The program in C++ is :
#include<iostream>
#include<cstring>
std::string compressBad(std::string str)
{
std::string mystr = "";
int count = 1;
char last = str[0];
for (int i = 0; i < str.length();++i)
{
if(str[i] == last)
count++;
else
{
std::string lastS = last+"";
std::string countS = std::to_string(count);
mystr.append(lastS);
mystr.append(countS);
//mystr = mystr + last + count;
count = 1;
last = str[i];
}
}
std::string lastS = last+"";
std::string countS = std::to_string(count);
mystr.append(lastS);
mystr.append(countS);
return mystr;
//return mystr+last+count;
}
int main()
{
std::string str;
std::getline(std::cin, str);
std::string str2 = compressBad(str);
std::cout<<str2;
/*if (str.length() < str2.length())
std::cout<<str;
else
std::cout<<str2;*/
std::cout<<std::endl;
return 0;
}
Few example on running this are :
Input : sssaaddddd
Output : ùÿÿ*425
Output it should print : s3a2d5
Second example:
Input : sssaaddd
Output: ùÿÿ*423
Output it should print : s3a2d3
I also implemented the same concept in Java and there it is working fine. The java implementation is here
Why is this problem happening with above code.
There may be other issues in your code, but I think that this line might be to blame:
std::string lastS = last+"";
Here, you're trying to convert the character last to a string by concatenating the empty string to the end. Unfortunately, in C++ this is interpreted to mean "take the numeric value of the character last, then add that to a pointer that points to the empty string, producing a new pointer to a character." This pointer points into random memory, hence the garbage you're seeing. (Notice that this is quite different from how Java works!)
Try changing this line to read
std::string lastS(1, last);
This will initialize lastS to be a string consisting of just the character stored in last.
Another option would be to use an ostringstream:
std::ostringstream myStr;
myStr << last << count;
// ...
return myStr.str();
This eliminates all the calls to .append() and std::to_string and is probably a lot easier to read.
last + "" doesn't do what you think.
just do
mystr.append(1, last);

Incrementing pointers for *char in a while loop

Here is what I have:
char* input = new char [input_max]
char* inputPtr = iput;
I want to use the inputPtr to traverse the input array. However I am not sure what will correctly check whether or not I have reached the end of the string:
while (*inputPtr++)
{
// Some code
}
or
while (*inputPtr != '\0')
{
inputPtr++;
// Some code
}
or a more elegant option?
Assuming input string is null-terminated:
for(char *inputPtr = input; *inputPtr; ++inputPtr)
{
// some code
}
Keep in mind that the example you posted may not give the results you want. In your while loop condition, you're always performing a post-increment. When you're inside the loop, you've already passed the first character. Take this example:
#include <iostream>
using namespace std;
int main()
{
const char *str = "apple\0";
const char *it = str;
while(*it++)
{
cout << *it << '_';
}
}
This outputs:
p_p_l_e__
Notice the missing first character and the extra _ underscore at the end. Check out this related question if you're confused about pre-increment and post-increment operators.
I would do:
inputPtr = input; // init inputPtr always at the last moment.
while (*inputPtr != '\0') { // Assume the string last with \0
// some code
inputPtr++; // After "some code" (instead of what you wrote).
}
Which is equivalent to the for-loop suggested by greatwolf. It's a personal choice.
Be careful, with both of your examples, you are testing the current position and then you increment. Therefore, you are using the next character!
Assuming input isn't null terminated:
char* input = new char [input_max];
for (char* inputPtr = input; inputPtr < input + input_max;
inputPtr++) {
inputPtr[0]++;
}
for the null terminated case:
for (char* inputPtr = input; inputPtr[0]; inputPtr++) {
inputPtr[0]++;
}
but generally this is as good as you can get. Using std::vector, or std::string may enable cleaner and more elegant options though.

C++ - Attempting to use string functions to reverse an input string

As part of a homework assignment I need to be able to take an input string and manipulate it several ways using a list of string functions. The first function takes a string and reverses it using a for loop. This is what I have:
#include <iostream>
#include <string>
namespace hw06
{
typedef std::string::size_type size_type;
//reverse function
std::string reverse( const std::string str );
}
// Program execution begins here.
int main()
{
std::string inputStr;
std::cout << "Enter a string: ";
std::getline( std::cin, inputStr );
std::cout << "Reversed: " << hw06::reverse( inputStr )
<< std::endl;
return 0;
}
//reverse function definition
std::string hw06::reverse( const std::string str )
{
std::string reverseStr = "";
//i starts as the last digit in the input. It outputs its current
//character to the return value "tempStr", then goes down the line
//adding whatever character it finds until it reaches position 0
for( size_type i = (str.size() - 1); (i >= 0); --i ){
reverseStr += str.at( i );
}
return reverseStr;
}
The program asks for input, then returns this error:
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::tat
I'm really at a loss as to what I'm doing wrong here. The loop seems correct to me, so am I misunderstanding how to reference the function?
Unless you really want to write a loop, it's probably easier to just do something like:
std::string reverse(std::string const &input) {
return std::string(input.rbegin(), input.rend());
}
The problem is that your loop never terminates. You have as your condition i >= 0, but size_type is unsigned, so 0 - 1 == 2^(sizeof(size_t) * 8) - 1, which is certainly out of the range of your string. Therefore, you need to pick something else as your termination condition. One option is you can use i != std::string::npos, but that feels wrong. You're probably better off with something like:
for (size_type i = str.size(); i != 0; ) {
reverseStr += str.at(--i);
}
EDIT: I did some checking on i != std::string::npos. It should be well-defined and OK. However, it still seems like the Wrong Way To Do It.
As Andreas Grapentin said, the problem is that std::string::size() returns a size_t which is required by the standard to be an unsigned type. So it will always be >= 0 and when you hit 0 and decrement it, you will go to some really large, positive number.
Consider something like this:
std::string hw06::reverse(const std::string &str)
{
std::string reverseStr;
for(size_t i = str.size(); i != 0; i--)
reverseStr += str.at(i - 1);
return reverseStr;
}
I'm not keen on answering homework questions, but seeing some of the answers, I couldn't resist this:
std::string hw06::reverse(const std::string &str)
{ return std::string(str.rbegin(), str.rend()); }
Simple, clean and least wasteful if you can't do it in-place.
As other answers say, the problem is in the loop. I'll suggest using the following "goes to" operator :)
for(size_t i = str.size(); i --> 0;)
{
}
use i-- and not --i. Or u will decrease i value before getting the char and get loop problems.