Splitting function in C++ - c++

I am trying to write a function that takes a string and a delimiter as an input and return an array of strings. For some reason, the following code runs into segmentation error. I am wondering what could be the problem?
char** split(string thing, char delimiter){
thing+='\0';//add null to signal end of string
char**split_string = new char*[100];
int i=0,j=0,l=0; //indexes- i is the ith letter in the string
// j is the jth letter in the lth string of the new array
int length = thing.length();
while (i < length){
if ((thing[i]!=delimiter && thing[i]!='\0')){
split_string[l][j]=thing[i];
j++;
}
else {
j=0; //reset j-value
l++;
}
i++;
}
return split_string;
}

After doing
char**split_string = new char*[100];
You still need to initialize each of the 100 char * pointers you created.
static const size_t str_len = 50; //assuming length will not exceed
for( size_t ix = 0 ; ix < 100 ; ++ix){
split_string[ix] = new char[str_len];
}
Also you need to make sure while writing to split_string you do not exceed the allocated memory in which case its 50 and you don't have splited strings more then 100.

Better split a std::string to std::vector<std::string>. Use the function below
#include <sstream>
#include <string>
#include <vector>
std::vector<std::string> split(std::string str, char delim) {
std::vector<std::string> result;
std::stringstream ss(str);
std::string token;
while (getline(ss, token, delim))
result.push_back(token);
return result;
}

1) Please allocate memory for each substring (something like char[l] = new char[100]) when ever you find a new substring.
As you do not know the number of substring in the beginning itself, please consider using vector. consider using vector < string > split_string. In the loop when ever you find a new sub string you just push that string in the vector. In the end, you will have all the splitted strings in the vector.

Each char * has to be individually initialized like this.
int len = 100;
char**split_string = new char*[len]; // len holds the number of pointers
for(int k = 0; k < len; k++) {
split_string[k] = new char[500]; // holds len of string (here Max word size is considered 500)
}
In C++, sticking to the usage of std::string would be more recommended to reduce complexity and increase readablity.
Your code will fail to catch hold of the last substring as you are breaking out of while-loop just before finding \0. To fix that you need to change the while (i < length) to while (i <= length).
Using vector < string >:
vector<string> split(string thing, char delimiter){
int len = 100;
vector<string> v;
char c[500];
int i=0,j=0;
int length = thing.length();
while (i <= length){
if ( thing[i] != delimiter && thing[i] != '\0') {
c[j]=thing[i];
j++;
}
else {
c[j] = '\0';
v.push_back(c);
memset(c, 0, sizeof c);
j = 0;
}
i++;
}
return v;
}
Demo.

Related

c++ Split char array without use of any library

I've been running into this weird issue where the split code returns correctly when I printf output inside the function, but will incorrectly return output upon calling it as an instance.
Question: How do I get the correct ouput when calling it as an instance?(see useage bellow)
Here is the code:
typedef struct SplitText
{
int splitLen;
char* splitTxt[100];
char* subTxt(char* text, int index, int len)
{
char subTxt_[1000];
int count = 0;
for (int i = 0; i < 1000; i++)
subTxt_[i] = '\0';
for (int i = index; i < index + len; i++)
subTxt_[count++] = text[i];
return subTxt_;
}
void split(char* text, char sep)
{
char separator[3] = { '<', sep, '>' };
int textLen = strlen(text);
int splitIndex = 0;
int splitCount = 0;
for (int t = 0; t < textLen; t++)
{
if (text[t] == separator[0] && text[t + 1] == separator[1] && text[t + 2] == separator[2])
{
if (splitIndex != 0)
splitIndex += 3;
splitTxt[splitCount] = subTxt(text, splitIndex, t - splitIndex);
splitIndex = t;
//correct output
printf(splitTxt[splitCount]);
printf("\n");
splitCount++;
}
}
splitLen = splitCount;
}
}SplitText;
Useage:
SplitText st;
st.split("testing<=>split<=>function<=>", '=');
for (int i = 0; i < st.splitLen; i++)
{
//incorrect output
printf(st.splitTxt[i]);
printf("\n");
}
printf("--------\n");
This:
char* subTxt(char* text, int index, int len)
{
char subTxt_[1000];
...
return subTxt_;
}
Is undefined behavior. Returning a pointer to a local stack variable (or local array var) is going to result in weird stuff like this happening.
The typical thing that corrupts the contens of that returned pointer is when another function is invoked, the memory occupied by subTxt_ is going to get overwritten with the stack variables of the next function invoked.
Better:
char* subTxt(char* text, int index, int len)
{
char *subTxt = new char[1000];
...
return subTxt_;
}
And then make sure whoever invokes subTxt remembers to delete [] on the returned pointer.
Or just use std::string and be done with it (unless this is an academic exercise).
Also, this is undefined behavior:
for (int t = 0; t < textLen; t++)
{
if (text[t] == separator[0] && text[t + 1] == separator[1] && text[t + 2] == separator[2])
when t == textLen-1, then referencing text[t+2] and text[t+1] is an out of bounds access. Change it to be:
for (int t = 2; t < textLen; t++)
{
if (text[t-2] == separator[0] && text[t -1] == separator[1] && text[t] == separator[2])
And do similar fixups with t within the block as well.
Well you can create a splitstring function instead of a struct/class.
Anyway your code still looks quite "C" like with its fixed size char arrays. This will limit the usability and stability (out-of-bound array bugs).
Strings in C++ are usually of type std::string.
and then C++ has string_view to make views on that string (so no data gets copied, but it also means your string_view is only valid for as long as the string it is viewing lives).
If you don't know the number of substrings in a string up-front, you should not use a fixed size array, but a std::vector (which can resize internally if needed)
This is what a split_string function would look like in current C++, note that the code also shows better what it is doing compared to "C" style programming that show more what you are doing.
std::vector<std::string_view> split_string(std::string_view string, std::string_view delimiters)
{
std::vector<std::string_view> substrings;
if(delimiters.size() == 0ul)
{
substrings.emplace_back(string);
return substrings;
}
auto start_pos = string.find_first_not_of(delimiters);
auto end_pos = start_pos;
auto max_length = string.length();
while(start_pos < max_length)
{
end_pos = std::min(max_length, string.find_first_of(delimiters, start_pos));
if(end_pos != start_pos)
{
substrings.emplace_back(&string[start_pos], end_pos - start_pos);
start_pos = string.find_first_not_of(delimiters, end_pos);
}
}
return substrings;
}
Take a look at std::string_view.
You can avoid allocating memory and it has a built-in substring function.
Just be careful when using printf for printing to console as "%s" will
print the whole string.
See printf documentation.
for(auto view : container_with_string_views)
printf("%.*s, (int)view.size(), view.data());

storing a string to a 2d char array

I am trying to store a sentence to a 2d array by separating each words. In the 2d array each row will contain each word from the sentence. Here is what I think I should do.
//Logic
//given string mystring
string mystring = "testing the arrays";
//create a 2d char array to hold 4 words with 10 max size
char 2darr[4][10] = {" "};
int x = 0;
for (int i = 0,j=0; i <mystring.length(); i++)
{
if (mystring(i) != ' ')
2darr[x][j++] = mystring(i); //copy the each character to the first row
else
2darr[x][j++] = '\0';
++x; // goes to next row
j = 0; //reset j for new row
}
Is there a better way to do this? I think my logic is a little off as well
The better way to do this is:
1) There is no need to check spaces. For this to occur, you can use std::istringstream with operator >> to obtain each word in a loop.
2) Use strncpy to copy the string into the 2 dimensional array
3) You need to make sure that the string does not exceed the bounds of the array, and that you have no more than 4 separate words.
Here is an example:
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <cstring>
int main()
{
char arr2d[4][10] = {};
std::string mystring = "testing the arrays";
// create an input stream containing the test string
std::istringstream strm(mystring);
std::string word;
// this is the word count
int curCount = 0;
// obtain each word and copy to the array
while (strm >> word && curCount < 4)
strncpy(arr2d[curCount++], word.c_str(), 9);
// output the results
for (int i = 0; i < curCount; ++i)
std::cout << arr2d[i] << "\n";
}
Output:
testing
the
arrays
this expression char 2darr[4][10] = {" "} will only set the first element to be " ", the others will be '\0' or NULL. But it is probably OK, since it is the default terminator of C-strings.
Variables can't start with a digit, call it arr2d instead.
String character access is mystring[i], not mystring(i)
You only indented the lines in the else block, in C++ if you don't enclose a block with curly braces, it only captures the first row, what you basically wrote is:
else {
2darr[x][j++] = '\0';
}
++x; // goes to next row
j = 0; //reset j for new row
Corrected code is
std::string mystring = "testing the arrays";
//create a 2d char array to hold 4 words with 10 max size
char arr2d[4][10] = { };
int x = 0;
for (int i = 0, j = 0; i < mystring.length(); i++)
{
if (mystring[i] != ' ') {
arr2d[x][j++] = mystring[i]; //copy the each character to the first row
}
else {
arr2d[x][j++] = '\0';
++x; // goes to next row
j = 0; //reset j for new row
}
}

Why doesn't String* return all Strings?

The following code is part of my project in C++98, therefore i'm not allowed to use vectors and such. Now the main use for this function is to break down a single string line, to an array of strings using the given delimeter, and the size is basically the number of words that i need to return. the problem is that when i debug and check nums towards the end, it changed it's size to 4, and only returned the first word, filled by every char of it.As if, nums is now char* i have changed the code many times, but i don't where i went wrong, any advice?
string* Split(string ss,char delimeter,int size)
{
string *nums=new string[size];
int index_c, index_sw=0;
for (int i = 0; i < size; i++)
{
for(unsigned int j=0;j<ss.length();j++)
{
if (ss.at(j) == delimeter)
{
index_c = j;
nums[i] = ss.substr(index_sw, index_c);
index_sw += index_c;
i++;
}
}
break;
}
return nums;
}
Since you do not know the number of words in the string ss beforehand, you cannot specify size when calling the Split function. Not knowing size you will not be able to allocate memory to nums.
So you are better off using a vector of strings. As pointed out, vector is available in C++98.
Then your modified Split function will look like this:
vector<string> Split(string ss, char delimiter)
{
vector<string> vs;
int index_c, index_sw=0, j;
for(j=0;j<ss.length();j++)
{
if (ss.at(j) == delimiter)
{
index_c = j;
vs.push_back(ss.substr(index_sw, index_c - index_sw));
index_sw = index_c + 1;
}
}
vs.push_back(ss.substr(index_sw, j - index_sw));
return vs;
}
which then can be called like this:
vector<string> ret = Split("This is a stackoverflow answer", ' ');
See demo here.

C++. How can I free memory correctly?

Written code to find and remove the largest word in a string without the using of library functions. Everything works fine. But when I want to free memory, the result is negative (displays an empty line). If you remove the call to the memory release function, everything will work correctly, but there will be a leak of memory.
How do I fix it? Please help me.
#include <iostream>
using namespace std;
int length(char *text) // string length
{
char *begin = text;
while(*text++);
return text - begin - 1;
}
int size(char **text) // size of two-dimensional array
{
int i = 0;
while(text[i]) i++;
return i;
}
void free_memory(char **text)
{
for(int i=0; i<size(text); i++)
delete text[i];
delete [] text;
}
char **split(char *text, char delim)
{
int words = 1;
int len = length(text);
for(int i=0; i<len; i++)
if(text[i] == delim) words++;
char **result = new char*[words + 1];
int j = 0, t = 0;
for(int i=0; i<words; i++)
{
result[i] = new char[len];
while(text[j] != delim && text[j] != '\0') result[i][t++] = text[j++];
j++;
t = 0;
}
result[words + 1] = nullptr;
return result;
}
char *strcat(char *source, char *destination)
{
char *begin = destination;
while(*destination) destination++;
*destination++ = ' ';
while(*source) *destination++ = *source++;
return begin;
}
char *removeWord(char *in_string)
{
char **words = split(in_string, ' ');
int max = length(words[0]);
int j = 0;
for(int i=0; i<size(words); i++)
if(max < length(words[i]))
{
max = length(words[i]);
j = i;
}
int index;
char *result;
if(!j) index = 1;
else index = 0;
result = words[index];
for(int i=0; i<size(words); i++)
if(i != j && i != index)
result = strcat(words[i], result);
free_memory(words); // I want free memory here
return result;
}
int main()
{
char text[] = "audi and volkswagen are the best car";
cout << removeWord(text) << endl;
return 0;
}
In fact, this is C style programming - not C++. I see that your aim is to implement everything from scratch, possibly for practicing. But even then, your code is not designed/structured properly.
Besides that, you also have several bugs in your code:
result[words + 1] = nullptr; must be result[words] = nullptr;
You need result[i][t] = '\0'; after the while loop in split
delete text[i] must be delete [] text[i]
You cannot assign to your result pointer memory from words, then free it and then return it for use by the caller.
There is at least one further bug in the second half of removeWord. It would be tedious to try to understand what you are trying to do there.
You might want to start with a simpler task. You also should proceed step-by-step and check each function for correctness independently first and not implement everything and then test. Also take a look at the tool valgrind for memory checking - if you use Linux.
The way you free memory correctly is to use RAII:
Only use new and new[] in constructors
Pair those with delete and delete[] in the corresponding destructor
Use automatic storage duration objects as much as possible
If you are specifically not using std::string and std::vector etc, for reasons of learning pointers, you will end up writing some small number of classes that resemble string and vector and unique_ptr, and then you go about programming as if you were using the std versions.
You have two issues. First is that result is assigned to a memory location in words. Second, is that you're storing the result of strcat in words[i] which will likely not have enough room (see strcat documentation).
result = new char[len(in_string)+1]; // +1 for space for null char
// the old loop reversed the word order -- if you want to keep doing
// that, make this a descending loop
for(int i=0; i<size(words); i++)
if(i != j && i != index)
strcat(result, words[i]);
free_memory(words);
return result;
So that when you free words, what result points to is also free'd. You would then need to free your result in main().
int main()
{
char text[] = "audi and volkswagen are the best car";
char * result = removeWord(text);
cout << result << endl;
delete[] result;
return 0;
}

Get string content from string pointer C++

so I'm working on a project that I have to read contents from a file and then analyze them. But I'm having a problem with getting the string out of a pointer that contains the address to what I need.
string lePapel(vector<char> vec){
string *str, s;
int i, j = 0;
vector<char> aux;
aux.resize(6);
for (i = 57; i <= 62; i++){
aux[j] = vec[i];
j++;
}
str = new string[aux.size()];
for (i = 0; i < 6; i++){ str[i] = aux[i]; }
return s;
}
So, the file contains in the array positions from 57 to 62 the word: ABCB4, but when returning the string s my output is A only as expected because of the pointer.
The thing is that I have been trying to find a solution and storing the whole content from vec[57] to vec[64] into the string s and returning it, and the closest that I got to returning anything plausible was using a pointer.
So, now to my question, how can I iterate the *str pointer and copy the whole content to s and return it?
Thanks in advance
I'd suggest you to not use pointers on string in your case. The following code is probably what you want :
#include <iostream>
#include <string>
#include <vector>
using namespace std;
string lePapel(vector<char> vec){
int j = 0;
vector<char> aux;
aux.resize(6);
for (int i = 57; i <= 62; i++){
aux[j] = vec[j];
j++;
}
string str;
str.reserve(6);
for (int i = 0; i < 6; i++){ str.push_back(aux[i]); }
return str;
}
int main() {
char x[5] = {'A', 'B', 'C', 'B', '4'};
vector<char> vec(x, x + 5);
string s = lePapel(vec);
cout << s;
return 0;
}
Tested here : Tested code
About reserving space to your vector : c++ vector::reserve
Same for strings : reserve for strings
The dynamic array of string objects and the whole aux vector seem completely needless here (unless there's some other purpose for them in your code). Additionally, str is currently causing a memory leak because you never delete it when you're finished.
A much simpler approach is just to append the characters one-at-a-time to the s string object (assuming it's a std::string):
string lePapel(vector<char> vec) {
string s;
for (int i = 57; i <= 62; i++) {
s += vec[i];
}
return s;
}
There are various ways to make the code even shorter (and more efficient) than that though, if you really want to.
EDIT: If you still need/want to iterate your dynamic array and concatenate the contents into s, here's how you could do it:
for (i = 0; i < 6; i++) s += str[i];
delete [] str; //<-- very important!
Short answer, you don't want a string * you want a char *. What you created is a string array. String objects contain a pointer to the char * data you are trying to capture. Also, the sizeof(std::string) (8 bytes in size) is a lot bigger than sizeof(char) (1 byte in size) the second character you store is 8 bytes away from the first character instead of being adjacent.
There are a lot of other C++ style and safety concerns, but I'll stick with the question. ;)