Splitting a string based on multiple string separators in c++

Splitting a string based on multiple string separators in c++ - c++

I'm writing my code in C++98 using only standard libraries.
I'm trying to write some code to split a string in multiple substrings each delimited by the string "OK" or the string "ERROR".
Each substring should be put in the mysubstring array.
This is the code I wrote for a single separator:
void split_string()
{
for (unsigned short k=0;k<10;k++)
{
mysubstring[k]=""; //resetting all substrings
}
string separator = "OK";
size_t pos = 0;
unsigned short index=0;
while ((pos = str_to_split.find(separator)) != std::string::npos) {
mysubstring[index] = str_to_split.substr(0, pos);
str_to_split.erase(0, pos + separator.length());
index++;
}
This single separator version works fine.
Then I tried to upgrade to the two separators:
void split_string()
{
for (unsigned short k=0;k<10;k++)
{
mysubstring[k]=""; //resetting all substrings
}
string okseparator = "OK";
string koseparator = "ERROR";
size_t okpos = 0;
size_t kopos = 0;
unsigned short index=0;
while ((okpos = string_to_split.find(okseparator)) != std::string::npos)
{
while ((kopos = string_to_split.find(koseparator)) != std::string::npos)
{
if (okpos <= kopos)
{
mysubtring[index] = string_to_split.substr(0, okpos + okseparator.length());
string_to_split.erase(0, okpos + okseparator.length());
index++;
}
else
{
mysubstring[index] = string_to_split.substr(0, kopos + koseparator.length());
string_to_split.erase(0, kopos + koseparator.length());
index++;
}
}
}
while ((kopos = string_to_split.find(koseparator)) != std::string::npos)
{
mysubtring[index] = string_to_split.substr(0, kopos + koseparator.length());
string_to_split.erase(0, kopos + koseparator.length());
index++;
}
}
The idea here is that I stay inside the first while loop untill all "OK" are consumed, then it enters the last while to finish off all "ERROR" left.
The substrings should enter the mysubstring array in the same order they are in the string_to_split original string.
Sadly I can't get it to work, could you help me ?
Example to test and verify:
#include <iostream>
#include <string.h>
void split_string();
string str_to_split = "skdjfnsdjknfjk OK fkjsnfjksdnfjnsdjkfn ERROR skjdfnjksdnf OK sjkdnfjksdnfjERROR jnfjnsdjfnsjdknfjkn OK";
use namespace std;
int main()
{
split_string();
return 0;
}

Figured it out:
void split_string()
{
for (unsigned short k=0;k<10;k++)
{
mysubstring[k]=""; //resetting all substrings
}
string okseparator = "OK";
string koseparator = "ERROR";
size_t okpos = 0;
size_t kopos = 0;
unsigned short index=0;
while (1)
{
okpos = string_to_split.find(okseparator);
kopos = string_to_split.find(koseparator);
if (okpos < kopos)
{
mysubstring[index] = string_to_split.substr(0, okpos + okseparator.length());
string_to_split.erase(0, okpos + okseparator.length());
index++;
}
else if (okpos > kopos)
{
mysubstring[index] = string_to_split.substr(0, kopos);
string_to_split.erase(0, kopos + koseparator.length());
index++;
}
else
{
break;
}
}
}
I get the position for both separators but I consider only the closest one.
The while(1) terminates when both the separators have the same position (string::npos = max(size_t)).

Related

C++ program to count repeated words in a cstring

I've been working on a C++ program, I've made the logic but I'm unable to execute it. The question is:
Task: Write a program, using functions only, with the following features.
Program reads paragraph(s) from the file and stores in a string.
Then program counts the occurrence of each word in the paragraph(s) and stores all words with their number of occurrences.
If that word has appeared more than one time in whole string, it should store the word only once along its total occurrences.
The output described in above (in part 3) must be stored in a new file.
Sample input:
is the is and the is and the and is and only that is
Sample output:
is 5
the 3
and 4
only 1
that 1
I'll cut short to Occurrence program that I've written,
My logic is to store token into character array and then compare that array with main character array and do the increment:
void occurances() {
char* string = getInputFromFile();
char separators[] = ",.\n\t ";
char* token;
char* nextToken;
char* temp[100];
token = strtok_s(string, separators, &nextToken);
cout << temp;
int counter = 0;
int i = 0;
while ((token != NULL)) {
temp[i] = token;
i++;
for (int i = 0; i < strlen(string); i++) {
for (int j = 0; j < 100; j++) {
if ((strcmp(token, *temp)) == 0) {
counter++;
}
}
cout << temp << " : " << counter << endl;
}
if (token != NULL) {
token = strtok_s(NULL, separators, &nextToken);
}
}
}
This code is preposterous I know that, But please anyone be kind enough to give me a clue, actually I'm new to C++ . Thank you

If you store token into array this array should grow dynamically because the number of tokens is not known at the beginning. And according to the task description, you cannot use C++ standard containers, so, it is necessary to implement dynamic array manually, for example:
#include <iostream>
std::size_t increase_capacity_value(std::size_t capacity) {
if (capacity == 0) {
return 1;
}
else if (capacity < (SIZE_MAX / 2)) {
return capacity * 2;
}
return SIZE_MAX;
}
bool increase_array_capacity(char**& tokens_array, std::size_t*& tokens_count, std::size_t& capacity) {
const std::size_t new_capacity = increase_capacity_value(capacity);
if (new_capacity <= capacity) {
return false;
}
const std::size_t tokens_array_byte_size = new_capacity * sizeof(char*);
char** const new_tokens_array = static_cast<char**>(std::realloc(tokens_array, tokens_array_byte_size));
if (new_tokens_array == nullptr) {
return false;
}
tokens_array = new_tokens_array;
const std::size_t tokens_count_byte_size = new_capacity * sizeof(std::size_t);
std::size_t* const new_tokens_count = static_cast<std::size_t*>(std::realloc(tokens_count, tokens_count_byte_size));
if (new_tokens_count == nullptr) {
return false;
}
tokens_count = new_tokens_count;
capacity = new_capacity;
return true;
}
bool add_token(char* token, char**& tokens_array, std::size_t*& tokens_count, std::size_t& array_size, std::size_t& array_capacity) {
if (array_size == array_capacity) {
if (!increase_array_capacity(tokens_array, tokens_count, array_capacity)) {
return false;
}
}
tokens_array[array_size] = token;
tokens_count[array_size] = 1;
++array_size;
return true;
}
std::size_t* get_token_count_storage(char* token, char** tokens_array, std::size_t* tokens_count, std::size_t array_size) {
for (std::size_t i = 0; i < array_size; ++i) {
if (std::strcmp(token, tokens_array[i]) == 0) {
return tokens_count + i;
}
}
return nullptr;
}
bool process_token(char* token, char**& tokens_array, std::size_t*& tokens_count, std::size_t& array_size, std::size_t& array_capacity) {
std::size_t* token_count_ptr = get_token_count_storage(token, tokens_array, tokens_count, array_size);
if (token_count_ptr == nullptr) {
if (!add_token(token, tokens_array, tokens_count, array_size, array_capacity)) {
return false;
}
}
else {
++(*token_count_ptr);
}
return true;
}
int main() {
char string[] = "is the is and the is and the and is and only that is";
char separators[] = ",.\n\t ";
std::size_t token_array_capacity = 0;
std::size_t token_array_size = 0;
char** tokens_array = nullptr;
std::size_t* tokens_count = nullptr;
char* current_token = std::strtok(string, separators);
while (current_token != nullptr) {
if (!process_token(current_token, tokens_array, tokens_count, token_array_size, token_array_capacity)) {
break;
}
current_token = std::strtok(nullptr, separators);
}
// print the report only if all tokens were processed
if (current_token == nullptr) {
for (std::size_t i = 0; i < token_array_size; ++i) {
std::cout << tokens_array[i] << " : " << tokens_count[i] << std::endl;
}
}
std::free(tokens_array);
std::free(tokens_count);
}
godbolt.org

okay what if i want to store any token once, in an array and then replace it with new word while deleting duplicates in character array
It is also possible solution. But in general case, it is also necessary to allocate the memory dynamically for the current token. Because the lengths of tokens are also not known at the beginning:
void replace_chars(char* str, const char* chars_to_replace) {
while (str && *str != '\0') {
str = std::strpbrk(str, chars_to_replace);
if (str == nullptr) {
break;
}
const std::size_t number_of_delimiters = std::strspn(str, chars_to_replace);
for (std::size_t i = 0; i < number_of_delimiters; ++i) {
str[i] = '\0';
}
str += number_of_delimiters;
}
}
bool keep_token(char*& token_storage, const char* new_token) {
if (new_token == nullptr) {
return false;
}
const std::size_t current_token_len = token_storage ? std::strlen(token_storage) : 0;
const std::size_t requried_token_len = std::strlen(new_token);
if (token_storage == nullptr || current_token_len < requried_token_len) {
token_storage =
static_cast<char*>(std::realloc(token_storage, (requried_token_len + 1) * sizeof(char)));
if (token_storage == nullptr) {
return false;
}
}
std::strcpy(token_storage, new_token);
return true;
}
std::size_t count_tokens_and_replace(char* str, std::size_t str_len, const char* token) {
std::size_t number_of_tokens = 0;
std::size_t i = 0;
while (i < str_len) {
while (str[i] == '\0') ++i;
if (std::strcmp(str + i, token) == 0) {
replace_chars(str + i, token);
++number_of_tokens;
}
i += std::strlen(str + i);
}
return number_of_tokens;
}
int main() {
char string[] = "is the is and the is and the and is and only that is";
char separators[] = ",.\n\t ";
const std::size_t string_len = std::strlen(string);
replace_chars(string, separators);
std::size_t i = 0;
char* token = nullptr;
while (true) {
while (i < string_len && string[i] == '\0') ++i;
if (i == string_len || !keep_token(token, string + i)) break;
std::cout << token << " : " << count_tokens_and_replace(string + i, string_len - i, token) << std::endl;
}
std::free(token);
}
godbolt.org
But if it is known that the token length cannot be greater than N, it is possible to use the static array of chars to keep the current token. And it will allow to remove dynamic memory allocation from the code.

pattern matching (codejam round 1A previous year) solution not working

I am trying previous year's codejam question of round 1A
link to question
i have submitted this code(start reading from main method, for ease)-
#include <bits/stdc++.h>
using namespace std;
#define range(t) for (int i = 0; i < t; i++)
#define rangeG(i, t) for (i = 0; i < t; i++)
#define printVec(vec) \
for (auto c : vec) \
{ \
cout << c << endl; \
}
vector<string> separate(string s)
{
vector<string> result;
range(s.size())
{
if (s[i] == '*')
{
string temp = s.substr(0, i + 1);
if (temp.size() > 1)
{
result.push_back(temp);
}
s = s.substr(i, s.size());
i = 0;
}
else if (i == (s.size() - 1))
{
string temp = s.substr(0, i + 1);
result.push_back(temp);
s = s.substr(i, s.size());
}
}
return result;
}
void removeAsterisk(string &s)
{
s.erase(remove(s.begin(), s.end(), '*'), s.end());
}
bool setStart(string s, string &start)
{
bool possible = 1;
removeAsterisk(s);
range(min(s.size(), start.size()))
{
if (s[i] != start[i])
{
possible = 0;
}
}
if (possible)
{
if (s.size() >= start.size())
{
start = s;
}
}
return possible;
}
bool setEnd(string s, string &end)
{
bool possible = 1;
removeAsterisk(s);
range(min(s.size(), end.size()))
{
if (s[s.size() - 1 - i] != end[end.size() - 1 - i])
{
possible = 0;
}
}
if (possible)
{
if (s.size() >= end.size())
{
end = s;
}
}
return possible;
}
void solve()
{
int n;
cin >> n;
vector<string> allS;
bool possible = 1;
string start = "";
string end = "";
string middle = "";
string result = "";
while (n--)
{
string str;
cin >> str;
if (count(str.begin(), str.end(), '*') == 0)
{
result = str;
}
vector<string> temp = separate(str);
for (string s : temp)
{
if (s[0] != '*')
{
possible = setStart(s, start);
}
if (s[s.size() - 1] != '*')
{
possible = setEnd(s, end);
}
if (possible && count(s.begin(), s.end(), '*') == 0)
{
result = s;
break;
}
if (s[0] == '*' && s[s.size() - 1] == '*')
{
removeAsterisk(s);
middle += s;
}
}
}
if (possible)
{
if (result.size() == 0)
{
result = start + middle + end;
}
cout << result << "\n";
}
else
{
cout << "*\n";
}
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
int t = 0;
cin >> t;
range(t)
{
cout << "Case #" << i + 1 << ": ";
solve();
}
return 0;
}
it seems correct to me and i have tested many times for many examples, but it is losing in test set-1(exactly one * (asterisk) character and and always the first character of string). Can anyone tell what's wrong?
you can consider code of first ranked here (it has all solutions,check only for "pattern matching" task) for help. I know that the wrong answer is an edge case and if it passes test set 1 then it will pass others.

How do I find the size of a char array?

How should I go about finding the length of a char array in C++? I've tried two methods already, but they both have resulted in the wrong number of characters in the array. I've used strlen and the sizeof operator so far, to no avail.
void countOccurences(char *str, string word)
{
char *p;
string t = "true";
string f = "false";
vector<string> a;
p = strtok(str, " ");
while (p != NULL)
{
a.push_back(p);
p = strtok(NULL, " ");
}
int c = 0;
for (int i = 0; i < a.size(); i++)
{
if (word == a[i])
{
c++;
}
}
int length = sizeof(str); //This is where I'm having the problem
string result;
cout << length << "\n";
if (length % 2 != 0)
{
if (c % 2 == 0)
{
result = "False";
}
else
{
result = "True";
}
}
else
{
if (c % 2 == 0)
{
result = "True";
}
else
{
result = "False";
}
}
if (strlen(str) != 0)
{
cout << result;
}
}
int boolean()
{
char str[1000];
cin.getline(str, sizeof(str));
string word = "not";
countOccurences(str, word);
return 0;
}

sizeof(str) is wrong. It gives you the size of a pointer (str is a pointer), which is a fixed number, normally either 4 or 8 depending at your platform.
std::strlen(str) is correct, but strtok inserts a bunch of \0 into your array before you try to obtain the size. strlen will stop at the first \0, and give you the number of characters preceeding it.
Call strlen before strtok and save its return value to a variable.

Here you can find a modern c++ solution:
#include <iostream>
#include <string_view>
#include <string>
#include <type_traits>
template<typename String>
inline std::size_t StrLength(String&& str)
{
using PureString = std::remove_reference_t<std::remove_const_t<String>>;
if constexpr(std::is_same_v<char, PureString>){
return 1;
}
else
if constexpr(std::is_same_v<char*, PureString>){
return strlen(str);
}
else{
return str.length();
}
}
template<
typename String,
typename Lambda,
typename Delim = char
>
void ForEachWord(String&& str, Lambda&& lambda, Delim&& delim = ' ')
{
using PureStr = std::remove_reference_t<std::remove_reference_t<String>>;
using View = std::basic_string_view<typename PureStr::value_type>;
auto start = 0;
auto view = View(str);
while(true)
{
auto wordEndPos = view.find_first_of(delim, start);
auto word = view.substr(start, wordEndPos-start);
if (word.length() > 0){
lambda(word);
}
if (wordEndPos == PureStr::npos)
{
break;
}
start = wordEndPos + StrLength(delim);
}
}
int main() {
std::string text = "This is not a good sentence.";
auto cnt = 0;
ForEachWord(
text,
[&](auto word)
{
//some code for every word... like counting or printing
if (word == "not" ){
++cnt;
}
},
' '
);
std::cout << cnt << "\n";
}

The "end of a string" is the char '\0' check for that character to stop the search.

Delete first and last 'X' character from character array

I'm trying to delete first 'w' and last 'w' from a string.
I deleted the first 'w', but couldn't delete the last one, and here is my code:
char str1[80], *pstr1, *pstr2;
cout << "Enter a String:\n";
gets_s(str1);
pstr1 = str1;
pstr2 = new char[strlen(str1)];
int n = strlen(str1) + 1, k = 0, i = 0;
bool s = true;
while (k < n+1)
{
if (strncmp((pstr1 + k), "w", 1) != 0)
{
*(pstr2 + i) = *(pstr1 + k);
i++;
k++;
}
else if(s == true)
{
k++;
s = false;
}
else
{
*(pstr2 + i) = *(pstr1 + k);
i++;
k++;
}
}

Make your life easy and use std::string with find_first_of, find_last_of and erase.
#include <string>
void erase_first_of(std::string& s, char c)
{
auto pos = s.find_first_of(c);
if (pos != std::string::npos)
{
s.erase(pos, 1);
}
}
void erase_last_of(std::string& s, char c)
{
auto pos = s.find_last_of(c);
if (pos != std::string::npos)
{
s.erase(pos, 1);
}
}
#include <iostream>
int main()
{
std::string s = "hellow, worldw\n";
erase_first_of(s, 'w');
erase_last_of(s, 'w');
std::cout << s;
}

use std::regex to find contents of a function

so let's say I have a main function with some arbitrary code:
void main(){
//Some random code
int a = 5;
int b = a + 7;
}
and the text of this function is stored inside an std::string:
std::string mystring("void main(){ //Some random code int a = 5; int b = a + 7;}");
I want to use std::regex in order to extract out the body of the function. So the result I would be getting back is:
"//Some random code int a= 5; int b = a + 7;"
My issue is I do not know how to format the regular expression to get what I want. Here is my code I have right now:
std::string text("void main(){ //Some random code int a = 5; int b = a + 7;}");
std::regex expr ("void main()\\{(.*?)\\}");
std::smatch matches;
if (std::regex_match(text, matches, expr)) {
for (int i = 1; i < matches.size(); i++) {
std::string match (matches[i].first, matches[i].second);
std::cout << "matches[" << i << "] = " << match << std::endl;
}
}
My regex is completely off and returns no matches. What do I need to make my regex in order for this to work?

As discussed in the comments OP only wants to "extract the text inside the function body, regardless of what that text is".
#OP:
Your regex is wrong as you don't escape the parenthesis for main().
Changing the regex to "void main\\(\\)\\{(.*?)\\}" will work.
I also recommend to use size_t for i in your for-loop so you don't compare signed with unsigned (std::smatch::size() returns size_t).
#include <iostream>
#include <regex>
int main()
{
std::string text("void main(){ //Some random code int a = 5; int b = a + 7;}");
std::regex expr("void main\\(\\)\\{(.*?)\\}");
std::smatch matches;
if (std::regex_match(text, matches, expr)) {
for (size_t i = 1; i < matches.size(); i++) {
std::string match(matches[i].first, matches[i].second);
std::cout << "matches[" << i << "] = " << match << std::endl;
}
}
}
Output:
matches[1] = //Some random code int a = 5; int b = a + 7;
This solution fails for the input "void main(){ while(true){ //Some random code int a = 5; int b = a + 7; } }"
The easiest solution to this would be to change the regex to "^void main\\(\\)\\{(.*?)\\}$" but that requires the input to start with "void main(){" and end with "}"
As proposed by Revolver_Ocelot you can also add some whitespace matching into the regex to make it a little bit more flexible.

As suggested in your use case it would probably be the best to just rely on string search and matching of braces.
#include <iostream>
#include <regex>
std::string getBody(const std::string& functionDef, const std::string& text)
{
size_t pos = 0;
do
{
if ((pos = text.find(functionDef, pos)) == std::string::npos)
continue;
pos += functionDef.length();
size_t firstSemicolon = text.find(";", pos);
size_t firstOpen = text.find("{", pos);
size_t firstClose = text.find("}", pos);
if (firstSemicolon != std::string::npos && firstSemicolon < firstOpen) //Only function declaration
continue;
if (firstOpen == std::string::npos || firstClose == std::string::npos || firstClose < firstOpen) //Mismatch
continue;
size_t bodyStart = pos = firstOpen + 1;
size_t bracesCount = 1;
do
{
firstOpen = text.find("{", pos);
firstClose = text.find("}", pos);
if (firstOpen == std::string::npos && firstClose == std::string::npos)//Mismatch
{
pos = std::string::npos;
continue;
}
//npos is always larger
if (firstOpen < firstClose)
{
bracesCount++;
pos = firstOpen + 1;
}
else if (firstOpen > firstClose)
{
bracesCount--;
if (bracesCount == 0)
{
size_t bodySize = firstClose - bodyStart;
return text.substr(bodyStart, bodySize);
}
pos = firstClose + 1;
}
else
{
//Something went terribly wrong...
pos = std::string::npos;
continue;
}
} while (pos != std::string::npos);
}
while (pos != std::string::npos);
return std::string();
}
int main()
{
std::string text("void main(); int test(); void main(){ while(true){ //Some {random} code int a = 5; int b = a + 7; } } int test(){ return hello; } ");
std::cout << getBody("void main()", text) << std::endl;
std::cout << getBody("int test()", text) << std::endl;
}
Output:
while(true){ //Some {random} code int a = 5; int b = a + 7; }
return hello;
The code can also handle newlines and skips function declarations. I tried to write it as clear as possible.
If there are still questions feel free to ask.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Splitting a string based on multiple string separators in c++ - c++

Related

C++ program to count repeated words in a cstring

pattern matching (codejam round 1A previous year) solution not working

How do I find the size of a char array?

Delete first and last 'X' character from character array

use std::regex to find contents of a function

Categories

Resources