How to split a string? [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Is There A Built-In Way to Split Strings In C++?
i have one string variable which contains:
string var = "this_is.that_alos" how i can separate only "that_alos" from it.
i.e. i need only second part after "that_alos"

std::string var = "this_is.that_alos";
std::string part = var.substr(var.find_first_of('.'));
If you want to exclude the '.', just add 1:
std::string part = var.substr(var.find_first_of('.') + 1);

In C, the usual function is strchr(3):
char * second_half = strchr(var, '.');
Of course, this is just a new pointer to the middle of the string. If you start writing into this string, it changes the one and only copy of it, which is also pointed to by char *var. (Again, assuming C.)

std::string var = "this_is.that_alos";
std::string::iterator start = std::find(var.begin(), var.end(), '.')
std::string second(start + 1, var.end());
But you should check if start + 1 is not var.end(), if var don't contain any dot.

If you don't want to use string indexing or iterators, you can use std::getline
std::string var = "this_is.that_alos";
std::istringstream iss(var);
std::string first, second;
std::getline(iss, first, '.');
std::getline(iss, second, '.');

The easiest solution would be to use boost::regex (soon to be
std::regex). It's not clear what the exact requirements are,
but they should be easy to specify using a regular expression.
Otherwise (and using regular expressions could be considered
overkill for this), I'd use the usual algorithms, which work for
any sequence. In this case, std::find will find the first '.',
use it with reverse iterators to find the last, etc.
--
James Kanze

Use strtok for splitting the String, with the Special character as "."
for ex
string var = "this_is.that_alos";
string Parts[5] = strtok(&var[0],".");
string SubPart = Parts[1];
Hope it will help U.

Related

Quick regex_search/replace, or clear indication of replacement?

I must browse a collection of strings to replace a pattern and save the changes.
The saving operation is (very) expensive and out of my hands, so I would like to know beforehand if the replacement did anything.
I can use std::regex_search to gain knowledge on the pattern's presence in my input, and use capture groups to store details in a std::smatch. std::regex_replace does not seem to explicitely tell me wether it did anything.
The patterns and strings are arbitrarily long and complicated; running regex_replace after a regex_search seems wasteful.
I can directly compare the input and output to search for a discrepancy but that too is uncomfortable.
Is there either a simple way to observe regex_replace to determine its impact, or to use a smatch filled by the regex_search to do a faster replacement operation ?
Thanks in advance.
No regex_replace doesn't provide this info and yes you can do it with a regex_search loop.
For example like this:
std::regex pattern("...");
std::string replacement_format = "...";
std::string input = "......"; // a very, very long string
std::string output, replacement;
std::smatch match;
auto begin = input.cbegin();
int replacements = 0;
while (std::regex_search(begin, input.cend(), match, pattern)) {
output += match.prefix();
replacement = match.format(replacement_format);
if (match[0] != replacement) {
replacements++;
}
output += replacement;
begin = match.suffix().first;
}
output.append(begin, input.cend());
if (replacements > 0) {
// process output ...
}
Live demo
As regex_replace creates a copy of your string you could simply compare the replaced string with the original one and only "store" the new one if they differ.
For C++14 it seems that regex_replace returns a pointer to the last place it has written to:
https://www.cplusplus.com/reference/regex/regex_replace/ Versions 5
and 6 return an iterator that points to the element past the last
character written to the sequence pointed by out.

Allow user to pass a separator character by doubling it in C++

I have a C++ function that accepts strings in below format:
<WORD>: [VALUE]; <ANOTHER WORD>: [VALUE]; ...
This is the function:
std::wstring ExtractSubStringFromString(const std::wstring String, const std::wstring SubString) {
std::wstring S = std::wstring(String), SS = std::wstring(SubString), NS;
size_t ColonCount = NULL, SeparatorCount = NULL; WCHAR Separator = L';';
ColonCount = std::count(S.begin(), S.end(), L':');
SeparatorCount = std::count(S.begin(), S.end(), Separator);
if ((SS.find(Separator) != std::wstring::npos) || (SeparatorCount > ColonCount))
{
// SEPARATOR NEED TO BE ESCAPED, BUT DON'T KNOW TO DO THIS.
}
if (S.find(SS) != std::wstring::npos)
{
NS = S.substr(S.find(SS) + SS.length() + 1);
if (NS.find(Separator) != std::wstring::npos) { NS = NS.substr(NULL, NS.find(Separator)); }
if (NS[NS.length() - 1] == L']') { NS.pop_back(); }
return NS;
}
return L"";
}
Above function correctly outputs MANGO if I use it like:
ExtractSubStringFromString(L"[VALUE: MANGO; DATA: NOTHING]", L"VALUE")
However, if I have two escape separators in following string, I tried doubling like ;;, but I am still getting MANGO instead ;MANGO;:
ExtractSubStringFromString(L"[VALUE: ;;MANGO;;; DATA: NOTHING]", L"VALUE")
Here, value assigner is colon and separator is semicolon. I want to allow users to pass colons and semicolons to my function by doubling extra ones. Just like we escape double quotes, single quotes and many others in many scripting languages and programming languages, also in parameters in many commands of programs.
I thought hard but couldn't even think a way to do it. Can anyone please help me on this situation?
Thanks in advance.
You should search in the string for ;; and replace it with either a temporary filler char or string which can later be referenced and replaced with the value.
So basically:
1) Search through the string and replace all instances of ;; with \tempFill- It would be best to pick a combination of characters that would be highly unlikely to be in the original string.
2) Parse the string
3) Replace all instances of \tempFill with ;
Note: It would be wise to run an assert on your string to ensure that your \tempFill (or whatever you choose as the filler) is not in the original string to prevent an bug/fault/error. You could use a character such as a \n and make sure there are non in the original string.
Disclaimer:
I can almost guarantee there are cleaner and more efficient ways to do this but this is the simplest way to do it.
First as the substring does not need to be splitted I assume that it does not need to b pre-processed to filter escaped separators.
Then on the main string, the simplest way IMHO is to filter the escaped separators when you search them in the string. Pseudo code (assuming the enclosing [] have been removed):
last_index = begin_of_string
index_of_current_substring = begin_of_string
loop: search a separator starting at last index - if not found exit loop
ok: found one at ix
if char at ix+1 is a separator (meaning with have an escaped separator
remove character at ix from string by copying all characters after it one step to the left
last_index = ix+1
continue loop
else this is a true separator
search a column in [ index_of_current_substring, ix [
if not found: error incorrect string
say found at c
compare key_string with string[index_of_current_substring, c [
if equal - ok we found the key
value is string[ c+2 (skip a space after the colum), ix [
return value - search is finished
else - it is not our key, just continue searching
index_of_current_substring = ix+1
last_index = index_of_current_substring
continue loop
It should now be easy to convert that to C++

Remove spaces from string before period and comma

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

C++ Strings and delimiters [duplicate]

This question already has answers here:
How do I iterate over the words of a string?
(84 answers)
Closed 7 years ago.
So i'm doing a program where i input some data into a file and read from it. The problem is i dont know how to handle this. I read from the file and recieve a string containing alot of different data that is seperated by a delimiter "|".
string data ="FirstName|LastName|Signature|Height";
So my question is, is there a way to seperate this string nicely and store each of the values in a seperate variable?
p.s I have tried this so far. I did find this function subrt() and find() which i could use in order to find the delimiter and take out a value but it doesnt give the correct value for me so i think i'm doing something wrong with it. Only the fname value is correct.
const string DELIM = "|";
string fname = data.substr(0, data.find(DELIM));
string lname = data.substr(1, data.find(DELIM));
string signature = data.substr(2, data.find(DELIM));
string height = data.substr(3, data.find(DELIM));
You did not understand how substr() works. The first parameter is not the number of character found, it is the index at which to start. See the doc. You should do the same for the find function. Something like that:
string const DELIM = "|";
auto const firstDelim = data.find(DELIM);
auto const secondDelim = data.find(DELIM, firstDelim + 1); // begin after the first delim
// and so on
auto fname = data.substr(0, firstDelim);
auto lname = data.substr(firstDelim + 1, secondDelim);
// and so on

C++: Splitting a string with multiple delimiters and keep the delimiters in the results?

Is there a good way to split a string (in C or C++) by multiple delimiters while keeping the delimiters as part of the split strings? The only way I've found to do this is using regex and I'd rather not have to pull in another library just to do this? (I'm using STL for strings, not using Boost).
Without regexp, though I'm not sure if it's faster or slower:
vector<string> split(string& stringToSplit)
{
vector<string> result;
size_t pos = 0, lastPos = 0;
while ((pos = stringToSplit.find_first_of(";,|", lastPos)) != string::npos)
{
result.push_back(stringToSplit.substr(lastPos, pos-lastPos+1));
lastPos = pos+1;
}
result.push_back(stringToSplit.substr(lastPos));
return result;
}
You can make use of a lookahead to do that. Split with the expression:
(?=,)
For comma delimiters, and add in (perhaps in a character class: [ ... ]) the other delimiters you want to split.
So, this,is,an,example becomes: this ,is ,an ,example (i.e. the delimiter goes with the term following it)
You'll use a lookbehind otherwise (meaning (?<=,)) to get: this, is, an, example.