C++ split string with space and punctuation chars - c++

I wanna split an string using C++ which contains spaces and punctuations.
e.g. str = "This is a dog; A very good one."
I wanna get "This" "is" "a" "dog" "A" "very" "good" "one" 1 by 1.
It's quite easy with only one delimiter using getline but I don't know all the delimiters. It can be any punctuation chars.
Note: I don't wanna use Boost!

Use std::find_if() with a lambda to find the delimiter.
auto it = std::find_if(str.begin(), str.end(), [] (const char element) -> bool {
return std::isspace(element) || std::ispunct(element);})

So, starting at the first position, you find the first valid token. You can use
index = str.find_first_not_of (yourDelimiters);
Then you have to find the first delimiter after this, so you can do
delimIndex = str.substr (index).find_first_of (yourDelimiters);
your first word will then be
// since delimIndex will essentially be the length of the word
word = str.substr (index, delimIndex);
Then you truncate your string and repeat. You have to, of course, handle all of the cases where find_first_not_of and find_first_of return npos, which means that character was/was not found, but I think that's enough to get started.
Btw, I'm not claiming that this is the best method, but it works...

vmpstr's solution works, but could be a bit tedious.
Some months ago, I wrote a C library that does what you want.
http://wiki.gosub100.com/doku.php?id=librerias:c:cadenas
Documentation has been written in Spanish (sorry).
It doesn't need external dependencies. Try with splitWithChar() function.
Example of use:
#include "string_functions.h"
int main(void){
char yourString[]= "This is a dog; A very good one.";
char* elementsArray[8];
int nElements;
int i;
/*------------------------------------------------------------*/
printf("Character split test:\n");
printf("Base String: %s\n",yourString);
nElements = splitWithChar(yourString, ' ', elementsArray);
printf("Found %d element.\n", nElements);
for (i=0;i<nElements;i++){
printf ("Element %d: %s\n", i, elementsArray[i]);
}
return 0;
}
The original string "yourString" is modified after use spliWithChar(), so be carefull.
Good luck :)

CPP, unlike JAVA doesn't provide an elegant way to split the string by a delimiter. You can use boost library for the same but if you want to avoid it, a manual logic would suffice.
vector<string> split(string s) {
vector<string> words;
string word = "";
for(char x: s) {
if(x == ' ' or x == ',' or x == '?' or x == ';' or x == '!'
or x == '.') {
if(word.length() > 0) {
words.push_back(word);
word = "";
}
}
else
word.push_back(x);
}
if(word.length() > 0) {
words.push_back(word);
}
return words;

Related

How to find certain substring in string and then go back to certain character?

I save messages in string and I need to make filter function that finds user specified word in those messages. I've split each message by '\n' so the example of one chat would be:
user1:Hey, man\nuser2:Hey\nuser1:What's up?\nuser2:Nothing, wbu?\n etc.
Now user could ask to search for word up and I've implemented a search like this:
for (auto it = msg.cbegin(); (it = std::find(it, msg.cend(), str)) != msg.cend(); it++)
and I could put that string into stringstream and use getline to \n, but how do I go backwards to previous \n so I can get full message? Also, what about first message, cause it doesn't start with \n?
Since you said you split the strings, I image you have a vector of strings where you want to find up for example. You would do something like this
for (const auto& my_string: vector_of_strings){
if (my_string.find("up") != string::npos) {
// message containing up is my_string
}
}
In case you haven't split the strings in a vector you can use this func inspired by this:
vector<string> split(const string& s, const string& delimiter){
vector<string> ret;
size_t last = 0;
size_t next = 0;
while ((next = s.find(delimiter, last)) != string::npos) {
ret.emplace_back(s.substr (last, next - last));
last = next + 1;
}
ret.emplace_back(s.substr(last));
return ret;
}
If this function doesn't work you can always take a look at How do I iterate over the words of a string?

Split a string in C++ after a space, if more than 1 space leave it in the string

I need to split a string by single spaces and store it into an array of strings. I can achieve this using the fonction boost:split, but what I am not being able to achieve is this:
If there is more than one space, I want to integrate the space in the vector
For example:
(underscore denotes space)
This_is_a_string. gets split into: A[0]=This A[1]=is A[2]=a A[3]=string.
This__is_a_string. gets split into: A[0]=This A[1] =_is A[2]=a A[4]=string.
How can I implement this?
Thanks
For this, you can use a combination of the find and substr functions for string parsing.
Suppose there was just a single space everywhere, then the code would be:
while (str.find(" ") != string::npos)
{
string temp = str.substr(0,str.find(" "));
ans.push_back(temp);
str = str.substr(str.find(" ")+1);
}
The additional request you have raised suggests that we call the find function after we are sure that it is not looking at leading spaces. For this, we can iterate over the leading spaces to count how many there are, and then call the find function to search from thereon. To use the find function from say after x positions (because there are x leading spaces), the call would be str.find(" ",x).
You should also take care of corner cases such as when the entire string is composed of spaces at any point. In that case the while condition in the current form will not terminate. Add the x parameter there as well.
This is by no means the most elegant solution, but it will get the job done:
void bizarre_string_split(const std::string& input,
std::vector<std::string>& output)
{
std::size_t begin_break = 0;
std::size_t end_break = 0;
// count how many spaces we need to add onto the start of the next substring
std::size_t append = 0;
while (end_break != std::string::npos)
{
std::string temp;
end_break = input.find(' ', begin_break);
temp = input.substr(begin_break, end_break - begin_break);
// if the string is empty it is because end_break == begin_break
// this happens because the first char of the substring is whitespace
if (!temp.empty())
{
std::string temp2;
while (append)
{
temp2 += ' ';
--append;
}
temp2 += temp;
output.push_back(temp2);
}
else
{
++append;
}
begin_break = end_break + 1;
}
}

Implementing a find-and-replace procedure in C++

Just for fun, I'm trying to write a find-and-replace procedure like word processors have. I was wondering whether someone could help me figure out what I'm doing wrong (I'm getting a Timeout error) and could help me write a more elegant procedure.
#include <iostream>
#include <string>
void find_and_replace(std::string& text, const std::string& fword, const std::string& rword)
{
for (std::string::iterator it(text.begin()), offend(text.end()); it != offend;)
{
if (*it != ' ')
{
std::string::iterator wordstart(it);
std::string thisword;
while (*(it+1) != ' ' && (it+1) != offend)
thisword.push_back(*++it);
if (thisword == fword)
text.replace(wordstart, it, rword);
}
else {
++it;
}
}
}
int main()
{
std::string S("Yo, dawg, I heard you like ...");
std::string f("dawg");
std::string w("dog");
// Replace every instance of the word "dawg" with "dog":
find_and_replace(S, f, w);
std::cout << S;
return 0;
}
A find-and-replace like most editors have would involve regular
expressions. If all you're looking for is for literal
replacements, the function you need is std::search, to find
the text to be replaced, and std::string::replace, to do the
actual replacement. The only real issue you'll face:
std::string::replace can invalidate your iterators. You could
always start the search at the beginning of the string, but this
could lead to endless looping, if the replacement text contained
the search string (e.g. something like s/vector/std::vector/).
You should convert the the iterator returned from std::search
to an offset into the string before doing the replace (offset
= iter - str.begin()), and convert it back to an iterator after
(iter = str.begin() + offset + replacement.size()). (The
addition of replacement.size() is to avoid rescanning the text
you just inserted, which can lead to an infinite loop, for the
same reasons as presented above.)
using text.replace may invalidate any iterators into text (ie, both it and offend): this isn't safe
copying each character into a temporary string (which is created and destroyed every time you start a new word) is wasteful
The simplest thing that could possibly work is to:
use find to find the first matching substring: it returns a position which won't be invalidated when you replace substrings
check whether:
your substring is either at the start of the text, or preceded by a word separator
your substring is either at the end of the text, or succeeded by a word separator
if 2.1 and 2.2 are true, replace the substring
if you replaced it, increase position (from 1) by the length of your replacement string
otherwise increase position by the length of the string you searched for
repeat from 1, this time starting your find from position (from 4/5)
end when step 1 returns position std::string::npos.
1) you don't push first symbol of found word into "thisword" variable.
2) You use only space symbol ' ' as separator, and what about comma ','. Your program will find word "dawg," not "dawd"
The following code works, but you should think about other word separators. Do you really need to replace only whole word, or just sequence of symbols?
#include <iostream>
#include <string>
void find_and_replace(std::string& text, const std::string& fword, const std::string& rword)
{
for (std::string::iterator it(text.begin()), offend(text.end()); it != offend;)
{
if (*it != ' ' && *it != ',')
{
std::string::iterator wordstart(it);
std::string thisword;
while ((it) != offend && *(it) != ' ' && *(it) != ',')
thisword.push_back(*it++);
if (thisword == fword)
text.replace(wordstart, it, rword);
}
else {
++it;
}
}
}
int main()
{
std::string S("Yo, dawg, I heard you like ...");
std::string f("dawg");
std::string w("dog");
// Replace every instance of the word "dawg" with "dog":
find_and_replace(S, f, w);
std::cout << S;
return 0;
}

C++ How to get string/char in between 2 words

i got a word that is
AD#Andorra
Got a few questions:
How do i check
AD?Andorra exist
? is a wildcard, it could be comma or hex or dollar sign or other value
then after confirm AD?Andorra exist, how do i get the value of ?
Thanks,
Chen
The problem can be solved generally with a regular expression match. However, for the specific problem you presented, this would work:
std::string input = getinput();
char at2 = input[2];
input[2] = '#';
if (input == "AD#Andorra") {
// match, and char of interest is in at2;
} else {
// doesn't match
}
If the ? is supposed to represent a string also, then you can do something like this:
bool find_inbetween (std::string input,
std::string &output,
const std::string front = "AD",
const std::string back = "Andorra") {
if ((input.size() < front.size() + back.size())
|| (input.compare(0, front.size(), front) != 0)
|| (input.compare(input.size()-back.size(), back.size(), back) != 0)) {
return false;
}
output = input.substr(front.size(), input.size()-front.size()-back.size());
return true;
}
If you are on C++11/use Boost (which I strongly recommend!) use regular expressions. Once you gain some level of understanding all text processing becomes easy-peasy!
#include <regex> // or #include <boost/regex>
//! \return A separating character or 0, if str does not match the pattern
char getSeparator(const char* str)
{
using namespace std; // change to "boost" if not on C++11
static const regex re("^AD(.)Andorra$");
cmatch match;
if (regex_match(str, match, re))
{
return *(match[1].first);
}
return 0;
}
assuming your character always starts at position 3!
use the string functions substr:
your_string.substr(your_string,2,1)
If you are using C++11, i recommend you to use regex instead of direct searching in your string.

Parsing a string by a delimeter in C++

Ok, so I need some info parsed and I would like to know what would be the best way to do it.
Ok so here is the string that I need to parse. The delimeter is the "^"
John Doe^Male^20
I need to parse the string into name, gender, and age variables. What would be the best way to do it in C++? I was thinking about looping and set the condition to while(!string.empty()
and then assign all characters up until the '^' to a string, and then erase what I have already assigned. Is there a better way of doing this?
You can use getline in C++ stream.
istream& getline(istream& is,string& str,char delimiter=ā€™\nā€™)
change delimiter to '^'
You have a few options. One good option you have, if you can use boost, is the split algorithm they provide in their string library. You can check out this so question to see the boost answer in action: How to split a string in c
If you cannot use boost, you can use string::find to get the index of a character:
string str = "John Doe^Male^20";
int last = 0;
int cPos = -1;
while ((cPos = str.find('^', cPos + 1)) != string::npos)
{
string sub = str.substr(last, cPos - last);
// Do something with the string
last = cPos + 1;
}
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "This is a sample string";
char * pch;
printf ("Looking for the 's' character in \"%s\"...\n",str);
pch=strchr(str,'s');
while (pch!=NULL)
{
printf ("found at %d\n",pch-str+1);
pch=strchr(pch+1,'s');
}
return 0;
}
Do something like this in an array.
You have a number of choices but I would use strtok(), myself. It would make short work of this.