c++ splitting string on non alphabetic characters

c++ splitting string on non alphabetic characters - c++

I am reading in a file line by line which I want to split on non alphabetic characters and if possible remove all non alphabetic characters at same time so I wouldn't have to do it latter.
I would like to use isalpha, but cant figure out how to use that with str.find() or similar functions, as those usually take single delimiter as a a string.
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))
{
token = str.substr(0, pos);
//transform(str.begin(),str.end(),str.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
cout<<token<<" already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.\n";
}
ptr=NULL;
str.erase(0, pos);
}
}
My latest attempt which doesn't work... All of examples I could find were based on str.find("single delimiter")
Which is no good to me.
Found a way to use isalpha
template<typename t>
void Tree<t>::readFromFile(string filename)
{
string str;
ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
while(getline(fileToOpen,str))
{
unsigned int pos= 0;
string token;
//transform(str.begin(),str.end(),str.begin(),::tolower);
while (pos = find_if(str.begin(),str.end(),aZCheck)!=str.end()!=string::npos)
{
token = str.substr(0, pos);
transform(token.begin(),token.end(),token.begin(),::tolower);
Node<t>* ptr=search(token,root);
if (ptr!=NULL)
{
ptr->count++;
// cout<<token<<" already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token,root);
cout<<token<<" added to tree.\n";
}
ptr=NULL;
str.erase(0, pos);
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!\n";
}
template<typename t>
inline bool Tree<t>::aZCheck(char c)
{
return !isalpha(c);
}
But issue still persists, string is getting split into single characters instead of words, and is whitespace considered valid by isalpha?

#include <algorithm>
#include <cctype>
...
template<typename t>
void Tree<t>::readFromFile(std::string filename)
{
std::string str;
std::ifstream fileToOpen(filename.c_str());
if (fileToOpen.is_open())
{
for (std::string::iterator pos, prev; std::getline(fileToOpen, str); )
{
for (pos = std::find_if(str.begin(), str.end(), isalpha); pos != str.end();
pos = std::find_if(prev, str.end(), isalpha))
{
prev = std::find_if_not(pos, str.end(), isalpha);
std::string token(pos, prev);
std::transform(token.begin(), token.end(), token.begin(), ::tolower);
Node<t>* ptr = search(token, root);
if (ptr != NULL)
{
ptr->count++;
// cout<< token << " already in tree.Count "<<ptr->count<<"\n";
}
else
{
insert(token, root);
cout << token << " added to tree.\n";
}
}
}
fileToOpen.close();
}
else
cout<<"Unable to open file!\n";
}
Online demo
Also since you say you want to save time, it would benefit you if your insert function does something extra. i.e. insert the value if it is not found in the tree, and set the counter at the position to 1. If the value is in the tree, simply increment the counter. This will save you from doing 2 iterations seeing as your tree might be potentially unbalanced

Try this test case. Two problems.
1 - Pos is 0 when a delimiter is found at the string start after truncation (or start)
This causes it to break out of the while. Use npos as a conditional check instead.
2 - You have to advance the postion past the delimiter when you erase, otherwise
it finds the same one over and over.
int pos= 0;
string token;
string str = "Thisis(asdfasdfasdf)and!this)))";
while ((pos=str.find_first_not_of("abcdefghijklmnopqrstuvwxyzQWERTYUIOPASDFGHJKLZXCVBNM"))!= string::npos )
{
if ( pos != 0 )
{
// Found a token
token = str.substr(0, pos);
cout << "Found: " << token << endl;
}
else
{
// Found another delimiter
// Just move on to next one
}
str.erase(0, pos+1); // Always remove pos+1 to get rid of delimiter
}
// Cover the last (or only) token
if ( str.length() > 0 )
{
token = str;
cout << "Found: " << token << endl;
}
Outputs >>
Found: Thisis
Found: asdfasdfasdf
Found: and
Found: this
Press any key to continue . . .

Related

Remove the words from a string that start with a certain character

I have to create a function in C++ that would remove all the words from a string that start with a certain character inputted by a user. For example, if I were to have a string "She made up her mind to meet up with him in the morning" and a substring "m", I would like my string to be "She up her to up with him in the".
I believe I would need to find the occurrences of "m", erase it and all the characters after it till the space " ". Would that be the right approach and if so what would be the best methods to use in this case?
With your kind help I have altered and added code a little bit. The first function 'GetNextWord' seems to be working alright, however, there is definitely something wrong with my function, which is supposed to strip the words, as I am not getting any output. Here is the code:
string GetNextWord(string& s, size_t pos) {
string word;
char del = ' ';
int i = 0;
for (int i = 0; i < s.length(); i++) {
if (s[i] != del) {
word += s[i];
}
else break;
}
return word;
}
string StripWordsThatBeginWithLetter(string& s, char c) {
string result;
string word;
size_t pos = 0;
while (true)
{
word = GetNextWord(s, pos);
pos += word.size() + 1;
if (word.size() == 0)
{
break;
}
if (word[0] == c) {
size_t inx = 0;
inx = s.find(word[0]);
s.erase(inx, word.length());
}
else result = s;
}
return result;
}

Here's a hint. I'm guessing this is a homework problem. And I'm probably giving too much away.
std::string GetNextWord(const std::string &s, size_t pos)
{
std::string word;
// your code goes here to return a string that includes all the chars starting from s[pos] until the start of the next word (including trailing whitespace)
// return an empty string if at the end of the string
return word;
}
std::string StripWordsThatBeginWithLetter(const std::string& s, char c)
{
std::string result;
std::string word;
size_t pos = 0;
while (true)
{
word = GetNextWord(s, pos);
pos += word.size();
if (word.size() == 0)
{
break;
}
// your code on processing "word" goes here with respect
// to "c" goes here
}
return result;
}

Simple example in french. You are a gentleman and dont want to say "merde" too often, and so decided not to say any word starting with 'm'.
This program will help you :
"je suis beau merde je sais" becomes "je suis beau je sais"
#include <string>
#include <algorithm>
int main(){
std::string str ("je suis beau merde je le sais");
const auto forbiden_start ((const char) 'm');
std::cout << "initial rude string (word starting with \'" << forbiden_start << "\') : " << str << std::endl;
auto i (str.begin ());
auto wait (false);
std::for_each (str.begin (), str.end (), [&i, &forbiden_start, &wait] (const auto& c) {
if (wait) {
if (c == ' ') {
wait = false; return;
}
}
else {
if (c == forbiden_start) {
wait = true;
}
else *i++ = c;
}
});
if (i != str.end ()) str.erase (i, str.end ());
std::cout << "polite string : " << str << std::endl;
return 0;
}
All is not tested (separator is " "), but it is the idea

how to split a sentence into strings using recursion in c++

string strings[10];
void split(string s){
int curr=0,start=0,end=0,i=0;
while(i<=len(s)){
if(s[i]==' ' or i == len(s)){
end = i;
string sub;
sub.append(s,start,end-start);
strings[curr] = sub;
start = end + 1;
curr += 1 ;
}
i++;
}
}
for example if the input is " computer laptop screen desktop mouse " then the output string should be:
computer
laptop
screen
desktop
mouse
I have successfully tried using loops but failed using recursion,
can anyone help me solve split() using recursion.
Thank you

This solution assumes you want only words from the string to enter your array and that you want to split on some predetermined string delimiter like <space>" " or <double-dash>"--".
If you need to keep the void function signature, here is one:
void split_rec(string str_array[], size_t arr_index,
string s, string delimiter) {
if (s == "") {
return;
}
size_t str_index = s.find(delimiter);
string word = s.substr(0, str_index);
if (word != "") {
str_array[arr_index++] = word;
}
// find type functions return string::npos if they don't find.
str_index = s.find_first_not_of(delimiter, str_index);
if (str_index == string::npos) {
return;
}
return split_rec(str_array, arr_index, s.substr(str_index), delimiter);
}
But I would recommend returning the size of the array so you communicate what the function is doing more accurately. Like this:
size_t split_rec(string str_array[], size_t arr_index,
string s, string delimiter) {
if (s == "") {
return arr_index;
}
size_t str_index = s.find(delimiter);
string word = s.substr(0, str_index);
if (word != "") {
str_array[arr_index++] = word;
}
str_index = s.find_first_not_of(delimiter, str_index);
if (str_index == string::npos) {
return arr_index;
}
return split_rec(str_array, arr_index, s.substr(str_index), delimiter);
}
Then the call is like this:
string strings[10];
// I left some extra spaces in this string.
string str = " computer laptop screen desktop mouse ";
size_t strings_len = split_rec(strings, 0, str, " ");
cout << "Array is length " << strings_len << endl;
for (size_t i = 0; i < strings_len; i++) {
cout << strings[i] << endl;
}
Array is length 5
computer
laptop
screen
desktop
mouse

how to point to delimiter at fixed position in std::string

Say I have text say with '#' as a delimiter.
example
std::string key = "012#txt1#txt2#txt3#txt4# #some other text:"
I have to insert modified text between #at position 5 and #at position 6. The one shown above with spaces in between.
To accomplish this I need to find 5th # and 6th #.
I wrote a small code but its not doing what i expect to do.It always return first found '#'. can someone please advice me.
std::string temp = key;
size_t found = 0;
size_t pos_key = temp.find('#');
while( ( found !=5 )&& ( pos_key != std::string::npos ) )
{
found++;
temp.find_first_of('#', pos_key + 1 );
temp.erase(0, pos_key );
}
std::cout << " the pos key is " << pos_key << std::endl ;

There are a couple problems going on. first you never update pos_key so you are stomping all over your string when you call erase which I am not sure why you are doing that. If you need to find the nth symbol you can use a function like:
size_t find_nth(const std::string & line, const std::string & symbol, size_t nth)
{
size_t pos = 0;
size_t counter = 0;
while (counter < nth && (pos = line.find(symbol, pos)) != std::string::npos)
{
counter++; // found a match so increment
pos++; // increment so we search for the next one
}
return pos;
}
And you can see it running in this Live Example

It seems you have two problems.
First you are not remembering the position of the '#' when you find it, you need to assign the return value of the std::string::find_first_of function to pos_key.
Second you keep deleting the contents of the string up to the position you find. That throws off all the position information you got from the std::string::find_first_of function.
I think this might be what you need:
int main()
{
std::string key = "012#txt1#txt2#txt3#txt4# #some other text:";
std::string temp = key;
size_t found = 0;
size_t pos_key = temp.find('#');
while((found != 5) && (pos_key != std::string::npos))
{
found++;
// this line does nothing with the found position
// temp.find_first_of('#', pos_key + 1);
// instead record the position of the latest '#'
pos_key = temp.find_first_of('#', pos_key + 1);
// this line just deletes most of the string
// for no apparent reason
// temp.erase(0, pos_key);
}
std::cout << " the pos key is " << pos_key << std::endl;
}

set<string>: how to list not strings starting with given string and ending with `/`?

for example we have in our set:
bin/obj/Debug/CloudServerPrototype/ra.write.1.tlog
bin/obj/Debug/CloudServerPrototype/rc.write.1.tlog
bin/obj/Debug/vc100.idb
bin/obj/Debug/vc100.pdb
So this is what I tried based on this grate answer:
#include <iostream>
#include <algorithm>
#include <set>
#include <string>
#include <iterator>
using namespace std;
struct get_pertinent_part
{
const std::string given_string;
get_pertinent_part(const std::string& s)
:given_string(s)
{
}
std::string operator()(const std::string& s)
{
std::string::size_type first = 0;
if (s.find(given_string) == 0)
{
first = given_string.length() + 1;
}
std::string::size_type count = std::string::npos;
std::string::size_type pos = s.find_last_of("/");
if (pos != std::string::npos && pos > first)
{
count = pos + 1 - first;
}
return s.substr(first, count);
}
};
void directory_listning_without_directories_demo()
{
set<string> output;
set<string> demo_set;
demo_set.insert("file1");
demo_set.insert("file2");
demo_set.insert("folder/file1");
demo_set.insert("folder/file2");
demo_set.insert("folder/folder/file1");
demo_set.insert("folder/folder/file2");
demo_set.insert("bin/obj/Debug/CloudServerPrototype/ra.write.1.tlog");
demo_set.insert("bin/obj/Debug/CloudServerPrototype/rc.write.1.tlog");
demo_set.insert("bin/obj/Debug/vc100.idb");
demo_set.insert("bin/obj/Debug/vc100.pdb");
std::transform(demo_set.begin(),
demo_set.end(),
std::inserter(output, output.end()),
get_pertinent_part("bin/obj/Debug/"));
std::copy(output.begin(),
output.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
int main()
{
directory_listning_without_directories_demo();
cin.get();
return 0;
}
This outputs:
CloudServerPrototype/
file1
file2
folder/
folder/folder/
vc100.idb
vc100.pdb
and we are given with bin/obj/Debug/string. We want to cout:
vc100.idb
vc100.pdb
CloudServerPrototype/
How to do such thing?

Quick example of what you want to do.
String.find(): http://www.cplusplus.com/reference/string/string/find/
String.subStr(): http://www.cplusplus.com/reference/string/string/substr/
string str = "bin/obj/Debug/vc100.pdb";
string checkString ("bin/obj/Debug");
// Check if string starts with the check string
if (str.find(checkString) == 0){
// Check if last letter if a "/"
if(str.substr(str.length()-1,1) == "/"){
// Output strating at the end of the check string and for
// the differnce in the strings.
cout << str.substr(checkString.length(), (str.length() - checkString.length()) ) << endl;
}
}

It's not clear with which part of the problem you are stuck, so here is a starter for you.
To get the parts of the strings between "given string" and the final '/' (where present):
std::string get_pertinent_part(const std::string& s)
{
std::string::size_type first = 0;
if (s.find(given_string) == 0)
{
first = given_string.length() + 1;
}
std::string::size_type count = std::string::npos;
std::string::size_type pos = s.find_last_of("/");
if (pos != std::string::npos && pos > first)
{
count = pos + 1 - first;
}
return s.substr(first, count);
}
To insert these parts into a new set (output) to guarantee uniqueness you can use the following:
std::transform(your_set.begin(),
your_set.end(),
std::inserter(output, output.end()),
get_pertinent_part);
You may wish to pass given_string into get_pertinent_part(), in which case you'll need to convert it to a functor:
struct get_pertinent_part
{
const std::string given_string;
get_pertinent_part(const std::string& s)
:given_string(s)
{
}
std::string operator()(const std::string& s)
{
std::string::size_type first = 0;
//
// ...same code as before...
//
return s.substr(first, count);
}
};
You can then call it this way:
std::transform(your_set.begin(),
your_set.end(),
std::inserter(output, output.end()),
get_pertinent_part("bin/obj/Debug"));
To output the new set:
std::copy(output.begin(),
output.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
Sorting the results is left as an exercise.

The easiest way I can think of, using the standard C functions, would be:
char * string1 = "bin/obj/Debug"
char * string2 = "bin/obj/Debug/CloudServerPrototype/rc.write.1.tlog"
char result[64];
// the above code is just to bring the strings into this example
char * position = strstr(string1, string2);
int substringLength;
if(position != NULL){
position += strlen(string2);
substringLength = strchr(position, '/') - position;
strncpy(result, position, substringLength);
}else{
strcpy(result, string1); // this case is for when your first string is not found
}
cout << result;
The first thing that occurs, is finding the substring, string1, in the string we are analyzing, being string2. Once we found the starting point, and assuming it was there at all, we add the length of that substring to that starting point using pointer arithmatic, and then find the resulting string's length by subtracting the starting position from the ending position, which is found with strchr(position, '/'). Then we simply copy that substring into a buffer and it's there to print with cout.
I am sure there is a fancy way of doing this with std::string, but I'll leave that to anyone who can better explain c++ strings, I never did manage to get comfortable with them, haha

c++: I am trying to reverse the order of words in string (not the whole string)

#include <iostream>
#include <vector>
using namespace std;
void RevStr (char *str)
{
if(*str !=0)
{
vector<char> v1;
while((*str != ' ')&&(*str !=0))
v1.push_back(*str++);
// trying to not add space in the last word of string
if(*str !=0)
{
v1.push_back(' ');
str++;
}
RevStr(str);
cout<<*str;
}
}
int main()
{
RevStr("hello world!");
cout<<endl;
}
I want to change the order of words in the string for example " how are you" => "you are how"
I am having some problem, its not printing correctly (print only w), please help me and tell me what i did wrong. However i know that I should not call "cout<<*str;
" since i am inserting the "array of char" in stack (recurssion) but i dont know what i need to do.

C++ makes it simple:
#include <algorithm>
#include <iterator>
#include <vector>
#include <string>
#include <iostream>
#include <sstream>
std::string reverse(std::string const& text)
{
std::stringstream inStream(text);
std::stringstream outStream;
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inStream), std::istream_iterator<std::string>(), std::back_inserter(words));
std::copy(words.rbegin(), words.rend(), std::ostream_iterator<std::string>(outStream, " "));
return outStream.str();
}
int main()
{
std::cout << reverse("Hello World") << "\n";
}

A common approach to do this is to reverse the entire string first, then for each word, reverse the letters in the word. So no recursion is necessary. You might find it easier to give this a try (yes, I know this isn't exactly an answer to your question :) ).

Use cout << str, not cout << *str to print a string. There's an operator<< overload for char *. But maybe that's not what you're trying to do; I can't quite follow your logic, in any event.

You're losing the "hello" part.
The algorithm you seem to go for does this:
each call to RevStr isolates the first word in the string it is passed as a parameter
calls RevStr with the remaining of the string
prints the word it isolated at step 1 as the stack unwinds
Basically, you should be printing the v1 data.

I would strongly advise making using some of the functionality exposed via std::string as a place to start.
One way you might do this would look like this:
std::string ReverseString(std::string s)
{
std::stack<std::string > stack;
std::string tmpstr = "";
std::string newstr = "";
size_t strsize = s.size();
size_t pos = 0; size_t tmppos = 0;
size_t i = 0; size_t stacksize = 0;
while( pos < strsize )
{
tmppos = s.find(" ", pos, 1); // starting as pos, look for " "
if (tmppos == std::string::npos) // std::string::npos => reached end
{
tmppos = strsize; // don't forget the last item.
}
tmpstr = s.substr(pos, tmppos-pos); // split the string.
stack.push(tmpstr); // push said string onto the stack
pos = tmppos+1;
}
stacksize = stack.size();
for ( i = 0; i < stacksize; i++ )
{
tmpstr = stack.top(); // grab string from top of the stack
stack.pop(); // stacks being LIFO, we're getting
if ( i != 0 ) // everything backwards.
{
newstr.append(" "); // add preceding whitespace.
}
newstr.append(tmpstr); // append word.
}
return newstr;
}
It's by no means the best or fastest way to achieve this; there are many other ways you could do it (Jerry Coffin mentions using std::vector with an iterator, for example), but as you have the power of C++ there, to me it would make sense to use it.
I've done it this way so you could use a different delimiter if you wanted to.
In case you're interested, you can now use this with:
int main(int argc, char** argv)
{
std::string s = "In Soviet Russia String Format You";
std::string t = ReverseString(s);
std::cout << t << std::endl;
}

given that its a char*, this reverses it inplace (ie, doesn't require more memory proportional to the incoming 'str'). This avoids converting it to a std::string ( not that its a bad idea to, just because it's a char* to start with.)
void reverse_words(char* str)
{
char* last = strlen(str) + str;
char *s, *e;
std::reverse(str,last);
for(s=e=str; e != last; e++)
{
if(*e == ' ')
{
std::reverse(s,e);
s = e+1;
}
}
std::reverse(s,e);
}

void Reverse(const string& text)
{
list<string> words;
string temp;
for ( auto cur = text.begin(); cur != text.end(); ++cur)
{
if (*cur == ' ')
{
words.push_front(temp);
temp.clear();
}
else
{
temp += *cur;
}
}
if (! temp.empty())
{
words.push_front(temp);
}
for_each(words.begin(), words.end(), [](const string& word) { cout << word << " "; });
cout << endl;
}

void swap(char* c1, char* c2) {
char tmp = *c1;
*c1 = *c2;
*c2 = tmp;
}
void reverse(char* s, char* e) {
if (s == NULL || e == NULL)
return;
while(s < e)
swap(s++, e--);
}
void reverse_words(char* line) {
if (line == NULL)
return;
reverse(line, line+strlen(line)-1);
char *s = line;
char *e;
while (*s != '\0') {
e = s;
while (*e != ' ' && *e != '\0') ++e;
--e;
reverse(s,e);
s = e+2;
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ splitting string on non alphabetic characters - c++

Related

Remove the words from a string that start with a certain character

how to split a sentence into strings using recursion in c++

how to point to delimiter at fixed position in std::string

set<string>: how to list not strings starting with given string and ending with `/`?

c++: I am trying to reverse the order of words in string (not the whole string)

Categories

Resources