Finding all occurrences of a character in a string - c++

I have comma delimited strings I need to pull values from. The problem is these strings will never be a fixed size. So I decided to iterate through the groups of commas and read what is in between. In order to do that I made a function that returns every occurrence's position in a sample string.
Is this a smart way to do it? Is this considered bad code?
#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>
using namespace std;
vector<int> findLocation(string sample, char findIt);
int main()
{
string test = "19,,112456.0,a,34656";
char findIt = ',';
vector<int> results = findLocation(test,findIt);
return 0;
}
vector<int> findLocation(string sample, char findIt)
{
vector<int> characterLocations;
for(int i =0; i < sample.size(); i++)
if(sample[i] == findIt)
characterLocations.push_back(sample[i]);
return characterLocations;
}

vector<int> findLocation(string sample, char findIt)
{
vector<int> characterLocations;
for(int i =0; i < sample.size(); i++)
if(sample[i] == findIt)
characterLocations.push_back(sample[i]);
return characterLocations;
}
As currently written, this will simply return a vector containing the int representations of the characters themselves, not their positions, which is what you really want, if I read your question correctly.
Replace this line:
characterLocations.push_back(sample[i]);
with this line:
characterLocations.push_back(i);
And that should give you the vector you want.

If I were reviewing this, I would see this and assume that what you're really trying to do is tokenize a string, and there's already good ways to do that.
Best way I've seen to do this is with boost::tokenizer. It lets you specify how the string is delimited and then gives you a nice iterator interface to iterate through each value.
using namespace boost;
string sample = "Hello,My,Name,Is,Doug";
escaped_list_seperator<char> sep("" /*escape char*/, ","/*seperator*/, "" /*quotes*/)
tokenizer<escaped_list_seperator<char> > myTokens(sample, sep)
//iterate through the contents
for (tokenizer<escaped_list_seperator<char>>::iterator iter = myTokens.begin();
iter != myTokens.end();
++iter)
{
std::cout << *iter << std::endl;
}
Output:
Hello
My
Name
Is
Doug
Edit If you don't want a dependency on boost, you can also use getline with an istringstream as in this answer. To copy somewhat from that answer:
std::string str = "Hello,My,Name,Is,Doug";
std::istringstream stream(str);
std::string tok1;
while (stream)
{
std::getline(stream, tok1, ',');
std::cout << tok1 << std::endl;
}
Output:
Hello
My
Name
Is
Doug
This may not be directly what you're asking but I think it gets at your overall problem you're trying to solve.

Looks good to me too, one comment is with the naming of your variables and types. You call the vector you are going to return characterLocations which is of type int when really you are pushing back the character itself (which is type char) not its location. I am not sure what the greater application is for, but I think it would make more sense to pass back the locations. Or do a more cookie cutter string tokenize.

Well if your purpose is to find the indices of occurrences the following code will be more efficient as in c++ giving objects as parameters causes the objects to be copied which is insecure and also less efficient. Especially returning a vector is the worst possible practice in this case that's why giving it as a argument reference will be much better.
#include <string>
#include <iostream>
#include <vector>
#include <Windows.h>
using namespace std;
vector<int> findLocation(string sample, char findIt);
int main()
{
string test = "19,,112456.0,a,34656";
char findIt = ',';
vector<int> results;
findLocation(test,findIt, results);
return 0;
}
void findLocation(const string& sample, const char findIt, vector<int>& resultList)
{
const int sz = sample.size();
for(int i =0; i < sz; i++)
{
if(sample[i] == findIt)
{
resultList.push_back(i);
}
}
}

How smart it is also depends on what you do with those subtstrings delimited with commas. In some cases it may be better (e.g. faster, with smaller memory requirements) to avoid searching and splitting and just parse and process the string at the same time, possibly using a state machine.

Related

How to get a word vector from a string?

I want to store words separated by spaces into single string elements in a vector.
The input is a string that may end or may not end in a symbol( comma, period, etc.)
All symbols will be separated by spaces too.
I created this function but it doesn't return me a vector of words.
vector<string> single_words(string sentence)
{
vector<string> word_vector;
string result_word;
for (size_t character = 0; character < sentence.size(); ++character)
{
if (sentence[character] == ' ' && result_word.size() != 0)
{
word_vector.push_back(result_word);
result_word = "";
}
else
result_word += character;
}
return word_vector;
}
What did I do wrong?
Your problem has already been resolved by answers and comments.
I would like to give you the additional information that such functionality is already existing in C++.
You could take advantage of the fact that the extractor operator extracts space separated tokens from a stream. Because a std::string is not a stream, we can put the string first into an std::istringstream and then extract from this stream vie the std:::istream_iterator.
We could life make even more easier.
Since roundabout 10 years we have a dedicated, special C++ functionality for splitting strings into tokens, explicitely designed for this purpose. The std::sregex_token_iterator. And because we have such a dedicated function, we should simply use it.
The idea behind it is the iterator concept. In C++ we have many containers and always iterators, to iterate over the similar elements in these containers. And a string, with similar elements (tokens), separated by a delimiter, can also be seen as such a container. And with the std::sregex:token_iterator, we can iterate over the elements/tokens/substrings of the string, splitting it up effectively.
This iterator is very powerfull and you can do really much much more fancy stuff with it. But that is too much for here. Important is that splitting up a string into tokens is a one-liner. For example a variable definition using a range constructor for iterating over the tokens.
See some examples below:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
#include <regex>
const std::regex delimiter{ " " };
const std::regex reWord{ "(\\w+)" };
int main() {
// Some debug print function
auto print = [](const std::vector<std::string>& sv) -> void {
std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, "\n")); std::cout << "\n"; };
// The test string
std::string test{ "word1 word2 word3 word4." };
//-----------------------------------------------------------------------------------------
// Solution 1: use istringstream and then extract from there
std::istringstream iss1(test);
// Define a vector (CTAD), use its range constructor and, the std::istream_iterator as iterator
std::vector words1(std::istream_iterator<std::string>(iss1), {});
print(words1); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 2: directly use dedicated function sregex_token iterator
std::vector<std::string> words2(std::sregex_token_iterator(test.begin(), test.end(), delimiter, -1), {});
print(words2); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 3: directly use dedicated function sregex_token iterator and look for words only
std::vector<std::string> words3(std::sregex_token_iterator(test.begin(), test.end(), reWord, 1), {});
print(words3); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 4: Use such iterator in an algorithm, to copy data to a vector
std::vector<std::string> words4{};
std::copy(std::sregex_token_iterator(test.begin(), test.end(), reWord, 1), {}, std::back_inserter(words4));
print(words4); // Show debug output
//-----------------------------------------------------------------------------------------
// Solution 5: Use such iterator in an algorithm for direct output
std::copy(std::sregex_token_iterator(test.begin(), test.end(), reWord, 1), {}, std::ostream_iterator<std::string>(std::cout,"\n"));
return 0;
}
You added the index instead of the character:
vector<string> single_words(string sentence)
{
vector<string> word_vector;
string result_word;
for (size_t i = 0; i < sentence.size(); ++i)
{
char character = sentence[i];
if (character == ' ' && result_word.size() != 0)
{
word_vector.push_back(result_word);
result_word = "";
}
else
result_word += character;
}
return word_vector;
}
Since your mistake was only due to the reason, that you named your iterator variable character even though it is actually not a character, but rather an iterator or index, I would like to suggest to use a ranged-base loop here, since it avoids this kind of confusion. The clean solution is obviously to do what #ArminMontigny said, but I assume you are prohibited to use stringstreams. The code would look like this:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
vector<string> single_words(string sentence)
{
vector<string> word_vector;
string result_word;
for (char& character: sentence) // Now `character` is actually a character.
{
if (character==' ' && result_word.size() != 0)
{
word_vector.push_back(result_word);
result_word = "";
}
else
result_word += character;
}
word_vector.push_back(result_word); // In your solution, you forgot to push the last word into the vector.
return word_vector;
}
int main() {
string sentence="Maybe try range based loops";
vector<string> result= single_words(sentence);
for(string& word: result)
cout<<word<<" ";
return 0;
}

How to delete part of a string c++ [duplicate]

I got a string and I want to remove all the punctuations from it. How do I do that? I did some research and found that people use the ispunct() function (I tried that), but I cant seem to get it to work in my code. Anyone got any ideas?
#include <string>
int main() {
string text = "this. is my string. it's here."
if (ispunct(text))
text.erase();
return 0;
}
Using algorithm remove_copy_if :-
string text,result;
std::remove_copy_if(text.begin(), text.end(),
std::back_inserter(result), //Store output
std::ptr_fun<int, int>(&std::ispunct)
);
POW already has a good answer if you need the result as a new string. This answer is how to handle it if you want an in-place update.
The first part of the recipe is std::remove_if, which can remove the punctuation efficiently, packing all the non-punctuation as it goes.
std::remove_if (text.begin (), text.end (), ispunct)
Unfortunately, std::remove_if doesn't shrink the string to the new size. It can't because it has no access to the container itself. Therefore, there's junk characters left in the string after the packed result.
To handle this, std::remove_if returns an iterator that indicates the part of the string that's still needed. This can be used with strings erase method, leading to the following idiom...
text.erase (std::remove_if (text.begin (), text.end (), ispunct), text.end ());
I call this an idiom because it's a common technique that works in many situations. Other types than string provide suitable erase methods, and std::remove (and probably some other algorithm library functions I've forgotten for the moment) take this approach of closing the gaps for items they remove, but leaving the container-resizing to the caller.
#include <string>
#include <iostream>
#include <cctype>
int main() {
std::string text = "this. is my string. it's here.";
for (int i = 0, len = text.size(); i < len; i++)
{
if (ispunct(text[i]))
{
text.erase(i--, 1);
len = text.size();
}
}
std::cout << text;
return 0;
}
Output
this is my string its here
When you delete a character, the size of the string changes. It has to be updated whenever deletion occurs. And, you deleted the current character, so the next character becomes the current character. If you don't decrement the loop counter, the character next to the punctuation character will not be checked.
ispunct takes a char value not a string.
you can do like
for (auto c : string)
if (ispunct(c)) text.erase(text.find_first_of(c));
This will work but it is a slow algorithm.
Pretty good answer by Steve314.
I would like to add a small change :
text.erase (std::remove_if (text.begin (), text.end (), ::ispunct), text.end ());
Adding the :: before the function ispunct takes care of overloading .
The problem here is that ispunct() takes one argument being a character, while you are trying to send a string. You should loop over the elements of the string and erase each character if it is a punctuation like here:
for(size_t i = 0; i<text.length(); ++i)
if(ispunct(text[i]))
text.erase(i--, 1);
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}
#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;//string is defined here.
cout << "Please enter a string with punctuation's: " << endl;//Asking for users input
getline(cin, s);//reads in a single string one line at a time
/* ERROR Check: The loop didn't run at first because a semi-colon was placed at the end
of the statement. Remember not to add it for loops. */
for(auto &c : s) //loop checks every character
{
if (ispunct(c)) //to see if its a punctuation
{
c=' '; //if so it replaces it with a blank space.(delete)
}
}
cout << s << endl;
system("pause");
return 0;
}
Another way you could do this would be as follows:
#include <ctype.h> //needed for ispunct()
string onlyLetters(string str){
string retStr = "";
for(int i = 0; i < str.length(); i++){
if(!ispunct(str[i])){
retStr += str[i];
}
}
return retStr;
This ends up creating a new string instead of actually erasing the characters from the old string, but it is a little easier to wrap your head around than using some of the more complex built in functions.
I tried to apply #Steve314's answer but couldn't get it to work until I came across this note here on cppreference.com:
Notes
Like all other functions from <cctype>, the behavior of std::ispunct
is undefined if the argument's value is neither representable as
unsigned char nor equal to EOF. To use these functions safely with
plain chars (or signed chars), the argument should first be converted
to unsigned char.
By studying the example it provides, I am able to make it work like this:
#include <string>
#include <iostream>
#include <cctype>
#include <algorithm>
int main()
{
std::string text = "this. is my string. it's here.";
std::string result;
text.erase(std::remove_if(text.begin(),
text.end(),
[](unsigned char c) { return std::ispunct(c); }),
text.end());
std::cout << text << std::endl;
}
Try to use this one, it will remove all the punctuation on the string in the text file oky.
str.erase(remove_if(str.begin(), str.end(), ::ispunct), str.end());
please reply if helpful
i got it.
size_t found = text.find('.');
text.erase(found, 1);

C++ Put string in array

I keep seeing similar questions to mine, however, I can't seem to find one that helps my situation. Honestly, it seems like such a mundane question, I shouldn't be asking it, but here I am 2 weeks latter, still with no answer.
{
string word;
ArrayWithWords[d] = word;
d++;
}
Every time this loop runs, I want to put word in position d of the array. Other examples I've found only turn the string into char*.
The array will be used more than once and having a solid value, if that's what it's called, is far more preferred. I'd like to avoid using a pointer.
Just use a vector of strings.
#include <string>
#include <vector>
int main()
{
std::vector<std::string> ArrayWithWords(10);
size_t d = 5; // something between 0 and 9
std::string word;
ArrayWithWords[d] = word;
d++;
}
Pretty much the same thing that was just posted but a little bit more old school.
#include <string>
using namespace std;
int main()
{
string stringArray[10];
string word;
word = "hello";
for (int i = 0; i < 10; i++)
{
stringArray[i] = string(word);
}
}

What is the most efficent way to turn a string of integers e.g. (1,5,73,2) into an array of integers?

I have had a similar problem while coding in Java, and in that instance, I used str1.split(",") to change the string of integers into an array of them.
Is there a method in C++ that has a similar function to Java's split method, or is the best way using a for loop to achieve the same goals?
Using std::istringstream to parse this out would certainly be more convenient.
But the question being asked is what's going to be most "efficient". And, for better or for worse, #include <iostream> is not known for its efficiency.
A simple for loop will be hard to beat, for efficiency's sake.
Assuming that the input doesn't contain any whitespace, only commas and digits:
std::vector<int> split(const std::string &s)
{
std::vector<int> r;
if (!s.empty())
{
int n=0;
for (char c:s)
{
if (c == ',')
{
r.push_back(n);
n=0;
}
else
n=n*10 + (c-'0');
}
r.push_back(n);
}
return r;
}
Feel free to benchmark this again any istream or istream_iterator-based approach.
If ou already know the number of elements in your string, the fastest method is to use the c function sscanf which is a lot faster that istringstream (http://lewismanor.blogspot.fr/2013/07/stdstringstream-vs-scanf.html):
#include <cstdio>
#include <iostream>
using namespace std;
int main() {
const char * the_string= "1,2,3,45";
int numbers[4];
sscanf(the_string, "%d,%d,%d,%d", numbers+0, numbers+1, numbers+2, numbers+3);
// verification
for(int i=0; i<4; i++)
cout<<numbers[i]<<endl;
}

sequence of delimiters in function strtok

im trying to obtain tokens with function strtok() in C++. Is very simple when you use just 1 delimiter like:
token = strtok(auxiliar,"[,]");. This will cut auxiliar everytime the function finds [,,or].
What I want is obtain tokens with a sequence of delimiters like: [,]
It is posible doing that with strtok function? I cannot find the way.
Thank you!
If you want strtok to treat [,] as a single token, this cannot be done. strtok always treats whatever you pass in the delimiters string as individual, 1-character delimiters.
Beyond this, it's best to not use strtok in C++ anyway. It is not re-entrant (eg, you can't nest calls), not type-safe, and very easy to use in a way that creates nasty bugs.
The simplest solution is to simply search withing a std::string for the particular delimiter you want, in a loop. If you need more sophisticated functionality, there are tokenizers in the Boost library, and I've also posted code to do more comprehensive tokenizing using only the Standard Library, here.
The code I've linked above also treats delimiters as single characters, but I think the code could be extended in the way you desire.
If this is really C++, you should use std::string and not C strings.
Here's an example that uses only the STL to split a std::string into a std::vector:
#include <cstddef>
#include <string>
#include <vector>
std::vector<std::string> split(std::string str, std::string sep) {
std::vector<std::string> vec;
size_t i = 0, j = 0;
do {
i = str.find(sep, j);
vec.push_back( str.substr(j, i-j) );
j = i + sep.size();
} while (i != str.npos);
return vec;
}
int main() {
std::vector<std::string> vec = split("This[,]is[[,]your, string", "[,]");
// vec is contains "This", "is[", "your, string"
return 0;
}
If you can use the new C++11 features, you can do it with regex and token iterators. For example:
regex reg("\[,\]");
const sregex_token_iterator end;
string aux(auxilar);
for(sregex_token_iterator iter(aux.begin(), aux.end(), reg); iter != end; ++iter) {
cout << *iter << endl;
}
This example is from the Wrox book Professional C++.
If you can use the boost library I think this will do what you want it to do - not totally sure though as your question is a little unclear
#include <iostream>
#include <vector>
#include <string>
#include <boost/tokenizer.hpp>
int main(int argc, char *argv[])
{
std::string data("[this],[is],[some],[weird],[fields],[data],[I],[want],[to],[split]");
boost::tokenizer<boost::char_separator<char> > tokens(data, boost::char_separator<char>("],["));
std::vector<std::string> words(tokens.begin(), tokens.end());
for(std::vector<std::string>::const_iterator i=words.begin(),end=words.end(); i!=end; ++i)
{
std::cout << '\'' << *i << "'\n";
}
return 0;
}
This produces the following output
'this'
'is'
'some'
'weird'
'fields'
'data'
'I'
'want'
'to'
'split'