Find duplicate entry in string in c++

Find duplicate entry in string in c++ - c++

I have different type of string and I need to find out the duplicate entry in this string.
Different type of string
string aStr= "1.1,1.2,1.3, 1"; //output should be duplicate entry
string aStr= "1.1,1.2,1.3"; //Ouput NO duplicate entry
string aStr= "1,2,1"; //o/p duplicate entry
string aStr = "1,2,3";//output No duplicate
I have tried as
std::vector < std::string > aOutString;
std::set <int> full_list;
std::set <std::string> aEntryInDouble;
int aNum1;
boost::split(aOutString, aStr , boost::is_any_of(","));
for(size_t it = 0; it < aOutString.size(); ++it)
{
if (aOutString[it].find('.') != std::string::npos)
{
//If entry is like "1.1,1.2,1.3"
if( !aEntryInDouble.insert(aOutString[it]).second )
{
aDup = false;
break;
}
}
else
{
//For entry "1,2,1"
aNum1 = std::atoi(aOutString[it].c_str());
if(aNum1)
{
if( !full_list.insert(aNum1).second )
{
aDup = false;
break;
}
}
I am not able to find out the solution for entry string "string aStr= "1.1,1.2,1.3, 1";
Please help me to find out the solution.
Thanks,

Here's an algorithm:
Split the input on commas. You will create a list of everything that is comma-delimited. You will then create a set from the list that may contain duplicates. This will remove all duplicates for you. If the list's length is equal to the set's length, there are no duplicates. Otherwise, constructing the set removed duplicates and it is shorter than the list.
Here's the code in C++. I took the tokenize from this answer and made a few modifications. Also, here's the coliru.
#include <iostream>
#include <vector>
#include <string>
#include <unordered_set>
std::vector<std::string> split(const std::string& str, const std::string& delimiters = ",") {
std::vector<std::string> tokens;
// Skip delimiters at beginning.
auto lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
auto pos = str.find_first_of(delimiters, lastPos);
while(std::string::npos != pos || std::string::npos != lastPos) {
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
bool has_dupes(const std::vector<std::string>& v) {
std::unordered_set<std::string> s(v.cbegin(), v.cend());
return s.size() != v.size();
}
std::string detect_duplicates(const std::string& s) {
auto v = split(s);
return has_dupes(v) ? "duplicates" : "no duplicates";
}
int main() {
// dupes
std::string dupes = "1,2,3,4,1";
std::cout << detect_duplicates(dupes) << '\n';
// no dupes
std::string no_dupes = "1,2,3";
std::cout << detect_duplicates(no_dupes) << '\n';
}

Related

Problem with only reading certain values in a string

For the following string string s1 = "172.16.254.01";I only want to read the numbers values and push them into a vector. It works well with this problem, but if I want to do it to let's say string s1 = "172.16.254...01"it will also push back two empty strings into my vector which I don't want to do. This will ruin my new vector, because not only do I have two empty strings but the size of the vector will also increase which will be problamatic in my case.
This is my code:
string s1 = "172.16.254.01";
vector<string> res;
string delimiter = ".";
size_t pos = 0;
while ((pos = s1.find(delimiter)) != std::string::npos) {
res.push_back(s1.substr(0, pos));
s1.erase(0, pos + delimiter.length());
}
res.push_back(s1);
for (auto value : res)
{
cout << value << endl;
}
This will output:
172
16
254
01
As you can see I use "." as a delimiter and read everything into a new string until it hits another ".". How can I manipulate my loop so that it fulfills the given requirements I mentioned at the start?

You need to check that between two delimiters there is a non-empty substring.
For example if the source string starts from the delimiter "." like ".1" then in the first iteration of the while loop an empty string will be pushed on the vector
while ((pos = s1.find(delimiter)) != std::string::npos) {
res.push_back(s1.substr(0, pos));
This statement after the while loop
res.push_back(s1);
can also append an empty string to the vector.
And there is no great sense to erase the source string.
I can suggest the following solution.
#include <iostream>
#include <string>
#include <vector>
int main()
{
std::string s( "172.16.254...01" );
std::vector<std::string> v;
const char delim = '.';
for (std::string::size_type pos = 0; pos != std::string::npos; )
{
auto last = s.find( delim, pos );
if (pos != last)
{
pos = s.find_first_not_of( ' ', pos );
if (pos != last)
{
v.push_back( s.substr( pos, last - pos ) );
}
pos = last;
}
if (pos != std::string::npos) ++pos;
}
for (const auto &item : v)
{
std::cout << item << ' ';
}
std::cout << '\n';
}
The program output is
172 16 254 01

Your code is not skipping unnecessary delimiter. It is directly inserting the values into res. So you should try something like this to skip unnecessary delimiter:
if (pos < s1.size())
{
if (s1.substr(pos + 1, delimiter.size()) == delimiter)
{
// Erase the extra delimiter
s1.erase(pos + 1, delimiter.length()); continue;
}
}
Final code:
#include <iostream>
#include <vector>
#include <string>
int main()
{
std::string s1 = "172.16.254...01";
std::vector<std::string> res;
std::string delimiter = ".";
size_t pos = 0;
while ((pos = s1.find(delimiter)) != std::string::npos)
{
if (s1.substr(pos + 1, delimiter.size()) == delimiter)
{
// Erase the extra delimiter
s1.erase(pos + 1, delimiter.length()); continue;
}
res.push_back(s1.substr(0, pos));
s1.erase(0, pos + delimiter.length());
}
for (auto value : res)
{
std::cout << value << std::endl;
}
}
Also, you should consider not using the following line in your code:
using namespace std;
...as it's considered as bad practice. Instead use std:: everytime.

Getting the words from a sentence and storing them in a vector of strings

Alright, guys ...
Here's my set that has all the letters. I'm defining a word as consisting of consecutive letters from the set.
const char LETTERS_ARR[] = {"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"};
const std::set<char> LETTERS_SET(LETTERS_ARR, LETTERS_ARR + sizeof(LETTERS_ARR)/sizeof(char));
I was hoping that this function would take in a string representing a sentence and return a vector of strings that are the individual words in the sentence.
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1) == 1))) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += k;
}
else {
++it;
}
}
return retvec;
}
For instance, the following call should return a vector of the strings "Yo", "dawg", etc.
std::string mystring("Yo, dawg, I heard you life functions, so we put a function inside your function so you can derive while you derive.");
std::vector<std::string> mystringvec = get_sntnc_wrds(mystring);
But everything isn't going as planned. I tried running my code and it was putting the entire sentence into the first and only element of the vector. My function is very messy code and perhaps you can help me come up with a simpler version. I don't expect you to be able to trace my thought process in my pitiful attempt at writing that function.

Try this instead:
#include <vector>
#include <cctype>
#include <string>
#include <algorithm>
// true if the argument is whitespace, false otherwise
bool space(char c)
{
return isspace(c);
}
// false if the argument is whitespace, true otherwise
bool not_space(char c)
{
return !isspace(c);
}
vector<string> split(const string& str)
{
typedef string::const_iterator iter;
vector<string> ret;
iter i = str.begin();
while (i != str.end())
{
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in [i, j)
if (i != str.end())
ret.push_back(string(i, j));
i = j;
}
return ret;
}
The split function will return a vector of strings, each element containing one word.
This code is taken from the Accelerated C++ book, so it's not mine, but it works. There are other superb examples of using containers and algorithms for solving every-day problems in this book. I could even get a one-liner to show the contents of a file at the output console. Highly recommended.

It's just a bracketing issue, my advice is (almost) never put in more brackets than are necessary, it's only confuses things
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1)) == 1) {
Your code compares the character with 1 not the return value of count.
Also although count does return an integer in this context I would simplify further and treat the return as a boolean
while (it+k+1 != S.end() && LETTERS_SET.count(*(it+k+1))) {

You should use the string steam with std::copy like so:
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
int main() {
std::string sentence = "And I feel fine...";
std::istringstream iss(sentence);
std::vector<std::string> split;
std::copy(std::istream_iterator<std::string>(iss),
std::istream_iterator<std::string>(),
std::back_inserter(split));
// This is to print the vector
for(auto iter = split.begin();
iter != split.end();
++iter)
{
std::cout << *iter << "\n";
}
}

I would use another more simple approach based on member functions of class std::string. For example
const char LETTERS[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
std::string s( "This12 34is 56a78 test." );
std::vector<std::string> v;
for ( std::string::size_type first = s.find_first_of( LETTERS, 0 );
first != std::string::npos;
first = s.find_first_of( LETTERS, first ) )
{
std::string::size_type last = s.find_first_not_of( LETTERS, first );
v.push_back(
std::string( s, first, last == std::string::npos ? std::string::npos : last - first ) );
first = last;
}
for ( const std::string &s : v ) std::cout << s << ' ';
std::cout << std::endl;

Here you make 2 mistakes, I have correct in the following code.
First, it should be
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1))
and, it should move to next by
it += (k+1);
and the code is
std::vector<std::string> get_sntnc_wrds(std::string S) {
std::vector<std::string> retvec;
std::string::iterator it = S.begin();
while (it != S.end()) {
if (LETTERS_SET.count(*it) == 1) {
std::string str(1,*it);
int k(0);
while (((it+k+1) != S.end()) && (LETTERS_SET.count(*(it+k+1)) == 1)) {
str.push_back(*(it + (++k)));
}
retvec.push_back(str);
it += (k+1);
}
else {
++it;
}
}
return retvec;
}
The output have been tested.

set<string>: how to list not strings starting with given string and ending with `/`?

for example we have in our set:
bin/obj/Debug/CloudServerPrototype/ra.write.1.tlog
bin/obj/Debug/CloudServerPrototype/rc.write.1.tlog
bin/obj/Debug/vc100.idb
bin/obj/Debug/vc100.pdb
So this is what I tried based on this grate answer:
#include <iostream>
#include <algorithm>
#include <set>
#include <string>
#include <iterator>
using namespace std;
struct get_pertinent_part
{
const std::string given_string;
get_pertinent_part(const std::string& s)
:given_string(s)
{
}
std::string operator()(const std::string& s)
{
std::string::size_type first = 0;
if (s.find(given_string) == 0)
{
first = given_string.length() + 1;
}
std::string::size_type count = std::string::npos;
std::string::size_type pos = s.find_last_of("/");
if (pos != std::string::npos && pos > first)
{
count = pos + 1 - first;
}
return s.substr(first, count);
}
};
void directory_listning_without_directories_demo()
{
set<string> output;
set<string> demo_set;
demo_set.insert("file1");
demo_set.insert("file2");
demo_set.insert("folder/file1");
demo_set.insert("folder/file2");
demo_set.insert("folder/folder/file1");
demo_set.insert("folder/folder/file2");
demo_set.insert("bin/obj/Debug/CloudServerPrototype/ra.write.1.tlog");
demo_set.insert("bin/obj/Debug/CloudServerPrototype/rc.write.1.tlog");
demo_set.insert("bin/obj/Debug/vc100.idb");
demo_set.insert("bin/obj/Debug/vc100.pdb");
std::transform(demo_set.begin(),
demo_set.end(),
std::inserter(output, output.end()),
get_pertinent_part("bin/obj/Debug/"));
std::copy(output.begin(),
output.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
int main()
{
directory_listning_without_directories_demo();
cin.get();
return 0;
}
This outputs:
CloudServerPrototype/
file1
file2
folder/
folder/folder/
vc100.idb
vc100.pdb
and we are given with bin/obj/Debug/string. We want to cout:
vc100.idb
vc100.pdb
CloudServerPrototype/
How to do such thing?

Quick example of what you want to do.
String.find(): http://www.cplusplus.com/reference/string/string/find/
String.subStr(): http://www.cplusplus.com/reference/string/string/substr/
string str = "bin/obj/Debug/vc100.pdb";
string checkString ("bin/obj/Debug");
// Check if string starts with the check string
if (str.find(checkString) == 0){
// Check if last letter if a "/"
if(str.substr(str.length()-1,1) == "/"){
// Output strating at the end of the check string and for
// the differnce in the strings.
cout << str.substr(checkString.length(), (str.length() - checkString.length()) ) << endl;
}
}

It's not clear with which part of the problem you are stuck, so here is a starter for you.
To get the parts of the strings between "given string" and the final '/' (where present):
std::string get_pertinent_part(const std::string& s)
{
std::string::size_type first = 0;
if (s.find(given_string) == 0)
{
first = given_string.length() + 1;
}
std::string::size_type count = std::string::npos;
std::string::size_type pos = s.find_last_of("/");
if (pos != std::string::npos && pos > first)
{
count = pos + 1 - first;
}
return s.substr(first, count);
}
To insert these parts into a new set (output) to guarantee uniqueness you can use the following:
std::transform(your_set.begin(),
your_set.end(),
std::inserter(output, output.end()),
get_pertinent_part);
You may wish to pass given_string into get_pertinent_part(), in which case you'll need to convert it to a functor:
struct get_pertinent_part
{
const std::string given_string;
get_pertinent_part(const std::string& s)
:given_string(s)
{
}
std::string operator()(const std::string& s)
{
std::string::size_type first = 0;
//
// ...same code as before...
//
return s.substr(first, count);
}
};
You can then call it this way:
std::transform(your_set.begin(),
your_set.end(),
std::inserter(output, output.end()),
get_pertinent_part("bin/obj/Debug"));
To output the new set:
std::copy(output.begin(),
output.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
Sorting the results is left as an exercise.

The easiest way I can think of, using the standard C functions, would be:
char * string1 = "bin/obj/Debug"
char * string2 = "bin/obj/Debug/CloudServerPrototype/rc.write.1.tlog"
char result[64];
// the above code is just to bring the strings into this example
char * position = strstr(string1, string2);
int substringLength;
if(position != NULL){
position += strlen(string2);
substringLength = strchr(position, '/') - position;
strncpy(result, position, substringLength);
}else{
strcpy(result, string1); // this case is for when your first string is not found
}
cout << result;
The first thing that occurs, is finding the substring, string1, in the string we are analyzing, being string2. Once we found the starting point, and assuming it was there at all, we add the length of that substring to that starting point using pointer arithmatic, and then find the resulting string's length by subtracting the starting position from the ending position, which is found with strchr(position, '/'). Then we simply copy that substring into a buffer and it's there to print with cout.
I am sure there is a fancy way of doing this with std::string, but I'll leave that to anyone who can better explain c++ strings, I never did manage to get comfortable with them, haha

Tokenize a string and include delimiters in C++

I'm tokening with the following, but unsure how to include the delimiters with it.
void Tokenize(const string str, vector<string>& tokens, const string& delimiters)
{
int startpos = 0;
int pos = str.find_first_of(delimiters, startpos);
string strTemp;
while (string::npos != pos || string::npos != startpos)
{
strTemp = str.substr(startpos, pos - startpos);
tokens.push_back(strTemp.substr(0, strTemp.length()));
startpos = str.find_first_not_of(delimiters, pos);
pos = str.find_first_of(delimiters, startpos);
}
}

The C++ String Toolkit Library (StrTk) has the following solution:
std::string str = "abc,123 xyz";
std::vector<std::string> token_list;
strtk::split(";., ",
str,
strtk::range_to_type_back_inserter(token_list),
strtk::include_delimiters);
It should result with token_list have the following elements:
Token0 = "abc,"
Token1 = "123 "
Token2 = "xyz"
More examples can be found Here

I now this a little sloppy, but this is what I ended up with. I did not want to use boost since this is a school assignment and my instructor wanted me to use find_first_of to accomplish this.
Thanks for everyone's help.
vector<string> Tokenize(const string& strInput, const string& strDelims)
{
vector<string> vS;
string strOne = strInput;
string delimiters = strDelims;
int startpos = 0;
int pos = strOne.find_first_of(delimiters, startpos);
while (string::npos != pos || string::npos != startpos)
{
if(strOne.substr(startpos, pos - startpos) != "")
vS.push_back(strOne.substr(startpos, pos - startpos));
// if delimiter is a new line (\n) then addt new line
if(strOne.substr(pos, 1) == "\n")
vS.push_back("\\n");
// else if the delimiter is not a space
else if (strOne.substr(pos, 1) != " ")
vS.push_back(strOne.substr(pos, 1));
if( string::npos == strOne.find_first_not_of(delimiters, pos) )
startpos = strOne.find_first_not_of(delimiters, pos);
else
startpos = pos + 1;
pos = strOne.find_first_of(delimiters, startpos);
}
return vS;
}

I can't really follow your code, could you post a working program?
Anyway, this is a simple tokenizer, without testing edge cases:
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void tokenize(vector<string>& tokens, const string& text, const string& del)
{
string::size_type startpos = 0,
currentpos = text.find(del, startpos);
do
{
tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
startpos = currentpos + del.size();
currentpos = text.find(del, startpos);
} while(currentpos != string::npos);
tokens.push_back(text.substr(startpos, currentpos-startpos+del.size()));
}
Example input, delimiter = $$:
Hello$$Stack$$Over$$$Flow$$$$!
Tokens:
Hello$$
Stack$$
Over$$
$Flow$$
$$
!
Note: I would never use a tokenizer I wrote without testing! please use boost::tokenizer!

if the delimiters are characters and not strings, then you can use strtok.

It depends on whether you want the preceding delimiters, the following delimiters, or both, and what you want to do with strings at the beginning and end of the string that may not have delimiters before/after them.
I'm going to assume you want each word, with its preceding and following delimiters, but NOT any strings of delimiters by themselves (e.g. if there's a delimiter following the last string).
template <class iter>
void tokenize(std::string const &str, std::string const &delims, iter out) {
int pos = 0;
do {
int beg_word = str.find_first_not_of(delims, pos);
if (beg_word == std::string::npos)
break;
int end_word = str.find_first_of(delims, beg_word);
int beg_next_word = str.find_first_not_of(delims, end_word);
*out++ = std::string(str, pos, beg_next_word-pos);
pos = end_word;
} while (pos != std::string::npos);
}
For the moment, I've written it more like an STL algorithm, taking an iterator for its output instead of assuming it's always pushing onto a collection. Since it depends (for the moment) in the input being a string, it doesn't use iterators for the input.

Is There A Built-In Way to Split Strings In C++?

well is there? by string i mean std::string

Here's a perl-style split function I use:
void split(const string& str, const string& delimiters , vector<string>& tokens)
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}

There's no built-in way to split a string in C++, but boost provides the string algo library to do all sort of string manipulation, including string splitting.

Yup, stringstream.
std::istringstream oss(std::string("This is a test string"));
std::string word;
while(oss >> word) {
std::cout << "[" << word << "] ";
}

STL strings
You can use string iterators to do your dirty work.
std::string str = "hello world";
std::string::const_iterator pos = std::find(string.begin(), string.end(), ' '); // Split at ' '.
std::string left(str.begin(), pos);
std::string right(pos + 1, str.end());
// Echoes "hello|world".
std::cout << left << "|" << right << std::endl;

void split(string StringToSplit, string Separators)
{
size_t EndPart1 = StringToSplit.find_first_of(Separators)
string Part1 = StringToSplit.substr(0, EndPart1);
string Part2 = StringToSplit.substr(EndPart1 + 1);
}

The answer is no. You have to break them up using one of the library functions.
Something I use:
std::vector<std::string> parse(std::string l, char delim)
{
std::replace(l.begin(), l.end(), delim, ' ');
std::istringstream stm(l);
std::vector<std::string> tokens;
for (;;) {
std::string word;
if (!(stm >> word)) break;
tokens.push_back(word);
}
return tokens;
}
You can also take a look at the basic_streambuf<T>::underflow() method and write a filter.

What the heck... Here's my version...
Note: Splitting on ("XZaaaXZ", "XZ") will give you 3 strings. 2 of those strings will be empty, and won't be added to theStringVector if theIncludeEmptyStrings is false.
Delimiter is not any element in the set, but rather matches that exact string.
inline void
StringSplit( vector<string> * theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
UASSERT( theStringVector, !=, (vector<string> *) NULL );
UASSERT( theDelimiter.size(), >, 0 );
size_t start = 0, end = 0, length = 0;
while ( end != string::npos )
{
end = theString.find( theDelimiter, start );
// If at end, use length=maxLength. Else use length=end-start.
length = (end == string::npos) ? string::npos : end - start;
if ( theIncludeEmptyStrings
|| ( ( length > 0 ) /* At end, end == length == string::npos */
&& ( start < theString.size() ) ) )
theStringVector -> push_back( theString.substr( start, length ) );
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size() );
}
}
inline vector<string>
StringSplit( const string & theString,
const string & theDelimiter,
bool theIncludeEmptyStrings = false )
{
vector<string> v;
StringSplit( & v, theString, theDelimiter, theIncludeEmptyStrings );
return v;
}

There is no common way doing this.
I prefer the boost::tokenizer, its header only and easy to use.

C strings
Simply insert a \0 where you wish to split. This is about as built-in as you can get with standard C functions.
This function splits on the first occurance of a char separator, returning the second string.
char *split_string(char *str, char separator) {
char *second = strchr(str, separator);
if(second == NULL)
return NULL;
*second = '\0';
++second;
return second;
}

A fairly simple method would be to use the c_str() method of std::string to get a C-style character array, then use strtok() to tokenize the string. Not quite as eloquent as some of the other solutions listed here, but it's easy and works.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find duplicate entry in string in c++ - c++

Related

Problem with only reading certain values in a string

Getting the words from a sentence and storing them in a vector of strings

set<string>: how to list not strings starting with given string and ending with `/`?

Tokenize a string and include delimiters in C++

Is There A Built-In Way to Split Strings In C++?

Categories

Resources