Case insensitive std::set of strings

Case insensitive std::set of strings - c++

How do you have a case insensitive insertion Or search of a string in std::set?
For example-
std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.

You need to define a custom comparator:
struct InsensitiveCompare {
bool operator() (const std::string& a, const std::string& b) const {
return strcasecmp(a.c_str(), b.c_str()) < 0;
}
};
std::set<std::string, InsensitiveCompare> s;
You may try stricmp or strcoll if strcasecmp is not available.

std::set offers the possibility of providing your own comparer (as do most std containers). You can then perform any type of comparison you like. Full example is available here

This is a generic solution that also works with other string types than std::string (tested with std::wstring, std::string_view, char const*). Basically anything that defines a range of characters should work.
The key point here is to use boost::as_literal that allows us to treat null-terminated character arrays, character pointers and ranges uniformly in the comparator.
Generic code ("iset.h"):
#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>
// Case-insensitive generic string comparator.
struct range_iless
{
template< typename InputRange1, typename InputRange2 >
bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const
{
// include the standard begin() and end() aswell as any custom overloads for ADL
using std::begin; using std::end;
// Treat null-terminated character arrays, character pointers and ranges uniformly.
// This just creates cheap iterator ranges (it doesn't copy container arguments)!
auto ir1 = boost::as_literal( r1 );
auto ir2 = boost::as_literal( r2 );
// Compare case-insensitively.
return std::lexicographical_compare(
begin( ir1 ), end( ir1 ),
begin( ir2 ), end( ir2 ),
boost::is_iless{} );
}
};
// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;
Usage example ("main.cpp"):
#include "iset.h" // above header file
#include <iostream>
#include <string>
#include <string_view>
// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
for( auto const& elem : r )
s << elem << sep;
s << std::endl;
}
int main()
{
iset< std::string > s1{ "Hello", "HELLO", "world" };
iset< std::wstring > s2{ L"Hello", L"HELLO", L"world" };
iset< char const* > s3{ "Hello", "HELLO", "world" };
iset< std::string_view > s4{ "Hello", "HELLO", "world" };
write_to( std::cout, s1, " " );
write_to( std::wcout, s2, L" " );
write_to( std::cout, s3, " " );
write_to( std::cout, s4, " " );
}
Live Demo at Coliru

From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors. As a result below is my solution to just roll your own.
#include <string>
#include <cctype>
#include <iostream>
#include <set>
struct caseInsensitiveLess
{
bool operator()(const std::string& x, const std::string& y)
{
unsigned int xs ( x.size() );
unsigned int ys ( y.size() );
unsigned int bound ( 0 );
if ( xs < ys )
bound = xs;
else
bound = ys;
{
unsigned int i = 0;
for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
{
if (tolower(*it1) < tolower(*it2))
return true;
if (tolower(*it2) < tolower(*it1))
return false;
}
}
return false;
}
};
int main()
{
std::set<std::string, caseInsensitiveLess> ss1;
std::set<std::string> ss2;
ss1.insert("This is the first string");
ss1.insert("THIS IS THE FIRST STRING");
ss1.insert("THIS IS THE SECOND STRING");
ss1.insert("This IS THE SECOND STRING");
ss1.insert("This IS THE Third");
ss2.insert("this is the first string");
ss2.insert("this is the first string");
ss2.insert("this is the second string");
ss2.insert("this is the second string");
ss2.insert("this is the third");
for ( auto& i: ss1 )
std::cout << i << std::endl;
std::cout << std::endl;
for ( auto& i: ss2 )
std::cout << i << std::endl;
}
Output with case insensitive set and regular set showing the same
ordering:
This is the first string
THIS IS THE SECOND STRING
This IS THE Third
this is the first string
this is the second string
this is the third

Related

Vector searching

Basically I have a load of words in my string vector vector<string> words.
I need to make a function that searches for all the words with "ly" throughout my vector and return them, for example (golly, helpfully, mostly, nearly).
How do I use the std::find_if function to do this or is there any other way that I can do this?
I also need to find words that are longer than 7 letters in my vector, do I still use the std::find_if function with >=7 or something else?

First of all, there is a more appropriate algorithm in the standard library called std::copy_if than the std::find_if (for what you have asked).
Secondly, you need to get a different list of words asper different cases. This sounds like having a template function which wraps the std::copy_if and also provide a way to give the custom compare (e.g. a lambda function) functionalities.
Therefore I would suggest something like as follows:
#include <algorithm> // std::copy_if
#include <iterator> // std::cbegin, std::cend
template<typename Container, typename Predicate>
auto getElelmentsOf(const Container& container, const Predicate condition) /* noexcept */
{
Container result;
std::copy_if(std::cbegin(container), std::cend(container), std::back_inserter(result),
condition);
return result;
}
Now you could write something like
// all the words with "ly"
const auto words_with_ly = [](const auto& ele) {
return ele.find(std::string{ "ly" }) != std::string::npos;
};
const auto elemtsOfLy = getElelmentsOf(words, words_with_ly); // function call
// find words that are longer than 7 letters
const auto words_with_size_7_more = [](const auto& ele) { return ele.size() > 7; };
const auto elemtsOfsize7More = getElelmentsOf(words, words_with_size_7_more); // function call
(See a Live Demo Online)

You can use std::copy_if to get all elements that satisfy some conditions.
#include <iostream>
#include <vector>
#include <string>
#include <algorithm> // for std::copy_if
#include <iterator> // for std::back_inserter
using std::vector;
using std::string;
int main(void) {
vector<string>words={
"golly", "hoge", "lyric", "helpfully",
"mostly", "abcdefg", "nearly", "terrible"
};
vector<string> res_ly, res_7;
// get all words that contains "ly"
std::copy_if(words.begin(), words.end(), std::back_inserter(res_ly),
[](const string& x){ return x.find("ly") != string::npos; });
// get all words that are longer than 7 letters
std::copy_if(words.begin(), words.end(), std::back_inserter(res_7),
[](const string& x){ return x.length() > 7; });
// print what we got
std::cout << "words with \"ly\":\n";
for (const string& s : res_ly) std::cout << " " << s << '\n';
std::cout << "\nwords longer than 7 letters:\n";
for (const string& s : res_7) std::cout << " " << s << '\n';
return 0;
}
Output:
words with "ly":
golly
lyric
helpfully
mostly
nearly
words longer than 7 letters:
helpfully
terrible
If you want to use std::find_if, you can repeat searching like this:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm> // for std::find_if
#include <iterator> // for std::next
using std::vector;
using std::string;
int main(void) {
vector<string>words={
"golly", "hoge", "lyric", "helpfully",
"mostly", "abcdefg", "nearly", "terrible"
};
vector<string> res_ly;
// get all words that contains "ly"
for (vector<string>::iterator start = words.begin(); ;) {
vector<string>::iterator next = std::find_if(start, words.end(),
[](const string& x){ return x.find("ly") != string::npos; });
if (next == words.end()) {
break;
} else {
res_ly.push_back(*next);
start = std::next(next, 1);
}
}
// print what we got
std::cout << "words with \"ly\":\n";
for (const string& s : res_ly) std::cout << " " << s << '\n';
return 0;
}

I could suggest the following solution.
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>
std::vector<std::string> copy_strings( const std::vector<std::string> &v, const std::string &s )
{
auto present = [&s]( const auto &item )
{
return item.find( s ) != std::string::npos;
};
auto n = std::count_if( std::begin( v ), std::end( v ), present );
std::vector<std::string> result;
result.reserve( n );
std::copy_if( std::begin( v ), std::end( v ),
std::back_inserter( result ),
present );
return result;
}
int main()
{
std::vector<std::string> v =
{
"golly", "helpfully", "mostly", "nearly"
};
auto result = copy_strings( v, "ly" );
for (const auto &item : result )
{
std::cout << item << ' ';
}
std::cout << '\n';
return 0;
}
The program output is
golly helpfully mostly nearly

Counting the appearance of words in a vector and listing those in a list, C++

I have a cpp vector containing separate words and I need to count how many times a word appears using a list. I try to iterate through the list but failing with the comparison of the two STL containers, whether the following word is already in my list or not. If not, I want to add that word to my list with an appearance of 1. I have a struct that counts the times a word appeared in the text.
The following code returns a list of words and numbers, but not each in my vector and I can't see why.
struct counter{
string word;
int sum = 1;
counter(){};
counter(string word): word(word){};
};
list<counter> list_count(vector<string> &text){
list<counter> word_count;
list<counter>::iterator it = word_count.begin();
for(string t:text){
if(it != word_count.end()){
it -> sum++;
} else {
word_count.push_back(counter(t));
}
++it;
}
return word_count;
}
Thank you in advance.

list<counter> list_count(const vector<string>& text) {
list<counter> word_count;
for (const string& t : text) {
auto it = std::find_if(word_count.begin(), word_count.end(),
[&](const counter& c){ return c.word == t; });
if (it != word_count.end()) {
it -> sum++;
} else {
word_count.push_back(counter(t));
}
}
return word_count;
}
Untested code.

You are not actually searching the std::list at all. On every loop iteration through the std::vector, you need to search the entire std::list from front to back, eg:
#include <string>
#include <list>
#include <vector>
#include <algorithm>
using namespace std;
struct counter {
string word;
int sum = 1;
counter(const string &word): word(word) {}
};
list<counter> list_count(const vector<string> &text) {
list<counter> word_count;
for(const string &t: text) {
// perform an actual search here!
list<counter>::iterator it = find_if(
word_count.begin(), word_count.end(),
[&](counter &c){ return (c.word == t); }
);
if (it != word_count.end()) {
it->sum++;
} else {
word_count.emplace_back(t);
}
}
return word_count;
}
Live Demo
That being said, a std::list is a poor solution for counting elements. A better solution is to use a std::(unordered_)map instead (unless you need to preserve the order of the words found, which neither one will do), eg:
#include <string>
#include <map>
#include <vector>
using namespace std;
map<string, int> list_count(const vector<string> &text) {
map<string, int> word_count;
for(const string &t: text) {
word_count[t]++;
}
return word_count;
}
Live Demo (using std::map)
Live Demo (using std::unordered_map)

You are trying to use an inefficient approach. The standard class template list does not have random access to its elements. Each new element is appended to the end of the list. To find whether an element is already present in the list elements of it are traversed sequentially.
It would be much efficiently to use the standard container std::map . Moreover in this container words will be ordered.
For example you could declare
std::map<std::string, size_t> counters;
Nevertheless if you want to use the list then the function can look as it is shown in the demonstrative program below.
#include <iostream>
#include <string>
#include <list>
#include <vector>
#include <iterator>
#include <algorithm>
struct counter
{
std::string word;
size_t n = 0;
counter() = default;
counter( const std::string &word ): word( word ), n( 1 ){}
};
std::list<counter> list_count( const std::vector<std::string> &text )
{
std::list<counter> word_count;
for ( const auto &s : text )
{
auto it = std::find_if( std::begin( word_count ), std::end( word_count ),
[&s]( const auto &c ) { return c.word == s; } );
if ( it == std::end( word_count ) )
{
word_count.push_back( s );
}
else
{
++it->n;
}
}
return word_count;
}
int main()
{
std::vector<std::string> v { "first", "second", "first" };
auto word_count = list_count( v );
for ( const auto &c : word_count )
{
std::cout << c.word << ": " << c.n << '\n';
}
return 0;
}
Its output is
first: 2
second: 1
Pay attention to that the definition of the struct counter is redundant. You could use instead the standard class std::pair. Here you are.
#include <iostream>
#include <string>
#include <utility>
#include <list>
#include <vector>
#include <iterator>
#include <algorithm>
std::list<std::pair<std::string, size_t>> list_count( const std::vector<std::string> &text )
{
std::list<std::pair<std::string, size_t>> word_count;
for ( const auto &s : text )
{
auto it = std::find_if( std::begin( word_count ), std::end( word_count ),
[&s]( const auto &p ) { return p.first == s; } );
if ( it == std::end( word_count ) )
{
word_count.emplace_back( s, 1 );
}
else
{
++it->second;
}
}
return word_count;
}
int main()
{
std::vector<std::string> v { "first", "second", "first" };
auto word_count = list_count( v );
for ( const auto &p : word_count )
{
std::cout << p.first << ": " << p.second << '\n';
}
return 0;
}
If to use std::map then the function will look very simple.
#include <iostream>
#include <string>
#include <vector>
#include <map>
std::map<std::string, size_t> list_count( const std::vector<std::string> &text )
{
std::map<std::string, size_t> word_count;
for ( const auto &s : text )
{
++word_count[s];
}
return word_count;
}
int main()
{
std::vector<std::string> v { "first", "second", "first" };
auto word_count = list_count( v );
for ( const auto &p : word_count )
{
std::cout << p.first << ": " << p.second << '\n';
}
return 0;
}
Using of the list will be efficient only in the case when the vector of strings is sorted.
Here is a demonstrative program.
#include <iostream>
#include <string>
#include <list>
#include <vector>
struct counter
{
std::string word;
size_t n = 0;
counter() = default;
counter( const std::string &word ): word( word ), n( 1 ){}
};
std::list<counter> list_count( const std::vector<std::string> &text )
{
std::list<counter> word_count;
for ( const auto &s : text )
{
if ( word_count.empty() || word_count.back().word != s )
{
word_count.push_back( s );
}
else
{
++word_count.back().n;
}
}
return word_count;
}
int main()
{
std::vector<std::string> v { "A", "B", "B", "C", "C", "C", "D", "D", "E" };
auto word_count = list_count( v );
for ( const auto &c : word_count )
{
std::cout << c.word << ": " << c.n << '\n';
}
return 0;
}
Its output is
A: 1
B: 2
C: 3
D: 2
E: 1

Count the number of unique words (case does not matter for this count)

Hey so I'm having trouble figuring out the code to count the number of unique words. My thought process in terms of psudeocode was first making a vector so something like vector<string> unique_word_list;Then I would get the program to read each line so I would have something likewhile(getline(fin,line)). The hard part for me is coming up with the code where I check the vector(array) to see if the string is already in there. If it's in there I just increase the word count(simple enough) but if its not in there then I just add a new element to the vector. I would really appreciate if someone could help me out here. I feel like this is not hard but for some reason I can't think of the code for comparing the string with whats inside of the array and determining if its a unique word or not.

Don't use a vector - use a container that maintains uniqueness, like std::set or std::unordered_set. Just convert the string into lower case (using std::tolower) before you add it:
std::set<std::string> words;
std::string next;
while (file >> next) {
std::transform(next.begin(), next.end(), next.begin(), std::tolower);
words.insert(next);
}
std::cout << "We have " << words.size() << " unique words.\n"

Cannot help myself writing an answer that makes use of C++ beautiful library. I'd do it like this, with a std::set:
#include <algorithm>
#include <cctype>
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include <iostream>
int main()
{
std::ifstream ifile("test.txt");
std::istream_iterator<std::string> it{ifile};
std::set<std::string> uniques;
std::transform(it, {}, std::inserter(uniques, uniques.begin()),
[](std::string str) // make it lower case, so case doesn't matter anymore
{
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
});
// display the unique elements
for(auto&& elem: uniques)
std::cout << elem << " ";
// display the size:
std::cout << std::endl << uniques.size();
}
You can also define a new string type in which you change the char_traits so the comparison becomes case-insensitive. This is the code you'd need (much more lengthy than before, but you may end up reusing it), the char_traits overload is copy/pasted from cppreference.com:
#include <algorithm>
#include <cctype>
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include <iostream>
struct ci_char_traits : public std::char_traits<char> {
static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
static bool lt(char c1, char c2) { return toupper(c1) < toupper(c2); }
static int compare(const char* s1, const char* s2, size_t n) {
while ( n-- != 0 ) {
if ( toupper(*s1) < toupper(*s2) ) return -1;
if ( toupper(*s1) > toupper(*s2) ) return 1;
++s1; ++s2;
}
return 0;
}
static const char* find(const char* s, int n, char a) {
while ( n-- > 0 && toupper(*s) != toupper(a) ) {
++s;
}
return s;
}
};
using ci_string = std::basic_string<char, ci_char_traits>;
// need to overwrite the insertion and extraction operators,
// otherwise cannot use them with our new type
std::ostream& operator<<(std::ostream& os, const ci_string& str) {
return os.write(str.data(), str.size());
}
std::istream& operator>>(std::istream& os, ci_string& str) {
std::string tmp;
os >> tmp;
str.assign(tmp.data(), tmp.size());
return os;
}
int main()
{
std::ifstream ifile("test.txt");
std::istream_iterator<ci_string> it{ifile};
std::set<ci_string> uniques(it, {}); // that's it
// display the unique elements
for (auto && elem : uniques)
std::cout << elem << " ";
// display the size:
std::cout << std::endl << uniques.size();
}

C++ - Joining properties of a vector<custom_class> into a string

I have a std::vector<custom_class> which I would like to join into a comma separated string.
I found the code:
std::stringstream s;
std::string delimeter = ",";
copy(v.begin(), v.end(), std::ostream_iterator<int>(s, delimeter.c_str()));
which is great to join a vector of a single type, such as int. However, I would like to join only a certain property of my custom_class.
Can I use copy to copy and join only a certain property of my custom_class?
For example, my vector<custom_class> looks like:
v[0].A = 1
v[0].B = 2
v[0].C = 3
v[1].A = 1
v[1].B = 2
v[1].C = 3
v[2].A = 1
v[2].B = 2
v[2].C = 3
v[3].A = 1
v[3].B = 2
v[3].C = 3
And I'd like to use the std::copy to only join those values of property B (as an example) to return the value:
2,2,2,2
Is something like this possible without looping through v explicity?

You can use standard algorithm std::transform instead of algorithm std::copy
For example
std::transform( v.begin(), v.end(), std::ostream_iterator<int>( s, "," ),
[]( const custom_class &c ) { return c.B; } );
The other way is to use algorithm std::accumulate declared in header <numeric> and function std::to_string
For example
std::string s = std::accumulate( v.begin(), v.end(), std::string(),
[]( std::string &s, const custom_class &c )
{
return ( s += std::to_string( c.B ) + ',' );
} );

Joining a string is a little odd, since you need to treat empty containers specially. So it might be easiest to roll your own algorithm. Here's one that takes an extractor predicate argument:
#include <iterator>
#include <sstream>
#include <string>
#include <utility>
template <typename C, typename E>
std::string join(char const * delim, C const & c, E && e)
{
using std::begin;
using std::end;
auto it = begin(c), eit = end(c);
if (it == eit) { return {}; }
std::ostringstream os;
os << e(*it);
while (++it != eit) { os << delim << e(*it); }
return os.str();
}
Usage example:
#include <functional>
#include <iostream>
#include <vector>
int main()
{
std::vector<std::pair<int, int>> v { { 1, 4 }, { 2, 8 }, { 3, 19 }};
std::cout << join(" | ", v, std::mem_fn(&std::pair<int, int>::second)) << "\n";
}
If you just want to print out the elements themselves without applying an extractor, you can pass some kind of "identity" extractor, for example a suitable instance of std::forward. We can in fact bake this in as default arguments:
template <typename C,
typename E = typename C::value_type const &(&)(typename C::value_type const &)>
std::string join(char const * delim,
C const & c,
E && e = static_cast<typename C::value_type const &(&)(typename C::value_type const &)>(std::forward))
Now we can say e.g.:
std::vector<int> w { 1, 4, 2, 8, 3, 19 };
std::cout << join(", ", w) << "\n";

Map doesn't sort with regard to comparator c++

I'm trying to solve a issue where I'm inserting chars in to a map of type <char, int>. If the char already exists in the map I will increase the int by 1. I have created my own comparator for prioritizing the elements within the map. The priority doesn't work in the way I hope it would work since in the end the output doesn't follow the order.
#include <iostream>
#include <string>
#include <map>
#include <iterator>
using namespace std;
struct classcomp {
bool operator()(const int& a, const int& b) const {
return a < b;
}
};
bool isPresent(map<char,int,classcomp> mymap, char c){
return (mymap.find('b') != mymap.end());
}
int main(){
string input="dadbadddddddcabca";
map<char,int,classcomp> mymap;
char temp;
for(string::iterator it = input.begin(); it!=input.end(); ++it){
temp = *it;
if(!isPresent(mymap, temp))
mymap.insert(pair<char,int>(*it,1));
else
mymap[temp]++;
}
for (auto& x: mymap) {
cout << x.first << ": " << x.second << '\n';
}
return 0;
}
Gives the following output:
a: 4
b: 2
c: 2
d: 8

std::map is designed to be sorted by key, and providing comparator for type of value does not change anything. imagine you have std::map<char,char>, how would you think you can provide comparator for value (if it would be possible)?
So solution would be to use container that allows to sort by multiple keys like boost::multi_index or just create another map - reversed:
#include <iostream>
#include <string>
#include <map>
#include <iterator>
using namespace std;
int main(){
string input="dadbadddddddcabca";
map<char,int> mymap;
for(string::iterator it = input.begin(); it!=input.end(); ++it){
mymap[*it]++;
}
map<int,char> reversemap;
for (auto& x: mymap) {
reversemap.insert( make_pair( x.second, x.first ) );
}
for (auto& x: reversemap ) {
cout << x.first << ": " << x.second << '\n';
}
return 0;
}
Notice that your pre-check for element existance is completely redundant, std::map operator[] creates new element and initializes it, if it does not exists.
You may notice that in output you are missing some values now (though they are sorted), if that is not what you need, change reversemap type from map to multimap, which allows key duplicates.

The comparator is used to sort the chars and not the ints.
It is sorting the keys and seems to work just fine - a b c d.

map sorts its entries by key, not value. The char keys get silently cast to int in your classcomp::operator()

Why
mymap.find('b') != mymap.end());
and not
mymap.find(c) != mymap.end());

Maybe this is what you wanted
int main() {
std::string input="dadbadddddddcabca";
typedef std::map< char, int > map_t;
map_t mymap;
char temp;
for ( std::string::const_iterator it = input.begin(), e = input.end(); it != e; ++it ) {
temp = *it;
mymap[ temp ] = mymap[ temp ] + 1; // Hopufuly operator[] inserts zero initialized value, if can't find a key
}
typedef std::pair< typename map_t::key_type, typename map_t::mapped_type > pair_t;
std::vector< pair_t > sortedByValue;
sortedByValue.assign( mymap.begin(), mymap.end() );
std::sort( sortedByValue.begin(), sortedByValue.end(), []( const pair_t & left, const pair_t & right ) {
return left.second < right.second;
// change to
// return left.second > right.second;
// for descend order
} );
for ( const auto & x: sortedByValue ) {
std::cout << x.first << ": " << x.second << std::endl;
}
}
LWS link

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Case insensitive std::set of strings - c++

How do you have a case insensitive insertion Or search of a string in std::set? For example- std::set<std::string> s; s.insert("Hello"); s.insert("HELLO"); //not allowed, string already exists.

You need to define a custom comparator: struct InsensitiveCompare { bool operator() (const std::string& a, const std::string& b) const { return strcasecmp(a.c_str(), b.c_str()) < 0; } }; std::set<std::string, InsensitiveCompare> s; You may try stricmp or strcoll if strcasecmp is not available.

std::set offers the possibility of providing your own comparer (as do most std containers). You can then perform any type of comparison you like. Full example is available here

Related

Vector searching

Counting the appearance of words in a vector and listing those in a list, C++

Count the number of unique words (case does not matter for this count)

C++ - Joining properties of a vector<custom_class> into a string

Map doesn't sort with regard to comparator c++

Categories

Resources