Case-insensitive string comparison in C++ [closed]

Case-insensitive string comparison in C++ [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
What is the best way of doing case-insensitive string comparison in C++ without transforming a string to all uppercase or all lowercase?
Please indicate whether the methods are Unicode-friendly and how portable they are.

Boost includes a handy algorithm for this:
#include <boost/algorithm/string.hpp>
// Or, for fewer header dependencies:
//#include <boost/algorithm/string/predicate.hpp>
std::string str1 = "hello, world!";
std::string str2 = "HELLO, WORLD!";
if (boost::iequals(str1, str2))
{
// Strings are identical
}

The trouble with boost is that you have to link with and depend on boost. Not easy in some cases (e.g. android).
And using char_traits means all your comparisons are case insensitive, which isn't usually what you want.
This should suffice. It should be reasonably efficient. Doesn't handle unicode or anything though.
bool iequals(const string& a, const string& b)
{
unsigned int sz = a.size();
if (b.size() != sz)
return false;
for (unsigned int i = 0; i < sz; ++i)
if (tolower(a[i]) != tolower(b[i]))
return false;
return true;
}
Update: Bonus C++14 version (#include <algorithm>):
bool iequals(const string& a, const string& b)
{
return std::equal(a.begin(), a.end(),
b.begin(), b.end(),
[](char a, char b) {
return tolower(a) == tolower(b);
});
}
Update: C++20 version using std::ranges:
#include <ranges>
#include <algorithm>
#include <string>
bool iequals(const std::string_view& lhs, const std::string_view& rhs) {
auto to_lower{ std::ranges::views::transform(std::tolower) };
return std::ranges::equal(lhs | to_lower, rhs | to_lower);
}

Take advantage of the standard char_traits. Recall that a std::string is in fact a typedef for std::basic_string<char>, or more explicitly, std::basic_string<char, std::char_traits<char> >. The char_traits type describes how characters compare, how they copy, how they cast etc. All you need to do is typedef a new string over basic_string, and provide it with your own custom char_traits that compare case insensitively.
struct ci_char_traits : public char_traits<char> {
static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
static bool lt(char c1, char c2) { return toupper(c1) < toupper(c2); }
static int compare(const char* s1, const char* s2, size_t n) {
while( n-- != 0 ) {
if( toupper(*s1) < toupper(*s2) ) return -1;
if( toupper(*s1) > toupper(*s2) ) return 1;
++s1; ++s2;
}
return 0;
}
static const char* find(const char* s, int n, char a) {
while( n-- > 0 && toupper(*s) != toupper(a) ) {
++s;
}
return s;
}
};
typedef std::basic_string<char, ci_char_traits> ci_string;
The details are on Guru of The Week number 29.

If you are on a POSIX system, you can use strcasecmp. This function is not part of standard C, though, nor is it available on Windows. This will perform a case-insensitive comparison on 8-bit chars, so long as the locale is POSIX. If the locale is not POSIX, the results are undefined (so it might do a localized compare, or it might not). A wide-character equivalent is not available.
Failing that, a large number of historic C library implementations have the functions stricmp() and strnicmp(). Visual C++ on Windows renamed all of these by prefixing them with an underscore because they aren’t part of the ANSI standard, so on that system they’re called _stricmp or _strnicmp. Some libraries may also have wide-character or multibyte equivalent functions (typically named e.g. wcsicmp, mbcsicmp and so on).
C and C++ are both largely ignorant of internationalization issues, so there's no good solution to this problem, except to use a third-party library. Check out IBM ICU (International Components for Unicode) if you need a robust library for C/C++. ICU is for both Windows and Unix systems.

Are you talking about a dumb case insensitive compare or a full normalized Unicode compare?
A dumb compare will not find strings that might be the same but are not binary equal.
Example:
U212B (ANGSTROM SIGN)
U0041 (LATIN CAPITAL LETTER A) + U030A (COMBINING RING ABOVE)
U00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE).
Are all equivalent but they also have different binary representations.
That said, Unicode Normalization should be a mandatory read especially if you plan on supporting Hangul, Thaï and other asian languages.
Also, IBM pretty much patented most optimized Unicode algorithms and made them publicly available. They also maintain an implementation : IBM ICU

boost::iequals is not utf-8 compatible in the case of string.
You can use boost::locale.
comparator<char,collator_base::secondary> cmpr;
cout << (cmpr(str1, str2) ? "str1 < str2" : "str1 >= str2") << endl;
Primary -- ignore accents and character case, comparing base letters only. For example "facade" and "Façade" are the same.
Secondary -- ignore character case but consider accents. "facade" and "façade" are different but "Façade" and "façade" are the same.
Tertiary -- consider both case and accents: "Façade" and "façade" are different. Ignore punctuation.
Quaternary -- consider all case, accents, and punctuation. The words must be identical in terms of Unicode representation.
Identical -- as quaternary, but compare code points as well.

My first thought for a non-unicode version was to do something like this:
bool caseInsensitiveStringCompare(const string& str1, const string& str2) {
if (str1.size() != str2.size()) {
return false;
}
for (string::const_iterator c1 = str1.begin(), c2 = str2.begin(); c1 != str1.end(); ++c1, ++c2) {
if (tolower(static_cast<unsigned char>(*c1)) != tolower(static_cast<unsigned char>(*c2))) {
return false;
}
}
return true;
}

You can use strcasecmp on Unix, or stricmp on Windows.
One thing that hasn't been mentioned so far is that if you are using stl strings with these methods, it's useful to first compare the length of the two strings, since this information is already available to you in the string class. This could prevent doing the costly string comparison if the two strings you are comparing aren't even the same length in the first place.

I'm trying to cobble together a good answer from all the posts, so help me edit this:
Here is a method of doing this, although it does transforming the strings, and is not Unicode friendly, it should be portable which is a plus:
bool caseInsensitiveStringCompare( const std::string& str1, const std::string& str2 ) {
std::string str1Cpy( str1 );
std::string str2Cpy( str2 );
std::transform( str1Cpy.begin(), str1Cpy.end(), str1Cpy.begin(), ::tolower );
std::transform( str2Cpy.begin(), str2Cpy.end(), str2Cpy.begin(), ::tolower );
return ( str1Cpy == str2Cpy );
}
From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors.
To get a truly Unicode friendly implementation it appears you must go outside the std library. One good 3rd party library is the IBM ICU (International Components for Unicode)
Also boost::iequals provides a fairly good utility for doing this sort of comparison.

str1.size() == str2.size() && std::equal(str1.begin(), str1.end(), str2.begin(), [](auto a, auto b){return std::tolower(a)==std::tolower(b);})
You can use the above code in C++14 if you are not in a position to use boost. You have to use std::towlower for wide chars.

Short and nice. No other dependencies, than extended std C lib.
strcasecmp(str1.c_str(), str2.c_str()) == 0
returns true if str1 and str2 are equal.
strcasecmp may not exist, there could be analogs stricmp, strcmpi, etc.
Example code:
#include <iostream>
#include <string>
#include <string.h> //For strcasecmp(). Also could be found in <mem.h>
using namespace std;
/// Simple wrapper
inline bool str_ignoreCase_cmp(std::string const& s1, std::string const& s2) {
if(s1.length() != s2.length())
return false; // optimization since std::string holds length in variable.
return strcasecmp(s1.c_str(), s2.c_str()) == 0;
}
/// Function object - comparator
struct StringCaseInsensetiveCompare {
bool operator()(std::string const& s1, std::string const& s2) {
if(s1.length() != s2.length())
return false; // optimization since std::string holds length in variable.
return strcasecmp(s1.c_str(), s2.c_str()) == 0;
}
bool operator()(const char *s1, const char * s2){
return strcasecmp(s1,s2)==0;
}
};
/// Convert bool to string
inline char const* bool2str(bool b){ return b?"true":"false"; }
int main()
{
cout<< bool2str(strcasecmp("asd","AsD")==0) <<endl;
cout<< bool2str(strcasecmp(string{"aasd"}.c_str(),string{"AasD"}.c_str())==0) <<endl;
StringCaseInsensetiveCompare cmp;
cout<< bool2str(cmp("A","a")) <<endl;
cout<< bool2str(cmp(string{"Aaaa"},string{"aaaA"})) <<endl;
cout<< bool2str(str_ignoreCase_cmp(string{"Aaaa"},string{"aaaA"})) <<endl;
return 0;
}
Output:
true
true
true
true
true

See std::lexicographical_compare:
// lexicographical_compare example
#include <iostream> // std::cout, std::boolalpha
#include <algorithm> // std::lexicographical_compare
#include <cctype> // std::tolower
// a case-insensitive comparison function:
bool mycomp (char c1, char c2) {
return std::tolower(c1) < std::tolower(c2);
}
int main () {
char foo[] = "Apple";
char bar[] = "apartment";
std::cout << std::boolalpha;
std::cout << "Comparing foo and bar lexicographically (foo < bar):\n";
std::cout << "Using default comparison (operator<): ";
std::cout << std::lexicographical_compare(foo, foo + 5, bar, bar + 9);
std::cout << '\n';
std::cout << "Using mycomp as comparison object: ";
std::cout << std::lexicographical_compare(foo, foo + 5, bar, bar + 9, mycomp);
std::cout << '\n';
return 0;
}
Demo

Visual C++ string functions supporting unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx
the one you are probably looking for is _wcsnicmp

FYI, strcmp() and stricmp() are vulnerable to buffer overflow, since they just process until they hit a null terminator. It's safer to use _strncmp() and _strnicmp().

The Boost.String library has a lot of algorithms for doing case-insenstive comparisons and so on.
You could implement your own, but why bother when it's already been done?

For my basic case insensitive string comparison needs I prefer not to have to use an external library, nor do I want a separate string class with case insensitive traits that is incompatible with all my other strings.
So what I've come up with is this:
bool icasecmp(const string& l, const string& r)
{
return l.size() == r.size()
&& equal(l.cbegin(), l.cend(), r.cbegin(),
[](string::value_type l1, string::value_type r1)
{ return toupper(l1) == toupper(r1); });
}
bool icasecmp(const wstring& l, const wstring& r)
{
return l.size() == r.size()
&& equal(l.cbegin(), l.cend(), r.cbegin(),
[](wstring::value_type l1, wstring::value_type r1)
{ return towupper(l1) == towupper(r1); });
}
A simple function with one overload for char and another for whar_t. Doesn't use anything non-standard so should be fine on any platform.
The equality comparison won't consider issues like variable length encoding and Unicode normalization, but basic_string has no support for that that I'm aware of anyway and it isn't normally an issue.
In cases where more sophisticated lexicographical manipulation of text is required, then you simply have to use a third party library like Boost, which is to be expected.

Doing this without using Boost can be done by getting the C string pointer with c_str() and using strcasecmp:
std::string str1 ="aBcD";
std::string str2 = "AbCd";;
if (strcasecmp(str1.c_str(), str2.c_str()) == 0)
{
//case insensitive equal
}

Assuming you are looking for a method and not a magic function that already exists, there is frankly no better way. We could all write code snippets with clever tricks for limited character sets, but at the end of the day at somepoint you have to convert the characters.
The best approach for this conversion is to do so prior to the comparison. This allows you a good deal of flexibility when it comes to encoding schemes, which your actual comparison operator should be ignorant of.
You can of course 'hide' this conversion behind your own string function or class, but you still need to convert the strings prior to comparison.

I wrote a case-insensitive version of char_traits for use with std::basic_string in order to generate a std::string that is not case-sensitive when doing comparisons, searches, etc using the built-in std::basic_string member functions.
So in other words, I wanted to do something like this.
std::string a = "Hello, World!";
std::string b = "hello, world!";
assert( a == b );
...which std::string can't handle. Here's the usage of my new char_traits:
std::istring a = "Hello, World!";
std::istring b = "hello, world!";
assert( a == b );
...and here's the implementation:
/* ---
Case-Insensitive char_traits for std::string's
Use:
To declare a std::string which preserves case but ignores case in comparisons & search,
use the following syntax:
std::basic_string<char, char_traits_nocase<char> > noCaseString;
A typedef is declared below which simplifies this use for chars:
typedef std::basic_string<char, char_traits_nocase<char> > istring;
--- */
template<class C>
struct char_traits_nocase : public std::char_traits<C>
{
static bool eq( const C& c1, const C& c2 )
{
return ::toupper(c1) == ::toupper(c2);
}
static bool lt( const C& c1, const C& c2 )
{
return ::toupper(c1) < ::toupper(c2);
}
static int compare( const C* s1, const C* s2, size_t N )
{
return _strnicmp(s1, s2, N);
}
static const char* find( const C* s, size_t N, const C& a )
{
for( size_t i=0 ; i<N ; ++i )
{
if( ::toupper(s[i]) == ::toupper(a) )
return s+i ;
}
return 0 ;
}
static bool eq_int_type( const int_type& c1, const int_type& c2 )
{
return ::toupper(c1) == ::toupper(c2) ;
}
};
template<>
struct char_traits_nocase<wchar_t> : public std::char_traits<wchar_t>
{
static bool eq( const wchar_t& c1, const wchar_t& c2 )
{
return ::towupper(c1) == ::towupper(c2);
}
static bool lt( const wchar_t& c1, const wchar_t& c2 )
{
return ::towupper(c1) < ::towupper(c2);
}
static int compare( const wchar_t* s1, const wchar_t* s2, size_t N )
{
return _wcsnicmp(s1, s2, N);
}
static const wchar_t* find( const wchar_t* s, size_t N, const wchar_t& a )
{
for( size_t i=0 ; i<N ; ++i )
{
if( ::towupper(s[i]) == ::towupper(a) )
return s+i ;
}
return 0 ;
}
static bool eq_int_type( const int_type& c1, const int_type& c2 )
{
return ::towupper(c1) == ::towupper(c2) ;
}
};
typedef std::basic_string<char, char_traits_nocase<char> > istring;
typedef std::basic_string<wchar_t, char_traits_nocase<wchar_t> > iwstring;

I've had good experience using the International Components for Unicode libraries - they're extremely powerful, and provide methods for conversion, locale support, date and time rendering, case mapping (which you don't seem to want), and collation, which includes case- and accent-insensitive comparison (and more). I've only used the C++ version of the libraries, but they appear to have a Java version as well.
Methods exist to perform normalized compares as referred to by #Coincoin, and can even account for locale - for example (and this a sorting example, not strictly equality), traditionally in Spanish (in Spain), the letter combination "ll" sorts between "l" and "m", so "lz" < "ll" < "ma".

Just use strcmp() for case sensitive and strcmpi() or stricmp() for case insensitive comparison. Which are both in the header file <string.h>
format:
int strcmp(const char*,const char*); //for case sensitive
int strcmpi(const char*,const char*); //for case insensitive
Usage:
string a="apple",b="ApPlE",c="ball";
if(strcmpi(a.c_str(),b.c_str())==0) //(if it is a match it will return 0)
cout<<a<<" and "<<b<<" are the same"<<"\n";
if(strcmpi(a.c_str(),b.c_str()<0)
cout<<a[0]<<" comes before ball "<<b[0]<<", so "<<a<<" comes before "<<b;
Output
apple and ApPlE are the same
a comes before b, so apple comes before ball

Late to the party, but here is a variant that uses std::locale, and thus correctly handles Turkish:
auto tolower = std::bind1st(
std::mem_fun(
&std::ctype<char>::tolower),
&std::use_facet<std::ctype<char> >(
std::locale()));
gives you a functor that uses the active locale to convert characters to lowercase, which you can then use via std::transform to generate lower-case strings:
std::string left = "fOo";
transform(left.begin(), left.end(), left.begin(), tolower);
This also works for wchar_t based strings.

Just a note on whatever method you finally choose, if that method happens to include the use of strcmp that some answers suggest:
strcmp doesn't work with Unicode data in general. In general, it doesn't even work with byte-based Unicode encodings, such as utf-8, since strcmp only makes byte-per-byte comparisons and Unicode code points encoded in utf-8 can take more than 1 byte. The only specific Unicode case strcmp properly handle is when a string encoded with a byte-based encoding contains only code points below U+00FF - then the byte-per-byte comparison is enough.

As of early 2013, the ICU project, maintained by IBM, is a pretty good answer to this.
http://site.icu-project.org/
ICU is a "complete, portable Unicode library that closely tracks industry standards." For the specific problem of string comparison, the Collation object does what you want.
The Mozilla Project adopted ICU for internationalization in Firefox in mid-2012; you can track the engineering discussion, including issues of build systems and data file size, here:
https://groups.google.com/forum/#!topic/mozilla.dev.platform/sVVpS2sKODw
https://bugzilla.mozilla.org/show_bug.cgi?id=724529 (tracker)
https://bugzilla.mozilla.org/show_bug.cgi?id=724531 (build system)

Looks like above solutions aren't using compare method and implementing total again so here is my solution and hope it works for you (It's working fine).
#include<iostream>
#include<cstring>
#include<cmath>
using namespace std;
string tolow(string a)
{
for(unsigned int i=0;i<a.length();i++)
{
a[i]=tolower(a[i]);
}
return a;
}
int main()
{
string str1,str2;
cin>>str1>>str2;
int temp=tolow(str1).compare(tolow(str2));
if(temp>0)
cout<<1;
else if(temp==0)
cout<<0;
else
cout<<-1;
}

A simple way to compare two string in c++ (tested for windows) is using _stricmp
// Case insensitive (could use equivalent _stricmp)
result = _stricmp( string1, string2 );
If you are looking to use with std::string, an example:
std::string s1 = string("Hello");
if ( _stricmp(s1.c_str(), "HELLO") == 0)
std::cout << "The string are equals.";
For more information here: https://msdn.microsoft.com/it-it/library/e0z9k731.aspx

If you don't want to use Boost library then here is solution to it using only C++ standard io header.
#include <iostream>
struct iequal
{
bool operator()(int c1, int c2) const
{
// case insensitive comparison of two characters.
return std::toupper(c1) == std::toupper(c2);
}
};
bool iequals(const std::string& str1, const std::string& str2)
{
// use std::equal() to compare range of characters using the functor above.
return std::equal(str1.begin(), str1.end(), str2.begin(), iequal());
}
int main(void)
{
std::string str_1 = "HELLO";
std::string str_2 = "hello";
if(iequals(str_1,str_2))
{
std::cout<<"String are equal"<<std::endl;
}
else
{
std::cout<<"String are not equal"<<std::endl;
}
return 0;
}

If you have to compare a source string more often with other strings one elegant solution is to use regex.
std::wstring first = L"Test";
std::wstring second = L"TEST";
std::wregex pattern(first, std::wregex::icase);
bool isEqual = std::regex_match(second, pattern);

bool insensitive_c_compare(char A, char B){
static char mid_c = ('Z' + 'a') / 2 + 'Z';
static char up2lo = 'A' - 'a'; /// the offset between upper and lowers
if ('a' >= A and A >= 'z' or 'A' >= A and 'Z' >= A)
if ('a' >= B and B >= 'z' or 'A' >= B and 'Z' >= B)
/// check that the character is infact a letter
/// (trying to turn a 3 into an E would not be pretty!)
{
if (A > mid_c and B > mid_c or A < mid_c and B < mid_c)
{
return A == B;
}
else
{
if (A > mid_c)
A = A - 'a' + 'A';
if (B > mid_c)/// convert all uppercase letters to a lowercase ones
B = B - 'a' + 'A';
/// this could be changed to B = B + up2lo;
return A == B;
}
}
}
this could probably be made much more efficient, but here is a bulky version with all its bits bare.
not all that portable, but works well with whatever is on my computer (no idea, I am of pictures not words)

An easy way to compare strings that are only different by lowercase and capitalized characters is to do an ascii comparison. All capital and lowercase letters differ by 32 bits in the ascii table, using this information we have the following...
for( int i = 0; i < string2.length(); i++)
{
if (string1[i] == string2[i] || int(string1[i]) == int(string2[j])+32 ||int(string1[i]) == int(string2[i])-32)
{
count++;
continue;
}
else
{
break;
}
if(count == string2.length())
{
//then we have a match
}
}

Related

C++ How can I use std::string::find to be case insensitive? [duplicate]

I am using std::string's find() method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp() but there doesn't seem to be a stristr().
I have found various answers and most suggest using Boost which is not an option in my case. Additionally, I need to support std::wstring/wchar_t. Any ideas?

You could use std::search with a custom predicate.
#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;
// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
my_equal( const std::locale& loc ) : loc_(loc) {}
bool operator()(charT ch1, charT ch2) {
return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
}
private:
const std::locale& loc_;
};
// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
typename T::const_iterator it = std::search( str1.begin(), str1.end(),
str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
if ( it != str1.end() ) return it - str1.begin();
else return -1; // not found
}
int main(int arc, char *argv[])
{
// string test
std::string str1 = "FIRST HELLO";
std::string str2 = "hello";
int f1 = ci_find_substr( str1, str2 );
// wstring test
std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
std::wstring wstr2 = L"привет";
int f2 = ci_find_substr( wstr1, wstr2 );
return 0;
}

The new C++11 style:
#include <algorithm>
#include <string>
#include <cctype>
/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
auto it = std::search(
strHaystack.begin(), strHaystack.end(),
strNeedle.begin(), strNeedle.end(),
[](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
);
return (it != strHaystack.end() );
}
Explanation of the std::search can be found on cplusplus.com.

why not use Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
bool Foo()
{
//case insensitive find
std::string str("Hello");
boost::iterator_range<std::string::const_iterator> rng;
rng = boost::ifind_first(str, std::string("EL"));
return rng;
}

Why not just convert both strings to lowercase before you call find()?
tolower
Notice:
Inefficient for long strings.
Beware of internationalization issues.

Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.
Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).
std::string upper_string(const std::string& str)
{
string upper;
transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
return upper;
}
std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
return upper(str).find(upper(substr) );
}
This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.
Additionally, I need to support
std::wstring/wchar_t. Any ideas?
tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).
[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.

If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator class.

Also make sense to provide Boost version: This will modify original strings.
#include <boost/algorithm/string.hpp>
string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)
if (str1.find(str2) != std::string::npos)
{
// str1 contains str2
}
or using perfect boost xpression library
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
cout << word << " found!" << endl;
}
In this example you should pay attention that your search word don't have any regex special characters.

#include <iostream>
using namespace std;
template <typename charT>
struct ichar {
operator charT() const { return toupper(x); }
charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }
int main()
{
string s = "The STRING";
wstring ws = L"The WSTRING";
cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr")) << endl;
}
A little bit dirty, but short & fast.

I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.
It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.
/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
/* Fail fast if the base string was smaller than what we're looking for */
if (subString.length() > baseString.length())
return false;
auto it = std::search(
baseString.begin(), baseString.end(), subString.begin(), subString.end(),
[](char ch1, char ch2)
{
return std::toupper(ch1) == std::toupper(ch2);
}
);
if(it == baseString.end())
return false;
size_t match_start_offset = it - baseString.begin();
std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);
/* Typical special characters and whitespace to split the substring up. */
size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$##!~`");
/* Pass fast if the remainder of the base string where
the match started is the same length as the substring. */
if (match_end_pos == std::wstring::npos && match_start.length() == subString.length())
return true;
std::wstring extracted_match = match_start.substr(0, match_end_pos);
return (extracted_match.length() == subString.length());
}

The Most Efficient Way
Simple and Fast.
Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons. (glic)
#include <cstring>
#include <string>
#include <iostream>
int main(void) {
std::string s1{"abc de fGH"};
std::string s2{"DE"};
auto pos = strcasestr(s1.c_str(), s2.c_str());
if(pos != nullptr)
std::cout << pos - s1.c_str() << std::endl;
return 0;
}

wxWidgets has a very rich string API
wxString
it can be done with (using the case conversion way)
int Contains(const wxString& SpecProgramName, const wxString& str)
{
wxString SpecProgramName_ = SpecProgramName.Upper();
wxString str_ = str.Upper();
int found = SpecProgramName.Find(str_);
if (wxNOT_FOUND == found)
{
return 0;
}
return 1;
}

How to check if a string is all lowercase and alphanumerics?

Is there a method that checks for these cases? Or do I need to parse each letter in the string, and check if it's lower case (letter) and is a number/letter?

You can use islower(), isalnum() to check for those conditions for each character. There is no string-level function to do this, so you'll have to write your own.

Assuming that the "C" locale is acceptable (or swap in a different set of characters for criteria), use find_first_not_of()
#include <string>
bool testString(const std::string& str)
{
std::string criteria("abcdefghijklmnopqrstuvwxyz0123456789");
return (std::string::npos == str.find_first_not_of(criteria);
}

It's not very well known, but a locale actually does have functions to determine characteristics of entire strings at a time. Specifically, the ctype facet of a locale has a scan_is and a scan_not that scan for the first character that fits a specified mask (alpha, numeric, alphanumeric, lower, upper, punctuation, space, hex digit, etc.), or the first that doesn't fit it, respectively. Other than that, they work a bit like std::find_if, returning whatever you passed as the "end" to signal failure, otherwise returning a pointer to the first item in the string that doesn't fit what you asked for.
Here's a quick sample:
#include <locale>
#include <iostream>
#include <iomanip>
int main() {
std::string inputs[] = {
"alllower",
"1234",
"lower132",
"including a space"
};
// We'll use the "classic" (C) locale, but this works with any
std::locale loc(std::locale::classic());
// A mask specifying the characters to search for:
std::ctype_base::mask m = std::ctype_base::lower | std::ctype_base::digit;
for (int i=0; i<4; i++) {
char const *pos;
char const *b = &*inputs[i].begin();
char const *e = &*inputs[i].end();
std::cout << "Input: " << std::setw(20) << inputs[i] << ":\t";
// finally, call the actual function:
if ((pos=std::use_facet<std::ctype<char> >(loc).scan_not(m, b, e)) == e)
std::cout << "All characters match mask\n";
else
std::cout << "First non-matching character = \"" << *pos << "\"\n";
}
return 0;
}
I suspect most people will prefer to use std::find_if though -- using it is nearly the same, but can be generalized to many more situations quite easily. Even though this has much narrower applicability, it's not really a lot easier to user (though I suppose if you're scanning large chunks of text, it might well be at least a little faster).

You could use the tolower & strcmp to compare if the original_string and the tolowered string.And do the numbers individually per character.
(OR) Do both per character as below.
#include <algorithm>
static inline bool is_not_alphanum_lower(char c)
{
return (!isalnum(c) || !islower(c));
}
bool string_is_valid(const std::string &str)
{
return find_if(str.begin(), str.end(), is_not_alphanum_lower) == str.end();
}
I used the some info from:
Determine if a string contains only alphanumeric characters (or a space)

Just use std::all_of
bool lowerAlnum = std::all_of(str.cbegin(), str.cend(), [](const char c){
return isdigit(c) || islower(c);
});
If you don't care about locale (i.e. the input is pure 7-bit ASCII) then the condition can be optimized into
[](const char c){ return ('0' <= c && c <= '9') || ('a' <= c && c <= 'z'); }

If your strings contain ASCII-encoded text and you like to write your own functions (like I do) then you can use this:
bool is_lower_alphanumeric(const string& txt)
{
for(char c : txt)
{
if (!((c >= '0' and c <= '9') or (c >= 'a' and c <= 'z'))) return false;
}
return true;
}

Case insensitive std::string.find()

I am using std::string's find() method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp() but there doesn't seem to be a stristr().
I have found various answers and most suggest using Boost which is not an option in my case. Additionally, I need to support std::wstring/wchar_t. Any ideas?

You could use std::search with a custom predicate.
#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;
// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
my_equal( const std::locale& loc ) : loc_(loc) {}
bool operator()(charT ch1, charT ch2) {
return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
}
private:
const std::locale& loc_;
};
// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
typename T::const_iterator it = std::search( str1.begin(), str1.end(),
str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
if ( it != str1.end() ) return it - str1.begin();
else return -1; // not found
}
int main(int arc, char *argv[])
{
// string test
std::string str1 = "FIRST HELLO";
std::string str2 = "hello";
int f1 = ci_find_substr( str1, str2 );
// wstring test
std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
std::wstring wstr2 = L"привет";
int f2 = ci_find_substr( wstr1, wstr2 );
return 0;
}

The new C++11 style:
#include <algorithm>
#include <string>
#include <cctype>
/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
auto it = std::search(
strHaystack.begin(), strHaystack.end(),
strNeedle.begin(), strNeedle.end(),
[](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
);
return (it != strHaystack.end() );
}
Explanation of the std::search can be found on cplusplus.com.

why not use Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
bool Foo()
{
//case insensitive find
std::string str("Hello");
boost::iterator_range<std::string::const_iterator> rng;
rng = boost::ifind_first(str, std::string("EL"));
return rng;
}

Why not just convert both strings to lowercase before you call find()?
tolower
Notice:
Inefficient for long strings.
Beware of internationalization issues.

Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.
Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).
std::string upper_string(const std::string& str)
{
string upper;
transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
return upper;
}
std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
return upper(str).find(upper(substr) );
}
This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.
Additionally, I need to support
std::wstring/wchar_t. Any ideas?
tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).
[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.

If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator class.

Also make sense to provide Boost version: This will modify original strings.
#include <boost/algorithm/string.hpp>
string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)
if (str1.find(str2) != std::string::npos)
{
// str1 contains str2
}
or using perfect boost xpression library
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
cout << word << " found!" << endl;
}
In this example you should pay attention that your search word don't have any regex special characters.

#include <iostream>
using namespace std;
template <typename charT>
struct ichar {
operator charT() const { return toupper(x); }
charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }
int main()
{
string s = "The STRING";
wstring ws = L"The WSTRING";
cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr")) << endl;
}
A little bit dirty, but short & fast.

I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.
It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.
/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
/* Fail fast if the base string was smaller than what we're looking for */
if (subString.length() > baseString.length())
return false;
auto it = std::search(
baseString.begin(), baseString.end(), subString.begin(), subString.end(),
[](char ch1, char ch2)
{
return std::toupper(ch1) == std::toupper(ch2);
}
);
if(it == baseString.end())
return false;
size_t match_start_offset = it - baseString.begin();
std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);
/* Typical special characters and whitespace to split the substring up. */
size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$##!~`");
/* Pass fast if the remainder of the base string where
the match started is the same length as the substring. */
if (match_end_pos == std::wstring::npos && match_start.length() == subString.length())
return true;
std::wstring extracted_match = match_start.substr(0, match_end_pos);
return (extracted_match.length() == subString.length());
}

The Most Efficient Way
Simple and Fast.
Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons. (glic)
#include <cstring>
#include <string>
#include <iostream>
int main(void) {
std::string s1{"abc de fGH"};
std::string s2{"DE"};
auto pos = strcasestr(s1.c_str(), s2.c_str());
if(pos != nullptr)
std::cout << pos - s1.c_str() << std::endl;
return 0;
}

wxWidgets has a very rich string API
wxString
it can be done with (using the case conversion way)
int Contains(const wxString& SpecProgramName, const wxString& str)
{
wxString SpecProgramName_ = SpecProgramName.Upper();
wxString str_ = str.Upper();
int found = SpecProgramName.Find(str_);
if (wxNOT_FOUND == found)
{
return 0;
}
return 1;
}

How do I check if a C++ string is an int?

When I use getline, I would input a bunch of strings or numbers, but I only want the while loop to output the "word" if it is not a number.
So is there any way to check if "word" is a number or not? I know I could use atoi() for
C-strings but how about for strings of the string class?
int main () {
stringstream ss (stringstream::in | stringstream::out);
string word;
string str;
getline(cin,str);
ss<<str;
while(ss>>word)
{
//if( )
cout<<word<<endl;
}
}

Another version...
Use strtol, wrapping it inside a simple function to hide its complexity :
inline bool isInteger(const std::string & s)
{
if(s.empty() || ((!isdigit(s[0])) && (s[0] != '-') && (s[0] != '+'))) return false;
char * p;
strtol(s.c_str(), &p, 10);
return (*p == 0);
}
Why strtol ?
As far as I love C++, sometimes the C API is the best answer as far as I am concerned:
using exceptions is overkill for a test that is authorized to fail
the temporary stream object creation by the lexical cast is overkill and over-inefficient when the C standard library has a little known dedicated function that does the job.
How does it work ?
strtol seems quite raw at first glance, so an explanation will make the code simpler to read :
strtol will parse the string, stopping at the first character that cannot be considered part of an integer. If you provide p (as I did above), it sets p right at this first non-integer character.
My reasoning is that if p is not set to the end of the string (the 0 character), then there is a non-integer character in the string s, meaning s is not a correct integer.
The first tests are there to eliminate corner cases (leading spaces, empty string, etc.).
This function should be, of course, customized to your needs (are leading spaces an error? etc.).
Sources :
See the description of strtol at: http://en.cppreference.com/w/cpp/string/byte/strtol.
See, too, the description of strtol's sister functions (strtod, strtoul, etc.).

The accepted answer will give a false positive if the input is a number plus text, because "stol" will convert the firsts digits and ignore the rest.
I like the following version the most, since it's a nice one-liner that doesn't need to define a function and you can just copy and paste wherever you need it.
#include <string>
...
std::string s;
bool has_only_digits = (s.find_first_not_of( "0123456789" ) == std::string::npos);
EDIT: if you like this implementation but you do want to use it as a function, then this should do:
bool has_only_digits(const string s){
return s.find_first_not_of( "0123456789" ) == string::npos;
}

You might try boost::lexical_cast. It throws an bad_lexical_cast exception if it fails.
In your case:
int number;
try
{
number = boost::lexical_cast<int>(word);
}
catch(boost::bad_lexical_cast& e)
{
std::cout << word << "isn't a number" << std::endl;
}

If you're just checking if word is a number, that's not too hard:
#include <ctype.h>
...
string word;
bool isNumber = true;
for(string::const_iterator k = word.begin(); k != word.end(); ++k)
isNumber &&= isdigit(*k);
Optimize as desired.

Use the all-powerful C stdio/string functions:
int dummy_int;
int scan_value = std::sscanf( some_string.c_str(), "%d", &dummy_int);
if (scan_value == 0)
// does not start with integer
else
// starts with integer

You can use boost::lexical_cast, as suggested, but if you have any prior knowledge about the strings (i.e. that if a string contains an integer literal it won't have any leading space, or that integers are never written with exponents), then rolling your own function should be both more efficient, and not particularly difficult.

Ok, the way I see it you have 3 options.
1: If you simply wish to check whether the number is an integer, and don't care about converting it, but simply wish to keep it as a string and don't care about potential overflows, checking whether it matches a regex for an integer would be ideal here.
2: You can use boost::lexical_cast and then catch a potential boost::bad_lexical_cast exception to see if the conversion failed. This would work well if you can use boost and if failing the conversion is an exceptional condition.
3: Roll your own function similar to lexical_cast that checks the conversion and returns true/false depending on whether it's successful or not. This would work in case 1 & 2 doesn't fit your requirements.

Here is another solution.
try
{
(void) std::stoi(myString); //cast to void to ignore the return value
//Success! myString contained an integer
}
catch (const std::logic_error &e)
{
//Failure! myString did not contain an integer
}

Since C++11 you can make use of std::all_of and ::isdigit:
#include <algorithm>
#include <cctype>
#include <iostream>
#include <string_view>
int main([[maybe_unused]] int argc, [[maybe_unused]] char *argv[])
{
auto isInt = [](std::string_view str) -> bool {
return std::all_of(str.cbegin(), str.cend(), ::isdigit);
};
for(auto &test : {"abc", "123abc", "123.0", "+123", "-123", "123"}) {
std::cout << "Is '" << test << "' numeric? "
<< (isInt(test) ? "true" : "false") << std::endl;
}
return 0;
}
Check out the result with Godbolt.

template <typename T>
const T to(const string& sval)
{
T val;
stringstream ss;
ss << sval;
ss >> val;
if(ss.fail())
throw runtime_error((string)typeid(T).name() + " type wanted: " + sval);
return val;
}
And then you can use it like that:
double d = to<double>("4.3");
or
int i = to<int>("4123");

I have modified paercebal's method to meet my needs:
typedef std::string String;
bool isInt(const String& s, int base){
if(s.empty() || std::isspace(s[0])) return false ;
char * p ;
strtol(s.c_str(), &p, base) ;
return (*p == 0) ;
}
bool isPositiveInt(const String& s, int base){
if(s.empty() || std::isspace(s[0]) || s[0]=='-') return false ;
char * p ;
strtol(s.c_str(), &p, base) ;
return (*p == 0) ;
}
bool isNegativeInt(const String& s, int base){
if(s.empty() || std::isspace(s[0]) || s[0]!='-') return false ;
char * p ;
strtol(s.c_str(), &p, base) ;
return (*p == 0) ;
}
Note:
You can check for various bases (binary, oct, hex and others)
Make sure you don't pass 1, negative value or value >36 as base.
If you pass 0 as the base, it will auto detect the base i.e for a string starting with 0x will be treated as hex and string starting with 0 will be treated as oct. The characters are case-insensitive.
Any white space in string will make it return false.

Comparing wstring with ignoring the case

I am sure this would have been asked before but couldn't find it. Is there any built in (i.e. either using std::wstring's methods or the algorithms) way to case insensitive comparison the two wstring objects?

If you don't mind being tied to Microsoft implementation you can use this function defined in <string.h>
int _wcsnicmp(
const wchar_t *string1,
const wchar_t *string2,
size_t count
);
But if you want best performance/compatibility/functionality ratio you will probably have to look at boost library (part of it is stl anyway). Simple example (taken from different answer to different question):
#include <boost/algorithm/string.hpp>
std::wstring wstr1 = L"hello, world!";
std::wstring wstr2 = L"HELLO, WORLD!";
if (boost::iequals(wstr1, wstr2))
{
// Strings are identical
}

Using the standard library:
bool comparei(wstring stringA , wstring stringB)
{
transform(stringA.begin(), stringA.end(), stringA.begin(), toupper);
transform(stringB.begin(), stringB.end(), stringB.begin(), toupper);
return (stringA == stringB);
}
wstring stringA = "foo";
wstring stringB = "FOO";
if(comparei(stringA , stringB))
{
// strings match
}

You can use std::tolower() to convert the strings to lowercase or use the function wcscasecmp to do a case insensitive compare on the c_str()'s.
Here is a comparison functor you can use directly as well:
struct ci_less_w
{
bool operator() (const std::wstring & s1, const std::wstring & s2) const
{
#ifndef _WIN32
return wcscasecmp(s1.c_str(), s2.c_str()) < 0;
#else
return _wcsicmp(s1.c_str(), s2.c_str()) < 0;
#endif
}
};

Talking about English right ?! though I would go with my lovely Boost :)
bool isequal(const std::wstring& first, const std::wstring& second)
{
if(first.size() != second.size())
return false;
for(std::wstring::size_type i = 0; i < first.size(); i++)
{
if(first[i] != second[i] && first[i] != (second[i] ^ 32))
return false;
}
return true;
}

You could use the boost string algorithms library. Its a header only library as long as you're not going to do regex. So you can do that very easily.
http://www.boost.org/doc/libs/1_39_0/doc/html/string_algo.html

#include <algorithm>
#include <string>
#include <cstdio>
bool icase_wchar_cmp(wchar_t a, wchar_t b)
{
return std::toupper(a) == std::toupper(b);
}
bool icase_cmp(std::wstring const& s1, std::wstring const& s2)
{
return (s1.size() == s2.size()) &&
std::equal(s1.begin(), s1.end(), s2.begin(),
icase_wchar_cmp);
}
int main(int argc, char** argv)
{
using namespace std;
wstring str1(L"Hello"), str2(L"hello");
wprintf(L"%S and %S are %S\n", str1.c_str(), str2.c_str(),
icase_cmp(str1,str2) ? L"equal" : L"not equal");
return 0;
}

If you need that the string will always make case insensitive comparation (when using operators == or !=), then a possible elegant solution is to redefine char_traits::compare method.
Define your own structure. Example
struct my_wchar_traits: public std::char_traits< wchar_t>
{
static int compare( const char_type* op1, const char_type* op2, std::size_t num)
{
// Implementation here... any of the previous responses might help...
}
};
Then, define your own case insensitive string:
typedef std::basic_string< wchar_t, my_wchar_traits> my_wstring;

You can use mismatch() or lexicographical_compare(). This is suggested by Scott Meyers in Effecitve STL, item 35.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Case-insensitive string comparison in C++ [closed] - c++

str1.size() == str2.size() && std::equal(str1.begin(), str1.end(), str2.begin(), [](auto a, auto b){return std::tolower(a)==std::tolower(b);}) You can use the above code in C++14 if you are not in a position to use boost. You have to use std::towlower for wide chars.

Visual C++ string functions supporting unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx the one you are probably looking for is _wcsnicmp

FYI, strcmp() and stricmp() are vulnerable to buffer overflow, since they just process until they hit a null terminator. It's safer to use _strncmp() and _strnicmp().

The Boost.String library has a lot of algorithms for doing case-insenstive comparisons and so on. You could implement your own, but why bother when it's already been done?

Doing this without using Boost can be done by getting the C string pointer with c_str() and using strcasecmp: std::string str1 ="aBcD"; std::string str2 = "AbCd";; if (strcasecmp(str1.c_str(), str2.c_str()) == 0) { //case insensitive equal }

If you have to compare a source string more often with other strings one elegant solution is to use regex. std::wstring first = L"Test"; std::wstring second = L"TEST"; std::wregex pattern(first, std::wregex::icase); bool isEqual = std::regex_match(second, pattern);

Related

C++ How can I use std::string::find to be case insensitive? [duplicate]

How to check if a string is all lowercase and alphanumerics?

Case insensitive std::string.find()

How do I check if a C++ string is an int?

Comparing wstring with ignoring the case

Categories

Resources