portable usage of boost::locale::transform - c++

I implement a search of a substring in strings and i would like to make this search "accent-nutral" or it might be called rough - if i start search "aba" in "rábano" i am supposed to succeed.
in Find substring in string using locale there is a working answer:
#include <locale>
#include <string>
#include <boost/locale.hpp>
std::string NormalizeString(const std::string & input)
{
std::locale loc = boost::locale::generator()("");
const boost::locale::collator<char>& collator = std::use_facet<boost::locale::collator<char> >(loc);
std::string result = collator.transform(boost::locale::collator_base::primary, input);
return result;
}
The only issue with this solution - transform adds several bytes to the end of string. in my case it is "\x1\x1\x1\x1\x0\x0\x0". Four bytes with 1 and several zero-bytes.
Of course it is easy to erase these bytes but i would not like to rely on such subtle implementation details. (The code is supposed to be cross-platform)
Is there a more reliable way?

As #R. Martinho Fernandes said it looks impossible to implement such a search with boost.
I found the solution in chrome sources. it uses ICU.
// This class is for speeding up multiple StringSearchIgnoringCaseAndAccents()
// with the same |find_this| argument. |find_this| is passed as the constructor
// argument, and precomputation for searching is done only at that timing.
class CStringSearchIgnoringCaseAndAccents
{
public:
explicit CStringSearchIgnoringCaseAndAccents(std::u16string find_this);
~CStringSearchIgnoringCaseAndAccents();
// Returns true if |in_this| contains |find_this|. If |match_index| or
// |match_length| are non-NULL, they are assigned the start position and total
// length of the match.
bool SearchIn(const std::u16string& in_this, size_t* match_index = nullptr, size_t* match_length = nullptr);
private:
std::u16string _find_this;
UStringSearch* _search_handle;
};
CStringSearchIgnoringCaseAndAccents::CStringSearchIgnoringCaseAndAccents(std::u16string find_this) :
_find_this(std::move(find_this)),
_search_handle(nullptr)
{
// usearch_open requires a valid string argument to be searched, even if we
// want to set it by usearch_setText afterwards. So, supplying a dummy text.
const std::u16string& dummy = _find_this;
UErrorCode status = U_ZERO_ERROR;
_search_handle = usearch_open((const UChar*)_find_this.data(), _find_this.size(),
(const UChar*)dummy.data(), dummy.size(), uloc_getDefault(), NULL, &status);
if (U_SUCCESS(status)) {
UCollator* collator = usearch_getCollator(_search_handle);
ucol_setStrength(collator, UCOL_PRIMARY);
usearch_reset(_search_handle);
}
}
CStringSearchIgnoringCaseAndAccents::~CStringSearchIgnoringCaseAndAccents()
{
if (_search_handle) usearch_close(_search_handle);
}
bool CStringSearchIgnoringCaseAndAccents::SearchIn(const std::u16string& in_this, size_t* match_index, size_t* match_length)
{
UErrorCode status = U_ZERO_ERROR;
usearch_setText(_search_handle, (const UChar*) in_this.data(), in_this.size(), &status);
// Default to basic substring search if usearch fails. According to
// http://icu-project.org/apiref/icu4c/usearch_8h.html, usearch_open will fail
// if either |find_this| or |in_this| are empty. In either case basic
// substring search will give the correct return value.
if (!U_SUCCESS(status)) {
size_t index = in_this.find(_find_this);
if (index == std::u16string::npos) {
return false;
}
else {
if (match_index)
*match_index = index;
if (match_length)
*match_length = _find_this.size();
return true;
}
}
int32_t index = usearch_first(_search_handle, &status);
if (!U_SUCCESS(status) || index == USEARCH_DONE) return false;
if (match_index)
{
*match_index = static_cast<size_t>(index);
}
if (match_length)
{
*match_length = static_cast<size_t>(usearch_getMatchedLength(_search_handle));
}
return true;
}
usage:
CStringSearchIgnoringCaseAndAccents searcher(a_utf16_string_what.c_str()));
searcher.SearchIn(a_utf16_string_where)

Even though this is an old question, I decided to post my solution, because it might help someone (or someone can tell me if I am wrong). I used the boost text conversion methods. First I applied the normalization form decomposition (NFD), which gave me separated chars. Then I just filtered those that had the code below 255. Then a simple lower case conversion. It worked for your problem (and for mine), but I am not sure if it applies on every case. Here's the solution:
#include <iostream>
#include <algorithm>
#include <string>
#include <locale>
#include <boost/locale.hpp>
static std::locale loc = boost::locale::generator()("en_US.UTF-8");
std::string NormalizeString(const std::string & input)
{
std::string s_norm = boost::locale::normalize(input, boost::locale::norm_nfd, loc);
std::string s;
std::copy_if(s_norm.begin(), s_norm.end(), std::back_inserter(s), [](unsigned int ch){return ch<256;} );
return boost::locale::to_lower(s, loc);
}
void find_norm(const std::string& input, const std::string& query) {
if (NormalizeString(input).find(NormalizeString(query)) != std::string::npos)
std::cout << query << " found in " << input << std::endl;
else
std::cout << query << " not found in " << input << std::endl;
}
int main(int argc, char *argv[])
{
find_norm("rábano", "aba");
find_norm("rábano", "aaa");
return EXIT_SUCCESS;
}

Related

C++ Brute Force attack function does not return results

so I'm currently working on a brute force attacker project in C++. I've managed to get it working, but one problem that I'm facing is that if the program actually managed to get a correct guess, the function still goes on. I think the problem is that the program fails to return a guess. Take a look at my code:
(Sorry for the mess, by the way, I'm not that experienced in C++ - I used to code in Python/JS.)
#include <iostream>
#include <cstdlib>
#include <string>
std::string chars = "abcdefghijklmnopqrstuvwxyz";
std::string iterateStr(std::string s, std::string guess, int pos);
std::string crack(std::string s);
std::string iterateChar(std::string s, std::string guess, int pos);
int main() {
crack("bb");
return EXIT_SUCCESS;
}
// this function iterates through the letters of the alphabet
std::string iterateChar(std::string s, std::string guess, int pos) {
for(int i = 0; i < chars.length(); i++) {
// sets the char to a certain letter from the chars variable
guess[pos] = chars[i];
// if the position reaches the end of the string
if(pos == s.length()) {
if(guess.compare(s) == 0) {
break;
}
} else {
// else, recursively call the function
std::cout << guess << " : " << s << std::endl;
iterateChar(s, guess, pos+1);
}
}
return guess;
}
// this function iterates through the characters in the string
std::string iterateStr(std::string s, std::string guess, int pos) {
for(int i = 0; i < s.length(); i++) {
guess = iterateChar(s, guess, i);
if(s.compare(guess) == 0) {
return guess;
}
}
return guess;
}
std::string crack(std::string s) {
int len = s.length();
std::string newS(len, 'a');
std::string newGuess;
newGuess = iterateStr(s, newS, 0);
return newGuess;
}
Edit : Updated code.
The main flaw in the posted code is that the recursive function returns a string (the guessed password) without a clear indication for the caller that the password was found.
Passing around all the strings by value, is also a potential efficiency problem, but the OP should be worried by snippets like this:
guess[pos] = chars[i]; // 'chars' contains the alphabet
if(pos == s.length()) {
if(guess.compare(s) == 0) {
break;
}
}
Where guess and s are strings of the same length. If that length is 2 (OP's last example), guess[2] is outside the bounds, but the successive call to guess.compare(s) will compare only the two chars "inside".
The loop inside iterateStr does nothing useful too, and the pos parameter is unused.
Rather than fixing this attempt, it may be better to rewrite it from scratch
#include <iostream>
#include <string>
#include <utility>
// Sets up the variable and start the brute force search
template <class Predicate>
auto crack(std::string const &src, size_t length, Predicate is_correct)
-> std::pair<bool, std::string>;
// Implements the brute force search in a single recursive function. It uses a
// lambda to check the password, instead of passing it directly
template <class Predicate>
bool recursive_search(std::string const &src, std::string &guess, size_t pos,
Predicate is_correct);
// Helper function, for testing purpouse
void test_cracker(std::string const &alphabet, std::string const &password);
int main()
{
test_cracker("abcdefghijklmnopqrstuvwxyz", "dance");
test_cracker("abcdefghijklmnopqrstuvwxyz ", "go on");
test_cracker("0123456789", "42");
test_cracker("0123456789", "one"); // <- 'Password not found.'
}
void test_cracker(std::string const &alphabet, std::string const &password)
{
auto [found, pwd] = crack(alphabet, password.length(),
[&password] (std::string const &guess) { return guess == password; });
std::cout << (found ? pwd : "Password not found.") << '\n';
}
// Brute force recursive search
template <class Predicate>
bool recursive_search(std::string const &src, std::string &guess, size_t pos,
Predicate is_correct)
{
if ( pos + 1 == guess.size() )
{
for (auto const ch : src)
{
guess[pos] = ch;
if ( is_correct(guess) )
return true;
}
}
else
{
for (auto const ch : src)
{
guess[pos] = ch;
if ( recursive_search(src, guess, pos + 1, is_correct) )
return true;
}
}
return false;
}
template <class Predicate>
auto crack(std::string const &src, size_t length, Predicate is_correct)
-> std::pair<bool, std::string>
{
if ( src.empty() )
return { length == 0 && is_correct(src), src };
std::string guess(length, src[0]);
return { recursive_search(src, guess, 0, is_correct), guess };
}
I've tried your code even with the modified version of your iterateStr() function. I used the word abduct as it is quicker to search for. When stepping through the debugger I noticed that your iterateChar() function was not returning when a match was found. Also I noticed that the length of string s being passed in was 6 however the guess string that is being updated on each iteration had a length of 7. You might want to step through your code and check this out.
For example at on specific iteration the s string contains: abduct but the guess string contains aaaabjz then on the next iteration the guess string contains aaaabkz. This might be your concerning issue of why the loop or function continues even when you think a match is found.
The difference in lengths here could be your culprit.
Also when stepping through your modified code:
for ( size_t i = 0; i < s.length(); i++ ) {
guess = iterCh( s, guess, i );
std::cout << "in the iterStr loop\n";
if ( guess.compare( s ) == 0 ) {
return guess;
}
}
return guess;
in your iterateStr() function the recursion always calls guess = iterCh( s, guess, i ); and the code never prints in the iterStr loop\n";. Your iterateChar function is completing through the entire string or sequence of characters never finding and return a match. I even tried the word abs as it is easier and quicker to step through the debugger and I'm getting the same kind of results.

C++ How can I use std::string::find to be case insensitive? [duplicate]

I am using std::string's find() method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp() but there doesn't seem to be a stristr().
I have found various answers and most suggest using Boost which is not an option in my case. Additionally, I need to support std::wstring/wchar_t. Any ideas?
You could use std::search with a custom predicate.
#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;
// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
my_equal( const std::locale& loc ) : loc_(loc) {}
bool operator()(charT ch1, charT ch2) {
return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
}
private:
const std::locale& loc_;
};
// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
typename T::const_iterator it = std::search( str1.begin(), str1.end(),
str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
if ( it != str1.end() ) return it - str1.begin();
else return -1; // not found
}
int main(int arc, char *argv[])
{
// string test
std::string str1 = "FIRST HELLO";
std::string str2 = "hello";
int f1 = ci_find_substr( str1, str2 );
// wstring test
std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
std::wstring wstr2 = L"привет";
int f2 = ci_find_substr( wstr1, wstr2 );
return 0;
}
The new C++11 style:
#include <algorithm>
#include <string>
#include <cctype>
/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
auto it = std::search(
strHaystack.begin(), strHaystack.end(),
strNeedle.begin(), strNeedle.end(),
[](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
);
return (it != strHaystack.end() );
}
Explanation of the std::search can be found on cplusplus.com.
why not use Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
bool Foo()
{
//case insensitive find
std::string str("Hello");
boost::iterator_range<std::string::const_iterator> rng;
rng = boost::ifind_first(str, std::string("EL"));
return rng;
}
Why not just convert both strings to lowercase before you call find()?
tolower
Notice:
Inefficient for long strings.
Beware of internationalization issues.
Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.
Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).
std::string upper_string(const std::string& str)
{
string upper;
transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
return upper;
}
std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
return upper(str).find(upper(substr) );
}
This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.
Additionally, I need to support
std::wstring/wchar_t. Any ideas?
tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).
[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.
If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator class.
Also make sense to provide Boost version: This will modify original strings.
#include <boost/algorithm/string.hpp>
string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)
if (str1.find(str2) != std::string::npos)
{
// str1 contains str2
}
or using perfect boost xpression library
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
cout << word << " found!" << endl;
}
In this example you should pay attention that your search word don't have any regex special characters.
#include <iostream>
using namespace std;
template <typename charT>
struct ichar {
operator charT() const { return toupper(x); }
charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }
int main()
{
string s = "The STRING";
wstring ws = L"The WSTRING";
cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr")) << endl;
}
A little bit dirty, but short & fast.
I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.
It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.
/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
/* Fail fast if the base string was smaller than what we're looking for */
if (subString.length() > baseString.length())
return false;
auto it = std::search(
baseString.begin(), baseString.end(), subString.begin(), subString.end(),
[](char ch1, char ch2)
{
return std::toupper(ch1) == std::toupper(ch2);
}
);
if(it == baseString.end())
return false;
size_t match_start_offset = it - baseString.begin();
std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);
/* Typical special characters and whitespace to split the substring up. */
size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$##!~`");
/* Pass fast if the remainder of the base string where
the match started is the same length as the substring. */
if (match_end_pos == std::wstring::npos && match_start.length() == subString.length())
return true;
std::wstring extracted_match = match_start.substr(0, match_end_pos);
return (extracted_match.length() == subString.length());
}
The Most Efficient Way
Simple and Fast.
Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons. (glic)
#include <cstring>
#include <string>
#include <iostream>
int main(void) {
std::string s1{"abc de fGH"};
std::string s2{"DE"};
auto pos = strcasestr(s1.c_str(), s2.c_str());
if(pos != nullptr)
std::cout << pos - s1.c_str() << std::endl;
return 0;
}
wxWidgets has a very rich string API
wxString
it can be done with (using the case conversion way)
int Contains(const wxString& SpecProgramName, const wxString& str)
{
wxString SpecProgramName_ = SpecProgramName.Upper();
wxString str_ = str.Upper();
int found = SpecProgramName.Find(str_);
if (wxNOT_FOUND == found)
{
return 0;
}
return 1;
}

How to generate 'consecutive' c++ strings?

I would like to generate consecutive C++ strings like e.g. in cameras: IMG001, IMG002 etc. being able to indicate the prefix and the string length.
I have found a solution where I can generate random strings from concrete character set: link
But I cannot find the thing I want to achieve.
A possible solution:
#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
std::string make_string(const std::string& a_prefix,
size_t a_suffix,
size_t a_max_length)
{
std::ostringstream result;
result << a_prefix <<
std::setfill('0') <<
std::setw(a_max_length - a_prefix.length()) <<
a_suffix;
return result.str();
}
int main()
{
for (size_t i = 0; i < 100; i++)
{
std::cout << make_string("IMG", i, 6) << "\n";
}
return 0;
}
See online demo at http://ideone.com/HZWmtI.
Something like this would work
#include <string>
#include <iomanip>
#include <sstream>
std::string GetNextNumber( int &lastNum )
{
std::stringstream ss;
ss << "IMG";
ss << std::setfill('0') << std::setw(3) << lastNum++;
return ss.str();
}
int main()
{
int x = 1;
std::string s = GetNextNumber( x );
s = GetNextNumber( x );
return 0;
}
You can call GetNextNumber repeatedly with an int reference to generate new image numbers. You can always use sprintf but it won't be the c++ way :)
const int max_size = 7 + 1; // maximum size of the name plus one
char buf[max_size];
for (int i = 0 ; i < 1000; ++i) {
sprintf(buf, "IMG%.04d", i);
printf("The next name is %s\n", buf);
}
char * seq_gen(char * prefix) {
static int counter;
char * result;
sprintf(result, "%s%03d", prefix, counter++);
return result;
}
This would print your prefix with 3 digit padding string. If you want a lengthy string, all you have to do is provide the prefix as much as needed and change the %03d in the above code to whatever length of digit padding you want.
Well, the idea is rather simple. Just store the current number and increment it each time new string is generated. You can implement it to model an iterator to reduce the fluff in using it (you can then use standard algorithms with it). Using Boost.Iterator (it should work with any string type, too):
#include <boost/iterator/iterator_facade.hpp>
#include <sstream>
#include <iomanip>
// can't come up with a better name
template <typename StringT, typename OrdT>
struct ordinal_id_generator : boost::iterator_facade<
ordinal_id_generator<StringT, OrdT>, StringT,
boost::forward_traversal_tag, StringT
> {
ordinal_id_generator(
const StringT& prefix = StringT(),
typename StringT::size_type suffix_length = 5, OrdT initial = 0
) : prefix(prefix), suffix_length(suffix_length), ordinal(initial)
{}
private:
StringT prefix;
typename StringT::size_type suffix_length;
OrdT ordinal;
friend class boost::iterator_core_access;
void increment() {
++ordinal;
}
bool equal(const ordinal_id_generator& other) const {
return (
ordinal == other.ordinal
&& prefix == other.prefix
&& suffix_length == other.suffix_length
);
}
StringT dereference() const {
std::basic_ostringstream<typename StringT::value_type> ss;
ss << prefix << std::setfill('0')
<< std::setw(suffix_length) << ordinal;
return ss.str();
}
};
And example code:
#include <string>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef ordinal_id_generator<std::string, unsigned> generator;
int main() {
std::ostream_iterator<std::string> out(std::cout, "\n");
std::copy_n(generator("IMG"), 5, out);
// can even behave as a range
std::copy(generator("foo", 1, 2), generator("foo", 1, 4), out);
return 0;
}
Take a look at the standard library's string streams. Have an integer that you increment, and insert into the string stream after every increment. To control the string length, there's the concept of fill characters, and the width() member function.
You have many ways of doing that.
The generic one would be to, like the link that you showed, have an array of possible characters. Then after each iteration, you start from right-most character, increment it (that is, change it to the next one in the possible characters list) and if it overflowed, set it to the first one (index 0) and go the one on the left. This is exactly like incrementing a number in base, say 62.
In your specific example, you are better off with creating the string from another string and a number.
If you like *printf, you can write a string with "IMG%04d" and have the parameter go from 0 to whatever.
If you like stringstream, you can similarly do so.
What exactly do you mean by consecutive strings ?
Since you've mentioned that you're using C++ strings, try using the .string::append method.
string str, str2;
str.append("A");
str.append(str2);
Lookup http://www.cplusplus.com/reference/string/string/append/ for more overloaded calls of the append function.
it's pseudo code. you'll understand what i mean :D
int counter = 0, retval;
do
{
char filename[MAX_PATH];
sprintf(filename, "IMG00%d", counter++);
if(retval = CreateFile(...))
//ok, return
}while(!retval);
You have to keep a counter that is increased everytime you get a new name. This counter has to be saved when your application is ends, and loaded when you application starts.
Could be something like this:
class NameGenerator
{
public:
NameGenerator()
: m_counter(0)
{
// Code to load the counter from a file
}
~NameGenerator()
{
// Code to save the counter to a file
}
std::string get_next_name()
{
// Combine your preferred prefix with your counter
// Increase the counter
// Return the string
}
private:
int m_counter;
}
NameGenerator my_name_generator;
Then use it like this:
std::string my_name = my_name_generator.get_next_name();

How to get the tail of a std::string?

How to retrieve the tail of a std::string?
If wishes could come true, it would work like that:
string tailString = sourceString.right(6);
But this seems to be too easy, and doesn't work...
Any nice solution available?
Optional question: How to do it with the Boost string algorithm library?
ADDED:
The method should be save even if the original string is smaller than 6 chars.
There is one caveat to be aware of: if substr is called with a position past the end of the array (superior to the size), then an out_of_range exception is thrown.
Therefore:
std::string tail(std::string const& source, size_t const length) {
if (length >= source.size()) { return source; }
return source.substr(source.size() - length);
} // tail
You can use it as:
std::string t = tail(source, 6);
Using the substr() method and the size() of the string, simply get the last part of it:
string tail = source.substr(source.size() - 6);
For handling case of a string smaller than the tail size see Benoit's answer (and upvote it, I don't see why I get 7 upvotes while Benoit provides a more complete answer!)
You could do:
std::string tailString = sourceString.substr((sourceString.length() >= 6 ? sourceString.length()-6 : 0), std::string::npos);
Note that npos is the default argument, and might be omitted. If your string has a size that 6 exceeds, then this routine will extract the whole string.
This should do it:
string str("This is a test");
string sub = str.substr(std::max<int>(str.size()-6,0), str.size());
or even shorter, since subst has string end as default for second parameter:
string str("This is a test");
string sub = str.substr(std::max<int>(str.size()-6,0));
You can use iterators to do this:
#include <iostream>
#include <string>
using namespace std;
int main ()
{
char *line = "short line for testing";
// 1 - start iterator
// 2 - end iterator
string temp(line);
if (temp.length() >= 8) { // probably want at least one or two chars
// otherwise exception is thrown
int cut_len = temp.length()-6;
string cut (temp.begin()+cut_len,temp.end());
cout << "cut is: " << cut << endl;
} else {
cout << "Nothing to cut!" << endl;
}
return 0;
}
Output:
cut is: esting
Since you also asked for a solution using the boost library:
#include "boost/algorithm/string/find.hpp"
std::string tail(std::string const& source, size_t const length)
{
boost::iterator_range<std::string::const_iterator> tailIt = boost::algorithm::find_tail(source, length);
return std::string(tailIt.begin(), tailIt.end());
}
Try substr method.
I think, using iterators is C++ way
Something like that:
#include <string>
#include <iostream>
#include <algorithm>
#include <iterator>
using namespace std;
std::string tail(const std::string& str, size_t length){
string s_tail;
if(length < str.size()){
std::reverse_copy(str.rbegin(), str.rbegin() + length, std::back_inserter(s_tail));
}
return s_tail;
}
int main(int argc, char* argv[]) {
std::string s("mystring");
std::string s_tail = tail(s, 6);
cout << s_tail << endl;
s_tail = tail(s, 10);
cout << s_tail << endl;
return 0;
}
Try the following:
std::string tail(&source[(source.length() > 6) ? (source.length() - 6) : 0]);
string tail = source.substr(source.size() - min(6, source.size()));

Case insensitive std::string.find()

I am using std::string's find() method to test if a string is a substring of another. Now I need case insensitive version of the same thing. For string comparison I can always turn to stricmp() but there doesn't seem to be a stristr().
I have found various answers and most suggest using Boost which is not an option in my case. Additionally, I need to support std::wstring/wchar_t. Any ideas?
You could use std::search with a custom predicate.
#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;
// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
my_equal( const std::locale& loc ) : loc_(loc) {}
bool operator()(charT ch1, charT ch2) {
return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
}
private:
const std::locale& loc_;
};
// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
typename T::const_iterator it = std::search( str1.begin(), str1.end(),
str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
if ( it != str1.end() ) return it - str1.begin();
else return -1; // not found
}
int main(int arc, char *argv[])
{
// string test
std::string str1 = "FIRST HELLO";
std::string str2 = "hello";
int f1 = ci_find_substr( str1, str2 );
// wstring test
std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
std::wstring wstr2 = L"привет";
int f2 = ci_find_substr( wstr1, wstr2 );
return 0;
}
The new C++11 style:
#include <algorithm>
#include <string>
#include <cctype>
/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
auto it = std::search(
strHaystack.begin(), strHaystack.end(),
strNeedle.begin(), strNeedle.end(),
[](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
);
return (it != strHaystack.end() );
}
Explanation of the std::search can be found on cplusplus.com.
why not use Boost.StringAlgo:
#include <boost/algorithm/string/find.hpp>
bool Foo()
{
//case insensitive find
std::string str("Hello");
boost::iterator_range<std::string::const_iterator> rng;
rng = boost::ifind_first(str, std::string("EL"));
return rng;
}
Why not just convert both strings to lowercase before you call find()?
tolower
Notice:
Inefficient for long strings.
Beware of internationalization issues.
Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.
Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).
std::string upper_string(const std::string& str)
{
string upper;
transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
return upper;
}
std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
return upper(str).find(upper(substr) );
}
This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.
Additionally, I need to support
std::wstring/wchar_t. Any ideas?
tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).
[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.
If you want “real” comparison according to Unicode and locale rules, use ICU’s Collator class.
Also make sense to provide Boost version: This will modify original strings.
#include <boost/algorithm/string.hpp>
string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)
if (str1.find(str2) != std::string::npos)
{
// str1 contains str2
}
or using perfect boost xpression library
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
cout << word << " found!" << endl;
}
In this example you should pay attention that your search word don't have any regex special characters.
#include <iostream>
using namespace std;
template <typename charT>
struct ichar {
operator charT() const { return toupper(x); }
charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }
int main()
{
string s = "The STRING";
wstring ws = L"The WSTRING";
cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr")) << endl;
}
A little bit dirty, but short & fast.
I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.
It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.
/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
/* Fail fast if the base string was smaller than what we're looking for */
if (subString.length() > baseString.length())
return false;
auto it = std::search(
baseString.begin(), baseString.end(), subString.begin(), subString.end(),
[](char ch1, char ch2)
{
return std::toupper(ch1) == std::toupper(ch2);
}
);
if(it == baseString.end())
return false;
size_t match_start_offset = it - baseString.begin();
std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);
/* Typical special characters and whitespace to split the substring up. */
size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$##!~`");
/* Pass fast if the remainder of the base string where
the match started is the same length as the substring. */
if (match_end_pos == std::wstring::npos && match_start.length() == subString.length())
return true;
std::wstring extracted_match = match_start.substr(0, match_end_pos);
return (extracted_match.length() == subString.length());
}
The Most Efficient Way
Simple and Fast.
Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons. (glic)
#include <cstring>
#include <string>
#include <iostream>
int main(void) {
std::string s1{"abc de fGH"};
std::string s2{"DE"};
auto pos = strcasestr(s1.c_str(), s2.c_str());
if(pos != nullptr)
std::cout << pos - s1.c_str() << std::endl;
return 0;
}
wxWidgets has a very rich string API
wxString
it can be done with (using the case conversion way)
int Contains(const wxString& SpecProgramName, const wxString& str)
{
wxString SpecProgramName_ = SpecProgramName.Upper();
wxString str_ = str.Upper();
int found = SpecProgramName.Find(str_);
if (wxNOT_FOUND == found)
{
return 0;
}
return 1;
}