find if string starts with sub string using std::equal - c++

May you please point me to what is the wrong thing I am doing here?
auto is_start_with = [](std::string const& whole_string, std::string const& starting_substring)->bool{
if (starting_substring.size() > whole_string.size()) return false;
return std::equal(begin(starting_substring), end(starting_substring), begin(whole_string));
};
It is always return true.
I know there is many many other solutions but I want to know what is the error here.
EDIT :
Debuging!
P.S. I tried it in other main file with directly entering the strings and it worked!!
Edit 2:
I deleted two to lower transforms before the comparison and it worked!
std::transform(std::begin(fd_name), std::end(fd_name), std::begin(fd_name), ::tolower);
std::transform(std::begin(type_id), std::end(type_id), std::begin(type_id_lower), ::tolower);

I would not use such long identifiers like whole_string or starting_substring. It is clear enough from the parameter declaration that the lambda deals with strings. Too long names make the code less readable.
And there is no sense to use general functions std::begin and std::end. The lambda is written specially for strings.
Also you could use only one return statement.`For example
auto is_start_with = []( std::string const &source, std::string const &target )
{
return !( source.size() < target.size() ) &&
std::equal( target.begin(), target.end(), source.begin() );
}
Or even like
auto is_start_with = []( std::string const &source, std::string const &target )
{
return ( not ( source.size() < target.size() ) ) &&
std::equal( target.begin(), target.end(), source.begin() );
}

Related

How can I concisely find all digits in a string without using a loop?

I want to get all digits in a std::string but without using a loop (myself; what the code I'm calling uses, I don't mind). An alternative view of the request is: remove all non-digits from the string, leaving only the digits. I know that I can find all digit in a string using code like this:
std::string get_digits(std::string input) {
std::string::size_type next_digit(0u);
for (std::string::size_type pos(0u);
input.npos != (pos = input.find_first_of("0123456789"));
++pos) {
input[next_digit++] = input[pos];
}
input.resize(next_digit);
return input;
}
However, this function uses a loop. std::string doesn't provide a function find_all() or something! Ideally, the string is maniulated in-place (the code above moves it but it is easily changed to take a reference).
When there are multiple alternatives, I'll promise to post profiling results of how good the different approaches work on some lengthy text.
One way would be to use std::copy_if (or std::remove_if):
std::string get_digits(std::string input) {
std::string result;
std::copy_if(
input.begin(),
input.end(),
std::back_inserter(result),
[](char c) { return '0' <= c && c <= '9'; });
return result;
}
Obviously this uses a loop internally, but you said you don't care about that...
Edit: With std::remove_if:
std::string get_digits_remove(std::string input) {
auto itErase = std::remove_if(
input.begin(),
input.end(),
[](char c) { return !('0' <= c && c <= '9'); });
input.erase(itErase, input.end());
return input;
}
Although I primarily had hoped for 5 quick answers (which wasn't achieved, sigh) the answers and comments led to some interesting approaches I hadn't thought of myself. My personal expectation had been that the answers effectively would result in:
If you want to be fast, use
input.erase(std::remove_if(input.begin(), input.end(),
[](unsigned char c){ return !std::isdigit(c); }),
input.end());
If you want to be concise, use
text = std::regex_replace(text, std::regex(R"(\D)"), "");
Instead, there were a number of approaches I hadn't even considered:
Use a recursive function!
Use std::partition() which seems to require extra work (retain the characters which will be thrown out) and changes the order.
Use std::stable_partition() which seems to require even more work but doesn't change the order.
Use std::sort() and extract the substring with the relevant characters although I don't know how to make that one retain the original sequence of character. Just using a stable version doesn't quite to it.
Putting the different approaches together and using a number of variations on how to classify the characters, led to a total of 17 version of roughly the same operation (the code is on github). Most of the versions use std::remove_if() and std::string::erase() but differ in the classification of digits.
remove_if() with [](char c){ return d.find(c) == d.npos; }).
remove_if() with [](char c){ return std::find(d.begin(), d.end(), c) == d.end(); }
remove_if() with [](char c){ return !std::binary_search(d.begin(), d.end()); }
remove_if() with [](char c){ return '0' <= c && c <= '9'; }
remove_if() with [](unsigned char c){ return !std::isdigit(c); } (the char is passed as unsigned char to avoid undefined behavior in case c is a char with a negative value)
remove_if() with std::not1(std::ptr_fun(std::static_cast<int(*)(int)>(&std::isdigit))) (the cast is necessary to determine the correct overload: std::isdigit() happens to be overloaded).
remove_if() with [&](char c){ return !hash.count(c); }
remove_if() with [&](char c){ return filter[c]; } (the code initializing actually uses a loop)
remove_if() with [&](char c){ return std::isidigit(c, locale); }
remove_if() with [&](char c){ return ctype.is(std::ctype_base::digit, c); }
str.erase(std::parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
str.erase(std::stable_parition(str.begin(), str.end(), [](unsigned char c){ return !std::isdigit(c); }), str.end())
the "sort-approach" describe in one of the answers
the copy_if() approach described in one of the answers
the recursive approach describe in one of the answers
text = std::regex_replace(text, std::regex(R"(\D)"), ""); (I didn't manage to get this to work on icc)
like 16 but with an already built regular expression
I have run the benchmark on a MacOS notebook. Since results like this are reasonably easy to graph with Google Chars, here is a graph of the results (although with the versions using regexps removed as these would cause the graph to scale such that the interesting bit isn't really visible). The results of the benchmarks in form of a table:
test clang gcc icc
1 use_remove_if_str_find 22525 26846 24815
2 use_remove_if_find 31787 23498 25379
3 use_remove_if_binary_search 26709 27507 37016
4 use_remove_if_compare 2375 2263 1847
5 use_remove_if_ctype 1956 2209 2218
6 use_remove_if_ctype_ptr_fun 1895 2304 2236
7 use_remove_if_hash 79775 60554 81363
8 use_remove_if_table 1967 2319 2769
9 use_remove_if_locale_naive 17884 61096 21301
10 use_remove_if_locale 2801 5184 2776
11 use_partition 1987 2260 2183
12 use_stable_partition 7134 4085 13094
13 use_sort 59906 100581 67072
14 use_copy_if 3615 2845 3654
15 use_recursive 2524 2482 2560
16 regex_build 758951 531641
17 regex_prebuild 775450 519263
I would start with a nice primitive function that composes the std algorithms you want to use:
template<class Container, class Test>
void erase_remove_if( Container&& c, Test&& test ) {
using std::begin; using std::end;
auto it = std::remove_if( begin(c), end(c), std::forward<Test>(test) );
c.erase( it, end(c) );
}
then we write save digits:
std::string save_digits( std::string s ) {
erase_remove_if( s,
[](char c){
if (c > '9') return true;
return c < '0';
}
);
return s;
}
You can do this in-place with std::partition:
std::string get_digits(std::string& input)
{
auto split =
std::partition( std::begin(input), std::end(input), [](char c){return ::isdigit(c);} );
size_t len = std::distance( std::begin(input), split );
input.resize( len );
return input;
}
std::partition does not guarantee order, so if order matters, use std::stable_partition
// terrible no-loop solution
void getDigs(const char* inp, char* dig)
{
if (!*inp)
return;
if (*inp>='0' && *inp<='9')
{
*dig=*inp;
dig++;
*dig=0;
}
getDigs(inp+1,dig);
}
Maybe the simple answer suffices?
std::string only_the_digits(std::string s)
{
s.erase(std::remove_if(s.begin(), s.end(),
[](char c) { return !::isdigit(c); }), s.end());
return s;
}
The downside of this approach is that it unconditionally creates a copy of the input data. If there are lots of digits, then that's OK, since we're reusing that object. Alternatively, you can make this function just modify the string in-place (void strip_non_digits(std::string &).)
But if there are only few digits and you want to leave the input untouched, then you may prefer to create a new (small) output object and not copy the input. This can be done with a referential view of the input string, e.g. as provided by the Fundamentals TS, and using copy_if:
std::string only_the_digits(std::experimental::string_view sv)
{
std::string result;
std::copy_if(sv.begin(), sv.end(), std::back_inserter(::isdigit));
return result;
}
No loop solution in 4 steps (but with error checking, more than 4 statements):
1) sort the string, using a suitable sort (incrementing order)
... now all digits will be together, concatentated
2) use std::string.find_first_of() to find the index of the first digit
(be sure to check for a digit found)
3) use std::string.find_last_of() to find the index of the last digit
(be sure to check for a digit found)
4) use std::string::substr() and the 2 previous indexes to extract the digits
This is about as succinct as I can get it I think.
std::string get_digits(std::string input)
{
input.erase(std::stable_partition(
std::begin(input),
std::end(input),
::isdigit),
std::end(input));
return input;
}
Features:
passes sink argument by value to take advantage of copy elision in c++11
preserves order of digits.
No user code - uses only peer-reviewed stl functions. Chance of bugs - zero.
This would be the stl-style iterator-based approach:
template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
return std::copy_if(first, last, first_out, ::isdigit);
}
This has a number of advantages:
input can be any iterable range of chars, not just strings
can be chained by virtue of returning the output iterator
allows destination container/iterator (including an ostream_iterator)
with a little love, it could be made to handle unicode chars etc
fun example:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
template<class InIter, class OutIter>
OutIter collect_digits(InIter first, InIter last, OutIter first_out)
{
return std::copy_if(first, last, first_out, ::isdigit);
}
using namespace std;
int main()
{
char chunk1[] = "abc123bca";
string chunk2 { "def456fed" };
vector<char> chunk3 = { 'g', 'h', 'i', '7', '8', '9', 'i', 'h', 'g' };
string result;
auto pos = collect_digits(begin(chunk1), end(chunk1), back_inserter(result));
pos = collect_digits(begin(chunk2), end(chunk2), pos);
collect_digits(begin(chunk3), end(chunk3), pos);
cout << "first collect: " << result << endl;
cout << "second collect: ";
collect_digits(begin(chunk3),
end(chunk3),
collect_digits(begin(chunk2),
end(chunk2),
collect_digits(begin(chunk1),
end(chunk1),
ostream_iterator<char>(cout))));
cout << endl;
return 0;
}
I use this one-liner macro as long as #include <regex> comes before it or you otherwise include that:
#define DIGITS_IN_STRING(a) std::regex_replace(a, std::regex(R"([\D])"), "")

How to make sure user enters allowed enum

I have to write a program with an Enum state, which is the 50 2-letter state abbreviations(NY, FL, etc). I need to make a program that asks for the user info and they user needs to type in the 2 letters corresponding to the state. How can I check that their input is valid i.e matches a 2 letter state defined in Enum State{AL,...,WY}? I suppose I could make one huge if statement checking if input == "AL" || ... || input == "WY" {do stuff} else{ error input does not match state }, but having to do that for all 50 states would get a bit ridiculous. Is there an easier way to do this?
Also if Enum State is defined as {AL, NY, FL}, how could I cast a user input, which would be a string, into a State? If I changed the States to {"AL", "NY", "FL"} would that be easier or is there another way to do it?
Unfortunately C++ does not provide a portable way to convert enum to string and vice versa. Possible solution would be to populate a std::map<std::string,State> (or hash map) and on conversion do a lookup. You can either populate such map manually or create a simple script that will generate a function in a .cpp file to populate this map and call that script during build process. Your script can generate if/else statements instead of populating map as well.
Another, more difficult, but more flexible and stable solution is to use compiler like llvm and make a plugin that will generate such function based on syntax tree generated by compiler.
The simplest method is to use an STL std::map, but for academic exercises that may not be permitted (for example it may be required to use only techniques covered in the course material).
Unless explicitly initialised, enumerations are integer numbered sequentially starting from zero. Given that, you can scan a lookup-table of strings, and cast the matching index to an enum. For example:
enum eUSstate
{
AL, AK, AZ, ..., NOSTATE
} ;
eUSstate string_to_enum( std::string inp )
{
static const int STATES = 50 ;
std::string lookup[STATES] = { "AL", "AK", "AZ" ... } ;
int i = 0 ;
for( i = 0; i < STATES && lookup[i] != inp; i++ )
{
// do nothing
}
return static_cast<eUSstate>(i) ;
}
If perhaps you don't want to rely on a brute-force cast and maintaining a look-up table in the same order as the enumerations, then a lookup table having both the string and the matching enum may be used.
eUSstate string_to_enum( std::string inp )
{
static const int STATES = 50 ;
struct
{
std::string state_string ;
eUSstate state_enum ;
} lookup[STATES] { {"AL", AL}, {"AK", AK}, {"AZ", AL} ... } ;
eUSstate ret = NOSTATE ;
for( int i = 0; ret == NOSTATE && i < STATES; i++ )
{
if( lookup[i].state_string == inp )
{
ret = lookup[i].state_enum ;
}
}
return ret ;
}
The look-up can be optimised by taking advantage of alphabetical ordering and performing a binary search, but for 50 states it is hardly worth it.
What you need is a table. Because the enums are linear,
a simple table of strings would be sufficient:
char const* const stateNames[] =
{
// In the same order as in the enum.
"NY",
"FL",
// ...
};
Then:
char const* const* entry
= std::find( std::begin( stateNames ), std::end( stateNames ), userInput );
if (entry == std::end( stateNames ) ) {
// Illegal input...
} else {
State value = static_cast<State>( entry - std::begin( stateNames ) );
Alternatively, you can have an array of:
struct StateMapping
{
State enumValue;
char const* name;
struct OrderByName
{
bool operator()( StateMapping const& lhs, StateMapping const& rhs ) const
{
return std::strcmp( lhs.name, rhs. name ) < 0;
}
bool operator()( StateMapping const& lhs, std::string const& rhs ) const
{
return lhs.name < rhs;
}
bool operator()( std::string const& lhs, StateMapping const& rhs ) const
{
return lhs < rhs.name;
}
};
};
StateMapping const states[] =
{
{ NY, "NY" },
// ...
};
sorted by the key, and use std::lower_bound:
StateMapping::OrderByName cmp;
StateMapping entry =
std::lower_bound( std::begin( states ), std::end( states ), userInput, cmp );
if ( entry == std::end( states ) || cmp( userInput, *entry) {
// Illegal input...
} else {
State value = entry->enumValue;
// ...
}
The latter is probably slightly faster, but for only fifty
entries, I doubt you'll notice the difference.
And of course, you don't write this code manually; you generate
it with a simple script. (In the past, I had code which would
parse the C++ source for the enum definitions, and generate the
mapping functionality from them. It's simpler than it sounds,
since you can ignore large chunks of the C++ code, other than
for keeping track of the various nestings.)
The solution is simple, but only for 2 characters in the string (as in your case):
#include <stdio.h>
#include <stdint.h>
enum TEnum
{
AL = 'LA',
NY = 'YN',
FL = 'LF'
};
int _tmain(int argc, _TCHAR* argv[])
{
char* input = "NY";
//char* input = "AL";
//char* input = "FL";
switch( *(uint16_t*)input )
{
case AL:
printf("input AL");
break;
case NY:
printf("input NY");
break;
case FL:
printf("input FL");
break;
}
return 0;
}
In above example I used an enumeration with a double character code (it is legal) and passed to the switch statement a input string. I tested it end work!. Notice the word alignment in enumeration.
Ciao

Is there a safe alternative to std::equal?

std::equal() is unsafe because the function cannot know whether it will overrun the length of the second container to be compared. That is:
std::vector< int > v( 100 );
std::vector< int > w( 10 );
bool same = std::equal( v.begin(), v.end(), w.begin() );
...will result in a buffer overrun for w.
Naturally we can test for these things (v.size() == w.size()), but compilers like Visual Studio 2010 still report the function itself as unsafe. And indeed it is unsafe in some fundamental sense: a team of programmers of varying levels of experience will eventually forget to compare sizes.
A safe alternative is easy to implement.
template< typename Iter1, typename Iter2 >
bool equal_safe( Iter1 begin1, Iter1 end1, Iter2 begin2, Iter2 end2 )
{
while( begin1 != end1 && begin2 != end2 )
{
if( *begin1 != *begin2 )
{
return false;
}
++begin1;
++begin2;
}
return begin1 == end1 && begin2 == end2;
}
But is there a safe alternative in the standard library?
In C++14, the standard library will contain a version of std::equal that takes two pairs of iterators, similar to your safe_equal. Same for std::mismatch and std::is_permutation.
vector has an operator== that first checks the size. In your example, just use the condition v==w.
I have wanted such a feature myself. I have not been able to find any facilities in the standard library.
If you are willing to use boost. Boost.Range has equal which I think is what your are looking for http://www.boost.org/doc/libs/1_53_0/libs/range/doc/html/range/reference/algorithms/non_mutating/equal.html
I got same problem and solved it by checking size of vector before equal.
std::vector< int > v( 100 );
std::vector< int > w( 10 );
bool same = (v.size() == w.size()) && std::equal( v.begin(), v.end(), w.begin() );
You can also use std::lexicographical_compare twice to determine if either sequence is less than the other.

C++ storing functions and operators in a structure

How to improve a data structure for storing functions in arithmetic parser converting from infix to postfix notation?
At this moment I am using an array of char arrays:
char *funct[] = { "sin", "cos", "tan"... }
char text[] = "tan";
This impementation is a little bit confused and leads to the following comparisions, if we test char to be a function
if ( strcmp ( funct[0], text) == 0 ) || ( strcmp ( funct[1], "text ) == 0 ) || ( strcmp ( func[2], text) == 0 ))
{
... do something
}
( or to the for cycle version).
If there are a lot of functions (and a lot of comparisions), the index referencing leads to errors and it is not clear. There is also a necessity to change the index when we remove/add a new function....
How to improve such a structure so as it is easy to read, easy to maintain and easy to scale up?
I was thinking about enum
typedef enum
{
Fsin=0,
Fcos,
Ftan
} TFunctions;
which results to
if ( strcmp ( funct[Fsin], text) == 0 ) || ( strcmp ( funct[Fcos], "text ) == 0 ) || ( strcmp ( func[Ftan], text) == 0 ))
{
...
but there may be a better solution...
You can use std::map.
enum functions
{
sin,
cos,
tan
};
std::map<std::string, unsigned char> func_map;
func_map["sin"] = sin;
func_map["cos"] = cos;
func_map["tan"] = tan;
// then:
std::string text = "cos";
std::map<char*, unsigned char>::iterator it;
it = func_map.find(text);
if(it != func_map.end())
{
// ELEMENT FOUND
unsigned char func_id = it->second;
}
else
{
// NOT FOUND
}
For fastest code you may have some kind of map as follow:
typedef std::map<std::string, func_t> func_map;
func_map fm;
fm["sin"] = sin_func(); // get value of this entry from somewhere
fm["cos"] = cos_func(); // for example sin_func or cos_func
auto i = fm.find( "sin" );
if( i != fm.end() ) {
func_t f = i->second; // value found, we may use it.
}
Also if there is really a lot of items you may use std::unordered_map instead of std::map

how to parse POST body / GET arguments?

So I need to parse such string login=julius&password=zgadnij&otherArg=Value with N args and each arg will have a value. You can find such ti GET arguments and in POST requests. So how to create a parser for such strings using Boost?
split on &
split the resulting parts on =
URL-decode both (!) the name and the value part
No regex needed.
In this question's case, as Tomalak mentioned, regular expression may be a
little overkill.
If your real input is more complex and regular expression is needed, does
the following code illustrate the usage?
int main() {
using namespace std;
using namespace boost;
string s = "login=julius&password=zgadnij&otherArg=Value";
regex re_amp("&"), re_eq("=");
typedef sregex_token_iterator sti;
typedef vector< string > vs;
typedef vs::iterator vsi;
sti i( s.begin(), s.end(), re_amp, -1 ), sti_end;
vs config( i, sti_end ); // split on &
for ( vsi i = config.begin(), e = config.end(); i != e; ++ i ) {
// split on =
vs setting( sti( i->begin(), i->end(), re_eq, -1 ), sti_end );
for ( vsi i2 = setting.begin(), e2 = setting.end(); i2 != e2; ++ i2 ) {
cout<< *i2 <<endl;
}
}
}
Hope this helps