how to parse POST body / GET arguments? - c++

So I need to parse such string login=julius&password=zgadnij&otherArg=Value with N args and each arg will have a value. You can find such ti GET arguments and in POST requests. So how to create a parser for such strings using Boost?

split on &
split the resulting parts on =
URL-decode both (!) the name and the value part
No regex needed.

In this question's case, as Tomalak mentioned, regular expression may be a
little overkill.
If your real input is more complex and regular expression is needed, does
the following code illustrate the usage?
int main() {
using namespace std;
using namespace boost;
string s = "login=julius&password=zgadnij&otherArg=Value";
regex re_amp("&"), re_eq("=");
typedef sregex_token_iterator sti;
typedef vector< string > vs;
typedef vs::iterator vsi;
sti i( s.begin(), s.end(), re_amp, -1 ), sti_end;
vs config( i, sti_end ); // split on &
for ( vsi i = config.begin(), e = config.end(); i != e; ++ i ) {
// split on =
vs setting( sti( i->begin(), i->end(), re_eq, -1 ), sti_end );
for ( vsi i2 = setting.begin(), e2 = setting.end(); i2 != e2; ++ i2 ) {
cout<< *i2 <<endl;
}
}
}
Hope this helps

Related

find if string starts with sub string using std::equal

May you please point me to what is the wrong thing I am doing here?
auto is_start_with = [](std::string const& whole_string, std::string const& starting_substring)->bool{
if (starting_substring.size() > whole_string.size()) return false;
return std::equal(begin(starting_substring), end(starting_substring), begin(whole_string));
};
It is always return true.
I know there is many many other solutions but I want to know what is the error here.
EDIT :
Debuging!
P.S. I tried it in other main file with directly entering the strings and it worked!!
Edit 2:
I deleted two to lower transforms before the comparison and it worked!
std::transform(std::begin(fd_name), std::end(fd_name), std::begin(fd_name), ::tolower);
std::transform(std::begin(type_id), std::end(type_id), std::begin(type_id_lower), ::tolower);
I would not use such long identifiers like whole_string or starting_substring. It is clear enough from the parameter declaration that the lambda deals with strings. Too long names make the code less readable.
And there is no sense to use general functions std::begin and std::end. The lambda is written specially for strings.
Also you could use only one return statement.`For example
auto is_start_with = []( std::string const &source, std::string const &target )
{
return !( source.size() < target.size() ) &&
std::equal( target.begin(), target.end(), source.begin() );
}
Or even like
auto is_start_with = []( std::string const &source, std::string const &target )
{
return ( not ( source.size() < target.size() ) ) &&
std::equal( target.begin(), target.end(), source.begin() );
}

algorithm for making optimise string from sub strings

Suppose I have a collection of substrings, for example:
string a = {"cat","sensitive","ate","energy","tense"}
Then the output for this should be as follows:
catensesensitivenergy
How can I do this?
This problem is known as the shortest common superstring problem and it is NP-hard, so if you need an exact solution you cannot do much better then trying all possibilities and choosing the best one.
One possible exponential solution is to generate all permutations of the input strings, find the shortest common superstring greedily for each permutation(a permutation specifies the order of strings and it is possible to prove that for a fixed order greedy algorithm always works correctly) and choose the best one.
Using user2040251 suggestion:
#include <string>
#include <iostream>
#include <algorithm>
std::string merge_strings( const std::vector< std::string > & pool )
{
std::string retval;
for( auto s : pool )
if( retval.empty() )
retval.append( s );
else if( std::search( retval.begin(), retval.end(), s.begin(), s.end() ) == retval.end() )
{
size_t len = std::min( retval.size(), s.size() );
for( ; len; --len )
if( retval.substr( retval.size() - len ) == s.substr( 0, len ) )
{
retval.append( s.substr( len ) );
break;
}
if( !len )
retval.append( s );
}
return retval;
}
std::string shortest_common_supersequence( std::vector< std::string > & pool )
{
std::sort( pool.begin(), pool.end() );
std::string buffer;
std::string best_reduction = merge_strings( pool );
while( std::next_permutation( pool.begin(), pool.end() ) )
{
buffer = merge_strings( pool );
if( buffer.size() < best_reduction.size() )
best_reduction = buffer;
}
return best_reduction;
}
int main( int argc, char ** argv )
{
std::vector< std::string > a{"cat","sensitive","ate","energy","tense"};
std::vector< std::string > b{"cat","sensitive","ate","energy","tense","sit"};
std::vector< std::string > c{"personal","ate","energy","tense","gyroscope"};
std::cout << "best a --> \"" << shortest_common_supersequence( a ) << "\"\n";
std::cout << "best b --> \"" << shortest_common_supersequence( b ) << "\"\n";
std::cout << "best c --> \"" << shortest_common_supersequence( c ) << "\"\n";
return 0;
}
Output:
best a --> "catensensitivenergy"
best b --> "catensensitivenergy"
best c --> "atensenergyroscopersonal"
Break the problem down and see what we got. Starting with only two strings. We must check which suffix of one string is the longest prefix of the other. This gives us the order for the best concatenation.
Now, with a set of n word, how do we do ? We start by building a trie containing every word (a key for each word). If a word is a duplicate of an other we can easily flag it as such while building the prefix tree.
I made a quick implementation of a regular Trie. You can find it here.
We have the tools to build a directed graph linking the different words wether a suffix of the first is a prefix of the second. The weight of the edge is the length of the suffix.
To do so, for each word w of the input set, we must see which words we can reach with a suffix of w :
We walk down the trie using the suffix. We will end up in a node (or not).
From this node, provided it exists, we scan the remaining subtree to see which words are
available.
If a given suffix of length l yields a match with a
prefix of word w', then we add an edge w → w', with weight length(w') - l.
If such an edge already exists, we just update the weight to keep the lowest.
From there, the graph is set and we must find the shortest path that runs through every vertex (eg. word) only once. If the graph is complete, this is the Traveling Salesman Problem. Most of the times, the graph won't be complete.
Still, it remains a NP-hard problem. In more "technical" terms, the problem at hand is to find the shortest hamiltonian path of a digraph.
Note : Given an Hamiltonian path (if it exists) with its cost C, and its starting vertex (word) W, the supestring length is given by :
Lsuper = LW + C
Note : If two words have no suffix linking them to another word, then the graph is not connected and there is no hamiltonian path.

How to make sure user enters allowed enum

I have to write a program with an Enum state, which is the 50 2-letter state abbreviations(NY, FL, etc). I need to make a program that asks for the user info and they user needs to type in the 2 letters corresponding to the state. How can I check that their input is valid i.e matches a 2 letter state defined in Enum State{AL,...,WY}? I suppose I could make one huge if statement checking if input == "AL" || ... || input == "WY" {do stuff} else{ error input does not match state }, but having to do that for all 50 states would get a bit ridiculous. Is there an easier way to do this?
Also if Enum State is defined as {AL, NY, FL}, how could I cast a user input, which would be a string, into a State? If I changed the States to {"AL", "NY", "FL"} would that be easier or is there another way to do it?
Unfortunately C++ does not provide a portable way to convert enum to string and vice versa. Possible solution would be to populate a std::map<std::string,State> (or hash map) and on conversion do a lookup. You can either populate such map manually or create a simple script that will generate a function in a .cpp file to populate this map and call that script during build process. Your script can generate if/else statements instead of populating map as well.
Another, more difficult, but more flexible and stable solution is to use compiler like llvm and make a plugin that will generate such function based on syntax tree generated by compiler.
The simplest method is to use an STL std::map, but for academic exercises that may not be permitted (for example it may be required to use only techniques covered in the course material).
Unless explicitly initialised, enumerations are integer numbered sequentially starting from zero. Given that, you can scan a lookup-table of strings, and cast the matching index to an enum. For example:
enum eUSstate
{
AL, AK, AZ, ..., NOSTATE
} ;
eUSstate string_to_enum( std::string inp )
{
static const int STATES = 50 ;
std::string lookup[STATES] = { "AL", "AK", "AZ" ... } ;
int i = 0 ;
for( i = 0; i < STATES && lookup[i] != inp; i++ )
{
// do nothing
}
return static_cast<eUSstate>(i) ;
}
If perhaps you don't want to rely on a brute-force cast and maintaining a look-up table in the same order as the enumerations, then a lookup table having both the string and the matching enum may be used.
eUSstate string_to_enum( std::string inp )
{
static const int STATES = 50 ;
struct
{
std::string state_string ;
eUSstate state_enum ;
} lookup[STATES] { {"AL", AL}, {"AK", AK}, {"AZ", AL} ... } ;
eUSstate ret = NOSTATE ;
for( int i = 0; ret == NOSTATE && i < STATES; i++ )
{
if( lookup[i].state_string == inp )
{
ret = lookup[i].state_enum ;
}
}
return ret ;
}
The look-up can be optimised by taking advantage of alphabetical ordering and performing a binary search, but for 50 states it is hardly worth it.
What you need is a table. Because the enums are linear,
a simple table of strings would be sufficient:
char const* const stateNames[] =
{
// In the same order as in the enum.
"NY",
"FL",
// ...
};
Then:
char const* const* entry
= std::find( std::begin( stateNames ), std::end( stateNames ), userInput );
if (entry == std::end( stateNames ) ) {
// Illegal input...
} else {
State value = static_cast<State>( entry - std::begin( stateNames ) );
Alternatively, you can have an array of:
struct StateMapping
{
State enumValue;
char const* name;
struct OrderByName
{
bool operator()( StateMapping const& lhs, StateMapping const& rhs ) const
{
return std::strcmp( lhs.name, rhs. name ) < 0;
}
bool operator()( StateMapping const& lhs, std::string const& rhs ) const
{
return lhs.name < rhs;
}
bool operator()( std::string const& lhs, StateMapping const& rhs ) const
{
return lhs < rhs.name;
}
};
};
StateMapping const states[] =
{
{ NY, "NY" },
// ...
};
sorted by the key, and use std::lower_bound:
StateMapping::OrderByName cmp;
StateMapping entry =
std::lower_bound( std::begin( states ), std::end( states ), userInput, cmp );
if ( entry == std::end( states ) || cmp( userInput, *entry) {
// Illegal input...
} else {
State value = entry->enumValue;
// ...
}
The latter is probably slightly faster, but for only fifty
entries, I doubt you'll notice the difference.
And of course, you don't write this code manually; you generate
it with a simple script. (In the past, I had code which would
parse the C++ source for the enum definitions, and generate the
mapping functionality from them. It's simpler than it sounds,
since you can ignore large chunks of the C++ code, other than
for keeping track of the various nestings.)
The solution is simple, but only for 2 characters in the string (as in your case):
#include <stdio.h>
#include <stdint.h>
enum TEnum
{
AL = 'LA',
NY = 'YN',
FL = 'LF'
};
int _tmain(int argc, _TCHAR* argv[])
{
char* input = "NY";
//char* input = "AL";
//char* input = "FL";
switch( *(uint16_t*)input )
{
case AL:
printf("input AL");
break;
case NY:
printf("input NY");
break;
case FL:
printf("input FL");
break;
}
return 0;
}
In above example I used an enumeration with a double character code (it is legal) and passed to the switch statement a input string. I tested it end work!. Notice the word alignment in enumeration.
Ciao

C++ storing functions and operators in a structure

How to improve a data structure for storing functions in arithmetic parser converting from infix to postfix notation?
At this moment I am using an array of char arrays:
char *funct[] = { "sin", "cos", "tan"... }
char text[] = "tan";
This impementation is a little bit confused and leads to the following comparisions, if we test char to be a function
if ( strcmp ( funct[0], text) == 0 ) || ( strcmp ( funct[1], "text ) == 0 ) || ( strcmp ( func[2], text) == 0 ))
{
... do something
}
( or to the for cycle version).
If there are a lot of functions (and a lot of comparisions), the index referencing leads to errors and it is not clear. There is also a necessity to change the index when we remove/add a new function....
How to improve such a structure so as it is easy to read, easy to maintain and easy to scale up?
I was thinking about enum
typedef enum
{
Fsin=0,
Fcos,
Ftan
} TFunctions;
which results to
if ( strcmp ( funct[Fsin], text) == 0 ) || ( strcmp ( funct[Fcos], "text ) == 0 ) || ( strcmp ( func[Ftan], text) == 0 ))
{
...
but there may be a better solution...
You can use std::map.
enum functions
{
sin,
cos,
tan
};
std::map<std::string, unsigned char> func_map;
func_map["sin"] = sin;
func_map["cos"] = cos;
func_map["tan"] = tan;
// then:
std::string text = "cos";
std::map<char*, unsigned char>::iterator it;
it = func_map.find(text);
if(it != func_map.end())
{
// ELEMENT FOUND
unsigned char func_id = it->second;
}
else
{
// NOT FOUND
}
For fastest code you may have some kind of map as follow:
typedef std::map<std::string, func_t> func_map;
func_map fm;
fm["sin"] = sin_func(); // get value of this entry from somewhere
fm["cos"] = cos_func(); // for example sin_func or cos_func
auto i = fm.find( "sin" );
if( i != fm.end() ) {
func_t f = i->second; // value found, we may use it.
}
Also if there is really a lot of items you may use std::unordered_map instead of std::map

Concise lists/vectors in C++

I'm currently translating an algorithm in Python to C++.
This line EXCH_SYMBOL_SETS = [["i", "1", "l"], ["s", "5"], ["b", "8"], ["m", "n"]]
is now
vector<vector<char>> exch_symbols;
vector<char> vector_1il;
vector_1il.push_back('1');
vector_1il.push_back('i');
vector_1il.push_back('l');
vector<char> vector_5s;
vector_5s.push_back('5');
vector_5s.push_back('s');
vector<char> vector_8b;
vector_8b.push_back('8');
vector_8b.push_back('b');
vector<char> vector_mn;
vector_mn.push_back('m');
vector_mn.push_back('n');
exch_symbols.push_back(vector_1il);
exch_symbols.push_back(vector_5s);
exch_symbols.push_back(vector_8b);
exch_symbols.push_back(vector_mn);
I hate to have an intermediate named variable for each inner variable in a 2-D vector. I'm not really familiar with multidimensional datastructures in C++. Is there a better way?
What's happening afterwards is this:
multimap<char, char> exch_symbol_map;
/*# Insert all possibilities
for symbol_set in EXCH_SYMBOL_SETS:
for symbol in symbol_set:
for symbol2 in symbol_set:
if symbol != symbol2:
exch_symbol_map[symbol].add(symbol2)*/
void insert_all_exch_pairs(const vector<vector<char>>& exch_symbols) {
for (vector<vector<char>>::const_iterator symsets_it = exch_symbols.begin();
symsets_it != exch_symbols.end(); ++symsets_it) {
for (vector<char>::const_iterator sym1_it = symsets_it->begin();
sym1_it != symsets_it->end(); ++sym1_it) {
for (vector<char>::const_iterator sym2_it = symsets_it->begin();
sym2_it != symsets_it->end(); ++sym2_it) {
if (sym1_it != sym2_it) {
exch_symbol_map.insert(pair<char, char>(*sym1_it, *sym2_it));
}
}
}
}
}
So this algorithm should work in one way or another with the representation here. The goal is that EXCH_SYMBOL_SETS can be easily changed later to include new groups of chars or add new letters to existing groups. Thank you!
I would refactor, instead of vector<char>, use std::string as internal, i.e.
vector<string> exch_symbols;
exch_symbols.push_back("1il");
exch_symbols.push_back("s5");
exch_symbols.push_back("b8");
exch_symbols.push_back("mn");
then change your insert method:
void insert_all_exch_pairs(const vector<string>& exch_symbols)
{
for (vector<string>::const_iterator symsets_it = exch_symbols.begin(); symsets_it != exch_symbols.end(); ++symsets_it)
{
for (string::const_iterator sym1_it = symsets_it->begin(); sym1_it != symsets_it->end(); ++sym1_it)
{
for (string::const_iterator sym2_it = symsets_it->begin(); sym2_it != symsets_it->end(); ++sym2_it)
{
if (sym1_it != sym2_it)
exch_symbol_map.insert(pair<char, char>(*sym1_it, *sym2_it));
}
}
}
}
You could shorten it by getting rid of the intermediate values
vector<vector<char> > exch_symbols(4, vector<char>()); //>> is not valid in C++98 btw.
//exch_symbols[0].reserve(3)
exch_symbols[0].push_back('i');
etc.
You could also use boost.assign or something similiar
EXCH_SYMBOL_SETS = [["i", "1", "l"], ["s", "5"], ["b", "8"], ["m", "n"]] then becomes
vector<vector<char>> exch_symbols(list_of(vector<char>(list_of('i')('1')('l')))(vector<char>(list_of('s')('5'))(list_of('m')('n'))) (not tested and never used it with nested vectors, but it should be something like this)
For your real question of...
how could I translate L = [A, [B],
[[C], D]]] to C++ ... at all!
There is no direct translation - you've switched from storing values of the same type to storing values of variable type. Python allows this because it's a dynamically typed language, not because it has a nicer array syntax.
There are ways to replicate the behaviour in C++ (e.g. a vector of boost::any or boost::variant, or a user defined container class that supports this behviour), but it's never going to be as easy as it is in Python.
Your code:
vector<char> vector_1il;
vector_1il.push_back('1');
vector_1il.push_back('i');
vector_1il.push_back('l');
Concise code:
char values[] = "1il";
vector<char> vector_1il(&values[0], &values[3]);
Is it fine with you?
If you want to use std::string as suggested by Nim, then you can use even this:
//Concise form of what Nim suggested!
std::string s[] = {"1il", "5s", "8b", "mn"};
vector<std::string> exch_symbols(&s[0], &s[4]);
Rest you can follow Nim's post. :-)
In c++0x the instruction
vector<string> EXCH_SYMBOL_SETS={"i1l", "s5", "b8", "mn"} ;
compiles and works fine. Sadly enough the apparently similar statement
vector<vector<char>> EXCH_SYMBOL_SETS={{'i','1','l'},{'s','5'}, {'b','8'}, {'m','n'}};
doesn't work :-(.
This is implemented in g++ 4.5.0 or later you should add the -std=c++0x option. I think this feature is not yet avaliable in microsoft c (VC10), and I don't know what's the status of other compilers.
I know that this is an old post, but in case anyone stumbles across it, C++ has gotten MUCH better at dealing with this stuff:
In c++11 the first code block can simply be re-written in as:
std::vector<std::string> exch_symbols {"1il", "5s", "8b", "mn"};
This isn't special to string either, we can nest vector like so:
std::vector<std::vector<int>> vov {{1,2,3}, {2,3,5,7,11}};
And here's the entire code in c++14-style, with an added cout at the end:
#include <iostream>
#include <map>
#include <string>
#include <vector>
void add_all_char_pairs (std::multimap<char, char> & mmap, const std::string & str)
{
// we choose not to add {str[i], str[i]} pairs for some reason...
const int s = str.size();
for (int i1 = 0; i1 < s; ++i1)
{
char c1 = str[i1];
for (int i2 = i1 + 1; i2 < s; ++i2)
{
char c2 = str[i2];
mmap.insert({c1, c2});
mmap.insert({c2, c1});
}
}
}
auto all_char_pairs_of_each_str (const std::vector<std::string> & strs)
{
std::multimap<char, char> mmap;
for (auto & str : strs)
{
add_all_char_pairs(mmap, str);
}
return mmap;
}
int main ()
{
std::vector<std::string> exch_symbols {"1il", "5s", "8b", "mn"};
auto mmap = all_char_pairs_of_each_str(exch_symbols);
for (auto e : mmap)
{
std::cout << e.first << e.second << std::endl;
}
}