Using templates for implementing a generic string parser

Using templates for implementing a generic string parser - c++

I am trying to come up with a generic solution for parsing strings (with a given format). For instance, I would like to be able to parse a string containing a list of numeric values (integers or floats) and return a std::vector. This is what I have so far:
template<typename T, typename U>
T parse_value(const U& u) {
throw std::runtime_error("no parser available");
}
template<typename T>
std::vector<T> parse_value(const std::string& s) {
std::vector<std::string> parts;
boost::split(parts, s, boost::is_any_of(","));
std::vector<T> res;
std::transform(parts.begin(), parts.end(), std::back_inserter(res),
[](const std::string& s) { return boost::lexical_cast<T>(s); });
return res;
}
Additionally, I would like to be able to parse strings containing other type of values. For instance:
struct Foo { /* ... */ };
template<>
Foo parse_value(const std::string& s) {
/* parse string and return a Foo object */
}
The reason to maintain a single "hierarchy" of parse_value functions is because, sometimes, I want to parse an optional value (which may exist or not), using boost::optional. Ideally, I would like to have just a single parse_optional_value function that would delegate on the corresponding parse_value function:
template<typename T>
boost::optional<T> parse_optional_value(const boost::optional<std::string>& s) {
if (!s) return boost::optional<T>();
return boost::optional<T>(parse_value<T>(*s));
}
So far, my current solution does not work (the compiler cannot deduce the exact function to use). I guess the problem is that my solution relies on deducing the template value based on the return type of parse_value functions. I am not really sure how to fix this (or even whether it is possible to fix it, since the design approach could just be totally flawed). Does anyone know a way to solve what I am trying to do? I would really appreciate if you could just point me to a possible way to address the issues that I am having with my current implementation. BTW, I am definitely open to completely different ideas for solving this problem too.

You cannot overload functions based on return value [1]. This is precisely why the standard IO library uses the construct:
std::cin >> a >> b;
which may not be your piece of cake -- many people don't like it, and it is truly not without its problems -- but it does a nice job of providing a target type to the parser. It also has the advantage over a static parse<X>(const std::string&) prototype that it allows for chaining and streaming, as above. Sometimes that's not needed, but in many parsing contexts it is essential, and the use of operator>> is actually a pretty cool syntax. [2]
The standard library doesn't do what would be far and away the coolest thing, which is to skip string constants scanf style and allow interleaved reading.
vector<int> integers;
std::cin >> "[" >> interleave(integers, ",") >> "]";
However, that could be defined. (Possibly it would be better to use an explicit wrapper around the string literals, but actually I prefer it like that; but if you were passing a variable you'd want to use a wrapper).
[1] With the new auto declaration, the reason for this becomes even clearer.
[2] IO manipulators, on the other hand, are a cruel joke. And error handling is pathetic. But you can't have everything.

Here is an example of libsass parser:
const char* interpolant(const char* src) {
return recursive_scopes< exactly<hash_lbrace>, exactly<rbrace> >(src);
}
// Match a single character literal.
// Regex equivalent: /(?:x)/
template <char chr>
const char* exactly(const char* src) {
return *src == chr ? src + 1 : 0;
}
where rules could be passed into the lex method.

Related

std::string aware options for vsprintf

I have an old MUD codebase in C (>80k lines) that uses printf-style string formatting. It is pervasive -- almost every bit of text runs through calls to either sprintf or a wrapper around vsprintf. However, I have recently moved to compiling with g++ to take advantage of the STL, and would like to use std::string (actually a derived class for default case-insensitive comparisons) where it makes sense.
Obviously, you can't pass std::string as one of the variadic arguments to any of the printf functions: I need .c_str() in every case. I don't want to do that, mostly because I don't want to modify 2000+ calls to printf functions. My question is: how can I make a std::string aware vsprintf?
The way I see it, I have two options: write my own printf functions that iterate through the arguments changing pointers to std::string to std::string.data (or c_out()) before passing to std::vsprintf, or I can borrow the guts of printf and roll my own. The first option sounds like less work, obviously.
Of course, a better option is if someone has done this before, but my googling is yielding nothing. Any tips on what the best option would look like?
EDIT:
This question was closed as a duplicate of How to use C++ std::ostream with printf-like formatting?, which I don't believe answers the question. I'm not asking how to output strings with std::ostream vs the old C printf. I'm asking for help with a patch solution for an old C codebase that makes extensive use of sprintf/vsprintf, without rewriting thousands of calls to those functions to use output streams.

You can make your own printf wrapper, that extracts char const* from std::string. E.g.:
#include <iostream>
#include <string>
#include <cstdio>
template<class T>
inline auto to_c(T&& arg) -> decltype(std::forward<T>(arg)) {
return std::forward<T>(arg);
}
inline char const* to_c(std::string const& s) { return s.c_str(); }
inline char const* to_c(std::string& s) { return s.c_str(); }
template<class... Args>
int my_printf(char const* fmt, Args&&... args) {
return std::printf(fmt, to_c(args)...);
}
int main() {
std::string name = "World";
my_printf("Hello, %s!\n", name);
}
Or, better, switch to a modern C++ formatting library, such as fmt.

The common advice is Boost.Format
Taking their example:
// printf directives's type-flag can be used to pass formatting options :
std::cout << format("_%1$4d_ is : _%1$#4x_, _%1$#4o_, and _%1$s_ by default\n") % 18;
// prints "_ 18_ is : _0x12_, _ 022_, and _18_ by default\n"
Now this assumes std::ostream&, so you'll need a std::stringstream to use a std::string as the backing buffer.
PS. using a derived class for case-insensitive comparisons sounds like a bad idea waiting to bite you. You just need a custom order; all the STL functions that assume ordering have overloads to support custom orderings.

C++ What is wrong about using this approach instead of enums when I want a string representation?

There are several questions around concerning this topic (e.g. here and here). I am a bit surprised how lenghty the proposed solutions are. Also, I am a bit lazy and would like to avoid maintaining an extra list of strings for my enums.
I came up with the following and I wonder if there is anything fundamentally wrong with my approach...
class WEEKDAY : public std::string{
public:
static const WEEKDAY MONDAY() {return WEEKDAY("MONDAY");}
static const WEEKDAY TUESDAY(){return WEEKDAY("TUESDAY");}
/* ... and so on ... */
private:
WEEKDAY(std::string s):std::string(s){};
};
Still I have to type the name/string representation more than once, but at least now its all in a single line for each possible value and also in total it does not take much more lines than a plain enum. Using these WEEKDAYS looks almost identical to using enums:
bool isAWorkingDay(WEEKDAY w){
if (w == WEEKDAY::MONDAY()){return true;}
/* ... */
return false;
}
and its straighforward to get the "string representation" (well, in fact it is just a string)
std::cout << WEEKDAY::MONDAY() << std::end;
I am still relatively new to C++ (not in writing but in understanding ;), so maybe there are things that can be done with enums that cannot be done with such kind of constants.

You could use the preprocessor to avoid duplicating the names:
#define WEEKDAY_FACTORY(DAY) \
static const WEEKDAY DAY() {return WEEKDAY(#DAY);}
WEEKDAY_FACTORY(MONDAY)
WEEKDAY_FACTORY(TUESDAY)
// and so on
Whether the deduplication is worth the obfuscation is a matter of taste. It would be more efficient to use an enumeration rather than a class containing a string in most places; I'd probably do that, and only convert to a string when needed. You could use the preprocessor to help with that in a similar way:
char const * to_string(WEEKDAY w) {
switch (w) {
#define CASE(DAY) case DAY: return #DAY;
CASE(MONDAY)
CASE(TUESDAY)
// and so on
}
return "UNKNOWN";
}

Piping from Istringstream into templates

I have the following questions: I have a map from string to string which is called psMap. I.e. psMap["a"]="20", psMap["b"]="test", psMap["c"]="12.5", psMap["d"]="1" (true) so the map stores string-expressions of various basic-data types.
The following function foo should (given a key), copy the mapped value to a corresponding type variable, i.e;
int aa;
foo("a", aa);
=> aa=20.
Explicitly, I want to have one function for all possible data-types (so no manually cast), so I tried with templates exploiting the automatic conversion of istringsteram, namely
template<class PARAMTYPE>
void foo(string _name, PARAMTYPE& _dataType) {
PARAMTYPE buff;
istringstream(psMap[_name]) >> buff;
_dataType = buff;
}
The problem is, that the ">>" operation gives an error: Error: no match for »operator>>« in »std::basic_stringstream<char>((* ....
What is going wrong here? Does the stringstream not recognize the correct data type and tries to pipe into an abstract type of "template"? How could I make my code work?
Tank you for your effort :)

You've created a temporary std::istream, which means that it
cannot bind to a non-const reference. Some of the >> are
member functions, and they will work, but others are free
functions with the signature:
std::istream& operator>>( std::istream&, TargetType& );
and these will not work (or even compile).
To avoid the problem either Just declare an std::istringstream
and use it, or call a member function on the temporary which
does nothing, but returns a (non-const) reference:
std::istringstream( psMap[name] ).ignore(0) >> buff;
(Personally, I find the separate variable more readable.)

You use reference as the template argument, so if you call
foo("a", aa);
without '& it should be fine (the way you tried the operator>> for pointer was needed). You also need to modify the last template line:
_dataType = buff;

Try this implementation:
template<class R>
R get_value(const std::string& name) {
R result{};
std::istringstream buffer{psMap[name]};
buffer >> result;
return result;
}
client code:
int x = get_value<int>("a");
Also, do not use identifiers starting with an underscore. That is reserved for library implementers.

C# String.Format with Parameters standard equivalent in C++?

I have a lot of C# Code that I have to write in C++. I don't have much experience in C++.
I am using Visual Studio 2012 to build. The project is an Static Library in C++ (not in C++/CLI).
In many places they were using String.Format, like this:
C#
String.Format("Some Text {0}, some other Text {1}", parameter0, parameter1);
Now, I know similar things have been asked before, but It is not clear to me what is the most standard/safe way to do this.
Would it be safe to use something like sprintf or printf? I read some people mentioning like they are not standard. Something like this? (would this be the C++ way, or is more the C way?)
C++ (or is it C?)
char buffer [50];
int n, a=5, b=3;
n=sprintf (buffer, "Some Text %d, some other Text %d", a, b);
Other people suggested to do your own class, and I saw many different implementations.
For the time being, I have a class that uses std::to_string, ostringstream, std::string.replace and std::string.find, with Templates. My class is rather limited, but for the cases I have in the C# code, it works. Now I don't know this is the most efficient way (or even correct at all):
C++
template <typename T>
static std::string ToString(T Number)
{
std::ostringstream stringStream;
stringStream << Number;
std::string string = stringStream.str();
return string;
};
template <typename T,unsigned S>
static std::string Format(const std::string& stringValue, const T (&parameters)[S])
{
std::string stringToReturn = std::string(stringValue);
for (int i = 0; i < S; ++i)
{
std::string toReplace = "{"+ std::to_string(i) +"}";
size_t f = stringToReturn.find(toReplace);
if(std::string::npos != f)
stringToReturn.replace(f, toReplace.length(), ToString(parameters[i]));
}
return stringToReturn;
};
//I have some other overloads that call the Format function that receives an array.
template <typename T>
static std::string Format(const std::string& stringValue, const T parameter, const T parameter2)
{
T parameters[] = {parameter, parameter2};
return Format(stringValue, parameters);
};
And I need my code to work both in Linux and Windows, so I need different compilers to be able to build it, that is why I need to be sure I am using a standard way. And my environment can not be updated so easily, so I can not use C++11. I can not use Boost either, because I can not be sure I will be able to add the libraries in the different environments I need it to work.
What is the best approach I can take in this case?

Here's a 1-header library I've been writing just for that purpose: fakeformat
Test:
REQUIRE(ff::format("{2}ff{1}").with('a').also_with(7).now()=="7ffa");
The library is configurable, so that you can start parameter indexing from 0. You can also write a wrapper, so that it would look exactly like String.Format.
It builds on linux and doesn't need c++11.
There's no standard way yet...
Or, you could use Boost.Locale formatting
Here it is, with indices starting from 0:
#include ...
struct dotnet_config {
static const char scope_begin='{';
static const char scope_end='}';
static const char separator=',';
static const char equals='=';
static const size_t index_begin=0;
static bool string_to_key(std::string const& to_parse,int& res) {
std::istringstream ss(to_parse);
ss.imbue(std::locale::classic());
ss >> res;
if (!ss.fail() && ss.eof())
return true;
return false;
}
};
template <typename T1>
std::string Format (std::string const& format_string,T1 p1) {
return ff::formatter<dotnet_config>(format_string).with(p1).now();
}
template <typename T1,typename T2>
std::string Format (std::string const& format_string,T1 p1,T2 p2) {
return ff::formatter<dotnet_config>(format_string).with(p1).with(p2).now();
}
int main() {
std::cout<<Format("test={0}",42)<<std::endl;
std::cout<<Format("{0}!={1}",33,42)<<std::endl;
return 0;
}
Output:
test=42
33!=42

sprintf works if all you have are non-object types (or you manually convert them to C-strings, or convert them to strings and then call the c_str() member function). You may want the extra protection against buffer overflow that snprintf provides.
If you're willing to learn more to do what you have to, you can use the Boost Format library. I'm sure you can write a script to convert String.format calls to Boost's syntax.
If you can't use Boost, and you can't use C++11, you have to go with sprintf and be careful about buffer overflow (possibly snprintf if you can rely on your compiler having it). You might want to write a script to wrap all the parameters so that they all convert to strings:
String.Format("Some Text {0}, some other Text {1}", to_printf(p0), to_printf(p1));
Also, note that C's format doesn't use braces. So that's a big problem. You may need to implement your own variadic function.
If everything is simple like {0}, you can probably write a script to replace most instances of String.Format (and none of the more complicated ones) with something like
`mystring = "Some Text "+tostring(p0)+", some other Text "+tostring(p1);`
which wouldn't be the most efficient way, but most likely won't matter unless you're doing thousands of formats per second. Or possibly slightly more efficient (no intermediate strings):
`"mystring = static_cast<std::ostringstream&>(std::ostringstream().flush()<<Some Text "<<p0<<", some other Text "<<p1).str();`,
which creates a temporary. The flush sort of tricks the compiler into thinking it's not a temporary, and that solves a specific problem about not being able to use non-member operator<<.

Why don't you use the << operator to format your string?
string strOutput;
stringstream strn;
int i = 10;
float f = 20.0f;
strn << "Sally scored "<<i<< " out of "<<f << ". She failed the test!";
strn >> strOutput;
cout << strOutput;

Changing Visual C++ output of inf

In Visual C++, if I have a double with the value inf, and I output it using a stream:
double myval = std::numeric_limits<double>::infinity();
std::ostringstream msg;
msg << "This is infinite: " << myval;
The result is "1.#INF".
Is there an easy way to make it print simply "inf" or "INF"? This string appears in text that will subsequently be parsed, and extra characters are causing us problems.
I thought of overloading the stream operator for double, but double is a built-in type.
I confess I can't figure out exactly how to search for an answer to the basic question...
Thanks!

This is possible, but somewhat non-trivial, and the correct way to do it is fairly obscure.
As an aside, I'll note that when you do something like msg << myval; only one of the operands has to be a user-defined type, which is the case here (even though you didn't define it, an ostringstream is still officially a user-defined type). That's more or less irrelevant though. The existing overload of operator<< will work fine; you don't need to provide your own.
I think of a stream as a "matchmaker". You have a stream buffer to handle the actual I/O, and a locale to handle the formatting. Thinking of things that way, the solution becomes fairly clear: since what you want to change is the formatting, and formatting is handled by a locale, you need to change the locale.
A locale, however, is really a heterogeneous collection. Specifically, it's a collection of facet classes. In this case, the facet we care about is the num_put facet. The num_put facet class has virtual do_put member functions for various types. The one we care about in this case is double:
template <class charT, class OutputIterator = std::ostreambuf_iterator<charT> >
class num_put : public std::num_put<charT, OutputIterator> {
public:
virtual iter_type do_put(iter_type i,
std::ios_base& b,
char_type fill,
double v) const
{
if (v == std::numeric_limits<double>().infinity()) {
static const char inf[]="inf";
std::copy(std::begin(inf), std::end(inf), i);
}
else {
std::ostringstream temp;
temp << v;
std::copy(temp.str().begin(), temp.str().end(), i);
}
return i;
}
};
To use it, you imbue the stream in question with a locale that includes that facet:
int main() {
char *d="0";
std::locale loc(std::locale::classic(), new num_put<char>);
std::cout.imbue(loc);
std::cout << 1.0/atoi(d);
return 0;
}
I should add, however, that this was slapped together pretty quickly, and testing is extremely minimal. It works for the test case, and probably for other narrow streams. At a guess, it probably needs more work before it'll work correctly with a wide stream though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js