Named parameter string formatting in C++ - c++

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.
mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30
#...
"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state
Are there any libraries that offer the functionality of those last two lines? I would expect it to offer a API something like:
PrintFMap(string format, map<string, string> args);
In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.
Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.

The fmt library supports named arguments:
print("You clicked {button} at {x},{y}.",
arg("button", "b1"), arg("x", 50), arg("y", 30));
And as a syntactic sugar you can even (ab)use user-defined literals to pass arguments:
print("You clicked {button} at {x},{y}.",
"button"_a="b1", "x"_a=50, "y"_a=30);
For brevity the namespace fmt is omitted in the above examples.
Disclaimer: I'm the author of this library.

I've always been critic with C++ I/O (especially formatting) because in my opinion is a step backward in respect to C. Formats needs to be dynamic, and makes perfect sense for example to load them from an external resource as a file or a parameter.
I've never tried before however to actually implement an alternative and your question made me making an attempt investing some weekend hours on this idea.
Sure the problem was more complex than I thought (for example just the integer formatting routine is 200+ lines), but I think that this approach (dynamic format strings) is more usable.
You can download my experiment from this link (it's just a .h file) and a test program from this link (test is probably not the correct term, I used it just to see if I was able to compile).
The following is an example
#include "format.h"
#include <iostream>
using format::FormatString;
using format::FormatDict;
int main()
{
std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
return 0;
}
It is different from boost.format approach because uses named parameters and because
the format string and format dictionary are meant to be built separately (and for
example passed around). Also I think that formatting options should be part of the
string (like printf) and not in the code.
FormatDict uses a trick for keeping the syntax reasonable:
FormatDict fd;
fd("x", 12)
("y", 3.141592654)
("z", "A string");
FormatString is instead just parsed from a const std::string& (I decided to preparse format strings but a slower but probably acceptable approach would be just passing the string and reparsing it each time).
The formatting can be extended for user defined types by specializing a conversion function template; for example
struct P2d
{
int x, y;
P2d(int x, int y)
: x(x), y(y)
{
}
};
namespace format {
template<>
std::string toString<P2d>(const P2d& p, const std::string& parms)
{
return FormatString("P2d(%{x}; %{y})") % FormatDict()
("x", p.x)
("y", p.y);
}
}
after that a P2d instance can be simply placed in a formatting dictionary.
Also it's possible to pass parameters to a formatting function by placing them between % and {.
For now I only implemented an integer formatting specialization that supports
Fixed size with left/right/center alignment
Custom filling char
Generic base (2-36), lower or uppercase
Digit separator (with both custom char and count)
Overflow char
Sign display
I've also added some shortcuts for common cases, for example
"%08x{hexdata}"
is an hex number with 8 digits padded with '0's.
"%026/2,8:{bindata}"
is a 24-bit binary number (as required by "/2") with digit separator ":" every 8 bits (as required by ",8:").
Note that the code is just an idea, and for example for now I just prevented copies when probably it's reasonable to allow storing both format strings and dictionaries (for dictionaries it's however important to give the ability to avoid copying an object just because it needs to be added to a FormatDict, and while IMO this is possible it's also something that raises non-trivial problems about lifetimes).
UPDATE
I've made a few changes to the initial approach:
Format strings can now be copied
Formatting for custom types is done using template classes instead of functions (this allows partial specialization)
I've added a formatter for sequences (two iterators). Syntax is still crude.
I've created a github project for it, with boost licensing.

The answer appears to be, no, there is not a C++ library that does this, and C++ programmers apparently do not even see the need for one, based on the comments I have received. I will have to write my own yet again.

Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.
As always I can envision some kind of speed / memory trade-off.
On the one hand, you can parse "Just In Time":
class Formater:
def __init__(self, format): self._string = format
def compute(self):
for k,v in context:
while self.__contains(k):
left, variable, right = self.__extract(k)
self._string = left + self.__replace(variable, v) + right
This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).
However it's far from being efficient...
On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.
I think however than the most efficient approach would be to have some kind of a mix of the two.
explode the format string into a list of Constant, Variable
index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).
There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).
When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.
Pros and cons:
- Just In Time: you scan the string again and again
- One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused.
- Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.
Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.

Given that Python it's self is written in C and that formatting is such a commonly used feature, you might be able (ignoring copy write issues) to rip the relevant code from the python interpreter and port it to use STL maps rather than Pythons native dicts.

I've writen a library for this puporse, check it out on GitHub.
Contributions are wellcome.

Related

C++ std Chrono - How did they manage to let us declare values as 1s, 1000ms, etc? [duplicate]

C++11 introduces user-defined literals which will allow the introduction of new literal syntax based on existing literals (int, hex, string, float) so that any type will be able to have a literal presentation.
Examples:
// imaginary numbers
std::complex<long double> operator "" _i(long double d) // cooked form
{
return std::complex<long double>(0, d);
}
auto val = 3.14_i; // val = complex<long double>(0, 3.14)
// binary values
int operator "" _B(const char*); // raw form
int answer = 101010_B; // answer = 42
// std::string
std::string operator "" _s(const char* str, size_t /*length*/)
{
return std::string(str);
}
auto hi = "hello"_s + " world"; // + works, "hello"_s is a string not a pointer
// units
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
At first glance this looks very cool but I'm wondering how applicable it really is, when I tried to think of having the suffixes _AD and _BC create dates I found that it's problematic due to operator order. 1974/01/06_AD would first evaluate 1974/01 (as plain ints) and only later the 06_AD (to say nothing of August and September having to be written without the 0 for octal reasons). This can be worked around by having the syntax be 1974-1/6_AD so that the operator evaluation order works but it's clunky.
So what my question boils down to is this, do you feel this feature will justify itself? What other literals would you like to define that will make your C++ code more readable?
Updated syntax to fit the final draft on June 2011
At first sight, it seems to be simple syntactic sugar.
But when looking deeper, we see it's more than syntactic sugar, as it extends the C++ user's options to create user-defined types that behave exactly like distinct built-in types. In this, this little "bonus" is a very interesting C++11 addition to C++.
Do we really need it in C++?
I see few uses in the code I wrote in the past years, but just because I didn't use it in C++ doesn't mean it's not interesting for another C++ developer.
We had used in C++ (and in C, I guess), compiler-defined literals, to type integer numbers as short or long integers, real numbers as float or double (or even long double), and character strings as normal or wide chars.
In C++, we had the possibility to create our own types (i.e. classes), with potentially no overhead (inlining, etc.). We had the possibility to add operators to their types, to have them behave like similar built-in types, which enables C++ developers to use matrices and complex numbers as naturally as they would have if these have been added to the language itself. We can even add cast operators (which is usually a bad idea, but sometimes, it's just the right solution).
We still missed one thing to have user-types behave as built-in types: user-defined literals.
So, I guess it's a natural evolution for the language, but to be as complete as possible: "If you want to create a type, and you want it to behave as much possible as a built-in types, here are the tools..."
I'd guess it's very similar to .NET's decision to make every primitive a struct, including booleans, integers, etc., and have all structs derive from Object. This decision alone puts .NET far beyond Java's reach when working with primitives, no matter how much boxing/unboxing hacks Java will add to its specification.
Do YOU really need it in C++?
This question is for YOU to answer. Not Bjarne Stroustrup. Not Herb Sutter. Not whatever member of C++ standard committee. This is why you have the choice in C++, and they won't restrict a useful notation to built-in types alone.
If you need it, then it is a welcome addition. If you don't, well... Don't use it. It will cost you nothing.
Welcome to C++, the language where features are optional.
Bloated??? Show me your complexes!!!
There is a difference between bloated and complex (pun intended).
Like shown by Niels at What new capabilities do user-defined literals add to C++?, being able to write a complex number is one of the two features added "recently" to C and C++:
// C89:
MyComplex z1 = { 1, 2 } ;
// C99: You'll note I is a macro, which can lead
// to very interesting situations...
double complex z1 = 1 + 2*I;
// C++:
std::complex<double> z1(1, 2) ;
// C++11: You'll note that "i" won't ever bother
// you elsewhere
std::complex<double> z1 = 1 + 2_i ;
Now, both C99 "double complex" type and C++ "std::complex" type are able to be multiplied, added, subtracted, etc., using operator overloading.
But in C99, they just added another type as a built-in type, and built-in operator overloading support. And they added another built-in literal feature.
In C++, they just used existing features of the language, saw that the literal feature was a natural evolution of the language, and thus added it.
In C, if you need the same notation enhancement for another type, you're out of luck until your lobbying to add your quantum wave functions (or 3D points, or whatever basic type you're using in your field of work) to the C standard as a built-in type succeeds.
In C++11, you just can do it yourself:
Point p = 25_x + 13_y + 3_z ; // 3D point
Is it bloated? No, the need is there, as shown by how both C and C++ complexes need a way to represent their literal complex values.
Is it wrongly designed? No, it's designed as every other C++ feature, with extensibility in mind.
Is it for notation purposes only? No, as it can even add type safety to your code.
For example, let's imagine a CSS oriented code:
css::Font::Size p0 = 12_pt ; // Ok
css::Font::Size p1 = 50_percent ; // Ok
css::Font::Size p2 = 15_px ; // Ok
css::Font::Size p3 = 10_em ; // Ok
css::Font::Size p4 = 15 ; // ERROR : Won't compile !
It is then very easy to enforce a strong typing to the assignment of values.
Is is dangerous?
Good question. Can these functions be namespaced? If yes, then Jackpot!
Anyway, like everything, you can kill yourself if a tool is used improperly. C is powerful, and you can shoot your head off if you misuse the C gun. C++ has the C gun, but also the scalpel, the taser, and whatever other tool you'll find in the toolkit. You can misuse the scalpel and bleed yourself to death. Or you can build very elegant and robust code.
So, like every C++ feature, do you really need it? It is the question you must answer before using it in C++. If you don't, it will cost you nothing. But if you do really need it, at least, the language won't let you down.
The date example?
Your error, it seems to me, is that you are mixing operators:
1974/01/06AD
^ ^ ^
This can't be avoided, because / being an operator, the compiler must interpret it. And, AFAIK, it is a good thing.
To find a solution for your problem, I would write the literal in some other way. For example:
"1974-01-06"_AD ; // ISO-like notation
"06/01/1974"_AD ; // french-date-like notation
"jan 06 1974"_AD ; // US-date-like notation
19740106_AD ; // integer-date-like notation
Personally, I would choose the integer and the ISO dates, but it depends on YOUR needs. Which is the whole point of letting the user define its own literal names.
Here's a case where there is an advantage to using user-defined literals instead of a constructor call:
#include <bitset>
#include <iostream>
template<char... Bits>
struct checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& checkbits<Bits...>::valid;
};
template<char High>
struct checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" _bits() noexcept
{
static_assert(checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101_bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101_bits;
// This throws at run time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101_bits");
}
The advantage is that a run-time exception is converted to a compile-time error.
You couldn't add the static assert to the bitset ctor taking a string (at least not without string template arguments).
It's very nice for mathematical code. Out of my mind I can see the use for the following operators:
deg for degrees. That makes writing absolute angles much more intuitive.
double operator ""_deg(long double d)
{
// returns radians
return d*M_PI/180;
}
It can also be used for various fixed point representations (which are still in use in the field of DSP and graphics).
int operator ""_fix(long double d)
{
// returns d as a 1.15.16 fixed point number
return (int)(d*65536.0f);
}
These look like nice examples how to use it. They help to make constants in code more readable. It's another tool to make code unreadable as well, but we already have so much tools abuse that one more does not hurt much.
UDLs are namespaced (and can be imported by using declarations/directives, but you cannot explicitly namespace a literal like 3.14std::i), which means there (hopefully) won't be a ton of clashes.
The fact that they can actually be templated (and constexpr'd) means that you can do some pretty powerful stuff with UDLs. Bigint authors will be really happy, as they can finally have arbitrarily large constants, calculated at compile time (via constexpr or templates).
I'm just sad that we won't see a couple useful literals in the standard (from the looks of it), like s for std::string and i for the imaginary unit.
The amount of coding time that will be saved by UDLs is actually not that high, but the readability will be vastly increased and more and more calculations can be shifted to compile-time for faster execution.
Bjarne Stroustrup talks about UDL's in this C++11 talk, in the first section on type-rich interfaces, around 20 minute mark.
His basic argument for UDLs takes the form of a syllogism:
"Trivial" types, i.e., built-in primitive types, can only catch trivial type errors. Interfaces with richer types allow the type system to catch more kinds of errors.
The kinds of type errors that richly typed code can catch have impact on real code. (He gives the example of the Mars Climate Orbiter, which infamously failed due to a dimensions error in an important constant).
In real code, units are rarely used. People don't use them, because incurring runtime compute or memory overhead to create rich types is too costly, and using pre-existing C++ templated unit code is so notationally ugly that no one uses it. (Empirically, no one uses it, even though the libraries have been around for a decade).
Therefore, in order to get engineers to use units in real code, we needed a device that (1) incurs no runtime overhead and (2) is notationally acceptable.
Let me add a little bit of context. For our work, user defined literals is much needed. We work on MDE (Model-Driven Engineering). We want to define models and metamodels in C++. We actually implemented a mapping from Ecore to C++ (EMF4CPP).
The problem comes when being able to define model elements as classes in C++. We are taking the approach of transforming the metamodel (Ecore) to templates with arguments. Arguments of the template are the structural characteristics of types and classes. For example, a class with two int attributes would be something like:
typedef ::ecore::Class< Attribute<int>, Attribute<int> > MyClass;
Hoever, it turns out that every element in a model or metamodel, usually has a name. We would like to write:
typedef ::ecore::Class< "MyClass", Attribute< "x", int>, Attribute<"y", int> > MyClass;
BUT, C++, nor C++0x don't allow this, as strings are prohibited as arguments to templates. You can write the name char by char, but this is admitedly a mess. With proper user-defined literals, we could write something similar. Say we use "_n" to identify model element names (I don't use the exact syntax, just to make an idea):
typedef ::ecore::Class< MyClass_n, Attribute< x_n, int>, Attribute<y_n, int> > MyClass;
Finally, having those definitions as templates helps us a lot to design algorithms for traversing the model elements, model transformations, etc. that are really efficient, because type information, identification, transformations, etc. are determined by the compiler at compile time.
Supporting compile-time dimension checking is the only justification required.
auto force = 2_N;
auto dx = 2_m;
auto energy = force * dx;
assert(energy == 4_J);
See for example PhysUnits-CT-Cpp11, a small C++11, C++14 header-only library for compile-time dimensional analysis and unit/quantity manipulation and conversion. Simpler than Boost.Units, does support unit symbol literals such as m, g, s, metric prefixes such as m, k, M, only depends on standard C++ library, SI-only, integral powers of dimensions.
Hmm... I have not thought about this feature yet. Your sample was well thought out and is certainly interesting. C++ is very powerful as it is now, but unfortunately the syntax used in pieces of code you read is at times overly complex. Readability is, if not all, then at least much. And such a feature would be geared for more readability. If I take your last example
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
... I wonder how you'd express that today. You'd have a KG and a LB class and you'd compare implicit objects:
assert(KG(1.0f) == LB(2.2f));
And that would do as well. With types that have longer names or types that you have no hopes of having such a nice constructor for sans writing an adapter, it might be a nice addition for on-the-fly implicit object creation and initialization. On the other hand, you can already create and initialize objects using methods, too.
But I agree with Nils on mathematics. C and C++ trigonometry functions for example require input in radians. I think in degrees though, so a very short implicit conversion like Nils posted is very nice.
Ultimately, it's going to be syntactic sugar however, but it will have a slight effect on readability. And it will probably be easier to write some expressions too (sin(180.0deg) is easier to write than sin(deg(180.0)). And then there will be people who abuse the concept. But then, language-abusive people should use very restrictive languages rather than something as expressive as C++.
Ah, my post says basically nothing except: it's going to be okay, the impact won't be too big. Let's not worry. :-)
I have never needed or wanted this feature (but this could be the Blub effect). My knee jerk reaction is that it's lame, and likely to appeal to the same people who think that it's cool to overload operator+ for any operation which could remotely be construed as adding.
C++ is usually very strict about the syntax used - barring the preprocessor there is not much you can use to define a custom syntax/grammar. E.g. we can overload existing operatos, but we cannot define new ones - IMO this is very much in tune with the spirit of C++.
I don't mind some ways for more customized source code - but the point chosen seems very isolated to me, which confuses me most.
Even intended use may make it much harder to read source code: an single letter may have vast-reaching side effects that in no way can be identified from the context. With symmetry to u, l and f, most developers will choose single letters.
This may also turn scoping into a problem, using single letters in global namespace will probably be considered bad practice, and the tools that are supposed mixing libraries easier (namespaces and descriptive identifiers) will probably defeat its purpose.
I see some merit in combination with "auto", also in combination with a unit library like boost units, but not enough to merit this adition.
I wonder, however, what clever ideas we come up with.
I used user literals for binary strings like this:
"asd\0\0\0\1"_b
using std::string(str, n) constructor so that \0 wouldn't cut the string in half. (The project does a lot of work with various file formats.)
This was helpful also when I ditched std::string in favor of a wrapper for std::vector.
Line noise in that thing is huge. Also it's horrible to read.
Let me know, did they reason that new syntax addition with any kind of examples? For instance, do they have couple of programs that already use C++0x?
For me, this part:
auto val = 3.14_i
Does not justify this part:
std::complex<double> operator ""_i(long double d) // cooked form
{
return std::complex(0, d);
}
Not even if you'd use the i-syntax in 1000 other lines as well. If you write, you probably write 10000 lines of something else along that as well. Especially when you will still probably write mostly everywhere this:
std::complex<double> val = 3.14i
'auto' -keyword may be justified though, only perhaps. But lets take just C++, because it's better than C++0x in this aspect.
std::complex<double> val = std::complex(0, 3.14);
It's like.. that simple. Even thought all the std and pointy brackets are just lame if you use it about everywhere. I don't start guessing what syntax there's in C++0x for turning std::complex under complex.
complex = std::complex<double>;
That's perhaps something straightforward, but I don't believe it's that simple in C++0x.
typedef std::complex<double> complex;
complex val = std::complex(0, 3.14);
Perhaps? >:)
Anyway, the point is: writing 3.14i instead of std::complex(0, 3.14); does not save you much time in overall except in few super special cases.

Maxima: creating a function that acts on parts of a string

Context: I'm using Maxima on a platform that also uses KaTeX. For various reasons related to content management, this means that we are regularly using Maxima functions to generate the necessary KaTeX commands.
I'm currently trying to develop a group of functions that will facilitate generating different sets of strings corresponding to KaTeX commands for various symbols related to vectors.
Problem
I have written the following function makeKatexVector(x), which takes a string, list or list-of-lists and returns the same type of object, with each string wrapped in \vec{} (i.e. makeKatexVector(string) returns \vec{string} and makeKatexVector(["a","b"]) returns ["\vec{a}", "\vec{b}"] etc).
/* Flexible Make KaTeX Vector Version of List Items */
makeKatexVector(x):= block([ placeHolderList : x ],
if stringp(x) /* Special Handling if x is Just a String */
then placeHolderList : concat("\vec{", x, "}")
else if listp(x[1]) /* check to see if it is a list of lists */
then for j:1 thru length(x)
do placeHolderList[j] : makelist(concat("\vec{", k ,"}"), k, x[j] )
else if listp(x) /* check to see if it is just a list */
then placeHolderList : makelist(concat("\vec{", k, "}"), k, x)
else placeHolderList : "makeKatexVector error: not a list-of-lists, a list or a string",
return(placeHolderList));
Although I have my doubts about the efficiency or elegance of the above code, it seems to return the desired expressions; however, I would like to modify this function so that it can distinguish between single- and multi-character strings.
In particular, I'd like multi-character strings like x_1 to be returned as \vec{x}_1 and not \vec{x_1}.
In fact, I'd simply like to modify the above code so that \vec{} is wrapped around the first character of the string, regardless of how many characters there may be.
My Attempt
I was ready to tackle this with brute force (e.g. transcribing each character of a string into a list and then reassembling); however, the real programmer on the project suggested I look into "Regular Expressions". After exploring that endless rabbit hole, I found the command regex_subst; however, I can't find any Maxima documentation for it, and am struggling to reproduce the examples in the related documentation here.
Once I can work out the appropriate regex to use, I intend to implement this in the above code using an if statement, such as:
if slength(x) >1
then {regex command}
else {regular treatment}
If anyone knows of helpful resources on any of these fronts, I'd greatly appreciate any pointers at all.
Looks like you got the regex approach working, that's great. My advice about handling subscripted expressions in TeX, however, is to avoid working with names which contain underscores in Maxima, and instead work with Maxima expressions with indices, e.g. foo[k] instead of foo_k. While writing foo_k is a minor convenience in Maxima, you'll run into problems pretty quickly, and in order to straighten it out you might end up piling one complication on another.
E.g. Maxima doesn't know there's any relation between foo, foo_1, and foo_k -- those have no more in common than foo, abc, and xyz. What if there are 2 indices? foo_j_k will become something like foo_{j_k} by the preceding approach -- what if you want foo_{j, k} instead? (Incidentally the two are foo[j[k]] and foo[j, k] when represented by subscripts.) Another problematic expression is something like foo_bar_baz. Does that mean foo_bar[baz], foo[bar_baz] or foo_bar_baz?
The code for tex(x_y) yielding x_y in TeX is pretty old, so it's unlikely to go away, but over the years I've come to increasing feel like it should be avoided. However, the last time it came up and I proposed disabling that, there were enough people who supported it that we ended up keeping it.
Something that might be helpful, there is a function texput which allows you to specify how a symbol should appear in TeX output. For example:
(%i1) texput (v, "\\vec{v}");
(%o1) "\vec{v}"
(%i2) tex ([v, v[1], v[k], v[j[k]], v[j, k]]);
$$\left[ \vec{v} , \vec{v}_{1} , \vec{v}_{k} , \vec{v}_{j_{k}} ,
\vec{v}_{j,k} \right] $$
(%o2) false
texput can modify various aspects of TeX output; you can take a look at the documentation (see ? texput).
While I didn't expect that I'd work this out on my own, after several hours, I made some progress, so figured I'd share here, in case anyone else may benefit from the time I put in.
to load the regex in wxMaxima, at least on the MacOS version, simply type load("sregex");. I didn't have this loaded, and was trying to work through our custom platform, which cost me several hours.
take note that many of the arguments in the linked documentation by Dorai Sitaram occur in the reverse, or a different order than they do in their corresponding Maxima versions.
not all the "pregexp" functions exist in Maxima;
In addition to this, escaping special characters varied in important ways between wxMaxima, the inline Maxima compiler (running within Ace editor) and the actual rendered version on our platform; in particular, the inline compiler often returned false for expressions that compiled properly in wxMaxima and on the platform. Because I didn't have sregex loaded on wxMaxima from the beginning, I lost a lot of time to this.
Finally, the regex expression that achieved the desired substitution, in my case, was:
regex_subst("\vec{\\1}", "([[:alpha:]])", "v_1");
which returns vec{v}_1 in wxMaxima (N.B. none of my attempts to get wxMaxima to return \vec{v}_1 were successful; escaping the backslash just does not seem to work; fortunately, the usual escaped version \\vec{\\1} does return the desired form).
I have yet to adjust the code for the rest of the function, but I doubt that will be of use to anyone else, and wanted to be sure to post an update here, before anyone else took time to assist me.
Always interested in better methods / practices or any other pointers / feedback.

How to store parsed function expressions for plugging-in many times?

As the topic indicates, my program needs to read several function expressions and plug-in different variables many times. Parsing the whole expression again every time I need to plug-in a new value is definitely way too ugly, so I need a way to store parsed expression.
The expression may look like 2x + sin(tan(5x)) + x^2. Oh, and the very important point -- I'm using C++.
Currently I have three ideas on it, but all not very elegant:
Storing the S-expression as a tree; evaluate it by recurring. It may
be the old-school way to handle this, but it's ugly, and I would
have to handle with different number of parameters (like + vs. sin).
Composing anonymous functions with boost::lambda. It may work nice,
but personally I don't like boost.
Writing a small python/lisp script, use its native lambda
expression and call it with IPC... Well, this is crazy.
So, any ideas?
UPDATE:
I did not try to implement support for parenthesis and functions with only one parameter, like sin().
I tried the second way first; but I did not use boost::lambda, but a feature of gcc which could be used to create (fake) anonymous functions I found from here. The resulting code has 340 lines, and not working correctly because of scoping and a subtle issue with stack.
Using lambda could not make it better; and I don't know if it could handle with scoping correctly. So sorry for not testing boost::lambda.
Storing the parsed string as S-expressions would definitely work, but the implementation would be even longer -- maybe ~500 lines? My project is not that kind of gigantic projects with tens of thousands lines of code, so devoting so much energy on maintaining that kind of twisted code which would not be used very often seems not a nice idea.
So finally I tried the third method -- it's awesome! The Python script has only 50 lines, pretty neat and easy to read. But, on the other hand, it would also make python a prerequisite of my program. It's not that bad on *nix machines, but on windows... I guess it would be very painful for the non-programmers to install Python. So is lisp.
However, my final solution is opening bc as a subprocess. Maybe it's a bad choice for most situations, however, it fits me well.
On the other hand, for projects work only under *nix or already have python as a prerequisite, personally I recommend the third way if the expression is simple enough to be parsed with hand-written parser. If it's very complicated, like Hurkyl said, you could consider creating a mini-language.
Why not use a scripting language designed for exactly this kind of purpose? There are several such languages floating around, but my experience is with lua.
I use lua to do this kind of thing "all the time". The code to embed and parse an expression like that is very small. It would look something like this (untested):
std::string my_expression = "2*x + math.sin( math.tan( x ) ) + x * x";
//Initialise lua and load the basic math library.
lua_State * L = lua_open();
lua_openmath(L);
//Create your function and load it into lua
std::string fn = "function myfunction(x) return "+my_expression+"end";
luaL_dostring( L, fn.c_str(), fn.size() );
//Use your function
for(int i=0; i<10; ++i)
{
// add the function to the stack
lua_getfield(L, LUA_GLOBALSINDEX, "myfunction");
// add the argument to the stack
lua_pushnumber(L, i);
// Make the call, using one argument and expecting one result.
// stack looks like this : FN ARG
lua_pcall(L,1,1)
// stack looks like this now : RESULT
// so get the result and print it
double result = lua_getnumber(L,-1);
std::cout<<i<<" : "<<result<<std::endl;
// The result is still on the stack, so clean it up.
lua_pop(L,1);
}

How do I handle combinations of behaviours?

I am considering the problem of validating real numbers of various formats, because this is very similar to a problem I am facing in design.
Real numbers may come in different combinations of formats, for example:
1. with/without sign at the front
2. with/without a decimal point (if no decimal point, then perhaps number of decimals can be agreed beforehand)
3. base 10 or base 16
We need to allow for each combination, so there are 2x2x2=8 combinations. You can see that the complexity increases exponentially with each new condition imposed.
In OO design, you would normally allocate a class for each number format (e.g. in this case, we have 8 classes), and each class would have a separate validation function. However, with each new condition, you have to double the number of classes required and it soon becomes a nightmare.
In procedural programming, you use 3 flags (i.e. has_sign, has_decimal_point and number_base) to identify the property of the real number you are validating. You have a single function for validation. In there, you would use the flags to control its behaviour.
// This is part of the validation function
if (has_sign)
check_sign();
for (int i = 0; i < len; i++)
{
if (has_decimal_point)
// Check if number[i] is '.' and do something if it is. If not, continue
if (number_base = BASE10)
// number[i] must be between 0-9
else if (number_base = BASE16)
// number[i] must be between 0-9, A-F
}
Again, the complexity soon gets out of hand as the function becomes cluttered with if statements and flags.
I am sure that you have come across design problems of this nature before - a number of independent differences which result in difference in behaviour. I would be very interested to hear how have you been able to implement a solution without making the code completely unmaintainable.
Would something like the bridge pattern have helped?
In OO design, you would normally
allocate a class for each number
format (e.g. in this case, we have 8
classes), and each class would have a
separate validation function.
No no no no no. At most, you'd have a type for representing Numeric Input (in case String doesn't make it); another one for Real Number (in most languages you'd pick a built-in type, but anyway); and a Parser class, which has the knowledge to take a Numeric Input and transform it into a Real Number.
To be more general, one difference of behaviour in and by itself doesn't automatically map to one class. It can just be a property inside a class. Most importantly, behaviours should be treated orthogonally.
If (imagining that you write your own parser) you may have a sign or not, a decimal point or not, and hex or not, you have three independent sources of complexity and it would be ok to find three pieces of code, somewhere, that treat one of these issues each; but it would not be ok to find, anywhere, 2^3 = 8 different pieces of code that treat the different combinations in an explicit way.
Imagine that add a new choice: suddenly, you remember that numbers might have an "e" (such as 2.34e10) and want to be able to support that. With the orthogonal strategy, you'll have one more independent source of complexity, the fourth one. With your strategy, the 8 cases would suddenly become 16! Clearly a no-no.
I don't know why you think that the OO solution would involve a class for each number pattern. My OO solution would be to use a regular expression class. And if I was being procedural, I would probably use the standard library strtod() function.
You're asking for a parser, use one:
http://www.pcre.org/
http://www.complang.org/ragel/
sscanf
boost::lexical_cast
and plenty of other alternatives...
Also: http://en.wikipedia.org/wiki/Parser_generator
Now how do I handle complexity for this kind of problems ? Well if I can, I reformulate.
In your case, using a parser generator (or regular expression) is using a DSL (Domain Specific Language), that is a language more suited to the problem you're dealing with.
Design pattern and OOP are useful, but definitely not the best solution to each and every problem.
Sorry but since i use vb, what i do is a base function then i combine a evaluator function
so ill fake code it out the way i have done it
function getrealnumber(number as int){ return getrealnumber(number.tostring) }
function getrealnumber(number as float){ return getrealnumber(number.tostring) }
function getrealnumber(number as double){ return getrealnumber(number.tostring) }
function getrealnumber(number as string){
if ishex(){ return evaluation()}
if issigned(){ return evaluation()}
if isdecimal(){ return evaluation()}
}
and so forth up to you to figure out how to do binary and octal
You don't kill a fly with a hammer.
I realy feel like using a Object-Oriented solution for your problem is an EXTREME overkill. Just because you can design Object-Oriented solution , doesn't mean you have to force such one to every problem you have.
From my experience , almost every time there is a difficulty in finding an OOD solution to a problem , It probably mean that OOD is not appropiate. OOD is just a tool , its not god itself. It should be used to solve large scale problems , and not problems such one you presented.
So to give you an actual answer (as someone mentioned above) : use regular expression , Every solution beyond that is just an overkill.
If you insist using an OOD solution.... Well , since all formats you presented are orthogonal to each other , I dont see any need to create a class for every possible combination. I would create a class for each format and pass my input through each , in that case the complexity will grow linearly.

What new capabilities do user-defined literals add to C++?

C++11 introduces user-defined literals which will allow the introduction of new literal syntax based on existing literals (int, hex, string, float) so that any type will be able to have a literal presentation.
Examples:
// imaginary numbers
std::complex<long double> operator "" _i(long double d) // cooked form
{
return std::complex<long double>(0, d);
}
auto val = 3.14_i; // val = complex<long double>(0, 3.14)
// binary values
int operator "" _B(const char*); // raw form
int answer = 101010_B; // answer = 42
// std::string
std::string operator "" _s(const char* str, size_t /*length*/)
{
return std::string(str);
}
auto hi = "hello"_s + " world"; // + works, "hello"_s is a string not a pointer
// units
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
At first glance this looks very cool but I'm wondering how applicable it really is, when I tried to think of having the suffixes _AD and _BC create dates I found that it's problematic due to operator order. 1974/01/06_AD would first evaluate 1974/01 (as plain ints) and only later the 06_AD (to say nothing of August and September having to be written without the 0 for octal reasons). This can be worked around by having the syntax be 1974-1/6_AD so that the operator evaluation order works but it's clunky.
So what my question boils down to is this, do you feel this feature will justify itself? What other literals would you like to define that will make your C++ code more readable?
Updated syntax to fit the final draft on June 2011
At first sight, it seems to be simple syntactic sugar.
But when looking deeper, we see it's more than syntactic sugar, as it extends the C++ user's options to create user-defined types that behave exactly like distinct built-in types. In this, this little "bonus" is a very interesting C++11 addition to C++.
Do we really need it in C++?
I see few uses in the code I wrote in the past years, but just because I didn't use it in C++ doesn't mean it's not interesting for another C++ developer.
We had used in C++ (and in C, I guess), compiler-defined literals, to type integer numbers as short or long integers, real numbers as float or double (or even long double), and character strings as normal or wide chars.
In C++, we had the possibility to create our own types (i.e. classes), with potentially no overhead (inlining, etc.). We had the possibility to add operators to their types, to have them behave like similar built-in types, which enables C++ developers to use matrices and complex numbers as naturally as they would have if these have been added to the language itself. We can even add cast operators (which is usually a bad idea, but sometimes, it's just the right solution).
We still missed one thing to have user-types behave as built-in types: user-defined literals.
So, I guess it's a natural evolution for the language, but to be as complete as possible: "If you want to create a type, and you want it to behave as much possible as a built-in types, here are the tools..."
I'd guess it's very similar to .NET's decision to make every primitive a struct, including booleans, integers, etc., and have all structs derive from Object. This decision alone puts .NET far beyond Java's reach when working with primitives, no matter how much boxing/unboxing hacks Java will add to its specification.
Do YOU really need it in C++?
This question is for YOU to answer. Not Bjarne Stroustrup. Not Herb Sutter. Not whatever member of C++ standard committee. This is why you have the choice in C++, and they won't restrict a useful notation to built-in types alone.
If you need it, then it is a welcome addition. If you don't, well... Don't use it. It will cost you nothing.
Welcome to C++, the language where features are optional.
Bloated??? Show me your complexes!!!
There is a difference between bloated and complex (pun intended).
Like shown by Niels at What new capabilities do user-defined literals add to C++?, being able to write a complex number is one of the two features added "recently" to C and C++:
// C89:
MyComplex z1 = { 1, 2 } ;
// C99: You'll note I is a macro, which can lead
// to very interesting situations...
double complex z1 = 1 + 2*I;
// C++:
std::complex<double> z1(1, 2) ;
// C++11: You'll note that "i" won't ever bother
// you elsewhere
std::complex<double> z1 = 1 + 2_i ;
Now, both C99 "double complex" type and C++ "std::complex" type are able to be multiplied, added, subtracted, etc., using operator overloading.
But in C99, they just added another type as a built-in type, and built-in operator overloading support. And they added another built-in literal feature.
In C++, they just used existing features of the language, saw that the literal feature was a natural evolution of the language, and thus added it.
In C, if you need the same notation enhancement for another type, you're out of luck until your lobbying to add your quantum wave functions (or 3D points, or whatever basic type you're using in your field of work) to the C standard as a built-in type succeeds.
In C++11, you just can do it yourself:
Point p = 25_x + 13_y + 3_z ; // 3D point
Is it bloated? No, the need is there, as shown by how both C and C++ complexes need a way to represent their literal complex values.
Is it wrongly designed? No, it's designed as every other C++ feature, with extensibility in mind.
Is it for notation purposes only? No, as it can even add type safety to your code.
For example, let's imagine a CSS oriented code:
css::Font::Size p0 = 12_pt ; // Ok
css::Font::Size p1 = 50_percent ; // Ok
css::Font::Size p2 = 15_px ; // Ok
css::Font::Size p3 = 10_em ; // Ok
css::Font::Size p4 = 15 ; // ERROR : Won't compile !
It is then very easy to enforce a strong typing to the assignment of values.
Is is dangerous?
Good question. Can these functions be namespaced? If yes, then Jackpot!
Anyway, like everything, you can kill yourself if a tool is used improperly. C is powerful, and you can shoot your head off if you misuse the C gun. C++ has the C gun, but also the scalpel, the taser, and whatever other tool you'll find in the toolkit. You can misuse the scalpel and bleed yourself to death. Or you can build very elegant and robust code.
So, like every C++ feature, do you really need it? It is the question you must answer before using it in C++. If you don't, it will cost you nothing. But if you do really need it, at least, the language won't let you down.
The date example?
Your error, it seems to me, is that you are mixing operators:
1974/01/06AD
^ ^ ^
This can't be avoided, because / being an operator, the compiler must interpret it. And, AFAIK, it is a good thing.
To find a solution for your problem, I would write the literal in some other way. For example:
"1974-01-06"_AD ; // ISO-like notation
"06/01/1974"_AD ; // french-date-like notation
"jan 06 1974"_AD ; // US-date-like notation
19740106_AD ; // integer-date-like notation
Personally, I would choose the integer and the ISO dates, but it depends on YOUR needs. Which is the whole point of letting the user define its own literal names.
Here's a case where there is an advantage to using user-defined literals instead of a constructor call:
#include <bitset>
#include <iostream>
template<char... Bits>
struct checkbits
{
static const bool valid = false;
};
template<char High, char... Bits>
struct checkbits<High, Bits...>
{
static const bool valid = (High == '0' || High == '1')
&& checkbits<Bits...>::valid;
};
template<char High>
struct checkbits<High>
{
static const bool valid = (High == '0' || High == '1');
};
template<char... Bits>
inline constexpr std::bitset<sizeof...(Bits)>
operator"" _bits() noexcept
{
static_assert(checkbits<Bits...>::valid, "invalid digit in binary string");
return std::bitset<sizeof...(Bits)>((char []){Bits..., '\0'});
}
int
main()
{
auto bits = 0101010101010101010101010101010101010101010101010101010101010101_bits;
std::cout << bits << std::endl;
std::cout << "size = " << bits.size() << std::endl;
std::cout << "count = " << bits.count() << std::endl;
std::cout << "value = " << bits.to_ullong() << std::endl;
// This triggers the static_assert at compile time.
auto badbits = 2101010101010101010101010101010101010101010101010101010101010101_bits;
// This throws at run time.
std::bitset<64> badbits2("2101010101010101010101010101010101010101010101010101010101010101_bits");
}
The advantage is that a run-time exception is converted to a compile-time error.
You couldn't add the static assert to the bitset ctor taking a string (at least not without string template arguments).
It's very nice for mathematical code. Out of my mind I can see the use for the following operators:
deg for degrees. That makes writing absolute angles much more intuitive.
double operator ""_deg(long double d)
{
// returns radians
return d*M_PI/180;
}
It can also be used for various fixed point representations (which are still in use in the field of DSP and graphics).
int operator ""_fix(long double d)
{
// returns d as a 1.15.16 fixed point number
return (int)(d*65536.0f);
}
These look like nice examples how to use it. They help to make constants in code more readable. It's another tool to make code unreadable as well, but we already have so much tools abuse that one more does not hurt much.
UDLs are namespaced (and can be imported by using declarations/directives, but you cannot explicitly namespace a literal like 3.14std::i), which means there (hopefully) won't be a ton of clashes.
The fact that they can actually be templated (and constexpr'd) means that you can do some pretty powerful stuff with UDLs. Bigint authors will be really happy, as they can finally have arbitrarily large constants, calculated at compile time (via constexpr or templates).
I'm just sad that we won't see a couple useful literals in the standard (from the looks of it), like s for std::string and i for the imaginary unit.
The amount of coding time that will be saved by UDLs is actually not that high, but the readability will be vastly increased and more and more calculations can be shifted to compile-time for faster execution.
Bjarne Stroustrup talks about UDL's in this C++11 talk, in the first section on type-rich interfaces, around 20 minute mark.
His basic argument for UDLs takes the form of a syllogism:
"Trivial" types, i.e., built-in primitive types, can only catch trivial type errors. Interfaces with richer types allow the type system to catch more kinds of errors.
The kinds of type errors that richly typed code can catch have impact on real code. (He gives the example of the Mars Climate Orbiter, which infamously failed due to a dimensions error in an important constant).
In real code, units are rarely used. People don't use them, because incurring runtime compute or memory overhead to create rich types is too costly, and using pre-existing C++ templated unit code is so notationally ugly that no one uses it. (Empirically, no one uses it, even though the libraries have been around for a decade).
Therefore, in order to get engineers to use units in real code, we needed a device that (1) incurs no runtime overhead and (2) is notationally acceptable.
Let me add a little bit of context. For our work, user defined literals is much needed. We work on MDE (Model-Driven Engineering). We want to define models and metamodels in C++. We actually implemented a mapping from Ecore to C++ (EMF4CPP).
The problem comes when being able to define model elements as classes in C++. We are taking the approach of transforming the metamodel (Ecore) to templates with arguments. Arguments of the template are the structural characteristics of types and classes. For example, a class with two int attributes would be something like:
typedef ::ecore::Class< Attribute<int>, Attribute<int> > MyClass;
Hoever, it turns out that every element in a model or metamodel, usually has a name. We would like to write:
typedef ::ecore::Class< "MyClass", Attribute< "x", int>, Attribute<"y", int> > MyClass;
BUT, C++, nor C++0x don't allow this, as strings are prohibited as arguments to templates. You can write the name char by char, but this is admitedly a mess. With proper user-defined literals, we could write something similar. Say we use "_n" to identify model element names (I don't use the exact syntax, just to make an idea):
typedef ::ecore::Class< MyClass_n, Attribute< x_n, int>, Attribute<y_n, int> > MyClass;
Finally, having those definitions as templates helps us a lot to design algorithms for traversing the model elements, model transformations, etc. that are really efficient, because type information, identification, transformations, etc. are determined by the compiler at compile time.
Supporting compile-time dimension checking is the only justification required.
auto force = 2_N;
auto dx = 2_m;
auto energy = force * dx;
assert(energy == 4_J);
See for example PhysUnits-CT-Cpp11, a small C++11, C++14 header-only library for compile-time dimensional analysis and unit/quantity manipulation and conversion. Simpler than Boost.Units, does support unit symbol literals such as m, g, s, metric prefixes such as m, k, M, only depends on standard C++ library, SI-only, integral powers of dimensions.
Hmm... I have not thought about this feature yet. Your sample was well thought out and is certainly interesting. C++ is very powerful as it is now, but unfortunately the syntax used in pieces of code you read is at times overly complex. Readability is, if not all, then at least much. And such a feature would be geared for more readability. If I take your last example
assert(1_kg == 2.2_lb); // give or take 0.00462262 pounds
... I wonder how you'd express that today. You'd have a KG and a LB class and you'd compare implicit objects:
assert(KG(1.0f) == LB(2.2f));
And that would do as well. With types that have longer names or types that you have no hopes of having such a nice constructor for sans writing an adapter, it might be a nice addition for on-the-fly implicit object creation and initialization. On the other hand, you can already create and initialize objects using methods, too.
But I agree with Nils on mathematics. C and C++ trigonometry functions for example require input in radians. I think in degrees though, so a very short implicit conversion like Nils posted is very nice.
Ultimately, it's going to be syntactic sugar however, but it will have a slight effect on readability. And it will probably be easier to write some expressions too (sin(180.0deg) is easier to write than sin(deg(180.0)). And then there will be people who abuse the concept. But then, language-abusive people should use very restrictive languages rather than something as expressive as C++.
Ah, my post says basically nothing except: it's going to be okay, the impact won't be too big. Let's not worry. :-)
I have never needed or wanted this feature (but this could be the Blub effect). My knee jerk reaction is that it's lame, and likely to appeal to the same people who think that it's cool to overload operator+ for any operation which could remotely be construed as adding.
C++ is usually very strict about the syntax used - barring the preprocessor there is not much you can use to define a custom syntax/grammar. E.g. we can overload existing operatos, but we cannot define new ones - IMO this is very much in tune with the spirit of C++.
I don't mind some ways for more customized source code - but the point chosen seems very isolated to me, which confuses me most.
Even intended use may make it much harder to read source code: an single letter may have vast-reaching side effects that in no way can be identified from the context. With symmetry to u, l and f, most developers will choose single letters.
This may also turn scoping into a problem, using single letters in global namespace will probably be considered bad practice, and the tools that are supposed mixing libraries easier (namespaces and descriptive identifiers) will probably defeat its purpose.
I see some merit in combination with "auto", also in combination with a unit library like boost units, but not enough to merit this adition.
I wonder, however, what clever ideas we come up with.
I used user literals for binary strings like this:
"asd\0\0\0\1"_b
using std::string(str, n) constructor so that \0 wouldn't cut the string in half. (The project does a lot of work with various file formats.)
This was helpful also when I ditched std::string in favor of a wrapper for std::vector.
Line noise in that thing is huge. Also it's horrible to read.
Let me know, did they reason that new syntax addition with any kind of examples? For instance, do they have couple of programs that already use C++0x?
For me, this part:
auto val = 3.14_i
Does not justify this part:
std::complex<double> operator ""_i(long double d) // cooked form
{
return std::complex(0, d);
}
Not even if you'd use the i-syntax in 1000 other lines as well. If you write, you probably write 10000 lines of something else along that as well. Especially when you will still probably write mostly everywhere this:
std::complex<double> val = 3.14i
'auto' -keyword may be justified though, only perhaps. But lets take just C++, because it's better than C++0x in this aspect.
std::complex<double> val = std::complex(0, 3.14);
It's like.. that simple. Even thought all the std and pointy brackets are just lame if you use it about everywhere. I don't start guessing what syntax there's in C++0x for turning std::complex under complex.
complex = std::complex<double>;
That's perhaps something straightforward, but I don't believe it's that simple in C++0x.
typedef std::complex<double> complex;
complex val = std::complex(0, 3.14);
Perhaps? >:)
Anyway, the point is: writing 3.14i instead of std::complex(0, 3.14); does not save you much time in overall except in few super special cases.