C++ should I use templates, I'm about to create a lexer, why should it be limited chars? [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm about to create a lexer for a project, proof of concepts of it exists, the idea works and whatnot, I was about to start writing it and I realised:
Why chars?
(I'm moving away from C, I'm still fairly suspicious of the standard libraries, I felt it easier to deal in char* for offsets and such than learn about strings)
why not w_char or something, ints, or indeed any type (given it has some defined operations).
So should I use a template? So far it seems like yes I should but there are 2 counter-arguments I can consider:
Firstly, modular complication, the moment I write "template" it must go in a header file / be available with implementation to whatever uses it (it's not a matter of hiding source code I don't mind the having to show code part, it will be free (as in freedom) software) this means extra parsing and things like that.
My C background screams not to do this, I seem to want separate .o files, I understand why I can't by the way, I'm not asking for a way.
Separate object files speed up complication, because the make file (you tell it or have it use -MM with the compiler to figure out for itself) wont run the complication command for things that haven't changed and so forth.
Secondly, with templates, I know of no way to specify what a type must do, other than have the user realise when something fails (you know how Java has an extends keyword) I suspect that C++11 builds on this, as meta-programming is a large chapter in "The C++ programming language" 4th edition.
Are these reasons important these days? I learned with the following burned into my mind:
"You are not creating one huge code file that gets compiled, you create little ones that are linked" and templates seem to go against this.
I'm sure G++ parses very quickly, but it also optimises, if it spends a lot of time optimising one file, it'll re-do that optimisation every time it sees that in a translation unit, where as with separate object files, it does a bit (general optimisations) only once, and perhaps a bit more if you use LTO (link time optimisation)
Or I could create a class that every input to the lexer derives from and use that (generic programming I believe it's called) but my C-roots say "eww virtuals" and urge me towards the char*
I understand this is quite open, I just don't know where to draw the line between using a template, and not using a template.

Templates don't have to be in the header! If you have only a few instantiations, you can explicitly instantiate the class and function templates in suitable translation units. That is, a template would be split into three parts:
A header declaring the templates.
A header including the first and implementing the template but otherwise only included in the third set of files.
Source files including the headers in 2. and explicitly instantiating the templates with the corresponding types.
Users of these template would only include the header and never the implementation header. An example where this can be done are IOStreams: There are basically just two instantiations: one for char and one for wchar_t. Yes, you can instantiate the streams for other types but I doubt that anybody would do so (I'm sometimes questioning if anybody uses stream with a different character type than char but probably people are).
That said, the concepts used by templates are, indeed, not explicitly represented in the source and C++11 doesn't add any facilities to do so either. There were discussions on adding concepts to C++ but so far they are not part of any standard. There is a concepts light proposal which, I think, will be included in C++14.
However, in practice I haven't found that much of a problem: it is quite possible to document the concepts and use things like static_assert() to potentially produce nicer error messages. The problem is more that many concepts are actually more restrictive than the underlying algorithms and that the extra slack is sometimes quite useful.
Here is a brief and somewhat made-up example of how to implement and instantiate the template. The idea is to implement something like std::basic_ostream but merely provide out scaled-down version of a string output operator:
// simple-ostream.hpp
#include "simple-streambuf.hpp"
template <typename CT>
class simple_ostream {
simple_streambuf<CT>* d_sbuf;
public:
simple_ostream(simple_streambuf<CT>* sbuf);
simple_streambuf<CT>* rdbuf() { return this->d_sbuf; } // should be inline
};
template <typename CT>
simple_ostream<CT>& operator<< (simple_ostream<CT>&, CT const*);
Except for the rdbuf() member the above is merely a class definition with a few member declarations and a function declaration. The rdbuf() function is implemented directly to show that you can mix&match the visible implementation where performance is necessary with external implementation where decoupling is more important. The used class template simple_streambuf is thought to be similar to std::basic_streambuf and, at least, declared in the header "simple-streambuf.hpp".
// simple-ostream.tpp
// the implementation, only included to create explicit instantiations
#include "simple-ostream.hpp"
template <typename CT>
simple_ostream<CT>::simple_ostream(simple_streambuf<CT>* sbuf): d_sbuf(sbuf) {}
template <typename CT>
simple_ostream<CT>& operator<< (simple_ostream<CT>& out, CT const* str) {
for (; *str; ++str) {
out.rdbuf()->sputc(*str);
}
return out;
}
This implementation header is only included when explicitly instantiating the class and function templates. For example, to instantiations for char would look like this:
// simple-ostream-char.cpp
#include "simple-ostream.tpp"
// instantiate all class members for simple_ostream<char>:
template class simple_ostream<char>;
// instantiate the free-standing operator
template simple_ostream<char>& operator<< <char>(simple_ostream<char>&, char const*);
Any use of the simple_ostream<CT> would just include simple-ostream.hpp. For example:
// use-simple-ostream.cpp
#include "simple-ostream.hpp"
int main()
{
simple_streambuf<char> sbuf;
simple_ostream<char> out(&sbuf);
out << "hello, world\n";
}
Of course, to build an executable you will need both use-simple-ostream.o and simple-ostream-char.o but assuming the template instantiations are part of a library this isn't really adding any complexity. The only real headache is when a user wants to use the class template with unexpected instantiations, say, char16_t, but only char and wchar_t are provided: In this case the user would need to explicitly create the instantiations or, if necessary, include the implementation header.
In case you want to try the example out, below is a somewhat simple-minded and sloppy (because being header-only) implementation of simple-streambuf<CT>:
#ifndef INCLUDED_SIMPLE_STREAMBUF
#define INCLUDED_SIMPLE_STREAMBUF
#include <iostream>
template <typename CT> struct stream;
template <>
struct stream<char> {
static std::ostream& get() { return std::cout; }
};
template <>
struct stream<wchar_t> {
static std::wostream& get() { return std::wcout; }
};
template <typename CT>
struct simple_streambuf
{
void sputc(CT c) {
stream<CT>::get().rdbuf()->sputc(c);
}
};
#endif

Yes, it should be limited to chars. Why ? Because you're asking...
I have little experience with templates, but when I used templates the necessity arose naturally, I didn't need to try to use templates.
My 2 cents, FWIW.

1: Firstly, modular complication, the moment I write "template" it must go in a header file…
That's not a real argument. You have the ability to use C, C++, structs, classes, templates, classes with virtual functions, and all the other benefits of a multi paradigm language. You're not coerced to take an all-or-nothing approach with your designs, and you can mix and match these functionalities based on your design's needs. So you can use templates where they are an appropriate tool, and other constructs where templates are not ideal. It's hard to know when that will be, until after you have had experience using them all. Template/header-only libraries are popular, but one of the reasons the approach is used is that they simplify linking and the build process, and can reduce dependencies if designed well. If they are designed poorly, then yes, they can result in an explosion in compile times. That's not the language's fault -- it's the implementor's design.
Heck, you could even put your implementations behind C opaque types and use templates for everything, keeping the core template code visible to exactly one translation.
2: Secondly, with templates, I know of no way to specify what a type must do…
That is generally regarded as a feature. Instantiation can result in further instantiations which is capable of instantiating different implementations and specializations -- this is template meta programming domain. Often, all you really need to do is instantiate the implementation, which results in evaluation of the type and parameters. This -- simulation of "concepts" and interface verification -- can increase your build times, however. But furthermore, that may not be the best design because deferring instantiation is in many cases preferable.
If you just need to brute-force instantiate all your variants, one approach would be to create a separate translation which does just that -- you don't even need to link it to your library; add it to a unit test or some separate target. That way, you could validate instantiation and functionalities are correct without significant impact to your clients including/linking to the library.
Are these reasons important these days?
No. Build times are of course very important, but I think you just need to learn the right tool to use, and when and why some implementations must be abstracted (or put behind compilation firewalls) when/if you need fast builds and scalability for large projects. So yes, they are important, but a good design can strike a good balance between versatility and build times. Also remember that template metaprogramming is capable of moving a significant amount of program validation from runtime to compile time. So a hit on compile times does not have to be bad, because it can save you from a lot of runtime validations/issues.
I'm sure G++ parses very quickly, but it also optimises, if it spends a lot of time optimising one file, it'll re-do that optimisation every time it sees that in a translation unit…
Right; That redundancy can kill fast build times.
where as with separate object files, it does a bit (general optimisations) only once, and perhaps a bit more if you use LTO (link time optimisation) … Separate object files speed up complication, because the make file (you tell it or have it use -MM with the compiler to figure out for itself) wont run the complication command for things that haven't changed and so forth.
Not necessarily so. First, many object files produce a lot of demand on the linker. Second, it multiplies the work because you have more translations, so reducing object files is a good thing. This really depends on the structure of your libraries and dependencies. Some teams take the approach the opposite direction (I do quite regularly), and use an approach which produces few object files. This can make your builds many times faster with complex projects because you eliminate redundant work for the compiler and linker. For best results, you need a good understanding of the process and your dependencies. In large projects, translation/object reductions can result in builds which are many times faster. This is often referred to as a "Unity Build". Large Scale C++ Design by John Lakos is a great read on dependencies and C++ project structures, although it's rather dated at this point so you should not take every bit of advice at face value.
So the short answer is: Use the best tool for the problem at hand -- a good designer will use many available tools. You're far from exhausting the capabilities of the tools and build systems. A good understanding of these subjects will take years.

Related

Compiling template header in C++

I have a main.cpp and node.h(template) file. It seems to work when I compile only main.cpp that includes node.h. I am wondering if it is okay not to compile node.h?
C++ compilers generally require that the definitions of all templates be visible in every translation unit in which they are used (the one real exception is if you only allow particular specializations to be used and those specializations are instantiated somewhere, in that particular instance you can get away with hiding the implementation).
Whether you split template declarations from their definitions as you describe is really just a matter of style. Personally I don't care for that as it makes it that much harder to find the actual code for any given template.
However if the code you are dealing with is large enough (as many boost libraries are, for example), then it may well make sense to implement the public template in terms of many private parts and it can well make sense to have those parts be split into their own headers. But again, so long as all the needed code is available in every translation unit it is simply style and one choice is really not any "better" than another so long as it is consistent.

How does the compiler define the classes in type_traits?

In C++11 and later, the <type_traits> header contains many classes for type checking, such as std::is_empty, std::is_polymorphic, std::is_trivially_constructible and many others.
While we use these classes just like normal classes, I cannot figure out any way to possibly write the definition of these classes. No amount of SFINAE (even with C++14/17 rules) or other method seems to be able to tell if a class is polymorphic, empty, or satisfy other properties. An class that is empty still occupies a positive amount of space as the class must have a unique address.
How then, might compilers define such classes in C++? Or perhaps it is necessary for the compiler to be intrinsically aware of these class names and parse them specially?
Back in the olden days, when people were first fooling around with type traits, they wrote some really nasty template code in attempts to write portable code to detect certain properties. My take on this was that you had to put a drip-pan under your computer to catch the molten metal as the compiler overheated trying to compile this stuff. Steve Adamczyk, of Edison Design Group (provider of industrial-strength compiler frontends), had a more constructive take on the problem: instead of writing all this template code that takes enormous amounts of compiler time and often breaks them, ask me to provide a helper function.
When type traits were first formally introduced (in TR1, 2006), there were several traits that nobody knew how to implement portably. Since TR1 was supposed to be exclusively library additions, these couldn't count on compiler help, so their specifications allowed them to get an answer that was occasionally wrong, but they could be implemented in portable code.
Nowadays, those allowances have been removed; the library has to get the right answer. The compiler help for doing this isn't special knowledge of particular templates; it's a function call that tells you whether a particular class has a particular property. The compiler can recognize the name of the function, and provide an appropriate answer. This provides a lower-level toolkit that the traits templates can use, individually or in combination, to decide whether the class has the trait in question.

Elegant way to move templated method definitions out of header file

In one of my C++ projects, I make use of a many templates. Since I cannot put them in *.cpp files, they whole functions live in the headers at the moment.
But that messes up header files on the one hand, and leads to long compile times on the other hand. How can I handle implementations of templated functions in a clean way?
There isn't really a requirement that templates have to be in the header. They can very well be in a translation unit. The only requirement is that compiler is either able to instantiate them implicitly when they are used or that they get explicitly instantiated.
Whether separating the templatized code into headers and non-headers based on that is feasible depends pretty much on what is being done. It works rather well, e.g., for the IOStreams library because it is, in practice, only instantiated for the character types char and wchar_t. Writing the explicit instantiations is fairly straight forward and even if there are couple of more character types, e.g., char16_t and char32_t, it stays feasible. On the other hand, separating templates like std::vector<T> in a similar way is rather infeasible.
Using general templates like std::vector<T> in interfaces between subsystems quickly starts to become a major problem: while concrete instantiations or selected instantiations are OK, as the subsystem can be implemented without being a template, using arbitrary instantiations would force an entire system to be all templates. Doing so is infeasible in any real-world application which are often a couple of million lines of code on the small end.
What this amounts to is to use compilation-firewalls which are fully typed and don't use arbitrary templates between subsystems. To ease the use of the subsystem interfaces there may be thin template wrappers which, e.g., convert one container type into another container or which type-erase the template parameter where feasible. It needs to be recognized, however, that the compilation separation generally comes at a run-time performance cost: calling a virtual function is a lot more expensive than calling an inline function. Thus, the abstractions between subsystems may be very different from those within subsystems. For example, iterators are great internal abstractions. Between subsystems, a specific container, e.g., a std::vector<X> for some type X, tends to be more effective.
Note that the build-time interactions of templates are inherent in their dependency on specific instantiations. That is, even if there is a rather different system for declarations and template definitions, e.g., in the form of a module system rather than using header files, it won't become feasible to make everything a template. Using templates for local flexibility works great but they don't work without fixing the instantiations globally in large projects.
Finally a plug: here is a write-up on how to organize sources implementing templates.
You can just create a new header file library_detail.hpp and include it in your original header.
I've seen some people naming this implementation-header file using .t or .template extensions as well.
One thing that can help is to factor out functionality that doesn't depend on the template arguments into non templated helper functions that can be defined in an implementation file.
As an example, if you were implementing your own vector class you could factor out most of the memory management into a non template class that just works with an untyped array of bytes and have the definitions of its member functions in a .cpp file. The functions that actually need to know the type of the elements are implemented in the templated class in a header and delegate much of the work to the non-templated helpers.

Creating serializeable unique compile-time identifiers for arbitrary UDT's

I would like a generic way to create unique compile-time identifiers for any C++ user defined types.
for example:
unique_id<my_type>::value == 0 // true
unique_id<other_type>::value == 1 // true
I've managed to implement something like this using preprocessor meta programming, the problem is, serialization is not consistent. For instance if the class template unique_id is instantiated with other_type first, then any serialization in previous revisions of my program will be invalidated.
I've searched for solutions to this problem, and found several ways to implement this with non-consistent serialization if the unique values are compile-time constants. If RTTI or similar methods, like boost::sp_typeinfo are used, then the unique values are obviously not compile-time constants and extra overhead is present. An ad-hoc solution to this problem would be, instantiating all of the unique_id's in a separate header in the correct order, but this causes additional maintenance and boilerplate code, which is not different than using an enum unique_id{my_type, other_type};.
A good solution to this problem would be using user-defined literals, unfortunately, as far as I know, no compiler supports them at this moment. The syntax would be 'my_type'_id; 'other_type'_id; with udl's.
I'm hoping somebody knows a trick that allows implementing serialize-able unique identifiers in C++ with the current standard (C++03/C++0x), I would be happy if it works with the latest stable MSVC and GNU-G++ compilers, although I expect if there is a solution, it's not portable.
I would like to make clear, that using mpl::set or similar constructs like mpl::vector and filtering, does not solve this problem, because the scope of the meta-set/vector is limited and actually causes more problems than just preprocessor meta programming.
A while back I added a build step to one project of mine, which allowed me to write #script_name(args) in a C++ source file and have it automatically replaced with the output of the associated script, for instance ./script_name.pl args or ./script_name.py args.
You may balk at the idea of polluting the language into nonstandard C++, but all you'd have to do is write #sha1(my_type) to get the unique integer hash of the class name, regardless of build order and without the need for explicit instantiation.
This is just one of many possible nonstandard solutions, and I think a fairly clean one at that. There's currently no great way to impose an arbitrary, consistent ordering on your classes without just specifying it explicitly, so I recommend you simply give in and go the explicit instantiation route; there's nothing really wrong with centralising the information, but as you said it's not all that different from an enumeration, which is what I'd actually use in this situation.
Persistence of data is a very interesting problem.
My first question would be: do you really want serialization ? If you are willing to investigate an alternative, then jump to the next section.
If you're still there, I think you have not given the typeid solution all its due.
// static detection
template <typename T>
size_t unique_id()
{
static size_t const id = some_hash(typeid(T)); // or boost::sp_typeinfo
return id;
}
// dynamic detection
template <typename T>
size_t unique_id(T const& t)
{
return some_hash(typeid(t)); // no memoization possible
}
Note: I am using a local static to avoid the order of initialization issue, in case this value is required before main is entered
It's pretty similar to your unique_id<some_type>::value, and even though it's computed at runtime, it's only computed once, and the result (for the static detection) is then memoized for future calls.
Also note that it's fully generic: no need to explicitly write the function for each type.
It may seem silly, but the issue of serialization is that you have a one-to-one mapping between the type and its representation:
you need to version the representation, so as to be able to decode "older" versions
dealing with forward compatibility is pretty hard
dealing with cyclic reference is pretty hard (some framework handle it)
and then there is the issue of moving information from one to another --> deserializing older versions becomes messy and frustrating
For persistent saves, I usually recommend using a dedicated BOM. Think of the saved data as a message to your future self. And I usually go the extra mile and proposes the awesome Google Proto Buffer library:
Backward and Forward compatibility baked-in
Several format outputs -> human readable (for debug) or binary
Several languages can read/write the same messages (C++, Java, Python)
Pretty sure that you will have to implement your own extension to make this happen, I've not seen nor heard of any such construct for compile-time. MSVC offers __COUNTER__ for the preprocessor but I know of no template equivalent.

C++ Template Specialization Compilation

I'm going to outline my problem in detail to explain what I'm trying to achieve, the question is in the last paragraph if you wish to ignore the details of my problem.
I have a problem with a class design in which I wish to pass a value of any type into push() and pop() functions which will convert the value passed into a string representation that will be appended to a string inside the class, effectively creating a stream of data. The reverse will occur for pop(), taking the stream and converting several bytes at the front of the stream back into a specified type.
Making push() and pop() templates tied with stringstream is an obvious solution. However, I wish to use this functionality inside a DLL in which I can change the way the string is stored (encryption or compression, for example) without recompilation of clients. A template of type T would need to be recompiled if the algorithm changes.
My next idea was to just use functions such as pushByte(), pushInt(), popByte(), popInt() etc. This would allow me to change the implementation without recompilation of clients, since they rely only on a static interface. This would be fine. However, it isn't so flexible. If a value was changed from a byte to a short, for example, all instances of pushByte() corresponding to that value would need to be changed to pushShort(), similarly for popByte() to popShort(). Overloading pop() and push() to combat this would cause conflictions in types (causing explicit casting, which would end up causing the same problem anyway).
With the above ideas, I could create a working class. However, I wondered how specialized templates are compiled. If I created push<byte>() and push<short>(), it would be a type specific overload, and the change from byte to short would automatically switch the template used, which would be ideal.
Now, my question is, if I used specialized templates only to simulate this kind of overloading (without a template of type T), would all specializations compile into my DLL allowing me to dispatch a new implementation without client recompilation? Or are specialized templates selected or dropped in the same way as a template of type T at client compilation time?
First of all, you can't just have specialized templates without a base template to specialize. It's just not allowed. You have to start with a template, then you can provide specializations of it.
You can explicitly instantiate a template over an arbitrary set of types, and have all those instantiations compiled into your DLL, but I'm not sure this will really accomplish much for you. Ultimately, templates are basically a compile-time form of polymorphism, and you seem to need (at least a limited form of) run-time polymorphism.
I'd probably just use overloading. The problem that I'd guess you're talking about arises with something on the order of:
int a;
byte b;
a = pop();
b = pop();
Where you'd basically just be overloading pop on the return type (which, as we all know, isn't allowed). I'd avoid that pretty simply -- instead of returning the value, pass a reference to the value to be modified:
int a;
byte b;
pop(a);
pop(b);
This not only lets overload resolution work, but at least to me looks cleaner as well (though maybe I've just written too much assembly language, so I'm accustomed to things like "pop ax").
It sounds like you have 2 opposing factors:
You want your clients to be able to push/pop/etc. every numeric type. Templates seem like a natural solution, but this is at odds with a consistent (only needs to be compiled once) implementation.
You don't want your clients to have to recompile when you change implementation aspects. The pimpl idiom seems like a natural solution, but this is at odds with a generic (works with any type) implementation.
From your description, it sounds like you only care about numeric types, not arbitrary T's. You can declare specializations of your template for each of them explicitly in a header file, and define them in a source file, and clients will use the specializations you've defined rather than compiling their own. The specializations are a form of compile time polymorphism. Now you can combine it with runtime polymorphism -- implement the specializations in terms of an implementation class that is type agnostic. Your implementation class could use boost::variant to do this since you know the range of possible T's ahead of time (boost::variant<int, short, long, ...>). If boost isn't an option for you, you can come up with a similar scheme yourself so long as you have a finite number of Ts you care about.