Template Metaprogramming - I still don't get it :( - c++

I have a problem... I don't understand template metaprogramming.
The problem is, that I’ve read a lot about it, but it still doesn’t make much sense to me.
Fact nr.1: Template Metaprogramming is faster
template <int N>
struct Factorial
{
enum { value = N * Factorial<N - 1>::value };
};
template <>
struct Factorial<0>
{
enum { value = 1 };
};
// Factorial<4>::value == 24
// Factorial<0>::value == 1
void foo()
{
int x = Factorial<4>::value; // == 24
int y = Factorial<0>::value; // == 1
}
So this metaprogram is faster ... because of the constant literal.
BUT: Where in the real world do we have constant literals? Most programs I use react on user input.
FACT nr. 2 : Template metaprogramming can accomplish better maintainability.
Yeah, the factorial example may be maintainable, but when it comes to complex functions, I and most other C++ programmers can't read them.
Also, the debugging options are very poor (or at least I don't know how to debug).
When does template metaprogramming make sense?

Just as factorial is not a realistic example of recursion in non-functional languages, neither is it a realistic example of template metaprogramming. It's just the standard example people reach for when they want to show you recursion.
In writing templates for realistic purposes, such as in everyday libraries, often the template has to adapt what it does depending on the type parameters it is instantiated with. This can get quite complex, as the template effectively chooses what code to generate, conditionally. This is what template metaprogramming is; if the template has to loop (via recursion) and choose between alternatives, it is effectively like a small program that executes during compilation to generate the right code.
Here's a really nice tutorial from the boost documentation pages (actually excerpted from a brilliant book, well worth reading).
http://www.boost.org/doc/libs/1_39_0/libs/mpl/doc/tutorial/representing-dimensions.html

I use template mete-programming for SSE swizzling operators to optimize shuffles during compile time.
SSE swizzles ('shuffles') can only be masked as a byte literal (immediate value), so we created a 'mask merger' template class that merges masks during compile time for when multiple shuffle occur:
template <unsigned target, unsigned mask>
struct _mask_merger
{
enum
{
ROW0 = ((target >> (((mask >> 0) & 3) << 1)) & 3) << 0,
ROW1 = ((target >> (((mask >> 2) & 3) << 1)) & 3) << 2,
ROW2 = ((target >> (((mask >> 4) & 3) << 1)) & 3) << 4,
ROW3 = ((target >> (((mask >> 6) & 3) << 1)) & 3) << 6,
MASK = ROW0 | ROW1 | ROW2 | ROW3,
};
};
This works and produces remarkable code without generated code overhead and little extra compile time.

so this Metaprogram is faster ... beacause of the Constant Literal.
BUT : Where in the real World do we have constant Literals ?
Most programms i use react on user input.
That's why it's hardly ever used for values. Usually, it is used on types. using types to compute and generate new types.
There are many real-world uses, some of which you're already familiar with even if you don't realize it.
One of my favorite examples is that of iterators. They're mostly designed just with generic programming, yes, but template metaprogramming is useful in one place in particular:
To patch up pointers so they can be used as iterators. An iterator must expose a handful of typedef's, such as value_type. Pointers don't do that.
So code such as the following (basically identical to what you find in Boost.Iterator)
template <typename T>
struct value_type {
typedef typename T::value_type type;
};
template <typename T>
struct value_type<T*> {
typedef T type;
};
is a very simple template metaprogram, but which is very useful. It lets you get the value type of any iterator type T, whether it is a pointer or a class, simply by value_type<T>::type.
And I think the above has some very clear benefits when it comes to maintainability. Your algorithm operating on iterators only has to be implemented once. Without this trick, you'd have to make one implementation for pointers, and another for "proper" class-based iterators.
Tricks like boost::enable_if can be very valuable too. You have an overload of a function which should be enabled for a specific set of types only. Rather than defining an overload for each type, you can use metaprogramming to specify the condition and pass it to enable_if.
Earwicker already mentioned another good example, a framework for expressing physical units and dimensions. It allows you to express computations like with physical units attached, and enforces the result type. Multiplying meters by meters yields a number of square meters. Template metaprogramming can be used to automatically produce the right type.
But most of the time, template metaprogramming is used (and useful) in small, isolated cases, basically to smooth out bumps and exceptional cases, to make a set of types look and behave uniformly, allowing you to use generic programming more efficiently

Seconding the recommendation for Alexandrescu's Modern C++ Design.
Templates really shine when you're writing a library that has pieces which can be assembled combinatorically in a "choose a Foo, a Bar and a Baz" approach, and you expect users to make use of these pieces in some form that is fixed at compile time. For example, I coauthored a data mining library that uses template metaprogramming to let the programmer decide what DecisionType to use (classification, ranking or regression), what InputType to expect (floats, ints, enumerated values, whatever), and what KernelMethod to use (it's a data mining thing). We then implemented several different classes for each category, such that there were several dozen possible combinations.
Implementing 60 separate classes to do this would have involved a lot of annoying, hard-to-maintain code duplication. Template metaprogramming meant that we could implement each concept as a code unit, and give the programmer a simple interface for instantiating combinations of these concepts at compile-time.
Dimensional analysis is also an excellent example, but other people have covered that.
I also once wrote some simple compile-time pseudo-random number generators just to mess with people's heads, but that doesn't really count IMO.

The factorial example is about as useful for real-world TMP as "Hello, world!" is for common programming: It's there to show you a few useful techniques (recursion instead of iteration, "else-if-then" etc.) in a very simple, relatively easy to understand example that doesn't have much relevance for your every-day coding. (When was the last time you needed to write a program that emitted "Hello, world"?)
TMP is about executing algorithms at compile-time and this implies a few obvious advantages:
Since these algorithms failing means your code doesn't compile, failing algorithms never make it to your customer and thus can't fail at the customer's. For me, during the last decade this was the single-most important advantage that led me to introduce TMP into the code of the companies I worked for.
Since the result of executing template-meta programs is ordinary code that's then compiled by the compiler, all advantages of code generating algorithms (reduced redundancy etc.) apply.
Of course, since they are executed at compile-time, these algorithms won't need any run-time and will thus run faster. TMP is mostly about compile-time computing with a few, mostly small, inlined functions sprinkled in between, so compilers have ample opportunities to optimize the resulting code.
Of course, there's disadvantages, too:
The error messages can be horrible.
There's no debugging.
The code is often hard to read.
As always, you'll just have to weight the advantages against the disadvantages in every case.
As for a more useful example: Once you have grasped type lists and basic compile-time algorithms operating on them, you might understand the following:
typedef
type_list_generator< signed char
, signed short
, signed int
, signed long
>::result_type
signed_int_type_list;
typedef
type_list_find_if< signed_int_type_list
, exact_size_predicate<8>
>::result_type
int8_t;
typedef
type_list_find_if< signed_int_type_list
, exact_size_predicate<16>
>::result_type
int16_t;
typedef
type_list_find_if< signed_int_type_list
, exact_size_predicate<32>
>::result_type
int32_t;
This is (slightly simplified) actual code I wrote a few weeks ago. It will pick the appropriate types from a type list, replacing the #ifdef orgies common in portable code. It doesn't need maintenance, works without adaption on every platform your code might need to get ported to, and emits a compile error if the current platform doesn't have the right type.
Another example is this:
template< typename TFunc, typename TFwdIter >
typename func_traits<TFunc>::result_t callFunc(TFunc f, TFwdIter begin, TFwdIter end);
Given a function f and a sequence of strings, this will dissect the function's signature, convert the strings from the sequence into the right types, and call the function with these objects. And it's mostly TMP inside.

Here's one trivial example, a binary constant converter, from a previous question here on StackOverflow:
C++ binary constant/literal
template< unsigned long long N >
struct binary
{
enum { value = (N % 10) + 2 * binary< N / 10 > :: value } ;
};
template<>
struct binary< 0 >
{
enum { value = 0 } ;
};

TMP does not necessarily mean faster or more maintainable code. I used the boost spirit library to implement a simple SQL expression parser that builds an evaluation tree structure. While the development time was reduced since I had some familiarity with TMP and lambda, the learning curve is a brick wall for "C with classes" developers, and the performance is not as good as a traditional LEX/YACC.
I see Template Meta Programming as just another tool in my tool-belt. When it works for you use it, if it doesn't, use another tool.

Scott Meyers has been working on enforcing code constraints using TMP.
Its quite a good read:
http://www.artima.com/cppsource/codefeatures.html
In this article he introduces the concepts of Sets of Types (not a new concept but his work is based ontop of this concept). Then uses TMP to make sure that no matter what order you specify the members of the set that if two sets are made of the same members then they are equavalent. This requires that he be able to sort and re-order a list of types and compare them dynamically thus generating compile time errors when they do not match.

I suggest you read Modern C++ Design by Andrei Alexandrescu - this is probably one of the best books on real-world uses of C++ template metaprogramming; and describes many problems which C++ templates are an excellent solution.

TMP can be used from anything like ensuring dimensional correctness (Ensuring that mass cannot be divided by time, but distance can be divided by time and assigned to a velocity variable) to optimizing matrix operations by removing temporary objects and merging loops when many matrices are involved.

'static const' values work as well. And pointers-to-member. And don't forget the world of types (explicit and deduced) as compile-time arguments!
BUT : Where in the real World do we have constant Literals ?
Suppose you have some code that has to run as fast as possible. It contains the critical inner loop of your CPU-bound computation in fact. You'd be willing to increase the size of your executable a bit to make it faster. It looks like:
double innerLoop(const bool b, const vector<double> & v)
{
// some logic involving b
for (vector::const_iterator it = v.begin; it != v.end(); ++it)
{
// significant logic involving b
}
// more logic involving b
return ....
}
The details aren't important, but the use of 'b' is pervasive in the implementation.
Now, with templates, you can refactor it a bit:
template <bool b> double innerLoop_B(vector<double> v) { ... same as before ... }
double innerLoop(const bool b, const vector<double> & v)
{ return b ? innerLoop_templ_B<true>(v) : innerLoop_templ_B<false>(v) ); }
Any time you have a relatively small, discrete, set of values for a parameter you can automatically instantiate separate versions for them.
Consider the possiblities when 'b' is based on the CPU detection. You can run a differently-optimized set of code depending on run-time detection. All from the same source code, or you can specialize some functions for some sets of values.
As a concrete example, I once saw some code that needed to merge some integer coordinates. Coordinate system 'a' was one of two resolutions (known at compile time), and coordinate system 'b' was one of two different resolutions (also known at compile time). The target coordinate system needed to be the least common multiple of the two source coordinate systems. A library was used to compute the LCM at compile time and instantiate code for the different possibilities.

Related

How to create optimized variations of the same algorithm without too much copy/paste code? (C++)

When I write C++ code for realtime optimised purposes, such as audio or graphics processing, I run into the following problem quite often:
I need several variations of piece of code but with only some tiny change in their innerloops. Most often this required variation is how the algorithm outputs the results. I.e. should it replace the previous data in the output buffer, should it add it with the previous data in the buffer, or should it multiply it, etc.
Copy-pasting the whole method and changing just couple of characters from one line in the inner loop feels like an awful way to do things.
One way to cure this fairly efficiently would be to always use some simple blending formula which you give parameters, such as:
*p_buffer++ = A*(*p_buffer) + B*new_value + C;
Then you could simply give A, B and C to that algorithm and be done with it. But that feels like a waste of CPU cycles. Also if the buffer hasn't been initialized with valid float values and there's a NAN, then the results might also be NAN, even if you intend to multiply the previous value in the buffer with 0.
So is there another efficient way of creating these kinds of variations for a method without sacrificing speed? I'm hoping to have separate methods for each of the variations I use. For example:
Render();
RenderAdditive();
RenderMultiplicative();
RenderSomeOtherWay();
EDIT:
Based on the answers, I went by defining the method as:
template<int TYPE> void Render(...)
{
if constexpr (TYPE == 0)
*p_output_buffer++ = data;
if constexpr (TYPE == 1)
*p_output_buffer++ += data;
if constexpr (TYPE == 2)
*p_output_buffer++ *= data;
}
This approach works great.
Thank you for everyone for helping!

How do I declare an arbitrary depth of pair of pairs?

A pair of pair of ints can be declared as: std::pair<int, std::pair<int, int> > A;
Similarly a pair of pair of pair of ints as std::pair<int, std::pair<int, std::pair<int, int> > >A;
I want to declare an arbitrary "pair of pairs" in my code. i.e., Depending on some value (known only at runtime), I want to have either a pair of pair of ints (n = 1) or pair of pair of pair of ints (n = 2) and so on. Was wondering how do I do it efficiently in C++?
Below is a snippet code in Python:
import numpy as np
n = 4 # a value known at runtime
m = 2 # a value known at runtime
def PP(A, j):
A_s = []
if j == n-1:
for i in range(1, m):
A_s.append((i, A[i]))
else:
for i in range(1, m):
A_c = A[i]
A_s.append((i, PP(A_c, j+1)))
return A_s
j = 0
# The dimension of A is known at runtime.
# Will have to create np.ones((m, m, m, m, m)) if n = 5
A = np.ones((m, m, m, m))
B = PP(A, 0)
Unlike Python, C++ is a statically-typed language. So if the structure or size of what you want to store isn't known until run time, you can't use the type itself, like nested pairs, to describe the specific structure. Instead what you do is use C++ types that can resize dynamically. Namely, std::vector<int> is an idiomatic and efficient way to store a dynamic number of ints in C++.
If you really want a tree-like structure as in your Python example ([(1, [(1, [(1, [(1, 1.0)])])])]), this is possible in C++, too. But it's a bit more work. See for instance Binary Trees in C++: Part 1.
C++ (see this website and n3337, the C++11 draft standard) has both std::variant and std::optional (in C++17). It seems that you want have some tagged union (like the abstract syntax trees inside C++ compilers) and smart pointers (e.g. std::unique_ptr).
Maybe you want something like
class Foo; // forward declaration
class Foo {
std::variant<std::unique_ptr<Foo>, std::pair<int,int>> fields;
/// many other things, in particular for the C++ rule of five
};
In contrast to Python, each C++ value has a known type.
You could combine both cleverly, and you might be interested by other standard containers (in particular std::vector). Please read a good C++ programming book and be aware of the rule of five.
Take inspiration from existing open source software on github or gitlab (such as Qt, RefPerSys, Wt, Fish, Clang static analyzer, GCC, and many others)
Read also the documentation of your C++ compiler (e.g. GCC) and of your debugger (e.g. GDB). If you use a recent GCC, compile with all warnings and debug info, e.g. g++ -Wall -Wextra -g. Document by writing your coding conventions, and for a large enough codebase, consider writing your GCC plugin to enforce them. A recent GCC (so GCC 10 in October 2020) has interesting static analysis options, that you could try.
Consider using the Clang static analyzer (see also this draft report, and the DECODER and CHARIOT European projects, and MILEPOST GCC) on your C++ code base.
Read also papers about C++ in recent ACM SIGPLAN conferences.
You could also define your class Matrix, or use existing libraries providing them (e.g. boost), perhaps adapting this answer to C++.
Remember that the sizeof every C++ type is known at compile time. AFAIK, C++ don't have any flexible array members (like C does).
As others have already said, you cannot do it. Not at runtime, at least.
However it seems also not useful in general. Let's say you have the following type (like yours, but with a know level of nesting at compile time) and you declare an object of that type
using your_type = std::pair<int, std::pair<int, std::pair<int, int>>>
your_type obj1{1,{2,{3,4}}};
This no different from
std::array<int,4> obj2{1,2,3,4};
where you have this correspondence:
obj1.first == obj2[0]
obj1.second.first == obj2[1]
obj1.second.second.first == obj2[2]
obj1.second.second.second == obj2[3]
And to stress that they are actually the same thing, think about how you would implement a size function for such a type; it could be a metafunction like this (probably there are better ways to do it):
template<typename T, typename Ts>
constexpr int size(std::pair<T,Ts>) {
if constexpr (std::is_same_v<T,Ts>)
return 2;
else
return 1 + size(Ts{});
}
In other words they are isomorphic.
Now your requirement is to write your_type in such a way that the level of nesting is known at run time. As already said, that's not possible, but guess what is isomorphic to the type you imagine? std::vector<int>!
So my answer is: you don't need that; just use std::vector instead.
As a final comment, based on my answer it seems reasonable that, if you really want a type which behaves like an arbitrarily nested pair, you can just write a wrapper around std::vector. But I don't see why you might really need to do it.

C++ architecture for compile-time array dimensionality and named indexing

I can work with code in C++, but it's not where I spend most of my time. I usually work in another language, where, over the course of my career, I have put together a well defined architecture for building predictor/corrector (e.g Kalman filter) type algorithms that are easily maintained and modified. For the sake of a ground up deployment of a recently designed filter, I am hoping to replicate this architecture within a C++ framework. Hopefully, we can get the same level of extensibility built into the deployed product, so I don't need to keep jumping back-and-forth to another language whenever I want to modify the model being used by the filter.
The idea here is that we're going to have an array that contains a bunch of different information about the state of a given system. Let's say, for example, we have a an object with a position and orientation in 3D... We'll use a quaternion for the orientation, but the specifics of that aren't super important.
Here's some pseudo-code to demonstrate what I'm trying to accomplish:
function build_model()
model.add_state('quaternion',[0;0;0;1],[1;1;1]);
model.add_state('position',[0;0;0],[10;10;10]);
model.add_input('velocity',[0;0;0]);
model.add_input('angular_rate',[0;0;0]);
model.add_noise('velocity_noise',[1;1;1]);
model.add_noise('angular_rate_noise',0.01*[1;1;1]);
end
where the above have the form:
add_state(state_name, initial_state_estimate, init_error_std_deviation_estimate)
add_input(input_name, initial_input_value)
add_noise(noise_name, noise_std_deviation)
After calling build_model() happens, I end up with a bunch of information about the estimator.
The state space is of dimension 7
The state error space is of dimension 6
The input vector is of dimension 6
The "process noise" vector is of dimension 6
Further (indexed from 0), I have some arrays, such that:
state[0:3] holds the quaternion
state[4:6] holds the position
state_err[0:2] holds quaternion error
state_err[3:5] holds position error
input[0:2] holds velocity
input[3:5] holds angular_rate
process_noise[0:2] holds velocity noise
process_noise[3:5] holds angular rate noise
... but, I don't want a bunch of hard-coded indices... in fact, once the model is built, the rest of the code should be designed to be completely agnostic to the positions/dimensions/etc of the variables/model/state/error-space etc.
Since the estimator and the model don't really care about each other, I try to keep them encapsulated... i.e. the estimator just has state/error/noise of known dimensions and processes it with functions of a generic format, and then the model specific stuff is presented in the appropriate format. This, unfortunately, makes using an indexed array (rather than a struct or something) preferable.
Essentially what I'm looking for, is a pre-compiler way to associate names (like a structure) and indices (like an array) with the same data... ideally building it up piece by piece using simple language as shown above, to a final dimension, determined by the pre-compiler based on the model definition, to be used for defining the size of various arrays within the estimator runtime algorithm.
I'm not looking for someone to do this for me, but I'd love a push in the right direction. Good architecture early pays dividends in the long run, so I'm willing to invest some time to get it right.
So, a couple of things I've thought about:
There are definitely ways to do this at run-time with dynamic memory and things like std:vector, structures, enums, and so forth. But, since the deployed version of this is going to be running in real-time, performance is an issue... besides, all of this stuff shouldn't need to happen at run-time anyway. If we had sufficiently sophisticated precompiler, it could just calculate all of this out, define some constants/macros/whatever to manipulate the model by name while using indices behind the scenes... unfortunately, fancy precompiler stuff is a pretty niche area that I have little experience with.
It seems like template meta-programming and/or macros might be a way to go, but I'm hesitant to dive head-first into that without guidance, and I recognize that this is shady at best in terms of modern software design.
I could always write code to write the C++ code for me... i.e. spit out a bunch of #defines or enums for the indices by name, as well as the dimensionality of the model/estimator components, and just copy paste this into the C++ code... but that feels wrong for different reasons. On the other hand, that's one way to get a "sufficiently sophisticated pre-compiler".
Giving up on the compile-time dimensioning of my arrays would also solve the problem, but since the all of this is constant once computed, run-time seems like the wrong place for it...
So, is there an elegant solution out there? I'd hate to just brute force this, but I don't see a clear alternative. Also, much of the above may be WAY OFF for any number of reasons... apologies if so, and I appreciate any input you might have :-)
I ended up getting most of the way there using template meta-programming... [see below]
I'd like to find a way to add the state to state_enum and define its corresponding set_state struct at the same time ie:
add_state(quaternion,{0,0,0,1},{1,1,1})
just for cleanliness and to prevent one happening without the other... if anyone has ideas on how to do this (preferably without using __COUNTER__ or boost), let me know. Thanks!
#include <iostream>
struct state_enum{
enum{quaternion,position,last};
};
template <int state_num> struct set_state{
static constexpr double x0[] = {};
static constexpr double sx0[] = {};
};
template <> struct set_state<state_enum::quaternion>{
static constexpr double x0[] = {0,0,0,1};
static constexpr double sx0[] = {1,1,1};
};
template <> struct set_state<state_enum::position>{
static constexpr double x0[] = {0,0,0};
static constexpr double sx0[] = {2,2,2};
};
template <int state_num> struct state{
enum{
m_x = sizeof(set_state<state_num>::x0)/sizeof(set_state<state_num>::x0[0]),
m_dx = sizeof(set_state<state_num>::sx0)/sizeof(set_state<state_num>::sx0[0])
};
enum{
m_x_cummulative = state<state_num-1>::m_x_cummulative+m_x,
m_dx_cummulative=state<state_num-1>::m_dx_cummulative+m_dx,
i_x0=state<state_num-1>::m_x_cummulative,
i_dx0=state<state_num-1>::m_dx_cummulative,
i_x1=state<state_num-1>::m_x_cummulative+m_x-1,
i_dx1=state<state_num-1>::m_dx_cummulative+m_dx-1
};
};
template <> struct state<-1>{
enum{m_x = 0, m_dx=0};
enum{m_x_cummulative = 0, m_dx_cummulative=0, i_x0 = 0, i_dx0=0, i_x1 = 0, i_dx1=0};
};
int main(int argc, const char * argv[]) {
std::cout << "Summary of model indexing and dimensions...\n\n";
std::printf("%-32s %02i\n","quaternion first state index",state<state_enum::quaternion>::i_x0);
std::printf("%-32s %02i\n","quaternion final state index",state<state_enum::quaternion>::i_x1);
std::printf("%-32s %02i\n","position first state index",state<state_enum::position>::i_x0);
std::printf("%-32s %02i\n","position final state index",state<state_enum::position>::i_x1);
std::printf("%-32s %02i\n","full state vector dimensionality",state<state_enum::last>::m_x_cummulative);
std::cout << "\n";
std::printf("%-32s %02i\n","quaternion first error index",state<state_enum::quaternion>::i_dx0);
std::printf("%-32s %02i\n","quaternion final error index",state<state_enum::quaternion>::i_dx1);
std::printf("%-32s %02i\n","position first error index",state<state_enum::position>::i_dx0);
std::printf("%-32s %02i\n","position final error index",state<state_enum::position>::i_dx1);
std::printf("%-32s %02i\n","full error vector dimensionality",state<state_enum::last>::m_dx_cummulative);
std::cout << "\n\n";
return 0;
}

vector<double>::size_type versus alternatives

My background is mostly in R, SAS, and VBA, and I'm trying to learn some C++. I've picked "Accelerated C++" (Koenig, Moo) as one of my first books on the subject. My theoretical background in comp. sci. is admittedly not the most robust, which perhaps explains why I'm confused by points like these.
I have a question about a piece of code similar to the following:
#include <iostream>
#include <vector>
int main() {
double input;
std::vector<double> example_vector;
while (std::cin >> input) {
example_vector.push_back(input);
}
std::vector<double>::size_type vector_size;
vector_size = example_vector.size();
return 0;
}
As I understand it, vector_size is "large enough" to hold the size of example_vector, no matter how large example_vector might be. I'm not sure I understand what this means: is vector_size (in this case) capable of representing an integer larger than, say, long long x;, so that std::cout << vector_size; would print a value that's different from std::cout << x;? How/why?
What this question boils down to is that the standard does not mandate what actual type is returned by the vector<T>::size() method. Different implementations may make different choices.
So, if you wish to assign the value returned by a call to size() to a variable, what type should you use for that variable? In order to write code that is portable across different implementations, you need a way to name that type, recognising the fact that different implementations of the standard library may use different types.
The answer is that vector<T> provides the type that you should use. It is
vector<T>::size_type
One thing that you do need to understand, and get used to, with C++, is that the standard does need to cater for significant variation between different implementations.
std::vector<T>::size_type is exactly what the standard implies, the type currently used by your implementation to allow storing the size.
Historically architectures have changed, requiring code parsers, retesting and other basic wastes of time. By following the standards, your code will work on any complient platform, now and in the future.
So your code can work on 16bit archetectures and theoretically 256bit architectures at such time as they are supported.
So, while you are most likely only going to see it as size_t if you change platforms, you don't have to worry. And more importantly, whoever is maintaining your code doesn't have to either.
In the vector class (and many others) they have a typedef:
typedef **implementation_defined** size_type;
A "size_type" may or may not use size_t. That said, size_t is always large enough to cover all your memory (32 bits or 64 bit as might be.) Note that it is likely that size_t will always be large enough to cover all your memory (at least as long as you program on your desktop computer), but that is not guaranteed.
Additionally, to know the size of a variable, and thus how much it can hold, you use sizeof.
So this would give you the actual size of the size_type type:
std::cout << sizeof(std::vector<double>::size_type) << std::endl;
which is going to be 4 or 8 depending on your processor and compiler. The maximum size is defined as 2 power that sizeof x 8 (i.e. 2 power 32 or 2 power 64.)

When does it make sense to typedef basic data types?

A company's internal c++ coding standards document states that even for basic data types like int, char, etc. one should define own typedefs like "typedef int Int". This is justified by advantage of portability of the code.
However are there general considerations/ advice about when (in means for which types of projects) does it really make sense?
Thanks in advance..
Typedefing int to Int offers almost no advantage at all (it provides no semantic benefit, and leads to absurdities like typedef long Int on other platforms to remain compatible).
However, typedefing int to e.g. int32_t (along with long to int64_t, etc.) does offer an advantage, because you are now free to choose the data-type with the relevant width in a self-documenting way, and it will be portable (just switch the typedefs on a different platform).
In fact, most compilers offer a stdint.h which contains all of these definitions already.
That depends. The example you cite:
typedef int Int;
is just plain dumb. It's a bit like defining a constant:
const int five = 5;
Just as there is zero chance of the variable five ever becoming a different number, the typedef Int can only possibly refer to the primitive type int.
OTOH, a typedef like this:
typedef unsigned char byte;
makes life easier on the fingers (though it has no portability benefits), and one like this:
typedef unsigned long long uint64;
Is both easier to type and more portable, since, on Windows, you would write this instead (I think):
typedef unsigned __int64 uint64;
Rubbish.
"Portability" is non-sense, because int is always an int. If they think they want something like an integer type that's 32-bits, then the typedef should be typedef int int32_t;, because then you are naming a real invariant, and can actually ensure that this invariant holds, via the preprocessor etc.
But this is, of course, a waste of time, because you can use <cstdint>, either in C++0x, or by extensions, or use Boost's implementation of it anyway.
Typedefs can help describing the semantics of the data type. For instance, if you typedef float distance_t;, you're letting the developer in on how the values of distance_t will be interpreted. For instance you might be saying that the values may never be negative. What is -1.23 kilometers? In this scenario, it might just not make sense with negative distances.
Of course, typedefs does not in any way constraint the domain of the values. It is just a way to make code (should at least) readable, and to convey extra information.
The portability issues your work place seem to mention would be when you want ensure that a particular datatype is always the same size, no matter what compiler is used. For instance
#ifdef TURBO_C_COMPILER
typedef long int32;
#elsif MSVC_32_BIT_COMPILER
typedef int int32;
#elsif
...
#endif
typedef int Int is a dreadful idea... people will wonder if they're looking at C++, it's hard to type, visually distracting, and the only vaguely imaginable rationalisation for it is flawed, but let's put it out there explicitly so we can knock it down:
if one day say a 32-bit app is being ported to 64-bit, and there's lots of stupid code that only works for 32-bit ints, then at least the typedef can be changed to keep Int at 32 bits.
Critique: if the system is littered which code that's so badly written (i.e. not using an explicitly 32-bit type from cstdint), it's overwhelmingly likely to have other parts of the code where it will now need to be using 64-bit ints that will get stuck at 32-bit via the typedef. Code that interacts with library/system APIs using ints are likely to be given Ints, resulting in truncated handles that work until they happen to be outside the 32-bit range etc.. The code will need a complete reexamination before being trustworthy anyway. Having this justification floating around in people's minds can only discourage them from using explicitly-sized types where they are actually useful ("what are you doing that for?" "portability?" "but Int's for portability, just use that").
That said, the coding rules might be meant to encourage typedefs for things that are logically distinct types, such as temperatures, prices, speeds, distances etc.. In that case, typedefs can be vaguely useful in that they allow an easy way to recompile the program to say upgrade from float precision to double, downgrade from a real type to an integral one, or substitute a user-defined type with some special behaviours. It's quite handy for containers too, so that there's less work and less client impact if the container is changed, although such changes are usually a little painful anyway: the container APIs are designed to be a bit incompatible so that the important parts must be reexamined rather than compiling but not working or silently performing dramatically worse than before.
It's essential to remember though that a typedef is only an "alias" to the actual underlying type, and doesn't actually create a new distinct type, so people can pass any value of that same type without getting any kind of compiler warning about type mismatches. This can be worked around with a template such as:
template <typename T, int N>
struct Distinct
{
Distinct(const T& t) : t_(t) { }
operator T&() { return t_; }
operator const T&() const { return t_; }
T t_;
};
typedef Distinct<float, 42> Speed;
But, it's a pain to make the values of N unique... you can perhaps have a central enum listing the distinct values, or use __LINE__ if you're dealing with one translation unit and no multiple typedefs on a line, or take a const char* from __FILE__ as well, but there's no particularly elegant solution I'm aware of.
(One classic article from 10 or 15 years ago demonstrated how you could create templates for types that knew of several orthogonal units, keeping counters of the current "power" in each, and adjusting the type as multiplications, divisions etc were performed. For example, you could declare something like Meters m; Time t; Acceleration a = m / t / t; and have it check all the units were sensible at compile time.)
Is this a good idea anyway? Most people clearly consider it overkill, as almost nobody ever does it. Still, it can be useful and I have used it on several occasions where it was easy and/or particularly dangerous if values were accidentally misassigned.
I suppose, that the main reason is portability of your code. For example, once you assume to use 32 bit integer type in the program, you need to be shure that the other's platform int is also 32 bits long. Typedef in header helps you to localize the changes of your code in one place.
I would like to put out that it could also be used for people who speak a different language. Say for instance, if you speak spanish and your code is all in spanish wouldn't you want a type definition in spanish. Just something to consider.