Impact of extra unused code from overloaded or templated function? - c++

I am writing an interface class for point cloud registration using the PCL library. This means that I need to use its classes which are for the most part templated. However, I will not know the type of data the user wants to use before run-time. I'm ok with having to store a couple possibly null pointers to data and objects that I might not need to use because they are too few to have a meaningful impact on memory usage and there will only be one object of my class.
However, I will also have to duplicate some of my code one way or another, because it is going to use the underlying templated code of PCL. For example I might need the following
template<typename PointT>
process_cloud(pcl::PointCloud<PointT> &input_cloud);
I'm going to need 3-4 instantiations of this function (and a couple others) to be able to handle types unkown until run-time. However, I'm going to end up only using one of them. If these functions are non-trivial in size, what sort of impact can I expect on performance?
If it is non-negligible, how can I aleviate it? I tried to figure out ways that don't need duplicate code but I can't find a way to handle templated code polymorphically without writing templated code of my own.
If I have to make do with this design, is there any way to optimize the memory layout as to minimize the performance hit of cache misses? For example can I guarantee that my universally-needed functions will be close together and not watered-down by the potentially never called instantiations?
I thought about templating the whole class. This will make code more local because each isntantiation will group together the functions that will be called in tandem (same data type). It will also introduce more code bloat by creating copies of code that didn't need to be templated. To avoid this extra bloat, the best I can come up with is conceptually this:
template<typename PointT>
class Processor {
public:
process_cloud(pcl::PointCloud<PointT> &input_cloud);
...
}
class Interface {
public:
// ...
// bunch of common functions
// ...
// Instantiations I'm going to need. Pointers to save space.
// Could also be std::optional if pointers turn out to be unneeded
std::unique_ptr<Processor<pcl::PointXYZ>> p1;
...
}
This should produce a memory layout where the common functions are grouped together because they are defined in Interface. Every point type also has the functions used on it also grouped together because they are defined in separate classes. It's a little less readable, though. Any cleaner ways to help the compiler understand that template instantiations with the same argument are going to be used in tandem and should be local? Will it maybe realize and do it automatically?

You can experiment with it by putting an explicit instantiation in separate compilation units vs putting them all in the same compilation unit.
My guess is the difference won't be large as the only difference would be in ITLB misses.
Fewer in the separate case due to the code being more local.
You will get slightly more total code but if you only use one instance the rest should not influence the runtime as it never pollute the caches and might be swapped out at some time.
If the compiler and linker doesn't decide that they want to sort the functions after the 3rd letter in their mangled names or for some slightly better reason.

Related

Is it better to use pointers for member class/object variables?

When I am trying to keep a clean header file with regard to #includes, I find that I can do a better job by making all my members pointers to classes/objects that require inclusion of other header files because this way I can use forward declarations instead of #includes.
But I am starting to wonder if this is a good rule-of-thumb or not.
So for example take the following class - option 1:
class QAudioDeviceInfo;
class QAudioInput;
class QIODevice;
class QAudioRx
{
public:
QAudioRx();
private:
QAudioDeviceInfo *mp_audioDeviceInfo;
QAudioInput *mp_audioInput;
QIODevice *mp_inputDevice;
};
However this can be re-written like this - option 2:
#include "QAudioDeviceInfo.h"
#include "QAudioInput.h"
#include "QIODevice.h"
class QAudioRx
{
public:
QAudioRx();
private:
QAudioDeviceInfo m_audioDeviceInfo;
QAudioInput m_audioInput;
QIODevice m_inputDevice;
};
Option 1, is generally my preference because its a nicer cleaner header to include. However option 2 does not require dynamic allocation / de-allocation of resources.
Maybe this could be seen as a subjective/opinion-based question (and I know the opinion-police will be all over this), so lets avoid "I like" and "I prefer" and stick to facts/key-points.
So my question is, in general, what is the best practice (if there is one) and why? Or if there is not one best practice (normally the case) when is it good to use each one?
edit
Sorry, this was really meant for private variables - not public ones.
Use pointers if you need pointers. Pointers are not for avoiding includes. Includes are not evil. When you need to use them, use them. Simple rules equal happy programming!
Using pointers in your case will bring a lot of unnecessary overhead. You may have to allocate your object dynamically which is not a good choice for your case. You have also to deal with memory management manually. Simply, there is no reason to use pointers here unless your case demand that (for example you want the object to live more than the pointer itself (non-ownership pattern for example).
There is a trade-off between compilation time and compile-time dependencies (which usually correlates with the number of include files needed) and simplicity and efficiency.
Option 1 will compile faster, but almost certainly run slower. Every time you access a member variable there is an indirection through a pointer, and the objects that the pointers refer to must be allocated and managed separately. If the allocation is dynamic then that has an extra cost, and the additional logic to manage the separate objects risks introducing bugs. For a start you have raw pointers, so you must define (or delete) a copy constructor, copy assignment, and probably destructor, and maybe move constructor and move assignment (your example is missing all of those, and so is probably horribly broken).
Option 2 requires some additional includes, and so everything that uses QAudioRx needs to be recompiled when any of those headers change. That tightly couples QAudioRx to those other types. In a large project that can be significant (but in a small one it probably doesn't matter - if the time to build your whole project is measured in seconds then it's certainly not worth it!)
However I would say your option 1 is almost never a good idea. If you care about reducing dependencies and build times then use the pImpl idiom (see also pimpl-idiom for related questions) instead:
class QAudioRx
{
public:
QAudioRx();
private:
class QAudioRxImpl;
std::unique_ptr<QAudioRxImpl> m_impl;
};
(N.B. My use of std::unique_ptr<QAudioRx> assumes the impl object will be dynamically allocated, because that's the most common approach, the solution can be adjusted if the object is managed differently).
Now all the member variables are part of the "impl" object, which is not defined in the header. This means that access to the members from within the impl class is direct, not through pointers, but access from the QAudioRx must perform one indirection to call a function on the impl class.
This still reduces dependencies, but makes it simpler to manage the extra complexity because there are no raw pointers so the special member functions will do something sensible automatically. By default it will be movable, but not copyable, and will clean up in the destructor.
When you are working on small project where number of class are little then you should go with option 2.
But when you are working with very large project in which number of classes are very large and compilation time of project is one of criteria for developing architecture of your project then go with option 1.
In most of cases option 1 is just fine. In real world application you need to combine both of this option choose trade of between compilation time and pretty look of your code(without pointers and nacked new(dynamic allocation) for objects.)

Class methods VS Class static functions VS Simple functions - Performance-wise?

OK, here's what I want :
I have written several REALLY demanding functions (mostly operating on bitmaps, etc) which have to be as fast as possible
Now, let's also mention that these functions may also be grouped by type, or even by the type of variable on which they operate.
And the thing is, apart from the very implementation of the algorithms, what I should do - from a technical point of view - in order not to mess up the speed.
And now, I'm considering the following scenarios :
Create them as simple functions and just pass the necessary parameters as arguments
Create a class (for 'grouping'/organisation purposes) and just declare them as static
Create class by type, e.g. Create a class for working on bitmaps, create a new instance of that Class for every bitmap (e.g. Bitmap* myBitmap = newBitmap(1010);, and operate on it with its inner methods (e.g. myBitmap->getFirstBitSet())
Now, which of these approaches is the fastest? Is there really any difference between straight simple functions and Class-encapsulated static functions, performance-wise? Any other scenario that would be preferable, which I haven't mentioned?
Sidenote : I'm using the clang++ compiler, for Mac OS X 10.6.8. (if that makes any difference)
At CPU level, there is only one kind of function, and it very much ressemble the C kind. You could craft your own, but...
As it turns out, C++ being built with efficiency in mind maps most functions directly to call instructions:
a namespace level function is like a regular C function
a static method is like a namespace level function (from a call point of view)
a non-static method is very similar to a static method, except an implicit this parameter is passed on top of the other parameters (one pointer)
All those 3 have the exact same kind of performance.
On the other hand, virtual methods have a slight overhead. There was a C++ technical report on performance which estimated the overhead compared to a non-virtual method between 10% and 15% (from memory) for empty functions. Meaning that for any function with meat inside (ie, doing real work), the overhead itself is close to getting lost in the noise. The real cost comes from the inhibition of inlining unless the virtual call can be deduced at compile-time.
There is absolutely no difference between classic old C functions and static methods of classes. The difference is only aesthetic. If you have multiple C functions that have certain relation between them, you can:
group them into a class;
place them into a namespace;
The difference will again be aesthetic. Most likely this will improve readability.
In case if these C functions share some static data, it would make sense (if possible) to define this data as private static data members of a class. In this case variant with the class would be preferable over the variant with namespace.
I would discourage you from creating a dummy instance. This will be misleading to the reader of the source code.
Creating an instance for every bitmap is possible and can even be favorable. Especially if you call methods on this instance several times in a typical scenario.

Single-use class

In a project I am working on, we have several "disposable" classes. What I mean by disposable is that they are a class where you call some methods to set up the info, and you call what equates to a doit function. You doit once and throw them away. If you want to doit again, you have to create another instance of the class. The reason they're not reduced to single functions is that they must store state for after they doit for the user to get information about what happened and it seems to be not very clean to return a bunch of things through reference parameters. It's not a singleton but not a normal class either.
Is this a bad way to do things? Is there a better design pattern for this sort of thing? Or should I just give in and make the user pass in a boatload of reference parameters to return a bunch of things through?
What you describe is not a class (state + methods to alter it), but an algorithm (map input data to output data):
result_t do_it(parameters_t);
Why do you think you need a class for that?
Sounds like your class is basically a parameter block in a thin disguise.
There's nothing wrong with that IMO, and it's certainly better than a function with so many parameters it's hard to keep track of which is which.
It can also be a good idea when there's a lot of input parameters - several setup methods can set up a few of those at a time, so that the names of the setup functions give more clue as to which parameter is which. Also, you can cover different ways of setting up the same parameters using alternative setter functions - either overloads or with different names. You might even use a simple state-machine or flag system to ensure the correct setups are done.
However, it should really be possible to recycle your instances without having to delete and recreate. A "reset" method, perhaps.
As Konrad suggests, this is perhaps misleading. The reset method shouldn't be seen as a replacement for the constructor - it's the constructors job to put the object into a self-consistent initialised state, not the reset methods. Object should be self-consistent at all times.
Unless there's a reason for making cumulative-running-total-style do-it calls, the caller should never have to call reset explicitly - it should be built into the do-it call as the first step.
I still decided, on reflection, to strike that out - not so much because of Jalfs comment, but because of the hairs I had to split to argue the point ;-) - Basically, I figure I almost always have a reset method for this style of class, partly because my "tools" usually have multiple related kinds of "do it" (e.g. "insert", "search" and "delete" for a tree tool), and shared mode. The mode is just some input fields, in parameter block terms, but that doesn't mean I want to keep re-initializing. But just because this pattern happens a lot for me, doesn't mean it should be a point of principle.
I even have a name for these things (not limited to the single-operation case) - "tool" classes. A "tree_searching_tool" will be a class that searches (but doesn't contain) a tree, for example, though in practice I'd have a "tree_tool" that implements several tree-related operations.
Basically, even parameter blocks in C should ideally provide a kind of abstraction that gives it some order beyond being just a bunch of parameters. "Tool" is a (vague) abstraction. Classes are a major means of handling abstraction in C++.
I have used a similar design and wondered about this too. A fictive simplified example could look like this:
FileDownloader downloader(url);
downloader.download();
downloader.result(); // get the path to the downloaded file
To make it reusable I store it in a boost::scoped_ptr:
boost::scoped_ptr<FileDownloader> downloader;
// Download first file
downloader.reset(new FileDownloader(url1));
downloader->download();
// Download second file
downloader.reset(new FileDownloader(url2));
downloader->download();
To answer your question: I think it's ok. I have not found any problems with this design.
As far as I can tell you are describing a class that represents an algorithm. You configure the algorithm, then you run the algorithm and then you get the result of the algorithm. I see nothing wrong with putting those steps together in a class if the alternative is a function that takes 7 configuration parameters and 5 output references.
This structuring of code also has the advantage that you can split your algorithm into several steps and put them in separate private member functions. You can do that without a class too, but that can lead to the sub-functions having many parameters if the algorithm has a lot of state. In a class you can conveniently represent that state through member variables.
One thing you might want to look out for is that structuring your code like this could easily tempt you to use inheritance to share code among similar algorithms. If algorithm A defines a private helper function that algorithm B needs, it's easy to make that member function protected and then access that helper function by having class B derive from class A. It could also feel natural to define a third class C that contains the common code and then have A and B derive from C. As a rule of thumb, inheritance used only to share code in non-virtual methods is not the best way - it's inflexible, you end up having to take on the data members of the super class and you break the encapsulation of the super class. As a rule of thumb for that situation, prefer factoring the common code out of both classes without using inheritance. You can factor that code into a non-member function or you might factor it into a utility class that you then use without deriving from it.
YMMV - what is best depends on the specific situation. Factoring code into a common super class is the basis for the template method pattern, so when using virtual methods inheritance might be what you want.
Nothing especially wrong with the concept. You should try to set it up so that the objects in question can generally be auto-allocated vs having to be newed -- significant performance savings in most cases. And you probably shouldn't use the technique for highly performance-sensitive code unless you know your compiler generates it efficiently.
I disagree that the class you're describing "is not a normal class". It has state and it has behavior. You've pointed out that it has a relatively short lifespan, but that doesn't make it any less of a class.
Short-lived classes vs. functions with out-params:
I agree that your short-lived classes are probably a little more intuitive and easier to maintain than a function which takes many out-params (or 1 complex out-param). However, I suspect a function will perform slightly better, because you won't be taking the time to instantiate a new short-lived object. If it's a simple class, that performance difference is probably negligible. However, if you're talking about an extremely performance-intensive environment, it might be a consideration for you.
Short-lived classes: creating new vs. re-using instances:
There's plenty of examples where instances of classes are re-used: thread-pools, DB-connection pools (probably darn near any software construct ending in 'pool' :). In my experience, they seem to be used when instantiating the object is an expensive operation. Your small, short-lived classes don't sound like they're expensive to instantiate, so I wouldn't bother trying to re-use them. You may find that whatever pooling mechanism you implement, actually costs MORE (performance-wise) than simply instantiating new objects whenever needed.

How to deal with the idea of "many small functions" for classes, without passing lots of parameters?

Over time I have come to appreciate the mindset of many small functions ,and I really do like it a lot, but I'm having a hard time losing my shyness to apply it to classes, especially ones with more than a handful of nonpublic member variables.
Every additional helper function clutters up the interface, since often the code is class specific and I can't just use some generic piece of code.
(To my limited knowledge, anyway, still a beginner, don't know every library out there, etc.)
So in extreme cases, I usually create a helper class which becomes the friend of the class that needs to be operated on, so it has access to all the nonpublic guts.
An alternative are free functions that need parameters, but even though premature optimization is evil, and I haven't actually profiled or disassembled it...
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference, even though that should be a simple address per argument.
Is all this a matter of preference, or is there a widely used way of dealing with that kind of stuff?
I know that trying to force stuff into patterns is a kind of anti pattern, but I am concerned about code sharing and standards, and I want to get stuff at least fairly non painful for other people to read.
So, how do you guys deal with that?
Edit:
Some examples that motivated me to ask this question:
About the free functions:
DeadMG was confused about making free functions work...without arguments.
My issue with those functions is that unlike member functions, free functions only know about data, if you give it to them, unless global variables and the like are used.
Sometimes, however, I have a huge, complicated procedure I want to break down for readability and understandings sake, but there are so many different variables which get used all over the place that passing all the data to free functions, which are agnostic to every bit of member data, looks simply nightmarish.
Click for an example
That is a snippet of a function that converts data into a format that my mesh class accepts.
It would take all of those parameter to refactor this into a "finalizeMesh" function, for example.
At this point it's a part of a huge computer mesh data function, and bits of dimension info and sizes and scaling info is used all over the place, interwoven.
That's what I mean with "free functions need too many parameters sometimes".
I think it shows bad style, and not necessarily a symptom of being irrational per se, I hope :P.
I'll try to clear things up more along the way, if necessary.
Every additional helper function clutters up the interface
A private helper function doesn't.
I usually create a helper class which becomes the friend of the class that needs to be operated on
Don't do this unless it's absolutely unavoidable. You might want to break up your class's data into smaller nested classes (or plain old structs), then pass those around between methods.
I still DREAD the mere thought of passing all the stuff I need sometimes, even just as reference
That's not premature optimization, that's a perfectly acceptable way of preventing/reducing cognitive load. You don't want functions taking more than three parameters. If there are more then three, consider packaging your data in a struct or class.
I sometimes have the same problems as you have described: increasingly large classes that need too many helper functions to be accessed in a civilized manner.
When this occurs I try to seperate the class in multiple smaller classes if that is possible and convenient.
Scott Meyers states in Effective C++ that friend classes or functions is mostly not the best option, since the client code might do anything with the object.
Maybe you can try nested classes, that deal with the internals of your object. Another option are helper functions that use the public interface of your class and put the into a namespace related to your class.
Another way to keep your classes free of cruft is to use the pimpl idiom. Hide your private implementation behind a pointer to a class that actually implements whatever it is that you're doing, and then expose a limited subset of features to whoever is the consumer of your class.
// Your public API in foo.h (note: only foo.cpp should #include foo_impl.h)
class Foo {
public:
bool func(int i) { return impl_->func(i); }
private:
FooImpl* impl_;
};
There are many ways to implement this. The Boost pimpl template in the Vault is pretty good. Using smart pointers is another useful way of handling this, too.
http://www.boost.org/doc/libs/1_46_1/libs/smart_ptr/sp_techniques.html#pimpl
An alternative are free functions that
need parameters, but even though
premature optimization is evil, and I
haven't actually profiled or
disassembled it... I still DREAD the
mere thought of passing all the stuff
I need sometimes, even just as
reference, even though that should be
a simple address per argument.
So, let me get this entirely straight. You haven't profiled or disassembled. But somehow, you intend on ... making functions work ... without arguments? How, exactly, do you propose to program without using function arguments? Member functions are no more or less efficient than free functions.
More importantly, you come up with lots of logical reasons why you know you're wrong. I think the problem here is in your head, which possibly stems from you being completely irrational, and nothing that any answer from any of us can help you with.
Generic algorithms that take parameters are the basis of modern object orientated programming- that's the entire point of both templates and inheritance.

How best to switch from template mess to clean classes architecture (C++)?

Assuming a largish template library with around 100 files containing around 100 templates with overall more than 200,000 lines of code. Some of the templates use multiple inheritance to make the usage of the library itself rather simple (i.e. inherit from some base templates and only having to implement certain business rules).
All that exists (grown over several years), "works" and is used for projects.
However, compilation of projects using that library consumes a growing amount of time and it takes quite some time to locate the source for certain bugs. Fixing often causes unexpected side effects or is quite difficult, because some interdependent templates need changing. Testing is nearly impossible due to the sheer amount of functions.
Now, I would really like to simplify the architecture to use less templates and more specialized smaller classes.
Is there any proven way to go about that task? What would be a good place to start?
I'm not sure I see how/why templates are the problem, and why plain non-templated classes would be an improvement. Wouldn't that just mean even more classes, less type safety and so larger potential for bugs?
I can understand simplifying the architecture, refactoring and removing dependencies between the various classes and templates, but automatically assuming that "fewer templates will make the architecture better" is flawed imo.
I'd say that templates potentially allow you to build a much cleaner architecture than you'd get without them. Simply because you can make separate classes totally independent. Without templates, classes functions which call into another class must know about the class, or an interface it inherits, in advance. With templates, this coupling isn't necessary.
Removing templates would only lead to more dependencies, not fewer.
The added type-safety of templates can be used to detect a lot of bugs at compile-time (Sprinkle your code liberally with static_assert's for this purpose)
Of course, the added compile-time may be a valid reason to avoid templates in some cases, and if you only have a bunch of Java programmers, who are used to thinking in "traditional" OOP terms, templates might confuse them, which can be another valid reason to avoid templates.
But from an architecture point of view, I think avoiding templates is a step in the wrong direction.
Refactor the application, sure, it sounds like that's needed. But don't throw away one of the most useful tools for producing extensible and robust code just because the original version of the app misused it. Especially if you're already concerned with the amount of code, removing templates will most likely lead to more lines of code.
You need automated tests, that way in ten years time when your succesor has the same problem he can refactor the code (probably to add more templates because he thinks it will simplify usage of the library) and know it still meets all test cases. Similarly the side effects of any minor bug fixes will be immediately visible (assuming your test cases are good).
Other than that, "divide and conqueor"
Write unit tests.
Where the new code must do the same as the old code.
That's one tip at least.
Edit:
If you deprecate old code that you have replaced with the new functionality you
can phase over to the new code little by little.
Well, the problem is that template way of thinking is very different from object-oriented inheritance-based way. It's hard to answer anything else than "redesign the whole thing and start from scratch".
Of course, there may be a simple way for a particular case. We can't tell without knowing more about what you have.
The fact that the template solution is so difficult to maintain is an indication of a poor design anyway.
Some points (but note: these are not evil indeed. If you want to change to non-template code, though, this can help out):
Lookup your static interfaces. Where do templates depend on what functions exist? Where do they need typedefs?
Put the common parts in an abstract base class. A good example is when you happen to stumble over the CRTP idiom. You can just replace it with an abstract base class having virtual functions.
Lookup integer lists. If you find your code uses integral lists like list<1, 3, 3, 1, 3>, you can replace them with std::vector, if all the codes using them can live with working with runtime values instead of constant expressions.
Lookup type traits. There is much code involved checking whether some typedef exists, or whether some method exists in typical templated code. Abstract baseclasses solve these two issues by using pure virtual methods, and by inheriting typedefs to the base. Often, typedefs are only needed to trigger hideous features like SFINAE, which would then be superfluous too.
Lookup expression templates. If your code uses expression templates to avoid creating temporaries, you will have to eliminate them and use the traditional way of returning / passing temporaries to the operators involved.
Lookup function objects. If you find your code uses function objects, you can change them to use abstract base classes too, and have something like void run(); to call them (or if you want to keep using operator(), better so! It can be virtual too).
As I understand, you are most concerned with build times, and the maintainability of your library?
First, don't try to "fix" all at once.
Second, understand what you fix. Template complexity is there often for a reason, e.g. to enforce certain use, and make the compiler help you not make a mistake. That reason might sometimes be taken to far, but throwing out 100 lines because "noone really knows what they do" shouldn't be taken lightly. Everything I suggest here can introduce really nasty bugs, you have been warned.
Third, consider cheaper fixes first: e.g. faster machines or distributed build tools. At least, throw in all the RAM the boards will take, and throw out old disks. It does maike a difference. One drive for OS, one drive for build is a cheap mans RAID.
Is the library well documented? That's your best chance at making it Look into tools such as doxygen that help you create such a documentation.
All considered? OK, now some suggestions for the build times ;)
Understand the C++ build model: every .cpp is compiled individually. That means many .cpp files with many headers = huge build. This is NOT an advise to put everything into one .cpp file, though! However, one trick (!) that can speed up a build immensely is to create a single .cpp file that includes a bunch of .cpp files, and only feed that "master" file to the compiler. You can't do that blindly, though - you need to understand the types of errors this could introduce.
If you don't have one yet, get a separate build machine that you can remote into. You'll have to do a lot of almost-full builds to check if you broke some include. You will want to run this in another machine, that doesn't block you from working on something else. Long term, you'll need it for daily integration builds anyway ;)
Use precompiled headers. (scales better with fast machines, see above)
Check your header inclusion policy. While every file should be "independent" (i.e. include everything it needs to be included by someone else), don't include liberally. Unfortunately, I haven't yet found a tool to find unnecessary #incldue statements, but it might help to spend some time removing unused headers in "hotspot" files.
Create and use forward declarations for the templates you use. Often, you can incldue a header with forwad declarations in many places, and use the full header only in a few specific ones. This can greatly help compile time. Check the <iosfwd> header how the standard library does that for i/o streams.
overloads for templates for few types: If you have a complex function template that is useful only for a very few types like this:
// .h
template <typename FLOAT> // float or double only
FLOAT CalcIt(int len, FLOAT * values) { ... }
You can declare the overloads in the header, and move the template to the body:
// .h
float CalcIt(int len, float * values);
double CalcIt(int len, double * values);
// .cpp
template <typename FLOAT> // float or double only
FLOAT CalcItT(int len, FLOAT * values) { ... }
float CalcIt(int len, float * values) { return CalcItT(len, values); }
double CalcIt(int len, double * values) { return CalcItT(len, values); }
this moves the lengthy template to a single compilation unit.
Unfortunately, this is only of limited use for classes.
Check if the PIMPL idiom can move code from the headers into .cpp files.
The general rule that hides behind that is separate the interface of your library from the implementation. Use comments, detail namesapces and separate .impl.h headers to mentally and physically isolate what should be known to the outside from how it is accomplished. This exposes the real value of your library (does it actually encapsulate complexity?), and gives you a chance to replace "easy targets" first.
More specific advise - and how useful the one given is - depends largely on the actual library.
Good luck!
As mentioned, unit tests are a good idea. Indeed, rather than breaking your code by introducing "simple" changes that are likely to ripple out, just focus on creating a suite of tests, and fixing non-compliance with the tests. Have an activity to update the tests when bugs come to light.
Beyond that, I would suggest upgrading your tools, if possible, to help with debugging template-related problems.
I've often come across legacy templates that were huge and required a lot of time and memory to instantiate, but didn't need to be. In those cases, the easiest way to cut out the fat was to take all of the code that didn't rely on any of the template arguments and hide it in separate functions defined in a normal translation unit. This also had the positive side-effect of triggering fewer recompiles when this code had to be slightly modified or documentation changed. It sounds rather obvious, but it's really surprising how often people write a class template and think that EVERYTHING it does has to be defined in the header, rather than just the code that needs the templated information.
Another thing you might want to consider is how often you clean up the inheritance hierarchies by making the templates "mixin" style instead of aggregations of multiple inheritance. See how many places you can get away with making one of the template arguments the name of the base class that it should derive from (the way boost::enable_shared_from_this works). Of course this typically only works well if the constructors take no arguments, as you don't have to worry about initializing anything correctly.