dependencies between source and error runtime component - c++

I have an annoying dependency problem between components, and i would like to hear several ways to resolve it.
Basically i have 3 components that depend almost acyclically from each other, except for a small dependency between the first and the last component. Concretely, this is a JIT compiler but hopefully it is a widely occuring type of abstract dependency which may happen in other circumstances.
The components are basically in sequence of flow dependency; source/AST generation, code generation and runtime. As it is clear from the diagram, errors generated at runtime should be able to communicate Ids that can be correlated to source location items. The tricky part is that this Id is not necessarily an integer type (although it can be). Until now, SourceItemID was a type internal to the Source component, but now it seems it needs to be defined outside of it.
What would be optimal patterns to use here? I was thinking in maybe templatizing the runtime error type with the desired Source location id.

The simplest solution is to define all the types and common behavior that is used by your modules in an independent unit (possibly a single header), that all the real processing units use.
For minimum overhead/headaches and compatibility issues (these shared types could be useful elsewhere at some point for communication with other apps/plugins/whatever), try to keep those types POD if you can.
"Templatizing" things is not trivial. It is very powerful and expressive, but if you're looking at removing dependencies, my opinion is: try to see if you can make things simpler first.

Related

How to organize multiple data types in a C++ project

In our project (C++14) we have divided our software into several components by a functional breakdown of the system. Each of the modules reside in their own namespace embedded in a common namespace for the system. We use CMake as our build system and each component is a static library that can be built separately and is linked together at the end.
Now, in many components specific data types are defined, as class or struct e.g. for time, a collection of data fields to be processed together and so on. These data structures are defined locally at the component where the data they contain is created.
But. When I now have to access one of these data structures from other components I have to include the header from the specific component and have a dependency between these two components. And as this is a common approach, we have many dependencies between our software components easily leading to cyclic dependencies. :(
In the C world I would have created a GlobalDataStructures.h and added all the data structures that are used throughout the software system.
What is the (modern) C++ approach to this?
What are best practices?
The idea behind the “C style” approach is basically sound.
As a general rule you want to define your data structures as far away from the root of the dependency tree as possible. If a cycle happens some of the data structures must move up at least one level towards the root to break the cycle.
At some point you end up at the dependency root. For a project with different components that means introducing a root component. It’s a component – in your case a static lib – like any other and holds data structures and functionality necessary and/or useful for the whole system. The tricky part is to not let that component become the cupboard under the kitchen sink – after all now you have a convenient place to put things without having to think about where they really belong. But that’s a people problem, not a technical one.
Usually I call the CMake target for the root component library projectname_core. I used to put all its contents into the projectname::core namespace. But it turned out that everybody in the team just wrote using namespace projectname::core; everywhere. Apparently the extra namespace doesn’t add any useful information, and putting the system-wide stuff into the projectname namespace works just as well. That’s what I do these days.
This is my area of expertise. I am the creator of POWER, the software architecture which models manufacturing. You can read my post on DTOs here, since I've gone beyond mere hardcoded ones. What is Data Transfer Object?
Now to answer your question...notice in distributed (OOP) architecture, you need to either attach the whole namespace to use the DTO, duplicate the DTO into each consuming namespace or centralize the DTO into one shared namespace. In POWER, however, a process in one namespace can operate on an object instance it doesn't even know about, because the process is bound to the DTO's reference properties rather than the DTO (and its type) itself. POWER is a zero coupling architecture, because manufacturing is too.
Technically, processes should never share DTOs even if the DTOs are highly similar or even structurally equivalent. Sharing modules or even DTOs creates centralization complexity, an organization flaw I authoritatively describe here using Harvard Business Review as a source: http://www.powersemantics.com/p.html

C++ compile-time / runtime options and parameters, how to handle?

What is the proper way to handle compile-time and runtime options in a generic library? What is good practice for large software in which there are simply too many options for the user to bother about most of them?
Suppose the task is to write a large library to perform calculations on a number of datasets at the same time. There are numerous ways to perform these calculations, and the library must be be highly configurable. Typically, there are options relative to how the calculation is performed as a whole. Then, each dataset has its own set of calculation options. Finally, each calculation has a number of tuning parameters, which must be set as well.
The library itself is generic, but each application which uses that library will use a particular kind of dataset, for which tuning parameters will take on a certain value. Since they will not change throughout the life of the application, I make them known at application compile-time. The way I would implement these tuning parameters in the library is through a Traits class, which contains the tuning parameters as static const elements. Calibration of their final value is part of the development of the application.
The datasets will of course change depending on what the user feeds to the application, and therefore a number of runtime options must be provided as well (with intelligent defaults). Calibration of their default value is also part of the development of the application. I would implement these options as a Config class which contains these options, and can be changed on application startup (e.g. parsing a config text file). It gets passed to the constructor of a lot of the classes in the library. Each class then calls the Config::get_x for their specific option x.
The thing I don't really like about this design, is that both Traits and Config classes break encapsulation. Some options relate to some parts of the library. Most of the time, however, they don't. And having them suddenly next to each other annoys me, because they affect separate things in the code, which are often in different abstraction layers.
One solution I was thinking about, is using multiple public inheritance for these different parts. A class which needs to know an option then casts the Config object or calls the relevant Trait parent to access it. Also, this passing along of Config to every class that needs it (or whose members need it) is very inelegant. Maybe Config should be a singleton?
You could have your parameters in a single struct named Config (to keep your words) and make it a singleton.
Encapsulation is important to preserve classes consistency, because a class is responsible of itself. But in your case where the Config class must be accessible to everyone, it is necessary. Furthermore, adding getters and setters to this type of class will only add overhead (in the best case you compiler will probably just inlined it).
Also, if you really want a Traits class to implement compile time parameters, you should probably just have an initialization function (like the constructor of your library).

Elegant way of handling similar code

I have a software project that is working just fine.
Now, this project has to be adjusted to model a new, but related system.
What strategies are there to keep these two codes well organized?
They will have a codebase that is about 90% the same, but there are many functions which need slight adjustments.
I have thought of the following:
Different branches in the git-repository: perfect control of the two projects, but common changes have to be made in each of the branches separately.
Modelling different program modes by C++ pragmas (#ifdef Project1 ...):
this keeps the changes local, but makes the code difficult to read.
I am not too satisfied by these solutions. Is there a better approach?
We have same problem and here is how we solve it:
We have only one branch on our git repo
Beside common files, we have different files according configuration: access_for_config1.cpp, access_for_config2.cpp, ...
We use design pattern like factory to abstract specific part for common part
For small very specific parts on common files, we have a #ifdef section according configuration
We have different rules in our makefile according each configuration: for a configuration, we compile common file + specific file and set correct flag. Also, using eclipse at office, we also define different build configuration, to allow correct highlighting.
Advantage of this approach is to keep common part always synchronous, and we isolate correctly each specific parts.
But, you have to be careful with not so far piece of code from each configuration. For example, with similar (but not same) code in different specific files, possible bug can be corrected in only one configuration. It can be reduce by defining some piece of code as common template or by re-thinking design to put some part in common
Hope it's answer will help you
What strategies are there to keep these two codes well organized? They will have a codebase that is about 90% the same
Its not exactly what you need but just make sure you know about it.
Submodules allow foreign repositories to be embedded within a dedicated subdirectory of the source tree, always pointed at a particular commit.
Different branches in the git-repository: perfect control of the two projects, but common changes have to be made in each of the branches separately.
You can commit changes to one branch and then use cherry-pick to add them to any other branch you want.

Keeping modules independent, while still using each other

A big part of my C++ application uses classes to describe the data model, e.g. something like ClassType (which actually emulates reflection in plain C++).
I want to add a new module to my application and it needs to make use of these ClassType's, but I prefer not to introduce dependencies from my new module on ClassType.
So far I have the following alternatives:
Not making it independent and introduce a dependency on ClassType, with the risk of creating more 'spaghetti'-dependencies in my application (this is my least-preferred solution)
Introduce a new class, e.g. IType, and letting my module only depend on IType. ClassType should then inherit from IType.
Use strings as identification method, and forcing the users of the new module to convert the ClassType to a string or vice versa where needed.
Use GUID's (or even simple integers) as identification, also requiring conversions between GUID's and ClassType's
How far should you try to go when decoupling modules in an application?
just introduce an interface and let all the other modules rely on the interface? (like in IType describe above)
even decouple it further by using other identifications like strings or GUID's?
I afraid that by decoupling it too far, the code becomes more unstable and more difficult to debug. I've seen one such example in Qt: signals and slots are linked using strings and if you make a typing mistake, the functionality doesn't work, but it still compiles.
How far should you keep your modules decoupled?
99% of the time, if your design is based on reflection, then you have major issues with the design.
Generally speaking, something like
if (x is myclass)
elseif (x is anotherclass)
else
is a poor design because it neglects polymorphism. If you're doing this, then the item x is in violation of the Liskov Substitution Principle.
Also, given that C++ already has RTTI, I don't see why you'd reinvent the wheel. That's what typeof and dynamic_cast are for.
I'll steer away from thinkng about your reflection, and just look at the dependency ideas.
Decouple what it's reasonable to decouple. Coupling implies that if one thing changes so must another. So your NewCode is using ClassType, if some aspects of it change then yuou surely must change NewCode - it can't be completely decoupled. Which of the following do you want to decouple from?
Semantics, what ClassType does.
Interface, how you call it.
Implementation, how it's implemented.
To my eyes the first two are reasonable coupling. But surely an implementation change should not require NewCode to change. So code to Interfaces. We try to keep Interfaces fixed, we tend to extend them rather than change them, keeping them back-compatible if at all possible. Sometimes we use name/value pairs to try to make the interface extensible, and then hit the typo kind of errors you allude to. It's a trade-off between flexibility and "type-safety".
It's a philosophical question; it depends on the type of module, and the trade-offs. I think I have personally done all of them at various times, except for the GUID to type mapping, which doesn't have any advantages over the string to type mapping in my opinion, and at least strings are readable.
I would say you need to look at what level of decoupling is required for the particular module, given the expected external usage and code organization, and go from there. You've hit all the conceptual methods as far as I know, and they are each useful in particular situations.
That's my opinion, anyway.

Can automated unit testing replace static type checking?

I've started to look into the whole unit testing/test-driven development idea, and the more I think about it, the more it seems to fill a similar role to static type checking. Both techniques provide a compile-time, rapid-response check for certain kinds of errors in your program. However, correct me if I'm wrong, but it seems that a unit test suite with full coverage would test everything static type checking would test, and then some. Or phrased another way, static type checks only go part of the way to "prove" that your program is correct, whereas unit tests will let you "prove" as much as you want (to a certain extent).
So, is there any reason to use a language with static type checking if you're using unit testing as well? A somewhat similar question was asked here, but I'd like to get into more detail. What specific advantages, if any, does static type checking have over unit tests? A few issues like compiler optimizations and intellisense come to mind, but are there other solutions for those problems? Are there other advantages/disadvantages I haven't thought of?
There is one immutable fact about software quality.
If it can't compile, it can't ship
In this rule, statically typed languages will win over dynamically typed languages.
Ok, yes this rule is not immutable. Web Apps can ship without compiling (I've deployed many test web apps that didn't compile). But what is fundamentally true is
The sooner you catch an error, the cheaper it is to fix
A statically typed language will prevent real errors from happening at one of the earliest possible moments in the software development cycle. A dynamic language will not. Unit Testing, if you are thorough to a super human level can take the place of a statically typed language.
However why bother? There are a lot of incredibly smart people out there writing an entire error checking system for you in the form of a Compiler. If you're concerned about getting errors sooner use a statically typed language.
Please do not take this post as a bashing of dynamic languages. I use dynamic languages daily and love them. They are incredibly expressive and flexible and allow for incredibly fanscinating program.s However in the case of early error reporting they do lose to statically typed languages.
For any reasonably sized project, you just cannot account for all situations with unit tests only.
So my answer is "no", and even if you manage to account for all situations, you've thereby defeated the whole purpose of using a dynamic language in the first place.
If you want to program type-safe, better use a type-safe language.
I would think that automated unit testing will be important to dynamic typed languages, but that doesn't mean it would replace static type checking in the context that you apply. In fact, some of those who use dynamic typing might actually be using it because they do not want the hassles of constant type safety checks.
The advantages dynamically typed languages offer over static typed languages go far beyond testing, and type safety is merely one aspect. Programming styles and design differences over dynamic and static typed languages also vary greatly.
Besides, unit tests that are written too vigorously enforce type safety would mean that the software shouldn't be dynamically typed after all, or the design being applied should be written in a statically typed language, not a dynamic one.
Having 100% code coverage doesn't mean you have fully tested your application. Consider the following code:
if (qty > 3)
{
applyShippingDiscount();
}
else
{
chargeFullAmountForShipping();
}
I can get 100% code coverage if I pump in values of qty = 1 and qty = 4.
Now imagine my business condition was that "...for orders of 3 or more items I am to apply a discount to the shipping costs..". Then I would need to be writing tests that worked on the boundaries. So I would design tests where qty was 2,3 and 4. I still have 100% coverage but more importantly I found a bug in my logic.
And that is the problem that I have with focusing on code coverage alone. I think that at best you end up with a situation where the developer creates some initial tests based on the business rules. Then in order to drive up the coverage number they reference their code when design new test cases.
Manifest typing (which I suppose you mean) is a form of specification, unit testing is much weaker since it only provides examples. The important difference is that a specification declares what has to hold in any case, while a test only covers examples. You can never be sure that your tests cover all boundary conditions.
People also tend to forget the value of declared types as documentation. For example if a Java method returns a List<String>, then I instantly know what I get, no need to read documentation, test cases or even the method code itself. Similarly for parameters: if the type is declared then I know what the method expects.
The value of declaring the type of local variables is much lower since in well-written code the scope of the variable's existence should be small. You can still use static typing, though: instead of declaring the type you let the compiler infer it. Languages like Scala or even C# allow you to do just this.
Some styles of testing get closer to a specification, e.g. QuickCheck or it's Scala variant ScalaCheck generate tests based on specifications, trying to guess the important boundaries.
I would word it a different way--if you don't have a statically-typed language, you had better have very thorough unit tests if you plan on doing anything "real" with that code.
That said, static typing (or rather, explicit typing) has some significant benefits over unit tests that make me prefer it generally. It creates much more understandable APIs and allows for quick viewing of the "skeleton" of an application (i.e. the entry points to each module or section of code) in a way that is much more difficult with a dynamically-typed language.
To sum up: in my opinion, given solid, thorough unit tests, the choice between a dynamically-typed language and a statically-typed language is mostly one of taste. Some people prefer one; others prefer the other. Use the right tool for the job. But this doesn't mean they're identical--statically-typed languages will always have an edge in certain ways, and dynamically-typed languages will always have an edge in certain different ways. Unit tests go a long way towards minimizing the disadvantages of dynamically-typed languages, but they do not eliminate them completely.
No.
But that's not the most important question, the most important question is: does it matter that it can't?
Consider the purpose of static type checking: avoiding a class of code defects (bugs). However, this has to be weighed in the context of the larger domain of all code defects. What matters most is not a comparison along a narrow sliver but a comparison across the depth and breadth of code quality, ease of writing correct code, etc. If you can come up with a development style / process which enables your team to produce higher quality code more efficiently without static type checking, then it's worth it. This is true even in the case where you have holes in your testing that static type checking would catch.
I suppose it could if you are very thorough. But why bother? If the language is already checking to ensure static types are correct, there would be no point in testing them (since you get it for free).
Also, if you are using static typed languages with an IDE, the IDE can provide you with errors and warnings, even before compiling to test. I am not certain there are any automated unit testing applications that can do the same.
Given all the benefits of dynamic, late-binding languages, I suppose that's one of the values offered by Unit Tests. You'll still need to code carefully and intentionally, but that's the #1 requirement for any kind of coding IMHO. Being able to write clear and simple tests helps prove the clarity and simplicity of your design and your implementation. It also provides useful clues for those who see your code later. But I don't think I'd count on it to detect mismatched types. But in practice I don't find that type-checking really catches many real errors anyway. It's just not a type of error I find occurring in real code, if you have a clear and simple coding style in the first place.
For javascript, I would expect that jsLint will find almost all type-checking issues. primarily by suggesting alternate coding styles to decrease your exposure.
Type checking helps enforce contracts between components in a system. Unit testing (as the name implies) verifies the internal logic of components.
For a single unit of code, I think unit testing really can make static type checking unnecessary. But in a complex system, automated tests cannot verify all the multitude ways that different components of the system might interact. For this, the use of interfaces (which are, in a sense, a kind of "contract" between components) becomes a useful tool for reducing potential errors. And interfaces require compile-time type checking.
I really enjoy programming in dynamic languages, so I'm certainly not bashing dynamic typing. This is just a problem that recently occurred to me. Unfortunately I don't really have any experience in using a dynamic language for a large and complex system, so I'd be interested to hear from other people whether this problem is real, or merely theoretical.