How can I convert a big C project to VC++ - c++

I'm manually re-writing the code.
I have a big C program with 50+ .c files and 20+ .h files
I need to convert them to a class so I can run multiple instances in a single exe
I have no experience of converting C project to C++.Is there a guidance to follow?
I have done some small research with Google and have following plan:
mv c to cpp and compile, fix all implicit converation to explict converation
remove all static keyword(for file scope), resolve global name conflicts
create a global h file for class hearder (class FOO), move all functions and variables defination to the class as members
for the macros and consts defined in other h files, include with extern "C" keyword
rename all function in cpp files to FOO::function

It sounds like your plan is to converting 50 files into a single C++ class and instantiate multiple instances of the class. This is, at best, a severe misuse of C++ classes. In practice, it's unlikely that this will work because you will still only have one thread of execution. Only one of those objects can run at a time. Every time you do an I/O operation (for example), everything comes to a halt until the I/O operation finishes.
I know nothing about your particular case, but in general, if I were approaching this problem, I would keep my existing code run multiple processes instead. I'd also write a shim class that managed communication with those processes using Inter-Process Communication (IPC), such as UNIX sockets or named pipes.
If you still feel that multiple instance is the way to go, then carve out a tiny fraction of your current source code and port it over so that you can understand the issues.

Related

Can a C header file be considered as an interface? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am learning about architecture from Robert C. Martin's book Clean Architecture. One of the main rules emphasized through the book is the DIP rule that states Source code dependencies must point only inwards, toward higher-level policies . Trying to translate this in the embedded domain assume 2 components scheduler and timer . The scheduler is high-level policy which relies on the low level timer driver and needs to call the APIs get_current_time() and set_timeout() , simply I would split the module into an implementation file timer.c and a header(an interface?) timer.h and the scheduler.c could simply include timer.h to use these APIs . Reading the book portrayed the previous scenario as breaking of the dependency rule and implied that an interface between the 2 components should be implemented to break the dependency.
To imitate that in c for example timer_abstract can include a generic structure with pointers to functions
struct timer_drv {
uint32 (*get_current_time)(void);
void (*set_timeout)(uint32 t);
}
This to me looks like over-design. Isn't a simple header file sufficient ? Can a C header file be considered as an interface ?
In computing, an "interface" is a common boundary across which two or more components or subsystems exchange information.
A header file in C or C++ is a text file that contains a set of declarations and (possibly) macros that can be inserted into a compilation unit (a separate unit of source code, such as a source file), and allow that compilation unit to use those declarations and macros. In other words #include "headerfile" within a source file is replaced by the content of headerfile by the C or C++ preprocessor before subsequent compilation.
Based on these definitions, I would not describe a header file as an interface.
A header file may define data types, declare variables, and declare functions. Multiple source files may include that header, and each will be able to use the data types, variables, and functions that are declared in that header. One compilation unit may include that header, and then define some (or all) of the functions declared in the header.
However types, variables, and functions need not be placed in a header file. A programmer who is determined enough can manually copy declarations and macros into every source file that uses them, and never use a header file. A C or C++ compiler cannot tell the difference - because all the preprocessor does is text substitution.
The logical grouping of declarations and macros is actually what represents an interface, not the means by which information about the interface is made available to compilation units. A header file is simply one (optional) means by which a set of declarations and macros can be made available to compilation units.
Of course, a header file is often practically used to avoid errors in using a set of declarations and macros - so can help make it easier to manage the interface represented by those declarations and macros. Every compilation unit that #includes a header file receives the same content (unless affected by other preprocessor macros). This is much less error prone than the programmer manually copying declarations into every source file that needs them. It is also easier to maintain - editing a header file means all compilation units can be rebuilt and have visibility of the changes. Whereas, manually updating declarations and macros into every source file can introduce errors, because programmers are error prone - for example, by editing the declarations inconsistently between source files.
I think the reason why you would want an interface for a timer is indeed to break dependencies. Since the Scheduler uses the Timer, every location Scheduler.o is linked to, Timer.o must be linked to as well provided you use scheduler symbols that depend on timer symbols.
If you would have used an interface for Timer, no linking from Scheduler.o to Timer.o (or Scheduler.so to Timer.so if you want) is required nor useful. You will create an instance of Timer at runtime, likely pass it to the constructor of Scheduler, Timer.o will be linked to elsewhere.
Now why would that be useful? Unit testing is one example: you can pass a Timer stub class to Scheduler's ctor and link to TimerTestStub.o etc. You can see that this way of working does break dependencies. Scheduler.o does require a Timer, but which one is not a requirement at the build time level of scheduler.so but higher up. You pass the Timer instance as an argument of Scheduler's ctor.
This is also very useful to lower the amount of build-time dependencies when using libraries. The real trouble starts when creating a dependency chain. Scheduler requires Timer, Timer requires class X, class X requires class Y, class Y requires class Z ...
This may look still ok to you but know that every class could be in another library.
You then want to use Scheduler but are forced to drag a ton of includepath settings and likely do a ton of linking.
You can break dependencies by only exposing the functionality of Scheduler you really need in its interface, of course you can use multiple interfaces.
You should make your own demo, write 10 classes, put them in 10 shared libs, make sure every class requires 3 other classes out of those 10. Now include 1 of those class headers in your main.cpp and see what you need to do to get it build properly.
Now you need to think on breaking those dependencies.

How to properly specify which function goes from which file?

The question is about organizing your own code.
Let's say I have multiple *.cpp and corresponding headers to them and I use some functions from these in another parts of program code.
After some time passes I may start to forget which header and cpp a certain function goes from and looking at a simple
func();
tells absolutely nothing.
I can only think of using namespaces so I can later write
Module::func();
Any other ways? I heard using many namespaces isn't a good practice and a bunch of my projects have more than 5-10 cpp's and headers
You can change your project to OOP.
Every .cpp file will represent single class.
In any part of programm you can see to what object belong called method.
If changing to OOP is problem, you can/must use namespaces.
Namespaces only can 'affect' compile-time performance, no other issues can come with it.

Precompile script into objects inside C++ application

I need to provide my users the ability to write mathematical computations into the program. I plan to have a simple text interface with a few buttons including those to validate the script grammar, save etc.
Here's where it gets interesting. These functions the user is writing need to execute at multi-megabyte line speeds in a communications application. So I need the speed of a compiled language, but the usage of a script. A fully interpreted language just won't cut it.
My idea is to precompile the saved user modules into objects at initialization of the C++ application. I could then use these objects to execute the code when called upon. Here are the workflows I have in mind:
1) Testing(initial writing) of script: Write code in editor, save, compile into object (testing grammar), run with test I/O, Edit Code
2) Use of Code (Normal operation of application): Load script from file, compile script into object, Run object code, Run object code, Run object code, etc.
I've looked into several off the shelf interpreters, but can't find what I'm looking for. I considered JAVA, as it is pretty fast, but I would need to load the JAVA virtual machine, which means passing objects between C and the virtual machine... The interface is the bottleneck here. I really need to create a native C++ object running C++ code if possible. I also need to be able to run the code on multiple processors effectively in a controlled manner.
I'm not looking for the whole explanation on how to pull this off, as I can do my own research. I've been stalled for a couple days here now, however, and I really need a place to start looking.
As a last resort, I will create my own scripting language to fulfill the need, but that seems a waste with all the great interpreters out there. I've also considered taking an existing open source complier and slicing it up for the functionality I need... just not saving the compiled results to disk... I don't know. I would prefer to use a mainline language if possible... but that's not required.
Any help would be appreciated. I know this is not your run of the mill idea I have here, but someone has to have done it before.
Thanks!
P.S.
One thought that just occurred to me while writing this was this: what about using a true C compiler to create object code, save it to disk as a dll library, then reload and run it inside "my" code? Can you do that with MS Visual Studio? I need to look at the licensing of the compiler... how to reload the library dynamically while the main application continues to run... hmmmmm I could then just group the "functions" created by the user into library groups. Ok that's enough of this particular brain dump...
A possible solution could be use gcc (MingW since you are on windows) and build a DLL out of your user defined code. The DLL should export just one function. You can use the win32 API to handle the DLL (LoadLibrary/GetProcAddress etc.) At the end of this job you have a C style function pointer. The problem now are arguments. If your computation has just one parameter you can fo a cast to double (*funct)(double), but if you have many parameters you need to match them.
I think I've found a way to do this using standard C.
1) Standard C needs to be used because when it is compiled into a dll, the resulting interface is cross compatible with multiple compilers. I plan to do my primary development with MS Visual Studio and compile objects in my application using gcc (windows version)
2) I will expose certain variables to the user (inputs and outputs) and standardize them across units. This allows multiple units to be developed with the same interface.
3) The user will only create the inside of the function using standard C syntax and grammar. I will then wrap that function with text to fully define the function and it's environment (remember those variables I intend to expose?) I can also group multiple functions under a single executable unit (dll) using name parameters.
4) When the user wishes to test their function, I dump the dll from memory, compile their code with my wrappers in gcc, and then reload the dll into memory and run it. I would let them define inputs and outputs for testing.
5) Once the test/create step was complete, I have a compiled library created which can be loaded at run time and handled via pointers. The inputs and outputs would be standardized, so I would always know what my I/O was.
6) The only problem with standardized I/O is that some of the inputs and outputs are likely to not be used. I need to see if I can put default values in or something.
So, to sum up:
Think of an app with a text box and a few buttons. You are told that your inputs are named A, B, and C and that your outputs are X, Y, and Z of specified types. You then write a function using standard C code, and with functions from the specified libraries (I'm thinking math etc.)
So now your done... you see a few boxes below to define your input. You fill them in and hit the TEST button. This would wrap your code in a function context, dump the existing dll from memory (if it exists) and compile your code along with any other functions in the same group (another parameter you could define, basically just a name to the user.) It then runs the function using a functional pointer, using the inputs defined in the UI. The outputs are sent to the user so they can determine if their function works. If there are any compilation errors, that would also be outputted to the user.
Now it's time to run for real. Of course I kept track of what functions are where, so I dynamically open the dll, and load all the functions into memory with functional pointers. I start shoving data into one side and the functions give me the answers I need. There would be some overhead to track I/O and to make sure the functions are called in the right order, but the execution would be at compiled machine code speeds... which is my primary requirement.
Now... I have explained what I think will work in two different ways. Can you think of anything that would keep this from working, or perhaps any advice/gotchas/lessons learned that would help me out? Anything from the type of interface to tips on dynamically loading dll's in this manner to using the gcc compiler this way... etc would be most helpful.
Thanks!

Is it better to define global (extern) variables in a single header, or in their respective header files?

I'm working on a small software project which I hope to release in the future as open-source, so I was hoping to gather opinions on what the best currently accepted practices are in regards to this issue.
The application itself is procedural, not object oriented (there is no need for me to encapsulate the rendering functions or event handling functions in a class), but some aspects of the application are heavily object oriented (like the scripting console, which heavily relies on OO). The OO aspects of the code have the standard object.cpp and object.h files.
For the procedural part, I have my code split up into various files (e.g. main.cpp, render.cpp, events.cpp), each which might have some global variables specific to that file. I also have corresponding header files for each, defining all functions and variables (as extern) that I want to be accessible from other files. Then, I just #include the right header when I need access to that function/variable from another source file.
I realized today that I could also have another option: create a single globals.h header file, where I could define all global variables (as extern again) and functions that would be needed outside of a specific source file. Then, I could just #include this file in all of the source files (instead of each individual header file like I do now). Also, using this method, if I needed to promote a variable/function to global (instead of local), I could just add the entry to the header file.
The Question: Is it a better practice to use a corresponding header file for every single .cpp file (and define the variables/functions I want globally accessible in those headers), or use a single header file to declare all globally accessible variables/functions?
Another quick update, most (but not all) of the globals are used as such because my application is multithreaded.
To me it is way better to have a header file corresponding to each implementation (c or cpp) file. You must think your classes, structures and functions as modules, and if you split your implementation, it is logical that you split your delarations too.
Another thing is that when you modify a header file, it leads all files that include it to be recompiled at build time. And at the end I can tell you it can take long. You can avoid rebuilding everything by properly splitting your declarations.
I would recommend having more headers and put less in them. You have to have the litany of includes, but that is simple to understand and edit if its wrong.
Having one big globals is harder to cope with if something goes wacky. If you did have to change something, that change is potentially far reaching and high risk.
More code isn't a bad thing in this case.
A minor point is that your compile times will increase super linearly the more you put in that one big header since each and every file has to process it. On an embedded project it is probably less of a worry, but in general having a lot in headers will start to weigh you down.
It's better to put them all in one file and not compile that file at all. If you have global variables you should be rethinking your design, especially if you're doing applications programming and not low-level systems programming.
As I've said in the comments below the question, the first thing to do would be to try and eliminate all global data. If this is not possible, rather than one big header, or throwing externs into each class' header, I'd follow a third approach.
Say your Event class needs to have a global instance. If you declare the global instance in event.cpp and extern it in event.hpp, then this essentially makes these files non-reusable anywhere else. Throwing it into a globals.cpp and globals.hpp is not ideal either because every time that global header gets modified, chances are your entire project will be rebuilt because the header is being included by everyone.
So the third option is to create an accompanying header and source file for each class that needs to have a global instance. So you'd declare the Event global instance in event_g.cpp and extern it in event_g.hpp.
Yes, it is ugly, and yes, it is tedious. But there's nothing pretty about global data to being with.

Multiple classes in a header file vs. a single header file per class

For whatever reason, our company has a coding guideline that states:
Each class shall have it's own header and implementation file.
So if we wrote a class called MyString we would need an associated MyStringh.h and MyString.cxx.
Does anyone else do this? Has anyone seen any compiling performance repercussions as a result? Does 5000 classes in 10000 files compile just as quickly as 5000 classes in 2500 files? If not, is the difference noticeable?
[We code C++ and use GCC 3.4.4 as our everyday compiler]
The term here is translation unit and you really want to (if possible) have one class per translation unit ie, one class implementation per .cpp file, with a corresponding .h file of the same name.
It's usually more efficient (from a compile/link) standpoint to do things this way, especially if you're doing things like incremental link and so forth. The idea being, translation units are isolated such that, when one translation unit changes, you don't have to rebuild a lot of stuff, as you would have to if you started lumping many abstractions into a single translation unit.
Also you'll find many errors/diagnostics are reported via file name ("Error in Myclass.cpp, line 22") and it helps if there's a one-to-one correspondence between files and classes. (Or I suppose you could call it a 2 to 1 correspondence).
Overwhelmed by thousands lines of code?
Having one set of header/source files per class in a directory can seem overkill. And if the number of classes goes toward 100 or 1000, it can even be frightening.
But having played with sources following the philosophy "let's put together everything", the conclusion is that only the one who wrote the file has any hope to not be lost inside. Even with an IDE, it is easy to miss things because when you're playing with a source of 20,000 lines, you just close your mind for anything not exactly referring to your problem.
Real life example: the class hierarchy defined in those thousand lines sources closed itself into a diamond-inheritance, and some methods were overridden in child classes by methods with exactly the same code. This was easily overlooked (who wants to explore/check a 20,000 lines source code?), and when the original method was changed (bug correction), the effect was not as universal as excepted.
Dependancies becoming circular?
I had this problem with templated code, but I saw similar problems with regular C++ and C code.
Breaking down your sources into 1 header per struct/class lets you:
Speed up compilation because you can use symbol forward-declaration instead of including whole objects
Have circular dependencies between classes (§) (i.e. class A has a pointer to B, and B has a pointer to A)
In source-controlled code, class dependencies could lead to regular moving of classes up and down the file, just to make the header compile. You don't want to study the evolution of such moves when comparing the same file in different versions.
Having separate headers makes the code more modular, faster to compile, and makes it easier to study its evolution through different versions diffs
For my template program, I had to divide my headers into two files: The .HPP file containing the template class declaration/definition, and the .INL file containing the definitions of the said class methods.
Putting all this code inside one and only one unique header would mean putting class definitions at the beginning of this file, and the method definitions at the end.
And then, if someone needed only a small part of the code, with the one-header-only solution, they still would have to pay for the slower compilation.
(§) Note that you can have circular dependencies between classes if you know which class owns which. This is a discussion about classes having knowledge of the existence of other classes, not shared_ptr circular dependencies antipattern.
One last word: Headers should be self-sufficients
One thing, though, that must be respected by a solution of multiple headers and multiple sources.
When you include one header, no matter which header, your source must compile cleanly.
Each header should be self-sufficient. You're supposed to develop code, not treasure-hunting by greping your 10,000+ source files project to find which header defines the symbol in the 1,000 lines header you need to include just because of one enum.
This means that either each header defines or forward-declare all the symbols it uses, or include all the needed headers (and only the needed headers).
Question about circular dependencies
underscore-d asks:
Can you explain how using separate headers makes any difference to circular dependencies? I don't think it does. We can trivially create a circular dependency even if both classes are fully declared in the same header, simply by forward-declaring one in advance before we declare a handle to it in the other. Everything else seems to be great points, but the idea that separate headers facilitate circular dependencies seems way off
underscore_d, Nov 13 at 23:20
Let's say you have 2 class templates, A and B.
Let's say the definition of class A (resp. B) has a pointer to B (resp. A). Let's also say the methods of class A (resp. B) actually call methods from B (resp. A).
You have a circular dependency both in the definition of the classes, and the implementations of their methods.
If A and B were normal classes, and A and B's methods were in .CPP files, there would be no problem: You would use a forward declaration, have a header for each class definitions, then each CPP would include both HPP.
But as you have templates, you actually have to reproduce that patterns above, but with headers only.
This means:
a definition header A.def.hpp and B.def.hpp
an implementation header A.inl.hpp and B.inl.hpp
for convenience, a "naive" header A.hpp and B.hpp
Each header will have the following traits:
In A.def.hpp (resp. B.def.hpp), you have a forward declaration of class B (resp. A), which will enable you to declare a pointer/reference to that class
A.inl.hpp (resp. B.inl.hpp) will include both A.def.hpp and B.def.hpp, which will enable methods from A (resp. B) to use the class B (resp. A).
A.hpp (resp. B.hpp) will directly include both A.def.hpp and A.inl.hpp (resp. B.def.hpp and B.inl.hpp)
Of course, all headers need to be self sufficient, and protected by header guards
The naive user will include A.hpp and/or B.hpp, thus ignoring the whole mess.
And having that organization means the library writer can solve the circular dependencies between A and B while keeping both classes in separate files, easy to navigate once you understand the scheme.
Please note that it was an edge case (two templates knowing each other). I expect most code to not need that trick.
We do that at work, its just easier to find stuff if the class and files have the same name. As for performance, you really shouldn't have 5000 classes in a single project. If you do, some refactoring might be in order.
That said, there are instances when we have multiple classes in one file. And that is when it's just a private helper class for the main class of the file.
+1 for separation. I just came onto a project where some classes are in files with a different name, or lumped in with another class, and it is impossible to find these in a quick and efficient manner. You can throw more resources at a build - you can't make up lost programmer time because (s)he can't find the right file to edit.
In addition to simply being "clearer", separating classes into separate files makes it easier for multiple developers not to step on each others toes. There will be less merging when it comes time to commit changes to your version control tool.
Most places where I have worked have followed this practice. I've actually written coding standards for BAE (Aust.) along with the reasons why instead of just carving something in stone with no real justification.
Concerning your question about source files, it's not so much time to compile but more an issue of being able to find the relevant code snippet in the first place. Not everyone is using an IDE. And knowing that you just look for MyClass.h and MyClass.cpp really saves time compared to running "grep MyClass *.(h|cpp)" over a bunch of files and then filtering out the #include MyClass.h statements...
Mind you there are work-arounds for the impact of large numbers of source files on compile times. See Large Scale C++ Software Design by John Lakos for an interesting discussion.
You might also like to read Code Complete by Steve McConnell for an excellent chapter on coding guidelines. Actualy, this book is a great read that I keep coming back to regularly.
N.B. You need the first edition of Code Complete that is easily available online for a copy. The interesting section on coding and naming guidelines didn't make it into Code Complete 2.
It's common practice to do this, especially to be able to include .h in the files that need it. Of course the performance is affected but try not to think about this problem until it arises :).
It's better to start with the files separated and after that try to merge the .h's that are commonly used together to improve performance if you really need to. It all comes down to dependencies between files and this is very specific to each project.
The best practice, as others have said, is to place each class in its own translation unit from a code maintenance and understandability perspective. However on large scale systems this is sometimes not advisable - see the section entitled "Make Those Source Files Bigger" in this article by Bruce Dawson for a discussion of the tradeoffs.
The same rule applies here, but it notes a few exceptions where it is allowed Like so:
Inheritance trees
Classes that are only used within a very limited scope
Some Utilities are simply placed in a general 'utils.h'
It is very helpful to have only have one class per file, but if you do your building via bulkbuild files which include all the individual C++ files, it makes for faster compilations since startup time is relatively large for many compilers.
I found these guidelines particularly useful when it comes to header files :
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Header_Files
Two words: Ockham's Razor. Keep one class per file with the corresponding header in a separate file. If you do otherwise, like keeping a piece of functionality per file, then you have to create all kinds of rules about what constitutes a piece of functionality. There is much more to gain by keeping one class per file. And, even half decent tools can handle large quantities of files. Keep it simple my friend.
I'm surprised that almost everyone is in favor of having one file per class. The problem with that is that in the age of 'refactoring' one may have a hard time keeping the file and class names in synch. Everytime you change a class name, you then have to change the file name too, which means that you have to also make a change everywhere the file is included.
I personally group related classes into a single files and then give such a file a meaningful name that won't have to change even if a class name changes. Having fewer files also makes scrolling through a file tree easier.
I use Visual Studio on Windows and Eclipse CDT on Linux, and both have shortcut keys that take you straight to a class declaration, so finding a class declaration is easy and quick.
Having said that, I think once a project is completed, or its structure has 'solidified', and name changes become rare, it may make sense to have one class per file. I wish there was a tool that could extract classes and place them in distinct .h and .cpp files. But I don't see this as essential.
The choice also depends on the type of project one works on. In my opinion the issue doesn't deserve a black and white answer since either choice has pros and cons.