C/C++ huge header/source

C/C++ huge header/source - c++

In my C project, I am reading data from an obj file and an image file for opengl. All the data is combined into 1 header file.
Example (psuedo code):
vertices = {
0 , 2, 4,
....
};
normals = {
0, 0, 0,
....
};
texture_pixels = {
0, 0, 0
...
}
The thing is that all this data adds up to a 15mb header. Is it a bad idea to have this massive header? Will this cause any issues in my program?

Whether having a big header file vs. other approaches depends on the application.
The header file is processed at compile time. If the file is infrequently compiled, or its compilation time is quick enough to be acceptable, there is no problem.
If the header file is frequently updated (say like more than once per day) even in its deployed configuration, perhaps the program could be rearchitected to read the equivalent data from a data file from the network, SD card, disk, or what-have-you.
Data files have their own weaknesses:
They are a separate piece apart from the program executable which may need to be kept in sync.
The file format is subject to big vs. little endian issues, unless it is coded in some character format (like XML).
How should the program find the data? Command line parameter, hard coded path, etc.
If the file cannot be found, what to do?

Usually you declare types and function prototypes in headers. and the only function bodies should be inline functions or static functions.
Header files are combined in the source during pre-processing. it is means the large header file will be combined to any source file who is including it.
it is not the right thing to declare variables in a header files. if you declare variable in the headers and include it from several source file, you'll get the "already defined" link error, exactly as if you declare 2 global variables with the same name.
if the header file is included only once than it is exactly like to put it in the source file.
but in any other case the large data should be located in a source file, and the header should contain only the extern declaration of the variables.
The compilation time for a source file depends it size after pre-processing. therefore if you reduce the file size by exporting parts to headers you won't improve the compilation time. if you divide it to several source files any part will be compiled separately, it slow down the compilation by a bit, but when you change one part, other parts won't be needed to recompile so sometimes it is better.

Related

Including Source File in Another Source

I have a large static const byte array that I will be hard defining. This ends up with about 500 lines of wall-of-text type code that is a bit of an eyesore among the rest of the normal flow of the file. In addition, the byte contents of the file is something that would likely be generated by a script type parser.
static const uint8_t largeArray[0x4000]
{
// About 500 lines of content here.
}
In this situation, is it acceptable style to simply create another .c file and then simply include it in my original source file in order to make the 500 lines turn into one? In general I despise including .c in another .c but I'm not sure what the recommended practice is for situations like this.
static const uint8_t largeArray[0x4000]
{
#include "ArrayContents.c" // One line, but not nice file structure.
}

It is acceptable and may typically happen where the large array is generated by some other process, but typically you would want to give the file a different suffix, like .hc instead of .c so that people are not confused about that it is neither a header file or a source file in its own.
An alternative is to just have the content of the file in a different .c file, with just;
const uint8_t largeArray[0x4000]
{
// About 500 lines of content here.
}
and then just declare it an extern in the file where you are using it, e.g.;
extern const uint8_t largeArray[0x4000];

What is the difference between including a .c file and a .h file

Lot of the times when I watch other people's code I see some are including a .h file and some are including a .c/.cpp file. What is the difference?

It depends on what is in the file(s).
The #include preprocessor directive simply inserts the referenced file at that point in the original file.
So what the actual compiler stage (which runs after the preprocessor) sees is the result of all that inserting.
Header files are generally designed and intended to be used via #include. Source files are not, but it sometimes makes sense. For instance when you have a C file containing just a definition and an initializer:
const uint8_t image[] = { 128, 128, 0, 0, 0, 0, ... lots more ... };
Then it makes sense to make this available to some piece of code by using #include. It's a C file since it actually defines (not just declares) a variable. Perhaps it's kept in its own file since the image is converted into C source from some other (image) format used for editing.

.h files are called header files, they should not contain any code (unless it happens to contain information about a C++ templated object). They typically contain function prototypes, typedefs, #define statements that are used by the source files that include them. .c files are the source files. They typically contain the source code implementation of the functions that were prototyped in the appropriate header file.
Source- http://cboard.cprogramming.com/c-programming/60805-difference-between-h-c-files.html

you can look at gcc website (https://gcc.gnu.org/onlinedocs/gcc/Invoking-G_002b_002b.html) that reports a good summary of all the extensions that you can use in C/C++:
C++ source files conventionally use one of the suffixes ‘.C’, ‘.cc’, ‘.cpp’, ‘.CPP’, ‘.c++’, ‘.cp’, or ‘.cxx’; C++ header files often use ‘.hh’, ‘.hpp’, ‘.H’, or (for shared template code) ‘.tcc’; and preprocessed C++ files use the suffix ‘.ii’. GCC recognizes files with these names and compiles them as C++ programs even if you call the compiler the same way as for compiling C programs (usually with the name gcc).

Including header file with declarations is the main, recommended and used almost anywhere, method for making consistent declarations among a project. Including another source file is another (very rare) kind of beast, and it's useful and possible under specific conditions:
There is a reason to split code to separate source files despite it shall be compiled as a single module. For example, there are different versions of some functions which shan't be visible from another modules. So, they are declared static but which version is included is regulated by compile options. Another variant is size and/or maintanenance credentials issues.
The included file isn't compiled by itself as a project module. So, its exported definitions aren't in conflict with the module that file is included to.
Here, I used terms definition and declaration in the manner that the following are declarations:
extern int qq;
void f(int);
#define MYDATATYPE double
and the following are definitions:
int qq; // here, the variable is allocated and exported
void f(int x) { printf("%d\n", x); } // the same for function
(Also, declarations include C++ methods with bodies declared inside their class definition.)
Anyway, the case another .c/.cxx/etc. file is included into source file are very confusing and shall be avoided until a very real need. Sometimes a specific suffix (e.g. .tpl) is used for such files, to avoid reader's confusion.

Splitting .cpp files without code changes

I have a .cpp that's getting rather large, and for easy management I'd like to split it into a few files. However, there are numerous globals, and I'd like to avoid the upkeep of managing a bunch of extern declarations across different files. Is there a way to have multiple .cpp files act as a single file? In essence, I'd like a way to divide the code without the division being recognized by the compiler.

Is there a way to have multiple .cpp files act as a single file?
Yes. That is the definition of #include. When you #include a file, you make a textual substitution of the included file in place of the #include directive. Thus, multiple included files act together to form one translation unit.
In your case, chop the file into several bits. Do this exactly -- do not add or detract any lines of text. Do not add header guards or anything else. You may break your files at almost any convenient location. The limitations are: the break must not occur inside a comment, nor inside a string, and it must occur at the end of a logical line.
Name the newly-created partial files according to some convention. They are not fully-formed translation units, so don't name them *.cpp. They are not proper header files, so don't name them *.h. Rather, they are partially-complete translation units. Perhaps you could name them *.pcpp.
As for the basename, choose the original file name, with a sequentially-numbered suffix: MyProg01.pcpp, MyProg02.pcpp, etc.
Finally, replace your original file with a series of #include statements:
#include "MyProg01.pcpp"
#include "MyProg02.pcpp"
#include "MyProg03.pcpp"

Of course, you can always just #include the various CPP-files into one master file which is the one that the compiler sees. It's a very bad idea though, and you will eventually get into headaches far worse than refactoring the file properly.

whilst you can declare the same set of globals in many cpp files, you will get a separate instance of each as the compiler compiles each file, which will then fail to link as they are combined.
The only answer is to put all your globals in their own file, then cut&paste them into a header file that contains extern declarations (this can easily be automated, but I find using the arrow keys to just paste 'extern' in front of them is quick and simple).
You could refactor everything, but often its not worth the effort (except when you need to change something for other reasons).
You could try splitting the files, and then using the compiler to tell you which globals are needed by each new file, and re-introducing just those directly into each file, keeping the true globals separately.
If you don't want to do this, just #include the cpp files.

How to minimize compilation time in C++

I've coded an script that generates a header file with constants like version, svn tag, build number. Then, I have a class that creates a string with this information.
My problem is the following: As the file is created in every compilation, the compiler detects that the header has changed, and forces the recompilation of a large number of files. I guess that the problem is in the situation of the header file. My project is a library and header has to be in the "interface to the world" header file (it must to be public).
I need some advice to minimize this compilation time or to reduce the files forced to recompile.

In the header write something like:
extern const char *VERSION;
extern const char *TAG;
extern const char *BUILD_DATE;
and create a .c (or .cpp) file that will contain
const char *VERSION = "0.13";
const char *TAG = "v_0_13";
const char *BUILD_DATE = "2011-02-02 11:19 UTC+0100";
If your script updates the .c file, only that file will have to be recompiled, but the files that include your header won't.

Generate the constants in the implementation file.
Make sure the header doesn't get changed by your script.

The easiest way to solve this is to not make those constants in a header file. Instead, make functions in the header file which get these values. Then place the values themselves in a small cpp file which implements those functions. (Or place them in a header ONLY included in that cpp file). When you recompile, only that one file will need to be recompiled.
Also, you could look in to distcc to speed up compilation if you have a few machines to spare.

If you're using gcc, you can try using ccache. it caches object files based on a hash of the preprocessed output, so will not recompile unless an actual change occured

Another way is to declare the constant values like extern const double PI; in your a header like "my_constants.h" and add one cpp file to the project with contents like:
#include "my_constants.h"
const double PI = 3.1415926535;
Then the actual values will only be compiled once, and changing a value only requires compiling that single file and linking the project again.

If you want to keep it as a single public header, you could add a pre-build step that takes your public header file and
filters out the version details, either removing them or replacing them with a fixed string, to a temp copy of the file
moves this to an internal version of your header only if it has changed, i.e. don't copy the file (+ update the timestamp) unless something other than the version has changed
then build your precompiled headers from this internal header file. You can still use the public header with the version details for the source files that need the version.
There's a moveifchanged script in the GCC sources you can borrow for this if you're on unix, or you can rig something up using fc in a batch file on Windows.

Object oriented solution:
Generally, you should put those often refreshed cnstants to cpp file, not h. Put them for example into a class. If you already have a class which creates a string of them and publish this by a method, I'd put all those constants to the same cpp and added some public methods to access them from other source files.

Should every C or C++ file have an associated header file?

Should every .C or .cpp file should have a header (.h) file for it?
Suppose there are following C files :
Main.C
Func1.C
Func2.C
Func3.C
where main() is in Main.C file. Should there be four header files
Main.h
Func1.h
Func2.h
Func3.h
Or there should be only one header file for all .C files?
What is a better approach?

For a start, it would be unusual to have a main.h since there's usually nothing that needs to be exposed to the other compilation units at compile time. The main() function itself needs to be exposed for the linker or start-up code but they don't use header files.
You can have either one header file per C file or, more likely in my opinion, a header file for a related group of C files.
One example of that is if you have a BTree implementation and you've put add, delete, search and so on in their own C files to minimise recompilation when the code changes.
It doesn't really make sense in that case to have separate header files for each C file, as the header is the API. In other words, it's the view of the library as seen by the user. People who use your code generally care very little about how you've structured your source code, they just want to be able to write as little code as possible to use it.
Forcing them to include multiple distinct header files just so they can create, insert into, delete from, and search, a tree, is likely to have them questioning your sanity :-)
You would be better off with one btree.h file and a single btree.lib file containing all of the BTree object files that were built from the individual C files.
Another example can be found in the standard C headers.
We don't know for certain whether there are multiple C files for all the stdio.h functions (that's how I'd do it but it's not the only way) but, even if there were, they're treated as a unit in terms of the API.
You don't have to include stdio_printf.h, stdio_fgets.h and so on - there's a single stdio.h for the standard I/O part of the C runtime library.

Header files are not mandatory.
#include simply copy/paste whatever file included (including .c source files)
Commonly used in real life projects are global header files like config.h and constants.h that contains commonly used information such as compile-time flags and project wide constants.
A good design of a library API would be to expose an official interface with one set of header files and use an internal set of header files for implementation with all the details. This adds a nice extra layer of abstraction to a C library without adding unnecessary bloat.
Use common sense. C/C++ is not really for the ones without it.

I used to follow the "it depends" trend until I realized that consistency, uniformity and simplicity are more important than saving the effort to create a file, and that "standards are good even when they are bad".
What I mean is the following: a .cpp/.h pair of files is pretty much what all "modules" end up anyway. Making the existing of both a requirement saves a lot of confusion and bad engineering.
For instance, when I see some interface of something in a header file, I know exactly where to search for / place its implementation. Conversely, if I need to expose the interface of something that was previously hidden in .cpp file (e.g. static function becoming global), I know exactly where to put it.
I've seen too many bad consequences of not following this simple rule. Unnecessary inline functions, breaking any kind of rules about encapsulation, (non)separation of the interface and implementation, misplaced code, to name a few -- all due to the fact that the appropriate sibling header or cpp file was never added.
So: always define both .h and .c files. Make it a standard, follow it, and safely rely on it. Life is much simpler this way, and simplicity is the most important thing in software.

Generally it's best to have a header file for each .c file, containing the declarations for functions etc in the .c file that you want to expose. That way, another .c file can include the .h file for the functions it needs, and won't need to be recompiled if a header file it didn't include got changed.

Generally there will be one .h file for each .c/.cpp file.

Bjarne Stroustrup Explains it beautifully in his book "The C++ Programming Language"....
The single header style of physical partitioning is most useful when the program is small and its parts are not intended for separate use. When namespaces are used, the logical structure of the program can still be explained in a single header file.
For larger Programs, the single header file approach is unworkable in a conventional file-based development environment. A change to the common header forces recompilation of the whole program, and updates of that single header by several programmers are error prone. Unless strong emphasis is placed on programming styles relying heavily on namespaces and classes, the logical structure deteriorates as program grows.
An alternative physical organization lets each logical module have its own header defining the facilities it provides. Each .c file then has a corresponding h. file specifying what it provides(its interface). Each .c module includes its own .h file and usually also other .h files that specifies what it needs from other modules in order to implement the services advertised in its interface. This physical organization corresponds to the logical organization of a module. The multiple header approach makes it easy to determine the dependencies. The single header approach forces us to look at every declarations used by any module and decide if its relevant. The simple fact is that maintenance of a code is invariably done with incomplete information and from a local perspective.
The better localization leads to less information to compile a module and thus faster compilation..

It depends. Usually your reason for having separate .c files will dictate whether you need separate .h files.

Generally cpp/c files are for implementation and h/hpp (hpp are not used often) files are for header files (prototypes and declarations only). Cpp files don't always have to have a header file associated with it but it usually does as the header file acts like a bridge between cpp files so each cpp file can use code from another cpp file.
One thing that should be strongly enforced is the no use of code within a header file! There's been too many times where header files break compiles in any size project because of redefinitions. And that's simply when you include the header file in 2 different cpp files. Header files should always be designed to be included multiple times as well. Cpp files should never be included.

It's all about what code needs to be aware of what other code. You want to reduce the amount other files are aware of to the bare minimum for them to do their jobs.
They need to know that a function exists, what types they need to pass into it, and what types it will return, but not what it's doing internally. Note that it's also important from the programmers point of view to know what those types actually mean. (e.g which int is the row, and which is the column) but the code itself doesn't care. This is why naming the function and parameters sensibly is worthwhile.
As others have said, if there's nothing in a cpp file worth exposing to other parts of the code, as is normally the case with main.c, then there's no need for a header file.
It's occasionally worth putting everything you want to expose in a single header file (e.g, Func1and2and3.h), so that anything that knows about Func1 knows about Func2 as well, but I'm personally not keen on this, as it means that you tend to load a hell of a lot of junk along with the stuff you actually want.
Summary:
Imagine that you trust that someone can write code and that their algorithms, design, etc. are all good. You want to use code they've written. All you need to know is what to give them to get something to happen, what you should give it to, and what you'll get back. That's what needs to go in the header files.

I like putting interfaces into header files and implementation in cpp files. I don't like writing C++ where I need to add member variables and prototypes to the header and then the method again in the C++. I prefer something like:
module.h
struct IModuleInterface : public IUnknown
{
virtual void SomeMethod () = 0;
}
module.cpp
class ModuleImpl : public IModuleInterface,
public CObject // a common object to do the reference
// counting stuff for IUnknown (so we
// can stick this object in a smart
// pointer).
{
ModuleImpl () : m_MemberVariable (0)
{
}
int m_MemberVariable;
void SomeInternalMethod ()
{
// some internal code that doesn't need to be in the interface
}
void SomeMethod ()
{
// implementation for the method in the interface
}
// whatever else we need
};
I find this is a really clean way of separating implementation and interface.

There is no better approach, only common and less common cases.
The more common case is when you have a class/function interface to declare/define. It's better to have only one .cpp/.c with the definitions, and one header for the declarations.
Giving them the same name makes easy to understand that they are directly related.
But that's not a "rule", that's the common way and the most efficient in almost all cases.
Now in some cases( like template classes or some tiny struct definition ) you'll not need any .c/.cpp file, just the header. We often have some virtual class interface definition in only a header file for example, with only virtual pure functions or trivial functions.
And in other rare cases (like an hypothetical main.c/.cpp file) if wouldn't be always required to allow code from external compilation unit to call the function of a given compilation unit. The main function is an example (no header/declaration needed), but there are others, mostly when it's code that "connect all the other parts together" and is not called by other parts of the application. That's very rare but in this case a header make no sense.

If your file exposes an interface - that is, if it has functions which will be called from other files - then it should have a header file. Otherwise, it shouldn't.

As already noted, generally, there will be one header (.h) file for each source (.c or .cpp) file.
However, you should look at the cohesiveness of the files. If the various source files provide separate, individually reusable sets of functions - an ideal organization - then you should certainly have one header per file. If, however, the three source files provide a composite set of functions (that is too big to fit into one file), then you would use a more complex organization. There would be one header for the external services used by the main program - and that would be used by other programs needing the same services. There would also be a second header used by the cooperating source files that provides 'internal' definitions shared by those files.
(Also noted by Pax): The main program does not normally need its own header - no other source code should be using the services it provides; it uses the services provided by other files.

If you want your compiled code to be used from another compilation unit you will need the header files. There are some situations for which you do now need/want to have a headers.
The first such case are main.c/cpp files. This class is not meant to be included and as such there is no need for a header file.
In some cases you can have a header file that defines behavior of a set of different implementations that are loaded through a dll that is loaded at runtime. There will be different set of .c/.cpp files that implement variations of the same header. This can be common in plugin systems.

In general, I don't think there is any explicit relationship between .h and .c files. In many cases (probably most), a unit of code is a library of functionality with a public interface (.h) and an opaque implementation (.c). Sometimes a number of symbols are needed, like enums or macros, and you get a .h with no corresponding .c and in a few circumstances, you will have a lump of code with no public interface and no corresponding .h
in particular, there are a number of times when, for the sake of readability, the headers or implementations (seldom both) are so big and hairy that they end up being broken into many smaller files, for the sake of the programmer's sanity.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js