Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I would focus on libraries though it can be a general application installation as well.
When we install a library (say C++), a novice user like me probably expects that when we "install" a library, all that source-code gets copied somewhere with few flags and path variables set so that we can directly use #include kind of statements in our own code and start using them.
But by inspection I can say that actually, the exact source-files are not copied but instead pre-compiled object-forms of the files are copied, except for the so called *.h header-files. (Simply because, I cannot find the sourcefiles all over the hard-disk except the headerfiles)
My Questions:
What is the behind scene method, when we "install" something.. what are all the typical locations that get affected by in a 'linux' environment. And the typical importance/use of each of those locations.
What is the difference between "installing" a library and installing a new application into the linux system via "sudo apt-get" or so.
Finally, If I have a custom set of source files which are useful as a library, and want to send them to another system, how would I "install" my own library there, in the same way as above.
Just to clarify, My primary interest is to know from your kind answers and literature-pointers, the bigger picture of a typical installation (an application/a library), to a level that I can crosscheck,learn and re-do if I want to.
(Question was removed, question addressed difference between header and object files) This is more a question of general programming. A header file is just the declaration of classes/functions/etc, it does nothing. All a header file does is say "hey, I exist, this is what I look like." That is to say it's just a declaration of signatures used later in the actual code. The object code is just the compiled and assembled, but not linked code. This diagram does a good job of explaining the steps of what we generally call the "compilation" process, but would better be called the "compilation, assembling, and linking process." Briefly, linking is pulling in all necessary object files, including those needed from the system, to create a running executable which you can use.
(Now question 1) When you think about it, what is installation except the creation and modification of necessary files with the appropriate content? That's what installing is, just placing the new files in the appropriate place, and then modifying configuration files if necessary. As to what "locations" are typically affected, you usually see binaries placed in /bin, /usr/bin and /usr/local/bin; libraries are typically placed in /lib or /usr/lib. Of course this varies, depending. I think you'd find this page on linux system directories to be an educational read. Remember though, anything can be placed pretty much anywhere and still work appropriately as long as you tell other things where to find it, these directories are just used because they keep things organized and allow for assumptions about where items, such as binaries, will be located.
(Now question 2) The only difference is that apt-get generally makes it easier by installing the item you need and keeping track of installed items, also it allows for easy removal of installed items. In terms of the actual installation, if you do it correctly manually then it should be the same. A package manager such as apt-get just makes life easier.
(Now question 3) If you want to do that you could create your own package or if it's less involved, you could just create a script that moves the files to the appropriate locations on the system. However you want to do it, as long as you get the items where they need to be. If you want to create a package yourself, it'd be a great learning experience and there are plenty of tutorials are online. Just find out what package system your flavor of linux uses then look for a tutorial on how to create packages of that type.
So the really big picture, in my opinion, of the installation process is just compilation (if necessary), then the moving of necessary files to their appropriate places on the system, and the modification of existing files on the system if necessary: Put your crap there, let the system know it's there if you need to.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I have an application dependent on many libraries. I am building everything from sources on an ubuntu machine. I want to remove any function/class that is not required by an application. Is there any tool to help with that?
P.S. I want to remove source code from the library not just symbols from object files.
Standard strip utility was created exactly for this.
I have now researched this a bit in the context of my own project and decided this was worth a full answer rather than just a comment. This answer is based on Apple's toolchain on macOS (which uses clang, rather than gcc), but I think things work in much the same way for both.
The key to this is enabling 'link time optimization' when building your libraries and executable(s). The mechanics of this are actually very simple - just pass -flto to gcc and ld on the command line. This has two effects:
Code (functions / methods) in object files or archives that is never called is omitted from the final executable.
The linker performs the sort of optimisations that the compiler can perform (such as function inlining), but with knowledge that extends across compilation unit boundaries.
It won't help you if you are linking against a shared library, but it might help if that shared library links with other (static) libraries which contain code that the shared library never calls.
On the upside, this reduced the size of my final executable by about 5%, which I'm pleased about. YMMV.
On the downside, my object files roughly doubled in size and sometimes link times increased dramatically (by something like a factor of 100). Then, if I re-linked, it was much faster. This behaviour might be a peculiarity of Apple's toolchain however. Perhaps it is stashing away some build intermediates somewhere on the first link. In any case, if you only enable this option for release builds it should not be a major issue.
There are more details of the full set of gcc command line options that control optimisation here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html. Search that page for flto to narrow down your search.
And for a glimpse behind the scenes, see: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html
Edit:
A bit more information about link times. Apple's linker creates some huge files in a directory called LTOCache when you link. I've not seen these before today so these look to be the build intermediates that speed up linking second time around. As for my initial link being so slow, this may in part be due to the fact that, in my case, these are created on an SMB server. But then again, the CPU was maxed out so maybe not.
OK, now that I understand the OP's requirements better I have another answer for this that I think might better suit his needs. I think the way to tackle this is with a code coverage tool. After all, the problem is identifying what you can safely get rid of it. Actually stripping it out is easy.
My IDE (Visual Studio) has one of these built in but I think the OP is using gcc so the first port of call appears to be gcov. There are a number of commercial options, but they are expensive. There's also a potentially useful post here.
The other thing you need, of course, is a program that exercises all the parts of the library that you want to keep to give you a coverage report to work from, but it sounds like the OP already has that. A good IDE will also help as it makes navigating around the code so much easier. In Visual Studio, I find Jump to Definition and quick and easy 'bookmarking' to be key features.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a question about how large c++ projects with many components are supposed to be managed (I guess is the best term). For all intents and purposes I'm a beginning programmer. I understand the basics of compiling, header files, etc., but I've never really worked on anything bigger than homework assignments. So, let's take something like a game engine that has various components like a memory manager, renderer, physics simulation, and so on. How would one work on these components separately, but in a way that makes it easy to integrate back into the whole? For example, would you make a separate visual studio project for each piece with its own main? If you have one big project for everything, how would you work on one component without potentially another unfinished component making it fail every compile? I feel like I'm missing some major concept. Like, for projects with multiple programmers that have to check out portions to work on... do they grab all the code so they can compile, or do they set up their own temporary project to work on their bit? Both options sound wrong. You have to have a main function to compile right?
I would very much appreciate anyone educating me on this topic as I feel this is something i should have and just somehow missed completely.
When you are working with larger programs it is customary to have one source file with a main program and the rest (there can be many source files) are called from main. Then you need a build strategy. You can write a script file that compiles each of your source files and then links them all together. Unfortunately this can lead to long build times, so professional programmers use of make files which rebuild only the files that change.
As a further refinement, you can organize groups of sources into libraries and build the libraries separately and then link them with your remaining compiled source files.
Try looking up gmake (for linux) to see how to build larger projects. I guess you are using Microsoft VC++, in which case compiled files have .obj extensions and libraries .lib extensions. Microsoft have there own way of building libraries which is slighly more complicated than using gmake.
When you look further you'll come across shared libraries (dynamic link libraries on windows - DLLs).
This isn't really a great question for Stack overflows format. C++ does support language facilities for managing large code bases, like namespaces, classes, and header files. But your question seems to suggest a lack of perspective as to what they are for, or a limited understanding of the technical framework and process for contributing code to a software project. Which isn't a c++ specific issue.
When working on a living project, a primary concern is dealing with complexity. Or, in other words, reducing the number of things you have to think about at any one point in time. What that means is if another programmer is working on the user interface, ideally your code in the physics engine shouldn't have to change to reflect those changes. So interfaces, for forming abstractions and hiding information, are essential.
Granted I'm pretty green as well, so I can't give any real solid advice. I only mention this point to give some perspective as to how vague your question is. If I understand your question correctly, you might enjoy a book like Code Complete 2 by McConnell.
Large projects are separated into pieces. Normally, you should have the ability to compile each piece separately. The best practice that I know is to declare the interfaces among the various components, minimizing dependencies as close as possible to zero, and then building 'test' programs, which are small and serve two reasons: test a small piece of code, have main().
The directory structure is usually:
yourlib/
lib/
ext-inc/
test/
other dirs/
...
the lib contains the output library object (.a, .so)
the ext-lib contains the headers external code will use (sometimes called 'public' or just 'inc')
the test directory usually have a main.c (cpp) file and might have some more, as needed.
When you checkout(svn) / clone(git) / sync(p4) / etc. you would take everything, but work only on your area. once done, you merge/submit your changes into the main branch.
We need to implement a feature to our program that would sync 2 or more watched folders.
In reality, the folders will reside on different computers on the local network, but to narrow down the problem, let's assume the tool runs on a single computer, and has a list of watched folders that it needs to sync, so any changes to one folder should propagate to all others.
There are several problems I've thought about so far:
Deleting files is a valid change, so if folder A has a file but folder B doesn't, it could mean that the file was created in folder A and needs to propagate to folder B, but it could also mean that the file was deleted in folder B and needs to propagate to folder A.
Files might be changed/deleted simultaneously in several directories, and with conflicting changes, I need to somehow resolve the conflicts.
One or more of the folders might be offline at any time, so changes must be stored and later propagated to it when it comes online.
I am not sure what kind of help if any the community can offer here, but I'm thinking about these:
If you know of a tool that already does this, please point it out. Our product is closed-source and commercial, however, so its license must be compatible with that for us to be able to use it.
If you know of any existing literature or research on the problem (papers and such), please link to it. I assume that this problem would have been researched already.
Or if you have general advice on the best way to approach this problem, which algorithms to use, how to solve conflicts, or race conditions if they exists, and other gotchas.
The OS is Windows, and I will be using Qt and C++ to implement it, if no tools or libraries exist.
It's not exceptionally hard. You just need to compare the relevant change journal records. Of course, in a distributed network you have to assume the clocks are synchronized.
And yes, if a complex file (anything you can't parse) is edited while the network is split, you cannot avoid problems. This is known as the
CAP theorem . Your system cannot be Consistent, Always Available and also resistant against Partitioning (going offline)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm building a very basic library, and it's the first project that I plan on releasing for others to use if they'd like. As such, I'd like to know what some "best practices" as far as organization goes. The only thing that may be unique about my project is that in order for it to be used in a project, users would be required to extend certain abstract classes, which leads me to my first question:
A lot of libraries I've seen consist of a .a file and a single .h file. Is this best practice? Wouldn't it be better to expose all the public .h files so that users can choose which ones to include? If this is the preferred way of doing things, how exactly is it accomplished? What goes into that single .h file?
My second question involves dependencies. For example my current project relies on OpenGL, GLFW, and GLEW. Should I package those in some way with my project, or just make it the user's responsibility to ensure that they are installed?
Edit: Someone asked about my target OS. All of my dependencies are cross platform so I'm (perhaps naively) hoping to make my library cross platform as well.
Thanks for any and all help!
It really depends on the circumstances. If you have some fairly complex functionality, that are in a number of closely related functions, then one header is the right solution.
E.g. you write a set of functions that draw something to the screen, and you need a few functions to confgiure/set up the environment, a few functions to define and place objects in the scene, a few functions to do the actual drawing/processing, and finally teardown, then using one header file is a good plan.
In the above case, it's also possible to have one "overall" header-file that includes several smaller ones. Particularly if you have fairly large classes, sticking them all in one file gets rather messy.
On the other hand, if you have one set of functions that deal with gasses dissolved in liquids, another set of functions to calculate the strength/load capacity of a steel beam, and another set of functions to calculate the friction of a rubber tyre against a roadsurface, then they probably should have different headers - even if it's all feasible functionality to go in a "Physics/mechanics library".
It is rarely a good idea to supply third party libraries with your library - yes, if you want to offer two downloads, one with the "all you nead, just add water", and one "bare library", that's fine. But I don't want to spend three times longer than necessary to download your library, simply because it also contains three other libraries that your code is using, which is already on my machine. However, do document what libraries are needed, and what you need to do to install them on your supported platforms (and what the supported platforms are). And what versions of libraries you have tested - there's nothing worse than "getting the latest", only to find that the version something needs is two steps back...
(And as Jason C points out, licensing gets very messy once you have a few different packages that your code depends on, because your license then has to be compatible with ALL the other licenses - sometimes that's not even possible...)
You have options and it really depends on how convenient you choose to make it for developers using your libraries.
As for the headers, the general method for libraries of average complexity is to have a single header that a developer can include to get everything they need. A good method is, if you have multiple headers, create a single header with the same name as your library (not required, just common) and have it #include all the individual headers. Then distribute the single header and individual headers. That way your users have the option of #including just one to get everything, or #including individual ones if necessary.
E.g. in mylibrary.h:
#ifndef MYLIBRARY_H
#define MYLIBRARY_H
#include <mylibrary/something.h>
#include <mylibrary/another.h>
#include <mylibrary/lastone.h>
#endif
Ensure that your individual headers can be included standalone (i.e. they #include everything they need) if you want to provide that option to developers.
As for dependencies, you will want to make it the user's responsibility to ensure they are installed. The user is compiling their code and linking to your library, and so it is also the user's responsibility to link to dependent libraries. If you package third-party dependencies with your library you run many risks:
Breaking user's systems who already have dependencies installed.
As mentioned in Mats Petersson's answer, forcing users to download dependencies they already have.
Violating licensing rights on third-party libraries.
The best thing for you to do is clearly document the required dependencies.
For this there are not really standard "best practices". Any sane practice would be a good practice.
General question:
For unmanaged C++, what's better for internal code sharing?
Reuse code by sharing the actual source code? OR
Reuse code by sharing the library / dynamic library (+ all the header files)
Whichever it is: what's your strategy for reducing duplicate code (copy-paste syndrome), code bloat?
Specific example:
Here's how we share the code in my organization:
We reuse code by sharing the actual source code.
We develop on Windows using VS2008, though our project actually needs to be cross-platform. We have many projects (.vcproj) committed to the repository; some might have its own repository, some might be part of a repository. For each deliverable solution (.sln) (e.g. something that we deliver to the customer), it will svn:externals all the necessary projects (.vcproj) from the repository to assemble the "final" product.
This works fine, but I'm quite worried about eventually the code size for each solution could get quite huge (right now our total code size is about 75K SLOC).
Also one thing to note is that we prevent all transitive dependency. That is, each project (.vcproj) that is not an actual solution (.sln) is not allowed to svn:externals any other project even if it depends on it. This is because you could have 2 projects (.vcproj) that might depend on the same library (i.e. Boost) or project (.vcproj), thus when you svn:externals both projects into a single solution, svn:externals will do it twice. So we carefully document all dependencies for each project, and it's up to guy that creates the solution (.sln) to ensure all dependencies (including transitive) are svn:externals as part of the solution.
If we reuse code by using .lib , .dll instead, this would obviously reduce the code size for each solution, as well as eliminiate the transitive dependency mentioned above where applicable (exceptions are, for example, third-party library/framework that use dll like Intel TBB and the default Qt)
Addendum: (read if you wish)
Another motivation to share source code might be summed up best by Dr. GUI:
On top of that, what C++ makes easy is
not creation of reusable binary
components; rather, C++ makes it
relatively easy to reuse source code.
Note that most major C++ libraries are
shipped in source form, not compiled
form. It's all too often necessary to
look at that source in order to
inherit correctly from an object—and
it's all too easy (and often
necessary) to rely on implementation
details of the original library when
you reuse it. As if that isn't bad
enough, it's often tempting (or
necessary) to modify the original
source and do a private build of the
library. (How many private builds of
MFC are there? The world will never
know . . .)
Maybe this is why when you look at libraries like Intel Math Kernel library, in their "lib" folder, they have "vc7", "vc8", "vc9" for each of the Visual Studio version. Scary stuff.
Or how about this assertion:
C++ is notoriously non-accommodating
when it comes to plugins. C++ is
extremely platform-specific and
compiler-specific. The C++ standard
doesn't specify an Application Binary
Interface (ABI), which means that C++
libraries from different compilers or
even different versions of the same
compiler are incompatible. Add to that
the fact that C++ has no concept of
dynamic loading and each platform
provide its own solution (incompatible
with others) and you get the picture.
What's your thoughts on the above assertion? Does something like Java or .NET face these kinds of problems? e.g. if I produce a JAR file from Netbeans, will it work if I import it into IntelliJ as long as I ensure that both have compatible JRE/JDK?
People seem to think that C specifies an ABI. It doesn't, and I'm not aware of any standardised compiled language that does. To answer your main question, use of libraries is of course the way to go - I can't imagine doing anything else.
One good reason to share the source code: Templates are one of C++'s best features because they are an elegant way around the rigidity of static typing, but by their nature are a source-level construct. If you focus on binary-level interfaces instead of source-level interfaces, your use of templates will be limited.
We do the same. Trying to use binaries can be a real problem if you need to use shared code on different platforms, build environments, or even if you need different build options such as static vs. dynamic linking to the C runtime, different structure packing settings, etc..
I typically set projects up to build as much from source on-demand as possible, even with third-party code such as zlib and libpng. For those things that must be built separately, e.g. Boost, I typically have to build 4 or 8 different sets of binaries for the various combinations of settings needed (debug/release, VS7.1/VS9, static/dynamic), and manage the binaries along with the debugging information files in source control.
Of course, if everyone sharing your code is using the same tools on the same platform with the same options, then it's a different story.
I never saw shared libraries as a way to reuse code from an old project into a new one. I always thought it was more about sharing a library between different applications that you're developing at about the same time, to minimize bloat.
As far as copy-paste syndrome goes, if I copy and paste it in more than a couple places, it needs to be its own function. That's independent of whether the library is shared or not.
When we reuse code from an old project, we always bring it in as source. There's always something that needs tweaking, and its usually safer to tweak a project-specific version than to tweak a shared version that can wind up breaking the previous project. Going back and fixing the previous project is out of the question because 1) it worked (and shipped) already, 2) it's no longer funded, and 3) the test hardware needed may no longer be available.
For example, we had a communication library that had an API for sending a "message", a block of data with a message ID, over a socket, pipe, whatever:
void Foo:Send(unsigned messageID, const void* buffer, size_t bufSize);
But in a later project, we needed an optimization: the message needed to consist of several blocks of data in different parts of memory concatenated together, and we couldn't (and didn't want to, anyway) do the pointer math to create the data in its "assembled" form in the first place, and the process of copying the parts together into a unified buffer was taking too long. So we added a new API:
void Foo:SendMultiple(unsigned messageID, const void** buffer, size_t* bufSize);
Which would assemble the buffers into a message and send it. (The base class's method allocated a temporary buffer, copied the parts together, and called Foo::Send(); subclasses could use this as a default or override it with their own, e.g. the class that sent the message on a socket would just call send() for each buffer, eliminating a lot of copies.)
Now, by doing this, we have the option of backporting (copying, really) the changes to the older version, but we're not required to backport. This gives the managers flexibility, based on the time and funding constraints they have.
EDIT: After reading Neil's comment, I thought of something that we do that I need to clarify.
In our code, we do lots of "libraries". LOTS of them. One big program I wrote had something like 50 of them. Because, for us and with our build setup, they're easy.
We use a tool that auto-generates makefiles on the fly, taking care of dependencies and almost everything. If there's anything strange that needs to be done, we write a file with the exceptions, usually just a few lines.
It works like this: The tool finds everything in the directory that looks like a source file, generates dependencies if the file changed, and spits out the needed rules. Then it makes a rule to take eveything and ar/ranlib it into a libxxx.a file, named after the directory. All the objects and library are put in a subdirectory that is named after the target platform (this makes cross-compilation easy to support). This process is then repeated for every subdirectory (except the object file subdirs). Then the top-level directory gets linked with all the subdirs' libraries into the executable, and a symlink is created, again, naked after the top-level directory.
So directories are libraries. To use a library in a program, make a symbolic link to it. Painless. Ergo, everything's partitioned into libraries from the outset. If you want a shared lib, you put a ".so" suffix on the directory name.
To pull in a library from another project, I just use a Subversion external to fetch the needed directories. The symlinks are relative, so as long as I don't leave something behind it still works. When we ship, we lock the external reference to a specific revision of the parent.
If we need to add functionality to a library, we can do one of several things. We can revise the parent (if it's still an active project and thus testable), tell Subversion to use the newer revision and fix any bugs that pop up. Or we can just clone the code, replacing the external link, if messing with the parent is too risky. Either way, it still looks like a "library" to us, but I'm not sure that it matches the spirit of a library.
We're in the process of moving to Mercurial, which has no "externals" mechanism so we have to either clone the libraries in the first place, use rsync to keep the code synced between the different repositories, or force a common directory structure so you can have hg pull from multiple parents. The last option seems to be working pretty well.