Finding unused files in a project

Finding unused files in a project - c++

We are migrating our works repository so I want to do a cull of all the unreferenced files that exist in the source tree before moving it into the nice fresh (empty) repository.
So far I have gone through by hand and found all the unreferenced files that I know about but I want to find out if I have caught them all. One way would be to manually move the project file by file to a new folder and see what sticks when compiling. That will take all week, so I need an automated tool.
What do people suggest?
Clarifications:
1) It is C++.
2) The files are mixed. I am looking for files that have been superseded by others but have left to rot in the repository - for instance file_iter.h is not referenced by any other file in the program but remains in the repository just in case someone wants to compile a version from 1996! Now we are moving to a fresh repository we can safely junk all the files that are no longer used.
3) Lint only finds unused includes - not unused files (I have the 7.5 manual in front of me).

You've tagged this post with c++, so I'm assuming that's the language in question. If that's the only thing that's in the repository then it shouldn't be too hard to grep all files in the repository for each filename to give you a good starting point. If the repository contains other files (metadata, support files, resources, etc) then you're probably going to need to do it manually.

I can't offer an existing tool for it, but I would expect that you can get a lot of this information from you build tools (with some effort, probably). Typically you can at least let the build tool print the commands it would run, without actually running them. (E.g. the -n option of make and bjam does this.) From it you should be able to extract at least the used source files.
With the -MM of g++ you can get all the non-system header files for the given source files. The output is in the form of a make rule, but with some filtering this shouldn't be a problem.
I don't know if this helps; it's just what I would try in your situation.

You can actually do this indirectly with Lint by running a "whole project analysis" (in which all files are analysed together rather than individually).
Configure it to ignore everything but unreferenced variable/enum/function etc warnings and it should give you a reasonable indicator of where the deadwood lies without those issues being obscured by any others in the codebase.

A static source code analysis tool like lint might do the job. They will tell you if a piece of code will never be called.

Have you taken a look at Source-Navigator? It can be used as an IDE but I found to be very good at analyzing source code structure. For example, it can find out where and if a certain method is used in your source code.
I don't know if it's scriptable but it might be a good starting point for you.

Related

set output path for cmake generated files

My question is the following:
Is there a way to tell CMakeFiles where to generate it's makefiles, such as cmake_install.cmake, CMakeCache.txt etc.?
More specifically, is there a way to set some commands in the CMakeFiles that specifies where to output these generated files? I have tried to search around the web to find some answers, and most people say there's no explicit way of doing this, while others say I might be able to, using custom commands. Sadly, I'm not very strong in cmake, so I couldn't figure this out.
I'm currently using the CLion IDE and there you can specifically set the output path through the settings, but for flexibility reasons I would like as much as possible to be done through the CMakeFiles such that compiling from different computers isn't that big of a hassle.
I would also like to avoid explicitly adding additional command line arguments etc.
I hope someone might have an answer for me, thanks in advance!

You can't (easily) do this and you shouldn't try to do it.
The build tree is CMake's territory. It allows you some tiny amount of customization there (for instance you can specify where the final build artifacts will be placed through the *_OUTPUT_DIRECTORY target properties), but it does not give you any direct control over where intermediate files, like object files or internal make scripts used for bookkeeping are being placed.
This is a feature. You have no idea how all the build systems supported by CMake work internally. Maybe you can move that internal file to a different location in your build process, which is based on Unix Makefiles. But maybe that will also horribly break my build process, which is using Visual Studio. The bottom line is: You shouldn't have to care about this. CMake should take care of it, and by taking some freedom away from you, it ensures that it can actually do that job on all supported build toolchains.
But this might still be an unsatisfactory answer to you. You're the developer, shouldn't you be in full control of the results produced by your build? Of course you should, which is why CMake again grants you full control over what goes into the install tree. That is, whatever ends up in the install directory when you call make install (or whatever is the equivalent of installing in your build toolchain) is again under your control.
So you do control everything that matters: The source tree, the install tree, and that tiny portion of the build tree where the final build artifacts go. The rest of the build tree is off-limits for you and for good reasons.

Cleaning up a VC++ 6 project

I'm working with a very old and large VC6++ project and it's all messed up. There are unused files and folders everywhere, copies of folders and it's just a mess to clean it up by hand in its current state.
It will be done eventually, but is there any simple way to check what files and folders are used when it does a clean compile?
The project settings doesnt help me at all because it simply uses copies of folders and additional include directories.
Any suggestions?

Well, if you want to parse the compiler output you can get which files are actually used. I also find this when googling around, you might want to try (I haven't tried it myself). My way would be to clean the build, list all source files, build, and for each source find its corresponding .obj. The ones without .obj are not used. Note that this only works for source files, unused header files stay undetected.

VC6 will produce a makefile for you:
http://msdn.microsoft.com/en-us/library/aa233950%28v=vs.60%29.aspx
You can use the generated makefile (and the associated .dep file) as a starting point and edit it down to the list of files that get used in a build.
This will let you see the header files that the project depends on in addition to the .c/.cpp/.lib files that might show in the build log. One thing to keep in mind is that you'll probably also want to make sure you track the .dsw and .dsp workspace and project files.
If you're a bit adventurous, you might be able to convince the makefile to actually copy the source files to some other location for you with an appropriate override of the certain macros and/or dependencies. But that would probably be more trouble than it's worth for a one-time effort.
Finally, there's a commercial product, CopyWiz by Kinook Software, that seems to have features that might do what you're looking for (and it supports VC++ 6). Note: I'm not sure if it will do what you want, but it may be worth a look.

Yes. Run Process Monitor from SysInternals. It can capture all file system events and filter them based on the path and other factors.
So, set the filter to the root of your source tree, only succesfull file reads (VC looks for headers in many places), and build your project. You'll probably still see several thousand events. So, save them to file, sort by path, and remove duplicate paths (headers especially will have many duplicate entries)

Is there a way to work out all the required dependencies but without doing "./configure" - C

For those who have compiled from source knows how much of a pain it is to run "./configure" only to find that X library or missing, worst yet it spits out a silly line saying a cryptic lib file is missing, which you then have to go to a web browser type in the missing file cross you fingers that Google can find the answer for you...
I find that very repetitive, so my question is:
Is there a way to work out all the required dependencies but without doing "./configure"

Read the README* or INSTALL* files in the source distribution, if there are any, or look for any documentation on the website where you downloaded it from. If the package is well documented, dependencies will usually be listed somewhere.

Given that there's no mention of a specific pkg has been mentioned, I assume this is a generic "how to avoid using configure" question. From a source tarball, no there is no automated way to work the dependencies out. That's what configure is for (you can always read the Makefiles and autoconf files and understand the dependencies manually, but then you'll miss configure very quickly). To avoid it, you need use something other the straight tarball, which has already worked out the dependencies.
For example you can switch to building source rpms (or debs, dependending on your system). Or you can use a system such as Gentoo which is really good at working out the dependencies for you. But all of these require the pkg you're interested in to be available in their format, so they won't work for tarballs that you download from the source provider.

Read configure.ac/configure.in. Look for calls to AC_CHECK_LIB, AC_CHECK_LIBS, AC_SEARCH_LIBS, AM_PATH_* (some old packages that don't use pkg-config put their checks into the AM_* namespace for some reason), PKG_CHECK_MODULES (for pkg-config), AX_* (many autoconf-archive macros are written to check for uncommon dependencies) and any macro call that start with an odd name (i.e., not AC_*, AM_* or AX_*. Try grep '^[^A]'?).

One thing you can do that would be good for the community is to submit a bug report/feature request to the package maintainers. There are quite a few packages whose configure script does not abort on the first missing dependency, but runs to completion and then prints a summary of all the dependencies that are missing. That greatly reduces the tedium you describe. Unfortunately, "quite a few" translates to less than .00001 percent (this is a made up statistic). If you can convince the package maintainers to re-write their configure script to support this behavior, you will contribute to making the world a better place.
Good luck with that!

keeping Eclipse-generated makefiles in the version control - any issues to expect?

we work under Linux/Eclipse/C++ using Eclipse's "native" C++ projects (.cproject). the system comprises from several C++ projects all kept under svn version control, using integrated subclipse plugin.
we want to have a script that would checkout, compile and package the system, without us needing to drive this process manually from eclipse, as we do now.
I see that there are generated makefile and support files (sources.mk, subdir.mk etc.), scattered around, which are not under version control (probably the subclipse plugin is "clever" enough to exclude them). I guess I can put them under svn and use in the script we need.
however, this feels shaky. have anybody tried it? Are there any issues to expect? Are there recommended ways to achieve what we need?
N.B. I don't believe that an idea of adopting another build system will be accepted nicely, unless it's SUPER-smooth. We are a small company of 4 developers running full-steam ahead, and any additional overhead or learning curve will not appreciated :)
thanks a lot in advance!

I would not recommend putting things that are generated in an external tool into version control. My favorite phrase for this tactic is "version the recipe, not the cake". Instead, you should use a third party tool like your script to manipulate Eclipse appropriately to generate these files from your sources, and then compile them. This avoids the risk of having one of these automatically generated files be out of sync with your root sources.
I'm not sure what your threshold for "super-smooth" is, but you might want to take a look at Maven2, which has a plugin for Eclipse projects to do just this.

I know that this is a big problem (I had exactly the same; in addition: maintaining a build-workspace in svn is a real pain!)
Problems I see:
You will get into problems as soon as somebody adds or changes project settings files but doesn't trigger a new build for all possible platforms! (makefiles aren't updated).
There is no overall make file so you can not easily use the build order of your projects that Eclipse had calculated
BTW: I wrote an Eclipse plugin that builds up a workspace from a given (textual) list of projects and then triggers the build. That's possible but also not an easy task.
Unfortunately I can't post the plugin somewhere because I wrote it for my former employer...

Automatic build ID

We're looking for a way to include some sort of build ID automatically in our builds. This needs to be portable (VC++, g++ on Linux and Mac) and automatic. VC++ is what matters most, since in the other environments we use custom Python build scripts so I can do whatever I want.
We use SVN, so we were looking at using the output of svnversion to write the revision to a header and include it. This has problems : if we put the file in SVN, it will appear as modified every time, but it would be a superfluous commit and in a sense generate an infinite loop of increasing revisions. If we don't put the file in SVN and just create it as a pre-build step, the sources wouldn't be complete, as they'd need the pre-build step or Makefile to generate that file.
We could also use __DATE__ but we can't guarantee the file that uses the __DATE__ (ie writes it to a log file) will be compiled if some other file is modified - except if we "touch" it, but then we'd cause the project to be always out of date. We could touch it as the pre-build step, so it would get touched only if the rest of the project is out of date, thus not causing a spurious compile, but if VC++ computes the dependencies before the pre-build step, this wouldn't work (the file with __DATE__ won't get compiled)
Any interesting ideas?

We're using the output of svnversion, written to a header file and included. We omit the file from the repository and create it in a pre-build step; this has worked quite well for us. (I'm not sure why you object to using a pre-build step?)
We're currently using a Perl script to convert svnversion's output into a header file; I later found out that TortoiseSVN includes a subwcrev command (which has also been ported to Linux) that can do much of the same thing.

If you don't like the idea of an include file not in source control that is required for a build, consider a batch file or other build step that programmatically creates a file/include and call the svnversion within your build process.
basically GENERATE the file so you don't have an unversioned and required file.
EDIT
Josh's subwcrev is probably the best idea.
Before that was implemented I wrote my own hacky tool to do the same thing - do replacement in a template file.

It could be as simple as:
% make -DBUILD_NUMBER=`svnlook youngest /path/to/repo`

I'd look at SvnRev. You can use it as a custom pre-build step in VS, or call it from a makefile, or whatever else you need to do, and it generates a header file that you can include in your other files that will give you what you need. There's good documentation on the site.
SubWCRev is another option, though the Linux port is newer, and I don't know that a Mac version exists. It's very useful on Windows for .NET (which I'm guessing isn't an issue for you, but I'm adding this for future reference), because it allows you to create a template file that can be used to generate, for example, the Properties file for a .NET assembly.

Automatic builds can typically be full, clean builds. In that case, you start in a clean directory and there would be no issue with __DATE__ in any case. Otherwise, see Paul Beckinham's idea.

Why not tie a GUID to it, almost every language has support for generating one, or if your's doesn't there are alot of algorithms for that around.
(Although, if you do use subversion, I personally like Josh's idea better!)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js