Handling really large multi language projects

Handling really large multi language projects - c++

I am working on an really large multi language project (1000+ Classes + Configs + Scripts), with files distributed over network drives. I am having trouble fighting through the code, since the available Tools are not helping. The main problem is finding things. For the C++ Part: VS with VAX can only find files and symbols which are in the solution. A lot of them are not. Same problem with Reshaper. Right now i am stuck with doing unindexed string and file searches, which is highly inefficient on a network drive. I heared that SourceInsight would be an option since it allows you to just specify the folders that are part of the project and than indexes them, but my company wont spent money on it.
So my question ist: what Tools are there available to fight through an incredible large amount of code? And if possible they should be low cost or even free/open source.

Check out -
ctags
cscope
idutils
snavigator
In every one of these tools, you would have to invest(*) some time in reading the documentation, and then building your index. Consider switching to an editor that will work with these tools.
(*): I do mean invest, because it will reap dividends once you do.
hope this helps,

If you need to maintain a large amount of code, you really should have a source code managment system, a lot of them will help you find text by indexing all the files
And Most of them will work with various language.
Otherwise you can install some indexer like Apache Lucene and index all your files...

You should take a look at LXR. This is used by many Linux kernel source listings.

Try ndexer http://code.google.com/p/ndexer/
promises to Handle extremely large codebases!

The Perl program ack is also worth a look -- think of it as multi-file grep on steroids. The new version (in what I would call late beta) even lets you specify regexes for the files to process as well as regexes to search for -- a feature I've used extensively since it came out (I've got a subproject with 30k lines in 300+ classes, where this feature has been very helpful). You can even chain the new ack with itself so you can subselect the files to process.

VS with VAX can only find files and symbols which are in the solution. A lot of them are not.
You can add all the files that are not in your solution and set them to not build in the settings. Your VS build will not be affected by this, but now VS knows about those files and you can search them along with your VS native files.

Related

What could be the simplest way to incorporate Windows WPP Software Tracing into SCons builds?

I ask my question in such a specific way because I am afraid that a more generic form could lead to excessively theoretic discussions of how the things should be done best and in the most appropriate way (like a question about pre and post-process actions in SCons).
WPP incorporation actually requires execution of an additional command (commands) before compilation of a file and only even if the build process finds necessity to compile the file without any regard to WPP.
I would remark that this is easily achieved with few lines of definitions in a shared Visual Studio property page file making this work for multiple files in multiple projects, folders, etc. in an absolutely transparent for developers way.
Thus I am wondering whether this can be done in a similarly simple way with SCons? I do not have any deep knowledge of either SCons or MSBuild frameworks; I work with them for simple practical use so I would truly appreciate a practical and useful advise.

Here's what I'd suggest.
SCons builds command lines from Environment() variables.
For example the compile command line for building shared object for c++ is stored in SHCXXCOM (and the variable for what is displayed to user when the command is run defaults to SHCXXCOM, but can be changed by modifying SHCXXCOMSTR).
Back to the problem at hand.
Assuming you have a limited number of build steps you want to wrap, you can do something like.
env['SHCXXCOM'] = [ 'MPP PRE COMMAND LINE', env['SHCXXCOM'], 'MPP POST COMMAND LINE']
You'll have to figure out which variables you need to do this with, but take a look at the manpage to figure that out.
https://scons.org/doc/production/HTML/scons-man.html
p.s. I've not tried this, but in theory it should work. Let us know if not.

HD Regular Expression Search

I am working on a project for my computer security class and I have a couple questions. I had an idea to write a program that would search the whole hard drive looking for email addresses. I am just looking for addresses stored in plain text since it would be hard to find anything otherwise. I figured the best way to find addresses would be to use a regular expression.
I wrote an application in C# that works fairly well but it I would like to see if anyone has any better ideas. I am completely up for writing this in another language since I'm assuming C# isn't the best for this type of thing. So far the application I created just starts at the C:/ and recursively locates all files on the drive skipping those that aren't accessible. It also skips all common image, video, audio, compressed, and files over 512mb. This speeds it up quite a bit but there is a small chance that a large file could contain something useful. It takes about 12 seconds to generate the list of files and I'm guessing about an hour to check them all. One downside is that it uses about 50% cpu while scanning.
I'm looking for ideas on how to improve the search. Is there a faster way, a more efficient way, a more thorough way, things like that? I was trying to think if there was any way that you could tell if the file would contain plain text strings or not. Just let me know if you have any cool ideas. Thanks.

To be honest, the easiest existing way to do this is to use grep. As you improve your program, compare your speeds to it, and when you get close, stop worrying about optimizing. Alternatively, take a look at its source for an example of an existing product that does what you're looking for.

As noted elsewhere, tools already exist for this if you install Win32 ports of UNIX tools. Alternatively, the Windows equivalent is:
for /r c:\ %i in (*.*) do findstr /i /r "regular expression" "%i"

you should just use grep + find. grep is optimized for searching files fast, and find is optimized for providing lists of appropriate files for things like this. people have spent a long time optimizing these tools - no need to reinvent the wheel.

I've been tasked with creating a tool that can diff and merge the configuration files for my company's product. The configurations are stored as either XML or URL-encoded strings. I'm looking for a library, preferably open source with a license compatible with commercial software, that can do these diffs. Our app is written in C++, so C++ libraries would be best, but I'm willing to look at libraries that are C#-specific since I can write a wrapper that exposes it to C++ via COM. Three-way diffs would be ideal, but two-way is acceptable. If it has an understanding of XML, that would also be a plus (since XML nodes can be reordered without changing the document, etc). Any library suggestions? Should I even consider writing my own diff tools in the hopes of giving it semantic knowledge of our formats?
Thanks to this similar question, I've already discovered this google library, which seems really great, but I'm still looking for other options. It also seems to be able to output the diffs in HTML format (using the <ins> and <del> tags that I didn't know existed before I discovered it), which could be really handy, but it seems to be a unified diff only. I'm going to need to display the results in a web browser, and probably have to build an interface for doing the merges in the browser as well. I don't expect a library to be able to help with these tasks, but it must produce output in a format that is amenable to me building this on top of it. I'm currently envisioning something along the lines of TortoiseMerge (side-by-side diffs, not unified), except browser-based. Any tips/tricks/design ideas on how to present this would be appreciated too.

Subversion comes with libsvn_diff and libsvn_delta licensed under Apache Software License.

Here is a C++ library that can diff what the author calls semistructured data. It deals nicely with HTML and XML. Since your data is XML it would make a lot of sense to use this instead of plain text diff. This is especially the case when the files are machine generated.
I am currently trying to use this library to build a tool that diffs Visual Studio project files. These are basically XML files and using a plain diff tool like Winmerge is too painful because Visual Studio pretty much mucks up the whole file by crazy reordering. The idea is to do some kind of a structured diff to address the problem.

For diffing the XML I would propose that you normalize it first: sort all the elements in alphabetic order, then generate a stream of tokens/xml that represents the original document but is independent of the original formatting. After running the diff, parse the result to get a tree containing what was added / removed.

C++ vim IDE. Things you'd need from it

I was going to create the C++ IDE Vim extendable plugin. It is not a problem to make one which will satisfy my own needs.
This plugin was going to work with workspaces, projects and its dependencies.
This is for unix like system with gcc as c++ compiler.
So my question is what is the most important things you'd need from an IDE? Please take in account that this is Vim, where almost all, almost, is possible.
Several questions:
How often do you manage different workspaces with projects inside them and their relationships between them? What is the most annoying things in this process.
Is is necessary to recreate "project" from the Makefile?
Thanks.
Reason to create this plugin:
With a bunch of plugins and self written ones we can simulate most of things. It is ok when we work on a one big "infinitive" project.
Good when we already have a makefile or jam file. Bad when we have to create our owns, mostly by copy and paste existing.
All ctags and cscope related things have to know about list of a real project files. And we create such ones. This <project#get_list_of_files()> and many similar could be a good project api function to cooperate with an existing and the future plugins.
Cooperation with an existing makefiles can help to find out the list of the real project files and the executable name.
With plugin system inside the plugin there can be different project templates.
Above are some reasons why I will start the job. I'd like to hear your one.

There are multiple problems. Most of them are already solved by independent and generic plugins.
Regarding the definition of what is a project.
Given a set of files in a same directory, each file can be the unique file of a project -- I always have a tests/ directory where I host pet projects, or where I test the behaviour of the compiler. On the opposite, the files from a set of directories can be part of a same and very big project.
In the end, what really defines a project is a (leaf) "makefile" -- And why restrict ourselves to makefiles, what about scons, autotools, ant, (b)jam, aap? And BTW, Sun-Makefiles or GNU-Makefiles ?
Moreover, I don't see any point in having vim know the exact files in the current project. And even so, the well known project.vim plugin already does the job. Personally I use a local_vimrc plugin (I'm maintaining one, and I've seen two others on SF). With this plugin, I just have to drop a _vimrc_local.vim file in a directory, and what is defined in it (:mappings, :functions, variables, :commands, :settings, ...) will apply to each file under the directory -- I work on a big project having a dozen of subcomponents, each component live in its own directory, has its own makefile (not even named Makefile, nor with a name of the directory)
Regarding C++ code understanding
Every time we want to do something complex (refactorings like rename-function, rename-variable, generate-switch-from-current-variable-which-is-an-enum, ...), we need vim to have an understanding of C++. Most of the existing plugins rely on ctags. Unfortunately, ctags comprehension of C++ is quite limited -- I have already written a few advanced things, but I'm often stopped by the poor information provided by ctags. cscope is no better. Eventually, I think we will have to integrate an advanced tool like elsa/pork/ionk/deshydrata/....
NB: That's where, now, I concentrate most of my efforts.
Regarding Doxygen
I don't known how difficult it is to jump to the doxygen definition associated to a current token. The first difficulty is to understand what the cursor is on (I guess omnicppcomplete has already done a lot of work in this direction). The second difficulty will be to understand how doxygen generate the page name for each symbol from the code.
Opening vim at the right line of code from a doxygen page should be simple with a greasemonkey plugin.
Regarding the debugger
There is the pyclewn project for those that run vim under linux, and with gdb as debugger. Unfortunately, it does not support other debuggers like dbx.
Responses to other requirements:
When I run or debug my compiled program, I'd like the option of having a dialog pop up which asks me for the command line parameters. It should remember the last 20 or so parameters I used for the project. I do not want to have to edit the project properties for this.
My BuildToolsWrapper plugin has a g:BTW_run_parameters option (easily overridden with project/local_vimrc solutions). Adding a mapping to ask the arguments to use is really simple. (see :h inputdialog())
work with source control system
There already exist several plugins addressing this issue. This has nothing to do with C++, and it must not be addressed by a C++ suite.

debugger
source code navigation tools (now I am using http://www.vim.org/scripts/script.php?script_id=1638 plugin and ctags)
compile lib/project/one source file from ide
navigation by files in project
work with source control system
easy acces to file changes history
rename file/variable/method functions
easy access to c++ help
easy change project settings (Makefiles, jam, etc)
fast autocomplette for paths/variables/methods/parameters
smart identation for new scopes (also it will be good thing if developer will have posibility to setup identation rules)
highlighting incorrect by code convenstion identation (tabs instead spaces, spaces after ";", spaces near "(" or ")", etc)
reformating selected block by convenstion

Things I'd like in an IDE that the ones I use don't provide:
When I run or debug my compiled program, I'd like the option of having a dialog pop up which asks me for the command line parameters. It should remember the last 20 or so parameters I used for the project. I do not want to have to edit the project properties for this.
A "Tools" menu that is configurable on a per-project basis
Ability to rejig the keyboard mappings for every possible command.
Ability to produce lists of project configurations in text form
Intelligent floating (not docked) windows for debugger etc. that pop up only when I need them, stay on top and then disappear when no longer needed.
Built-in code metrics analysis so I get a list of the most complex functions in the project and can click on them to jump to the code
Built-in support for Doxygen or similar so I can click in a Doxygen document and go directly to code. Sjould also reverse navigate from code to Doxygen.
No doubt someone will now say Eclipse can do this or that, but it's too slow and bloated for me.

Adding to Neil's answer:
integration with gdb as in emacs. I know of clewn, but I don't like that I have to restart vim to restart the debugger. With clewn, vim is integrated into the debugger, but not the other way around.

Not sure if you are developing on Windows, but if you are I suggest you check out Viemu. It is a pretty good VIM extension for Visual Studio. I really like Visual Studio as an IDE (although I still think VC6 is hard to beat), so a Vim extension for VS was perfect for me. Features that I would prefer worked better in a Vim IDE are:
The Macro Recording is a bit error prone, especially with indentation. I find I can easily and often record macros in Vim while I am editing code (eg. taking an enum defn from a header and cranking out a corresponding switch statement), but found that Viemu is a bit flakey in that deptartment.
The VIM code completion picks up words in the current buffer where Viemu hooks into the VS code completion stuff. This means if I have just created a method name and I want to ctrl ] to auto complete, Vim will pick it up, but Viemu won't.

For me, it's just down to the necessities
nice integration with ctags, so you can do jump to definition
intelligent completion, that also give you the function prototype
easy way to switch between code and headers
interactive debugging with breaakpoints, but maybe
maybe folding
extra bonus points for refactoring tools like rename or extract method
I'd say stay away from defining projects - just treat the entire file branch as part of the "project" and let users have a settings file to override that default
99% of the difference in speed I see between IDE and vim users is code lookup and navigation. You need to be able to grep your source tree for a phrase (or intelligently look for the right symbol using ctags), show all the hits, and switch to that file in like two or three keystrokes.
All the other crap like repository navigation or interactive debugging is nice, but there are other ways to solve those problems. I'd say drop the interactive debugging even. Just focus on what makes IDEs good editors - have a "big picture" view of your project, instead of single file.
In fact, are there any plugins for vim that already achieve this?

C++ Directory Restructuring

I have a source code of about 500 files in about 10 directories. I need to refactor the directory structure - this includes changing the directory hierarchy or renaming some directories.
I am using svn version control. There are two ways to refactor: one preserving svn history (using svn move command) and the other without preserving. I think refactoring preserving svn history is a lot easier using eclipse CDT and SVN plugin (visual studio does not fit at all for directory restructuring).
But right now since the code is not released, we have the option to not preserve history.
Still there remains the task of changing the include directives of header files wherever they are included. I am thinking of writing a small script using python - receives a map from current filename to new filename, and makes the rename wherever needed (using something like sed). Has anyone done this kind of directory refactoring? Do you know of good related tools?

If you're having to rewrite the #includes to do this, you did it wrong. Change all your #includes to use a very simple directory structure, at mot two levels deep and only using a second level to organize around architecture or OS dependencies (like sys/types.h).
Then change your make files to use -I include paths.
Voila. You'll never have to hack the code again for this, and compiles will blow up instantly if something goes wrong.
As far as the history part, I personally find it easier to make a clean start when doing this sort of thing; archive the old one, make a new repository v2, go from there. The counterargument is when there is a whole lot of history of changes, or lots of open issues against the existing code.
Oh, and you do have good tests, and you're not doing this with a release coming right up, right?

I would preserve the history, even if it takes a small amount of extra time. There's a lot of value in being able to read through commit logs and understand why function X is written in a weird way, or that this really is an off-by-one error because it was written by Oliver, who always gets that wrong.
The argument against preserving the history can be made for the following users:
your code might have embarrassing things, like profanity and fighting among developers
you don't care about the commit history of your code, because it's not going to change or be maintained in the future
I did some directory refactoring like this last year on our code base. If your code is reasonable structured at the beginning, you can do about 75-90% of the work using scripts written in your language of choice (I used Perl). In my case, we were moving from set of files all in one big directory, to a series of nested directories depending on namespaces. So, a file that declared the class protocols::serialization::SerializerBase was located in src/protocols/serialization/SerializerBase. The mapping from the old name to the new name was trivial, so that doing a find and replace on #includes in every source file in the tree was trivial, although it was a big change. There were a couple of weird edge cases that we had to fix by hand, but that seemed a lot better than either having to do everything by hand or having to write our own C++ parser.

Hacking up a shell script to do the svn moves is trivial. In tcsh it's foreach F ( $FILES ) ... end to adjust a set of files. Perl & Python offer better utility.
It really is worth saving the history. Especially when trying to track down some exotic bug. Those who do not learn from history are doomed to repeat it, or some such junk...
As for altering all the files... There was a similar question just the other day over at:
https://stackoverflow.com/questions/573430/
c-include-header-path-change-windows-to-linux/573531#573531

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js