How can I compare directories recursive ignoring comments?

How can I compare directories recursive ignoring comments? - compare

I would like to compare a complete folder with another folder and see all changes between them. But changes that are only comments (with #) should be ignored as well as any whitespace or newline changes.
With diff -r I get nearly all files back cause in my case in every fiele there was the version number added in the head.
The best would be a graphical diff-tool like Kompare

I haven't tried it myself, but I would check out Meld for a nice, graphical tool. It supports comparison of directories (among other things) and is designed for software projects possibly under version control.
It runs on Linux, Windows and Mac OS.

Related

Enforcing coding styles in Visual Studio and VIM

I work with a medium sized team of developers, with half being Linux developers using VIM on Ubuntu and MacVIM on OSX, and the other half being Windows developers using Visual Studio 2010 or later.
A fair bit of time has been wasted in the past when handling things like SVN operations failing due to mixed line endings, or changes to code reviews due to a mix of hard-tabs versus spaces-as-tabs (and sometimes of varying lengths, ie: 4 spaces vs 2 spaces vs 8 spaces, etc), and I would like to put an end to it.
The plan is to adapt a common coding style we've designed, which is almost identical to the Linux Kernel coding style. For all developers, we could require them to run their code through the checkpatch.pl script used by Linux kernel devs, but our code includes C, C++, and C#, so we would need to generalize the rule checker script beyond just ANSI C.
Is there a generic way to implement a rule set for VIM, and another for Visual studio? We'd like to generate a script that checks entire files, which could be hooked into our version control system so that it's run on code before commits complete, and perhaps as a run-time script to enforce the style as coders type?
Thank you.

EditorConfig seems to do exactly what you want in Vim, Visual Studio, and a lot of other editors and IDEs.

The run-time "script" is the best first-line of defense. In this case, it will be various Vim and Visual Studio settings to help enforce your code style. That alone will catch quite a few problems. Keep in mind, this won't catch everything, but will encourage the coding style you want.
I've worked across Linux & Visual Studio in a team before (and sometimes by myself). The whole Tabs/Spaces issue drove me nuts as there would be wholesale groups of lines that were either shifted WAY over or not enough. To solve this, I ended up using these three settings in Vim (also set similar values in Visual Studio), thus catching one class of issues at the root.
Vim
set expandtab
set shiftwidth=4 " Mainly for if/for/while/general {} block indentation. 4 spaces.
set softtabstop=-1 " Allows us to use the Tab key and have it act like shiftwidth.
Visual Studio
Insert spaces when Tab key is pressed.
Shift 4 spaces on indent inside code block
They key is getting rid of the Tab characters, or at least having both systems use the SAME SETTINGS (i.e. both using Tab with the same values or SPACEs substituting as Tabs)
Something to watch out for:
Someone copying a file from one Operating System to a different one and then checking the file into SVN on that machine. SVN will blindly commit, say, a DOS line-ending file from a UNIX system. You want to checkout/commit files on the same system only. Otherwise, checking out/editing/committing files all on the same OS should present no issues, as SVN can convert the line endings upon checkout. You can "fix" this by loading a file into Vim and then converting the line-endings to the particular OS you want by typing ":set fileformat=dos" (if you want to change to Windows-style), ":set fileformat=unix" (for unix style), or finally ":set fileformat=mac" for Mac.
As far as the code style goes, as you probably already know, both Vim and Visual Studio offer lots of flexibility there. While I cannot give you the specific settings for Vim, the options to look at are
autoindent
cindent
cinoptions (implied from cindent)
cinkeys (implied from cindent)
comments (default is probably fine, but here for thoroughness)
So, you will want
set autoindent
and cindent should be automatically set when editing a C or C++ file. The defaults for cinoptions and cinkeys are ok for me, but I have tweaked them in the past when working with a different group.
Don't forget about using the '=' command over a selected range of lines to reformat the code! This can be very handy!
I shy away from the completely automatic SVN backend method because it may take longer than you expect to get it right, and when it screws up, it will probably take more time out of your day than you expect as well. After all, you really just want to be productive, right?.
Discipline up front is key!

Handling really large multi language projects

I am working on an really large multi language project (1000+ Classes + Configs + Scripts), with files distributed over network drives. I am having trouble fighting through the code, since the available Tools are not helping. The main problem is finding things. For the C++ Part: VS with VAX can only find files and symbols which are in the solution. A lot of them are not. Same problem with Reshaper. Right now i am stuck with doing unindexed string and file searches, which is highly inefficient on a network drive. I heared that SourceInsight would be an option since it allows you to just specify the folders that are part of the project and than indexes them, but my company wont spent money on it.
So my question ist: what Tools are there available to fight through an incredible large amount of code? And if possible they should be low cost or even free/open source.

Check out -
ctags
cscope
idutils
snavigator
In every one of these tools, you would have to invest(*) some time in reading the documentation, and then building your index. Consider switching to an editor that will work with these tools.
(*): I do mean invest, because it will reap dividends once you do.
hope this helps,

If you need to maintain a large amount of code, you really should have a source code managment system, a lot of them will help you find text by indexing all the files
And Most of them will work with various language.
Otherwise you can install some indexer like Apache Lucene and index all your files...

You should take a look at LXR. This is used by many Linux kernel source listings.

Try ndexer http://code.google.com/p/ndexer/
promises to Handle extremely large codebases!

The Perl program ack is also worth a look -- think of it as multi-file grep on steroids. The new version (in what I would call late beta) even lets you specify regexes for the files to process as well as regexes to search for -- a feature I've used extensively since it came out (I've got a subproject with 30k lines in 300+ classes, where this feature has been very helpful). You can even chain the new ack with itself so you can subselect the files to process.

VS with VAX can only find files and symbols which are in the solution. A lot of them are not.
You can add all the files that are not in your solution and set them to not build in the settings. Your VS build will not be affected by this, but now VS knows about those files and you can search them along with your VS native files.

C++ vim IDE. Things you'd need from it

I was going to create the C++ IDE Vim extendable plugin. It is not a problem to make one which will satisfy my own needs.
This plugin was going to work with workspaces, projects and its dependencies.
This is for unix like system with gcc as c++ compiler.
So my question is what is the most important things you'd need from an IDE? Please take in account that this is Vim, where almost all, almost, is possible.
Several questions:
How often do you manage different workspaces with projects inside them and their relationships between them? What is the most annoying things in this process.
Is is necessary to recreate "project" from the Makefile?
Thanks.
Reason to create this plugin:
With a bunch of plugins and self written ones we can simulate most of things. It is ok when we work on a one big "infinitive" project.
Good when we already have a makefile or jam file. Bad when we have to create our owns, mostly by copy and paste existing.
All ctags and cscope related things have to know about list of a real project files. And we create such ones. This <project#get_list_of_files()> and many similar could be a good project api function to cooperate with an existing and the future plugins.
Cooperation with an existing makefiles can help to find out the list of the real project files and the executable name.
With plugin system inside the plugin there can be different project templates.
Above are some reasons why I will start the job. I'd like to hear your one.

There are multiple problems. Most of them are already solved by independent and generic plugins.
Regarding the definition of what is a project.
Given a set of files in a same directory, each file can be the unique file of a project -- I always have a tests/ directory where I host pet projects, or where I test the behaviour of the compiler. On the opposite, the files from a set of directories can be part of a same and very big project.
In the end, what really defines a project is a (leaf) "makefile" -- And why restrict ourselves to makefiles, what about scons, autotools, ant, (b)jam, aap? And BTW, Sun-Makefiles or GNU-Makefiles ?
Moreover, I don't see any point in having vim know the exact files in the current project. And even so, the well known project.vim plugin already does the job. Personally I use a local_vimrc plugin (I'm maintaining one, and I've seen two others on SF). With this plugin, I just have to drop a _vimrc_local.vim file in a directory, and what is defined in it (:mappings, :functions, variables, :commands, :settings, ...) will apply to each file under the directory -- I work on a big project having a dozen of subcomponents, each component live in its own directory, has its own makefile (not even named Makefile, nor with a name of the directory)
Regarding C++ code understanding
Every time we want to do something complex (refactorings like rename-function, rename-variable, generate-switch-from-current-variable-which-is-an-enum, ...), we need vim to have an understanding of C++. Most of the existing plugins rely on ctags. Unfortunately, ctags comprehension of C++ is quite limited -- I have already written a few advanced things, but I'm often stopped by the poor information provided by ctags. cscope is no better. Eventually, I think we will have to integrate an advanced tool like elsa/pork/ionk/deshydrata/....
NB: That's where, now, I concentrate most of my efforts.
Regarding Doxygen
I don't known how difficult it is to jump to the doxygen definition associated to a current token. The first difficulty is to understand what the cursor is on (I guess omnicppcomplete has already done a lot of work in this direction). The second difficulty will be to understand how doxygen generate the page name for each symbol from the code.
Opening vim at the right line of code from a doxygen page should be simple with a greasemonkey plugin.
Regarding the debugger
There is the pyclewn project for those that run vim under linux, and with gdb as debugger. Unfortunately, it does not support other debuggers like dbx.
Responses to other requirements:
When I run or debug my compiled program, I'd like the option of having a dialog pop up which asks me for the command line parameters. It should remember the last 20 or so parameters I used for the project. I do not want to have to edit the project properties for this.
My BuildToolsWrapper plugin has a g:BTW_run_parameters option (easily overridden with project/local_vimrc solutions). Adding a mapping to ask the arguments to use is really simple. (see :h inputdialog())
work with source control system
There already exist several plugins addressing this issue. This has nothing to do with C++, and it must not be addressed by a C++ suite.

debugger
source code navigation tools (now I am using http://www.vim.org/scripts/script.php?script_id=1638 plugin and ctags)
compile lib/project/one source file from ide
navigation by files in project
work with source control system
easy acces to file changes history
rename file/variable/method functions
easy access to c++ help
easy change project settings (Makefiles, jam, etc)
fast autocomplette for paths/variables/methods/parameters
smart identation for new scopes (also it will be good thing if developer will have posibility to setup identation rules)
highlighting incorrect by code convenstion identation (tabs instead spaces, spaces after ";", spaces near "(" or ")", etc)
reformating selected block by convenstion

Things I'd like in an IDE that the ones I use don't provide:
When I run or debug my compiled program, I'd like the option of having a dialog pop up which asks me for the command line parameters. It should remember the last 20 or so parameters I used for the project. I do not want to have to edit the project properties for this.
A "Tools" menu that is configurable on a per-project basis
Ability to rejig the keyboard mappings for every possible command.
Ability to produce lists of project configurations in text form
Intelligent floating (not docked) windows for debugger etc. that pop up only when I need them, stay on top and then disappear when no longer needed.
Built-in code metrics analysis so I get a list of the most complex functions in the project and can click on them to jump to the code
Built-in support for Doxygen or similar so I can click in a Doxygen document and go directly to code. Sjould also reverse navigate from code to Doxygen.
No doubt someone will now say Eclipse can do this or that, but it's too slow and bloated for me.

Adding to Neil's answer:
integration with gdb as in emacs. I know of clewn, but I don't like that I have to restart vim to restart the debugger. With clewn, vim is integrated into the debugger, but not the other way around.

Not sure if you are developing on Windows, but if you are I suggest you check out Viemu. It is a pretty good VIM extension for Visual Studio. I really like Visual Studio as an IDE (although I still think VC6 is hard to beat), so a Vim extension for VS was perfect for me. Features that I would prefer worked better in a Vim IDE are:
The Macro Recording is a bit error prone, especially with indentation. I find I can easily and often record macros in Vim while I am editing code (eg. taking an enum defn from a header and cranking out a corresponding switch statement), but found that Viemu is a bit flakey in that deptartment.
The VIM code completion picks up words in the current buffer where Viemu hooks into the VS code completion stuff. This means if I have just created a method name and I want to ctrl ] to auto complete, Vim will pick it up, but Viemu won't.

For me, it's just down to the necessities
nice integration with ctags, so you can do jump to definition
intelligent completion, that also give you the function prototype
easy way to switch between code and headers
interactive debugging with breaakpoints, but maybe
maybe folding
extra bonus points for refactoring tools like rename or extract method
I'd say stay away from defining projects - just treat the entire file branch as part of the "project" and let users have a settings file to override that default
99% of the difference in speed I see between IDE and vim users is code lookup and navigation. You need to be able to grep your source tree for a phrase (or intelligently look for the right symbol using ctags), show all the hits, and switch to that file in like two or three keystrokes.
All the other crap like repository navigation or interactive debugging is nice, but there are other ways to solve those problems. I'd say drop the interactive debugging even. Just focus on what makes IDEs good editors - have a "big picture" view of your project, instead of single file.
In fact, are there any plugins for vim that already achieve this?

C++ Header files - put them in one directory or merged in a tree structure?

I have a substantial body of source code (OOFILE) which I'm finally putting up on Sourceforge. I need to decide if I should go with a monolithic include directory or keep the header files with the source tree.
I want to make this decision before pushing to the svn repo on SourceForge. I expect a lot of people who use it after that move will keep a working copy checked out directly from SF so won't want to change their structure.
The full source tree has about 262 files in 25 folders. There are a lot more classes than that suggests as due to conforming to 8.3 character names (yes it dates back to Win3.1) many classes are in one file. As I used to develop with ObjectMaster, that never bothered me but I will be splitting it up to conform to more recent trends to minimise the number of classes per file. From a quick skim of the class list, there are about 600 classes.
OOFILE is a cross-platform product expected to be built on Mac, Windows and assorted Unix platforms. As it started life on Mac, with compilers that point to include trees rather than flat include dirs, headers were kept with the source.
Later, mainly to keep some Visual Studio users happy, a build was reorganised with a single include directory. I'm trying to choose between those models.
The entire OOFILE product covers quite a few domains:
database front-end
range of database backends
simple 2D graphing engine for Mac and Windows
simple character-mode report-writer for trivial html and text listing
very rich banding report-writer with Mac and Windows Preview and Printing and cross-platform generation of text, RTF, HTML and XML reports
forms integration engine for easy CRUD forms binding to the database, with implementations on PowerPlant and MFC
cross-platform utility classes
file and directory manipulation
strings
arrays
XML and tag generation
Many people only want to use it on a single platform and some of those code areas are pure legacy (eg: PowerPlant UI framework on classic Mac). It therefore seems people would appreciate not having headers from those unwanted areas dumped in their monolithic include directory.
I started thinking about having an include directory split up into a few of the domains above and then realised that was sounding more like the original structure.
In summary, the choices seem to be:
Keep original model, all headers adjacent to source - max flexibility at cost of some complex includes in projects.
one include directory with everything inside
split includes by domain, so there may be about 6 directories for someone using the lot but a pure database user would probably have a single directory.
From a Unix build aspect, the recommended structure has been 2. My situation is complicated by needing to keep Visual Studio and XCode users happy (sniff, CodeWarrior, how I doth miss thee!).
Edit - the chosen solution:
I went with four subdirectories in include. I started trying to divide them up further by platform but it just got very noisy very quickly.

Personally I would go with 2, or 3 if really pushed.
But whichever you choose, please make it crystal clear in the build instructions how to set up the include paths. Nothing dooms an open source project more than it being really difficult to build - developers want a quick out-of-the-box experience and if it involves faffing around with many undocumented environment variables (or whatever) most will simply go away.

C++ Directory Restructuring

I have a source code of about 500 files in about 10 directories. I need to refactor the directory structure - this includes changing the directory hierarchy or renaming some directories.
I am using svn version control. There are two ways to refactor: one preserving svn history (using svn move command) and the other without preserving. I think refactoring preserving svn history is a lot easier using eclipse CDT and SVN plugin (visual studio does not fit at all for directory restructuring).
But right now since the code is not released, we have the option to not preserve history.
Still there remains the task of changing the include directives of header files wherever they are included. I am thinking of writing a small script using python - receives a map from current filename to new filename, and makes the rename wherever needed (using something like sed). Has anyone done this kind of directory refactoring? Do you know of good related tools?

If you're having to rewrite the #includes to do this, you did it wrong. Change all your #includes to use a very simple directory structure, at mot two levels deep and only using a second level to organize around architecture or OS dependencies (like sys/types.h).
Then change your make files to use -I include paths.
Voila. You'll never have to hack the code again for this, and compiles will blow up instantly if something goes wrong.
As far as the history part, I personally find it easier to make a clean start when doing this sort of thing; archive the old one, make a new repository v2, go from there. The counterargument is when there is a whole lot of history of changes, or lots of open issues against the existing code.
Oh, and you do have good tests, and you're not doing this with a release coming right up, right?

I would preserve the history, even if it takes a small amount of extra time. There's a lot of value in being able to read through commit logs and understand why function X is written in a weird way, or that this really is an off-by-one error because it was written by Oliver, who always gets that wrong.
The argument against preserving the history can be made for the following users:
your code might have embarrassing things, like profanity and fighting among developers
you don't care about the commit history of your code, because it's not going to change or be maintained in the future
I did some directory refactoring like this last year on our code base. If your code is reasonable structured at the beginning, you can do about 75-90% of the work using scripts written in your language of choice (I used Perl). In my case, we were moving from set of files all in one big directory, to a series of nested directories depending on namespaces. So, a file that declared the class protocols::serialization::SerializerBase was located in src/protocols/serialization/SerializerBase. The mapping from the old name to the new name was trivial, so that doing a find and replace on #includes in every source file in the tree was trivial, although it was a big change. There were a couple of weird edge cases that we had to fix by hand, but that seemed a lot better than either having to do everything by hand or having to write our own C++ parser.

Hacking up a shell script to do the svn moves is trivial. In tcsh it's foreach F ( $FILES ) ... end to adjust a set of files. Perl & Python offer better utility.
It really is worth saving the history. Especially when trying to track down some exotic bug. Those who do not learn from history are doomed to repeat it, or some such junk...
As for altering all the files... There was a similar question just the other day over at:
https://stackoverflow.com/questions/573430/
c-include-header-path-change-windows-to-linux/573531#573531

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js