Doing a 'diff/st' and ignoring the first line if it matches a specific criterion - regex

In a repository for a well known open source project, all files contain a version string with a timestamp as their first line:
<?php // $Id: index.php,v 1.201.2.10 2009-04-25 21:18:24 stronk7 Exp $
Even if I don't really understand why they do this - since the files are already under version control -, I have to live with this.
The main problem is that if I try to 'st' or 'diff' a release to get an idea of what was changed from the previous one, every single file contained in the repository is obviously marked as modified and the diffs become unreadable and unmanageable.
I'm wondering if there's a way to ignoring the first lines doing a diff/st when they match a regexp.
The project is under cvs - cvs, yes, you've read correctly - and included in a bigger mercurial repository.

I don't know about cvs, but with hg you can use any external diff tool with the bundled extdiff extension, and any modern tool should have the ability to let you ignore diffs that match certain patterns.
I swear by Beyond Compare, which allows arbitrary syntax definition.
kdiff3 has preprocessor commands that you can pipe the input through.

If you try
man diff
you'll find
--ignore-matching-lines=RE Ignore changes whose lines all match RE.
search "ignore matching lines" on the web gives examples :
diff --unified --recursive --new-file
--ignore-matching-lines='[$]Author.[$]'
--ignore-matching-lines='[$]Date.[$]' ...
(http://www.cygwin.com/ml/cygwin-apps/2005-01/msg00000.html)
Thus try :
diff --ignore-matching-lines='[<][?]php [/][/] [$]Id:'

Related

git word diff regex strange behaviour

I'm using Git to version prose and have been trying git diff --word-diff to see changes within lines. I want to use the results generated in a script.
But the default way that --word-diff identifies a word seems flawed. So I've been experimenting with --word-diff-regex= options.
Problem
Here are the two main flaws I'm trying to deal with:
Added whitespace seems to be ignored. But whitespace can be quite important if trying to use the results programmatically.
For example, take this header from a Markdown (.md) file:
# Test file
Now, let's add some text to the end of it:
# Test file in Markdown
If I run git diff --word-diff on this:
# Test file {+in Markdown+}
But the space before the word "in" has not been included as part of the diff.
Empty lines are completely ignored.
Here's a standard git diff for the content of a file where I've removed a line and also added a couple of new lines -- one empty, the other with the text "Here's a new line."
This is a test file to see how word diff responds in certain situations.
-
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
+
+Here's a new line.
But here's git diff --word-diff for the same content:
This is a test file to see how word diff responds in certain situations.
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
{+Here's a new line.+}
The removed and added empty lines are completely ignored.
Desired results
Putting the two examples above together. Here's what I'd like to see:
# Test file{+ in Markdown+}
This is a test file to see how word diff responds in certain situations.
{--}
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
{++}
{+Here's a new line.+}
Things I've tried:
git diff --word-diff-regex='.' seems too granular for when new words share characters with existing words
git diff --word-diff-regex='[^ ]+|[ ]' seems to solve the first problem but, to be honest, I'm not actually sure why.
git diff --word-diff-regex='[^ ]+|[ ]|^$' -- I was hoping the ^$ on the end would help capture empty lines -- but it doesn't and, worse, it then seems to ignore the change that follows.
git diff --word-diff-regex='[^ ]+|[ ]|.{0}' creates same problem as the one before.
I'd be grateful if anyone could shed any light on how to do this, or at least share some knowledge on what's going on under the hood with --word-diff-regex.
The main thing that you're running into that's stopping you from having what you want, from https://git-scm.com/docs/diff-options, is:
A match that contains a newline is silently truncated(!) at the newline.
This is going to mean that word diffs are always going to ignore line diffs. I don't think you're going to achieve the results you want short of a custom diff generator.

How to search files in windows file explorer with specified extension name?

We can search files in windows 7 or higher version using the following tool:
(I don't have image uploading privilage. I mean the top-right area in windows file explorer.)
When I search for MATLAB files using "*.m", it not only returns *.m files, but also returns *.mp3, *.mp4 files. Is there any way to show *.m files exclusively?
Thanks!
I assume you used the quotation marks here to show the text you typed, because ironically the exact way how it should work is to put the search in quotation marks...
so
*.m
finds .mp3 as well as .m but
"*.m"
should only find the .m files. Alternatively you could also write
ext:".m"
which would guarantee that only extensions are searched. (Although I am not sure if this is ever necessary here, because while windows can have a dot in the filename and also can have files without extensions I am not sure if it is possible to have both at the same time.)
using the following
"*.m"
will solve your problem.You can find more information on regex to be used in msdn in the following link .Advanced query syntax
Above that, you can also take advantage of the wildcard character *.
For example, if you want to search for a file with a name ending with 024 or starting with 024 then you can put in the search box like *024.* or 024*.* respectively.
Here the * after . represents files with any extensions, if you want particular then mention extension line 024.png.
Explorer don't have a function of finding with RegEx.
You need to use Power-Shell instead of Win Explorer;
for example: where '(?i)Out' is a regex
Get-ChildItem -Path e:\temp -Recurse -File | Where-Object { $_.Name -match '(?i)Out' }
alternatively you can just simply search for your extension like this:
.extension
eg:
typing .exe will give you all the files with .exe extensions in a folder.
PS: Typing .xml OR .vmcx will give you both type of files. It is useful if you seek to make an archive of different kinds of files stored in different folders or locations.
You can get close to proper regex support from the mostly awesome Cygwin, and as a bonus you get most every linux tool running natively on linux. But it still doesnn't know that .* means "zero or more of anything", ^ means the start of a line (and $ the end), so some things are still weird.
And a startlingly large bunch of weird corner cases that only deranged perl programmers notice fail the test.
So many other things it gets wrong, but it's more workable than anything in any windows OS, plus you get perl, grep, diff, wget, curl, etc. -- the whole GNU lib for free.
If you want a full on bash shell with proper respect for regex, install the super neet-o Bash for Windows 10
Either will do what you want. And they're a billion times faster than that stupid search bar that takes off at 100 mph then crawls to 1 pixel per 10 minutes near the end.

Ignore multiline comments git diff

I'm trying to find the significant differences in C/C++ source code in which only source code changes. I know you can use the git diff -G<regex> but it seems very limiting in the kind of regexes that can be run. For example, it doesn't seem to offer a way to ignore multiline comments in C/C++.
Is there any way in git or preferably libgit2 to ignore comments (including multiline), whitespaces, etc. before a diff is run? Or a way of determining if a line from the diff output is a comment or not?
git diff -w to ignore whitespace differences.
You cannot ignore multiline comments because git is a versioning tool, not a language dependent interpreter. It doesn't know your code is C++. It does not parse files for semantics, so it cannot interpret what is comment and what isn't. In particular, it relies on diff (or a configured difftool) to compare text files and it expects a line-by-line comparison.
I agree with #andrew-c that what you are really asking is to compare the two pieces of code without comments. More specifically helpful, you are asking to compare the lines of code where all multiline comments have been turned into empty lines. You keep the blank lines there so you have the correct line numbers to reference on a normal copy.
So you could manually convert the two code states to blank out multiline comments... or you might look at building your own diff wrapper that did the stripping for you. But the latter is not likely to be worth the effort.
You can achieve this using git attributes and diff filters as described in Viewing git filters output when using meld as a diff tool to call a sed script, which however is pretty complex on its own if you want it to handle all cases like comment delimiters inside string literals etc.

git smart line and word diff

I'd like to git diff and combine the regular line-by-line diff with git diff --word-diff. The problem with line-by-line diffs is that they're unnecessary if I change one or two words and leave the line mostly intact--the chunking is too coarse. On the other hand if I change entire lines and use --word-diff, sometimes the diff algorithm will get confused and spit out incredibly confusing diffs, with lots of words inserted and deleted to "morph" one line into another.
Is there a way to specify that git should be smart about this and only --word-diff if it actually makes sense to do so (on a line-by-line basis, of course)?
The smartest thing I have found for git diff --word-diff or git diff --color-words are the predefined patterns that come with git (as used in --word-diff-regex or diff.wordregex). They might not be perfect, but give quite good results AFAICT.
A list of predefined diff drivers (they all have predefined word regexes too) is given in the docs for .gitattributes. It is further stated that
you still need to enable this with the attribute mechanism, via .gitattributes
So to activate the python pattern for all *.py files, you could issue the following command in your repo root:
echo "*.py diff=python" >> .gitattributes
If you are interested in what the different preset patterns actually look like, take a look at git's source code

Hints and tools for finding unmatched braces / preprocessor directives

This is one of my most dreaded C/C++ compiler errors:
file.cpp(3124) : fatal error C1004: unexpected end-of-file found
file.cpp includes almost a hundred header files, which in turn include other header files. It's over 3000 lines. The code should be modularized and structured, the source files smaller. We should refactor it. As a programmer there's always a wish list for improving things.
But right now, the code is a mess and the deadline is around the corner. Somewhere among all these lines—quite possibly in one of the included header files, not in the source file itself—there's apparently an unmatched brace, unmatched #ifdef or similar.
The problem is that when something is missing, the compiler can't really tell me where it is missing. It just knows that when it reached end of the file it wasn't in the right parser state.
Can you offer some tools or other hints / methodologies to help me find the cause for the error?
If the #includes are all in one place in the source file, you could try putting a stray closing brace in between the #includes. If you get an 'unmatched closing brace' error when you compile, you know it all balances up to that point. It's a slow method, but it might help you pinpoint the problem.
One approach: if you have Notepad++, open all of the files (no problem opening 100 files), search for { and } (Search -> Find -> Find in All Open Documents), note the difference in count (should be 1). Randomly close 10 files and see if the difference in count is still 1, if so continue, else the problem is in one of those files. Repeat.
Handy tip:
For each header file, auto-generate a source file which includes it, then optionally contains an empty main method, and does nothing else. Compile all of these files as test cases, although there's no point running them.
Provided that each header includes its own dependencies (which is a big "provided"), this should give you a better idea which header is causing the problem.
This tip is adapted from Google's published C++ style guide, which says that each component's source files should include the interface header for that component before any other header. This has the same effect, of ensuring that there is at least one source file which will fail to compile, and implicate that header, if there's anything wrong with it.
Of course it won't catch unmatched braces in macros, so if you use much in the way of macros, you may have to resort to running the preprocessor over your source file, and examining the result by hand and IDE.
Another handy tip, which is too late now:
Check in more often (on a private branch to keep unfinished code out of everyone else's way). That way, when things get nasty you can diff against the last thing that compiled, and focus on the changed lines.
Hints:
make small changes and recompile after each small change
for unmatched braces, use an editor/IDE that supports brace match hilighting
if all else failds, Ye Olde method of commenting out blocks of code using a binary chop approach works for me
Very late to the party, but on linux you can use fgrep -o
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
So if you fgrep -o { then you'll get a list of all the opening braces in your file.
You can then pipe that to wc -l and that will give you the number of opening braces in your file.
With a bit of bash arithmetic, you can do the same with closing braces, and then print the difference.
In the below example I'm using git status -s to get the short-format output of all modified files from my git repo, and then I'm iterating over them to find which files may have mismatched braces
for i in $(git status -s | awk '{print $2}'); do
open=$(fgrep -o { $i | wc -l); # count number of opening braces
close=$(fgrep -o } $i | wc -l); # count number of closing braces
diff=$((open-close)); # find difference
echo "$diff : $i"; # print difference for filename
done
I think using some editor with brace highlighting will help. There should also be some tools around that do automatic indention on code.
EDIT: Does this vim script help? It seems to do #ifdef highlighting.
Can you create a script that actually does all the including, and then write the whole stuff to a temporary file? Or try the compiler for helping you with that? Then you can use the bracket highlighting features of various editors to find the problem and you'll probably be able to identify the file. (An extra help can be to have your script add a comment around every included file).
This might not be relevant to your problem, but I was getting an "Unexpected #else" error when framing some header files in an #if/#else/#endif block.
I found that if I set the problem modules to not use pre-compiled headers, the problem went away. Something to do with the "#pragma hdrstop" should not be within an #if/#endif.
Have a look at this question (Highlighting unmatched brackets in vim)
Pre-compile your code first, this will create a big chunk with the include files stuffed into the same file. Then you can use those brace-matching scripts.
I recently ran into this situation while refactoring some code and I'm not a fan of any of the answers above. The problem is that they neglect to incorporate a pretty basic assumption:
Any given file (most likely) will have matching braces and if/endif macros.
While it is true one can have a C++ source file that opens a bracket or an if block and includes another module that closes it, I have never seen this in practice. Something like the following:
foo.cpp
namespace Blah{
#include "bar.h"
bar.h:
}; /// namespace Blah
So with the assumption that any given file / module contains matching sets of braces and preprocessor directives, this problem is now trivial to solve. We just need a script to read the files / count the brackets & directives that it finds and report mismatches.
I've implemented one in Ruby here, feel free to take and adapt to your needs:
https://gist.github.com/movitto/6c6d187f7a350c2d71480834892552ab
I just spent an hour with a problem like this. It was very hard to spot, but in the end, I had typed a double # on one line:
##ifdef FEATURE_X
...
That broke the world.
Easy one to search for though.
The simplest and fastest solution is to comment file from the end by consistent blocks. If the error still exists, then you not yet comment the open brace.