Git diff: ignore lines starting with a word - regex

As I have learned here, we can tell git diff to ignore lines starting with a * using:
git diff -G '^[[:space:]]*[^[:space:]*]'
How do I tell git to ignore lines starting with a word, or more (for example: * Generated at), not just a character?
This file shall be ignored, it contains only trivial changes:
- * Generated at 2018-11-21
+ * Generated at 2018-11-23
This file shall NOT be ignored, it contains NOT only trivial changes:
- * Generated at 2018-11-21
+ * Generated at 2018-11-23
+ * This line is important! Although it starts with a *

Git is using POSIX regular expressions which seem not to support lookarounds. That is the reason why #Myys 3's approach does not work. A not so elegant workaround could be something like this:
git diff -G '^\s*([^\s*]|\*\s*[^\sG]|\*\sG[^e]|\*\sGe[^n]|\*\sGen[^e]|\*\sGene[^r]|\*\sGener[^a]|\*\sGenera[^t]|\*\sGenerat[^e]|\*\sGenerate[^d]).*'
This will filter out all changes starting with "* Generated".
Test: https://regex101.com/r/kdv4V0/3

Considering you are ignoring changes that does NOT match your regex, you just have to put the words you want inside the expression within a lookahead capture group, like this:
git diff -G '^(?=.*Generated at)[[:space:]]*[^[:space:]*]'
Note that if you want to keep adding words to ignore, just keep adding these groups (don't forget the .*):
However, if the string contains a "Generated at" anywhere in their whole, it shall be ignored. If you want to define exactly how it should start, then replace the . with a [^[:word:]].
git diff -G '^(?=[^[:word:]]*Generated at)[[:space:]]*[^[:space:]*]'
You can have a look at it's behaviour at
Version 1: .*
https://regex101.com/r/kdv4V0/1
Version 2: [^[:word:]]*
https://regex101.com/r/kdv4V0/2

TL;DR: git diff -G is not able to exclude changes only to include changes that match the regex.
Have a look at git diff: ignore deletion or insertion of certain regex
There torek explains how git log and git diff work and how the parameter -G works.

Related

Trying to get git diff to ignore comments using regex, doesn't seem to be working

My goal is to get git diff to ignore C comments. I've been using a basic regex, and printing the diff to another file (it doesn't print anything otherwise). I've also tried the reverse, to get the diff to only show the comments (I'll later see if I can reverse engineer it). However, it doesn't behave as it's supposed to. Here's a few examples of what I've tried:
Trying to get the diff to show only the lines that begin with /*:
git diff -w -G'(^(/\**)' master > text.diff
Getting the diff to show lines that start with either * or / or end with the same:
git diff -w -G'(^[/\*])|($[^/\*])' master > text.diff
Getting the diff to show only non-comment lines (see How to make 'git diff' ignore comments):
git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])' master > text.diff
I'm running it under WSL and using git version 2.17.1 for reference.
My goal is to get git diff to ignore C comments ...
This is difficult, because:
Git doesn't understand C code;
parsing C comments requires lexical analysis across multiple lines;
git diff breaks up the input into lines too early.
Your best bet is therefore not to do this directly with git diff at all. Instead:
Extract the file(s) to be compared from wherever they live (two commits, one commit and a regular file, one commit and an index copy of a file, etc).
Use a C-comment-stripper that does understand how to analyze C source and detect (and remove1) the comments.
Run the output of step 2 through some diff engine (regular diff, git diff, whatever you like).
If you wrap all of this up as a tool that git difftool can run, you'll get something serviceable and convenient. It will require generating lots of temporary files.
(Note that your attempt to use -G here is ultimately doomed. The -G expression will look for a comment within the changed lines, rather than whether the changed line is or is not in the middle of a long comment. Languages that have only comment-to-end-of-line, such as sh/bash, are more tractable than C. Backslash-newline sequences will still foil things though. See also Erik Aronesty's answer to the linked question.)
1Remember that in ANSI C, comments always separate tokens, so for ANSI C, replace comments with white-space, but in many traditional K&R compilers, comments simply vanish. This technique is used in place of the new-in-1989 token-pasting operator in some very old C code. You might want to support this mode by making step 2 have an option to leave out the white-space.

git word diff regex strange behaviour

I'm using Git to version prose and have been trying git diff --word-diff to see changes within lines. I want to use the results generated in a script.
But the default way that --word-diff identifies a word seems flawed. So I've been experimenting with --word-diff-regex= options.
Problem
Here are the two main flaws I'm trying to deal with:
Added whitespace seems to be ignored. But whitespace can be quite important if trying to use the results programmatically.
For example, take this header from a Markdown (.md) file:
# Test file
Now, let's add some text to the end of it:
# Test file in Markdown
If I run git diff --word-diff on this:
# Test file {+in Markdown+}
But the space before the word "in" has not been included as part of the diff.
Empty lines are completely ignored.
Here's a standard git diff for the content of a file where I've removed a line and also added a couple of new lines -- one empty, the other with the text "Here's a new line."
This is a test file to see how word diff responds in certain situations.
-
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
+
+Here's a new line.
But here's git diff --word-diff for the same content:
This is a test file to see how word diff responds in certain situations.
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
{+Here's a new line.+}
The removed and added empty lines are completely ignored.
Desired results
Putting the two examples above together. Here's what I'd like to see:
# Test file{+ in Markdown+}
This is a test file to see how word diff responds in certain situations.
{--}
I'll try removing lines and adding them to see what happens.
Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
{++}
{+Here's a new line.+}
Things I've tried:
git diff --word-diff-regex='.' seems too granular for when new words share characters with existing words
git diff --word-diff-regex='[^ ]+|[ ]' seems to solve the first problem but, to be honest, I'm not actually sure why.
git diff --word-diff-regex='[^ ]+|[ ]|^$' -- I was hoping the ^$ on the end would help capture empty lines -- but it doesn't and, worse, it then seems to ignore the change that follows.
git diff --word-diff-regex='[^ ]+|[ ]|.{0}' creates same problem as the one before.
I'd be grateful if anyone could shed any light on how to do this, or at least share some knowledge on what's going on under the hood with --word-diff-regex.
The main thing that you're running into that's stopping you from having what you want, from https://git-scm.com/docs/diff-options, is:
A match that contains a newline is silently truncated(!) at the newline.
This is going to mean that word diffs are always going to ignore line diffs. I don't think you're going to achieve the results you want short of a custom diff generator.

how to search git log without a certain commit

Assume I have a branch master with 3 commits, the comments are t123, b1 and b12 separately.
* b90b03f (HEAD -> master) b12
* 27f7577 b1
* 7268b40 t123
And now, I want to use git log --grep <regex> to search the log log without t123.
The result I want is
* b90b03f b12
* 27f7577 b1
So how do I use regex to meet the requirement?
It sounds like what you want is to include all commits whose commit messages do not match some pattern, but --grep includes commits that do match some pattern. But the answer to "how do I write a regexp that matches everything except some pattern" is: You don't.1
You don't need to, because you can use something else (or more precisely, something additional) to exclude the commit with the string "t123" in it. Specifically, if you look at the documentation for git log, you will find that it not only has its --grep=<pattern> option, but also an --invert-grep option:
--invert-grep
Limit the commits output to ones with log message that do not match
the pattern specified with --grep=<pattern>.
That is, instead of inventing some sort of inverse regular expression, you simply tell the command to invert the result from searching for the regular expression. Since your regexp is just a fixed string with no meta-characters in it:
git log --grep t123 --invert-grep
will do the job. (The = between --grep and the <pattern> part is optional for --grep.)
1It is, in some sense, not impossible; it's just way too difficult, inefficient, and most of all, unnecessary.

ignoring changes matching a string in git diff

I've made a single simple change to a large number of files that are version controlled in git and I'd like to be able to check that no other changes are slipping into this large commit.
The changes are all of the form
- "main()",
+ OOMPH_CURRENT_FUNCTION,
where "main()" could be the name of any function. I want to generate a diff of all changes that are not of this form.
The -G and -S options to git diff are tantalisingly close--they find changes that DO match a string or regexp.
Is there a good way to do this?
Attempts so far
Another question describes how regexs can be negated, using this approach I think the command should be
git diff -G '^((?!OOMPH_CURRENT_FUNCTION).)*$'
but this just returns the error message
fatal: invalid log-grep regex: Invalid preceding regular expression
so I guess git doesn't support this regex feature.
I also noticed that the standard unix diff has the -I option to "ignore changes whose lines all match RE". But I can't find the correct way to replace git's own diff with the unix diff tool.
Try the following:
$ git diff > full_diff.txt
$ git diff -G "your pattern" > matching_diff.txt
You can then compare the two like so:
$ diff matching_diff.txt full_diff.txt
If all changes match the pattern, full_diff.txt and matching_diff.txt will be identical, and the last diff command will not return anything.
If there are changes that do not match the pattern, the last diff will highlight those.
You can combine all of the above steps and avoid having to create two extra files like so:
diff <(git diff -G "your pattern") <(git diff) # works with other diff tools too
No more grep needed!
With Git 2.30 (Q1 2021), "git diff"(man) family of commands learned the "-I<regex>" option to ignore hunks whose changed lines all match the given pattern.
See commit 296d4a9, commit ec7967c (20 Oct 2020) by Michał Kępień (kempniu).
(Merged by Junio C Hamano -- gitster -- in commit 1ae0949, 02 Nov 2020)
diff: add -I<regex> that ignores matching changes
Signed-off-by: Michał Kępień
Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression.
This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement).
The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis.
Use the 'ignore' field of xdchange_t for marking a change as ignored or not.
Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I.
These two options can also be used together in the same git invocation (they are complementary to each other).
Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent".
diff-options now includes in its man page:
-I<regex>
--ignore-matching-lines=<regex>
Ignore changes whose all lines match <regex>.
This option may be specified more than once.
Examples:
git diff --ignore-blank-lines -I"ten.*e" -I"^[124-9]"
A small memleak in "diff -I<regexp>" has been corrected with Git 2.31 (Q1 2021).
See commit c45dc9c, commit e900d49 (11 Feb 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 45df6c4, 22 Feb 2021)
diff: plug memory leak from regcomp() on {log,diff} -I
Signed-off-by: Ævar Arnfjörð Bjarmason
Fix a memory leak in 296d4a9 ("diff: add -I that ignores matching changes", 2020-10-20, Git v2.30.0-rc0 -- merge listed in batch #3) by freeing the memory it allocates in the newly introduced diff_free().
This memory leak was intentionally introduced in 296d4a9, see the discussion on a previous iteration of it.
At that time freeing the memory was somewhat tedious, but since it isn't anymore with the newly introduced diff_free() let's use it.
Let's retain the pattern for diff_free_file() and add a diff_free_ignore_regex(), even though (unlike "diff_free_file") we don't need to call it elsewhere.
I think this will make for more readable code than gradually accumulating a giant diff_free() function, sharing "int i" across unrelated code etc.
Use git difftool to run a real diff.
Example: https://github.com/cben/kubernetes-discovery-samples/commit/b1e946434e73d8d1650c887f7d49b46dcbd835a6
I've created a script running diff the way I want to (here I'm keeping curl --verbose outputs in the repo, resulting in boring changes each time I rerun the curl):
#!/bin/bash
diff --recursive --unified=1 --color \
--ignore-matching-lines=serverAddress \
--ignore-matching-lines='^\* subject:' \
--ignore-matching-lines='^\* start date:' \
--ignore-matching-lines='^\* expire date:' \
--ignore-matching-lines='^\* issuer:' \
--ignore-matching-lines='^< Date:' \
--ignore-matching-lines='^< Content-Length:' \
--ignore-matching-lines='--:--:--' \
--ignore-matching-lines='{ \[[0-9]* bytes data\]' \
"$#"
And now I can run git difftool --dir-diff --extcmd=path/to/above/script.sh and see only interesting changes.
An important caveat about GNU diff -I aka --ignore-matching-lines: this merely prevents such lines from making a chunk "intersting" but when these changes appear in same chunk with other non-ignored changes, it will still show them. I used --unified=1 above to reduce this effect by making chunks smaller (only 1 context line above and below each change).
I think that I have a different solution using pipes and grep. I had two files that needed to be checked for differences that didn't include ## and g:, so I did this (borrowing from here and here and here:
$ git diff -U0 --color-words --no-index file1.tex file2.tex | grep -v -e "##" -e "g:"
and that seemed to do the trick. Colors still were there.
So I assume you could take a simpler git diff command/output and do the same thing. What I like about this is that it doesn't require making new files or redirection (other than a pipe).

Doing a 'diff/st' and ignoring the first line if it matches a specific criterion

In a repository for a well known open source project, all files contain a version string with a timestamp as their first line:
<?php // $Id: index.php,v 1.201.2.10 2009-04-25 21:18:24 stronk7 Exp $
Even if I don't really understand why they do this - since the files are already under version control -, I have to live with this.
The main problem is that if I try to 'st' or 'diff' a release to get an idea of what was changed from the previous one, every single file contained in the repository is obviously marked as modified and the diffs become unreadable and unmanageable.
I'm wondering if there's a way to ignoring the first lines doing a diff/st when they match a regexp.
The project is under cvs - cvs, yes, you've read correctly - and included in a bigger mercurial repository.
I don't know about cvs, but with hg you can use any external diff tool with the bundled extdiff extension, and any modern tool should have the ability to let you ignore diffs that match certain patterns.
I swear by Beyond Compare, which allows arbitrary syntax definition.
kdiff3 has preprocessor commands that you can pipe the input through.
If you try
man diff
you'll find
--ignore-matching-lines=RE Ignore changes whose lines all match RE.
search "ignore matching lines" on the web gives examples :
diff --unified --recursive --new-file
--ignore-matching-lines='[$]Author.[$]'
--ignore-matching-lines='[$]Date.[$]' ...
(http://www.cygwin.com/ml/cygwin-apps/2005-01/msg00000.html)
Thus try :
diff --ignore-matching-lines='[<][?]php [/][/] [$]Id:'