Ignore multiline comments git diff - c++

I'm trying to find the significant differences in C/C++ source code in which only source code changes. I know you can use the git diff -G<regex> but it seems very limiting in the kind of regexes that can be run. For example, it doesn't seem to offer a way to ignore multiline comments in C/C++.
Is there any way in git or preferably libgit2 to ignore comments (including multiline), whitespaces, etc. before a diff is run? Or a way of determining if a line from the diff output is a comment or not?

git diff -w to ignore whitespace differences.
You cannot ignore multiline comments because git is a versioning tool, not a language dependent interpreter. It doesn't know your code is C++. It does not parse files for semantics, so it cannot interpret what is comment and what isn't. In particular, it relies on diff (or a configured difftool) to compare text files and it expects a line-by-line comparison.
I agree with #andrew-c that what you are really asking is to compare the two pieces of code without comments. More specifically helpful, you are asking to compare the lines of code where all multiline comments have been turned into empty lines. You keep the blank lines there so you have the correct line numbers to reference on a normal copy.
So you could manually convert the two code states to blank out multiline comments... or you might look at building your own diff wrapper that did the stripping for you. But the latter is not likely to be worth the effort.

You can achieve this using git attributes and diff filters as described in Viewing git filters output when using meld as a diff tool to call a sed script, which however is pretty complex on its own if you want it to handle all cases like comment delimiters inside string literals etc.

Related

Trying to get git diff to ignore comments using regex, doesn't seem to be working

My goal is to get git diff to ignore C comments. I've been using a basic regex, and printing the diff to another file (it doesn't print anything otherwise). I've also tried the reverse, to get the diff to only show the comments (I'll later see if I can reverse engineer it). However, it doesn't behave as it's supposed to. Here's a few examples of what I've tried:
Trying to get the diff to show only the lines that begin with /*:
git diff -w -G'(^(/\**)' master > text.diff
Getting the diff to show lines that start with either * or / or end with the same:
git diff -w -G'(^[/\*])|($[^/\*])' master > text.diff
Getting the diff to show only non-comment lines (see How to make 'git diff' ignore comments):
git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])' master > text.diff
I'm running it under WSL and using git version 2.17.1 for reference.
My goal is to get git diff to ignore C comments ...
This is difficult, because:
Git doesn't understand C code;
parsing C comments requires lexical analysis across multiple lines;
git diff breaks up the input into lines too early.
Your best bet is therefore not to do this directly with git diff at all. Instead:
Extract the file(s) to be compared from wherever they live (two commits, one commit and a regular file, one commit and an index copy of a file, etc).
Use a C-comment-stripper that does understand how to analyze C source and detect (and remove1) the comments.
Run the output of step 2 through some diff engine (regular diff, git diff, whatever you like).
If you wrap all of this up as a tool that git difftool can run, you'll get something serviceable and convenient. It will require generating lots of temporary files.
(Note that your attempt to use -G here is ultimately doomed. The -G expression will look for a comment within the changed lines, rather than whether the changed line is or is not in the middle of a long comment. Languages that have only comment-to-end-of-line, such as sh/bash, are more tractable than C. Backslash-newline sequences will still foil things though. See also Erik Aronesty's answer to the linked question.)
1Remember that in ANSI C, comments always separate tokens, so for ANSI C, replace comments with white-space, but in many traditional K&R compilers, comments simply vanish. This technique is used in place of the new-in-1989 token-pasting operator in some very old C code. You might want to support this mode by making step 2 have an option to leave out the white-space.

How to set up clang-format comment pragmas so multiline doxygen comments don't get touched?

I am trying to introduce clang-format to a couple of our projects at work (C and C++), but I am having trouble getting it to format multi-line Doxygen comments the way I want.
All comments have the same format:
/*! #brief Some text
*
* Some more text
*
* #verbatim
*
* A very long line of text that exceeds the clang-format column width but should not be touched
*
* #endverbatim
*/
I want clang-format to leave the verbatim blocks alone and not reflow them. I am using clang-format-6.0
Turning ReflowComments off is not an option as non-doxygen comments must be taken care of by clang-format
I have tried various regular expressions in the CommentPragmas config item but to no avail:
#verbatim(.*\n)*.*#endverbatim to treat the entire verbatim block as a comment pragma. This is the ideal situation, as any other part of the Doxygen comment I do not mind being broken into multiple lines.
#brief(.*\n)+ to match the entire comment block as the pragma. I've also tried this with an arbitrary token at the end of the comment to act as an explicit end-of-block marker. This isn't ideal as it doesn't force the non-verbatim part of the comment to conform, but is a compromise I'm willing to live with if I have to.
Various other regexes I've seen in other discussions, adapted to fit our Doxygen markup.
All I've managed to get it to do so far is to leave the first line of the multi-line comment alone, if it happens to exceed the column limit. However, any following long line is still broken up.
The only other tool I have left in my box is to use // clang-format off and // clang-format on around these comments but again I'd like to avoid it if I can because:
a) it'll be quite tedious to add them throughout the code base
b) I'll have to surround the entire comments with these, rather than just the verbatim blocks (I haven't figured out if you can disable it just for a portion of a multi-line comment - I've only managed to get it working for an entire one, and even if that was possible the clang-format directives would end up in the generated Doxygen docs which is unacceptable)
c) I don't really like the way it looks in code.
Any help is appreciated. Thanks.
Ran into this issue also, and the only work around found was to use clang-format on/off.
clang-format re flowing comments tends to:
break #page, #section, etc titles, and links generated from them (in rare cases).
break #startuml blocks, which have a specific syntax.
break #verbatim blocks.
See an example of usage in MySQL:
https://github.com/mysql/mysql-server/blob/8.0/storage/perfschema/pfs.cc
Update:
Filed a feature request on clang-format itself:
https://bugs.llvm.org/show_bug.cgi?id=44486

git smart line and word diff

I'd like to git diff and combine the regular line-by-line diff with git diff --word-diff. The problem with line-by-line diffs is that they're unnecessary if I change one or two words and leave the line mostly intact--the chunking is too coarse. On the other hand if I change entire lines and use --word-diff, sometimes the diff algorithm will get confused and spit out incredibly confusing diffs, with lots of words inserted and deleted to "morph" one line into another.
Is there a way to specify that git should be smart about this and only --word-diff if it actually makes sense to do so (on a line-by-line basis, of course)?
The smartest thing I have found for git diff --word-diff or git diff --color-words are the predefined patterns that come with git (as used in --word-diff-regex or diff.wordregex). They might not be perfect, but give quite good results AFAICT.
A list of predefined diff drivers (they all have predefined word regexes too) is given in the docs for .gitattributes. It is further stated that
you still need to enable this with the attribute mechanism, via .gitattributes
So to activate the python pattern for all *.py files, you could issue the following command in your repo root:
echo "*.py diff=python" >> .gitattributes
If you are interested in what the different preset patterns actually look like, take a look at git's source code

Global substitution for latex commands in vim

I am writing a long document and I am frequently formatting some terms to italics. After some time I realized that maybe that is now what I want so I would like to remove all the latex commands that format text to italics.
Example:
\textit{Vim} is undoubtedly one of the best editors ever made. \textit{LaTeX} is an extremely powerful, intelligent typesetter. \textbd{Vim-LaTeX} aims at bringing together the best of both these worlds
How can I run a substitution command that recognizes all the instances of \textit{whatever} and changes them to just whatever without affecting different commands such as \textbd{Vim-LaTeX} in this example?
EDIT: As technically the answer that helps is the one from Igor I will mark that one as the correct one. Nevertheless, Konrad's answer should be taken into account as it shows the proper Latex strategy to follow.
You shouldn’t use formatting commands at all in your text.
LaTeX is built around the idea of semantic markup. So instead of saying “this text should be italic” you should mark up the text using its function. For instance:
\product{Vim} is undoubtedly one of the best editors ever made. \product{LaTeX}
is an extremely powerful, intelligent typesetter. \product{Vim-LaTeX} aims at
bringing together the best of both these worlds
… and then, in your preamble, a package, or a document class, you (re-)define a macro \product to set the formatting you want. That way, you can adapt the macro whenever you deem necessary without having to change the code.
Or, if you want to remove the formatting completely, just make the macro display its bare argument:
\newcommand*\product[1]{#1}
Use this substitution command:
% s/\\textit{\([^}]*\)}/\1/
If textit can span muptiple lines:
%! perl -e 'local $/; $_=<>; s/\\textit{([^}]*)}/$1/g; print;'
And you can do this without perl also:
%s/\\textit{\(\_.\{-}\)}/\1/g
Here:
\_. -- any symbol including a newline character
\{-} -- make * non-greedy.

Doing a 'diff/st' and ignoring the first line if it matches a specific criterion

In a repository for a well known open source project, all files contain a version string with a timestamp as their first line:
<?php // $Id: index.php,v 1.201.2.10 2009-04-25 21:18:24 stronk7 Exp $
Even if I don't really understand why they do this - since the files are already under version control -, I have to live with this.
The main problem is that if I try to 'st' or 'diff' a release to get an idea of what was changed from the previous one, every single file contained in the repository is obviously marked as modified and the diffs become unreadable and unmanageable.
I'm wondering if there's a way to ignoring the first lines doing a diff/st when they match a regexp.
The project is under cvs - cvs, yes, you've read correctly - and included in a bigger mercurial repository.
I don't know about cvs, but with hg you can use any external diff tool with the bundled extdiff extension, and any modern tool should have the ability to let you ignore diffs that match certain patterns.
I swear by Beyond Compare, which allows arbitrary syntax definition.
kdiff3 has preprocessor commands that you can pipe the input through.
If you try
man diff
you'll find
--ignore-matching-lines=RE Ignore changes whose lines all match RE.
search "ignore matching lines" on the web gives examples :
diff --unified --recursive --new-file
--ignore-matching-lines='[$]Author.[$]'
--ignore-matching-lines='[$]Date.[$]' ...
(http://www.cygwin.com/ml/cygwin-apps/2005-01/msg00000.html)
Thus try :
diff --ignore-matching-lines='[<][?]php [/][/] [$]Id:'