Drop commits by commit message in `git rebase` - regex

I would like to do a git rebase and drop commits whose commit messages match a certain regular expression. For example, it might work like
git rebase --drop="deletme|temporary" master
And this would do a rebase over master while dropping all commits containing the string deleteme or temporary.
Is is possible to do this with the standard Git tool? If not, is it possible with a third-party Git tool? In particular, I want it to be a single, noninteractive command.

This can be accomplished using the same method as I used in this answer.
First, we need to find the relevant commits. You can do that with something like:
git log --format=format:"%H %s" master..HEAD | grep -E "deleteme|temporary"
This will give you a list of commits with commit messages containing deleteme or temporary that are between master and your current branch. These are the commits that need to be dropped.
Save this bash script somewhere you can access it:
#!/bin/bash
for sha in $(git log --format=format:"%H %s" master..HEAD | grep -E "deleteme|temporary" | cut -d " " -f 1)
do
sha=${sha:0:7}
sed -i "s/pick $sha/drop $sha/" $#
done
Then run the rebase as:
GIT_SEQUENCE_EDITOR=/path/to/script.sh git rebase -i
This will automatically drop all commits that contain deleteme or temporary in their commit message.
As mentioned in my other answer:
[This script won't allow] you to customize what command is run to calculate which commits to use, but if this is an issue, you could probably pass in an environment variable to allow such customization.
Obligatory warning: Since a rebase rewrites history, this can be dangerous / disruptive for anyone else working on this branch. Be sure you clearly communicate what you have done with anyone you are collaborating with.

You could e. g. use interactive rebase. So do git rebase -i <first commit that should not be touched>, and then in vim where you have the list of commits, you can do :%s/^[^ ]* \([^ ]* issue\)/d \1/g to use drop stanza for all commits whose commit message starts with issue. But be aware that git rebase is not working optimally with merge commits. By default they are skipped and the history flattened, but you can try to keep them with parameters.

#Scott Weldon's answer works great for this usecase, however If the regex checks from the start of the message for example with (^(deleteme)|^(temporary)), then this won't work, since the start of the grep is the commit hash. So in that case you can use this instead
#!/bin/bash
for sha in $(git log --format=format:"%s %H" master..HEAD | grep -E "^(deleteme)|^(temporary)" | awk '{print $NF}')
do
sha=${sha:0:7}
sed -i "s/pick $sha/drop $sha/" $#
done
The core difference is that %sand %H swapped places, and therefore we search the last part of the string instead of the first part of the string by piping to awk '{print $NF}')
Also worth noting that this is called the same way as in Scott Weldon's answer:
GIT_SEQUENCE_EDITOR=/path/to/script.sh git rebase -i master

Related

Effective way of detecting X11 vs Wayland, preferrably with CMake

So I've done some Google searching and this is something that has very little knowledge out there. What would be an effective and foolproof way of detecting whether X11 or Wayland is in use, preferrably at compile-time and with CMake? I need to apply this to a C++ project of mine.
The accepted answer is very inaccurate and dangerous. It just runs loginctl to dump a large list of user-sessions and greps every line with a username or other string that matches the current user's name, which can lead to false positives and multiple matching lines. Calling whoami is also wasteful. So it's harmful, and inaccurate.
Here's a much better way to get the user's session details, by querying your exact username's details and grabbing their 1st session scope's id.
This is a Bash/ZSH-compatible one-liner solution:
if [ "$(loginctl show-session $(loginctl user-status $USER | grep -E -m 1 'session-[0-9]+\.scope' | sed -E 's/^.*?session-([0-9]+)\.scope.*$/\1/') -p Type | grep -ic "wayland")" -ge 1 ]; then
echo "Wayland!"
else
echo "X11"
fi
I really wish that loginctl had a "list all sessions just for a specific user", but it doesn't, so we have to resort to these tricks. At least my trick is a LOT more robust and should always work!
I assume you want to evaluate the display server during compile time, when calling CMake, instead of for every compilation. That's how CMake works and hot it should be used. One downside is, that you have to re-run CMake for every changed display server.
There is currently no default way to detect the running display server. Similar, there is no default code snippet to evaluate the display server by CMake. Just pick one way of detecting the display server that manually works for you or your environment respectively.
Call this code from CMake and store the result in a variable and use it for your C++ code.
For example loginctl show-session $(loginctl | grep $(whoami) | awk '{print $1}') -p Type works for me. The resulting CMake check is
execute_process(
"loginctl show-session $(loginctl | grep $(whoami) | awk '{print $1}') -p Type"
OUTPUT_VARIABLE result_display_server)
if ("${resulting_display_server}" EQUALS "Type=x11")
set(display_server_x11 TRUE)
else()
set(display_server_x11 FALSE)
endif()
Probably you have to fiddle around with the condition and check for Type=wayland or similar to get it properly working in your environment.
You can use display_server_x11 and write it into a config.h file to use it within C++ code.

Using regex in git shell when checking out multiple branches

If I have the following git shell command:
for branch in `git branch -a | grep remotes | grep -v master | sed 's/^.*old\/\(.*\)/\1/g'`; do git branch --track $branch remotes/old/$branch; done
This checks out every single remote branch that exists on the old remote and tracks them using the same name that they have on that remote. However, what if I wanted to slightly change the name that the local branches are checked out has?
What if I have the following remote branches:
release/1.2.1.0
release/1.2.1.1
And I want to check them out under the same parent folder release but I only want the last 3 digits in the version number. So I want my local branches to be:
release/2.1.0
release/2.1.1
I have a simple javascript regex that matches the last 3 digits of the version string: (?:\d\.)(\d.*)
This uses a non-matching group to toss out the first digit followed by the period. The question is, how do I apply that regex to the $branch variable in the git shell bash script above?
First, avoid git branch for loops like this. The correct tool here is git for-each-ref, which is designed to work with scripting languages (git branch is aimed at users and the output format may change in the future, for instance).
To loop over all remote-tracking branches, simply tell for-each-ref to scan the remote-tracking branch namespace. Since you want, more specifically, the remote named old, you can do that very easily by adding /old as well:
git for-each-ref refs/remotes/old
The output here defaults to a triple of objectname objecttype refname. We only care about the refname part (and we can use the :short modifier to drop refs/remotes/ as well, if we like, although we still need to drop the old/ too so we could get away without the modifier). Thus we want to include --format=%(refname:short).
Moving on to bash, bash has built-in regular expression support. Its RE syntax is not quite the same, though, so your existing RE must change. Here is one that probably works for your needs:
bash$ x=1.2.3.4
bash$ [[ $x =~ ([0-9]\.)([0-9.]+) ]] && echo ${BASH_REMATCH[2]}
2.3.4
(There is a bit of subtlety here: using $x changes the way the =~ match applies, which in our case is probably good. As an old school Unix person I generally prefer using expr myself, but in this case I might resort to doing this in Python, which has Perl-style REs, and Javascript/ECMAscript REs are modeled on Perl's. But all that is more or less irrelevant. The most important is that this RE is slightly sub-par as a version number matcher. For instance, it matches strings like "1.3..6". We're safe in that these are invalid branch names—double dots are verboten since they would conflict with the set subtraction syntax in gitrevisions—but it's generally a bit sloppy; with some work we could come up with a tighter expression. It also fails to match revisions starting with two or more digits, but your original RE did as well, so I left that in on purpose.)
Reading in a loop in shell, using -r is generally wise (see Etan Reisner's comment), although in this case we could omit it safely since git controls branch names. I will use it in the example just for form's sake.
Putting these all together:
warn() {
echo "warning: $#" 1>&2
}
# Given an input name release/\d\.(\d|\.)+, make
# a local branch named release/\2 (more or less).
make_local_release_branch() {
local relnum newname
relnum=${1#old/release/}
[[ $relnum =~ ([0-9]\.)([0-9.]+) ]] || {
warn "remote-tracking branch $1 does not conform to name style, ignored"
return
}
newname=release/${BASH_REMATCH[2]}
git rev-parse -q --verify refs/heads/$newname >/dev/null && {
warn "branch $newname already exists, remote-tracking branch $1 ignored"
return
}
git branch --track $branch $1
}
git for-each-ref --format='%(refname:short)' refs/remotes/old |
while read -r rmtbranch; do
case $rmtbranch in
old/release/[0-9]*) make_local_release_branch "$rmtbranch";;
*) warn "skipping remote branch $rmtbranch -- not old/release/[digit]";;
esac
done
(this whole thing is entirely untested).

How to substitute words in Git history & properly debug related problems?

I'm trying to remove sensitive data like passwords from my Git history. Instead of deleting whole files I just want to substitute the passwords with removedSensitiveInfo. This is what I came up with after browsing through numerous StackOverflow topics and other sites.
git filter-branch --tree-filter "find . -type f -exec sed -Ei '' -e 's/(aSecretPassword1|aSecretPassword2|aSecretPassword3)/removedSensitiveInfo/g' {} \;"
When I run this command it seems to be rewriting the history (it shows the commits it's rewriting and takes a few minutes). However, when I check to see if all sensitive data has indeed been removed it turns out it's still there.
For reference this is how I do the check
git grep aSecretPassword1 $(git rev-list --all)
Which shows me all the hundreds of commits that match the search query. Nothing has been substituted.
Any idea what's going on here?
I double checked the regular expression I'm using which seems to be correct. I'm not sure what else to check for or how to properly debug this as my Git knowledge quite rudimentary. For example I don't know how to test whether 1) my regular expression isn't matching anything, 2) sed isn't being run on all files, 3) the file changes are not being saved, or 4) something else.
Any help is very much appreciated.
P.S.
I'm aware of several StackOverflow threads about this topic. However, I couldn't find one that is about substituting words (rather than deleting files) in all (ASCII) files (rather than specifying a specific file or file type). Not sure whether that should make a difference, but all suggested solutions haven't worked for me.
git-filter-branch is a powerful but difficult to use tool - there are several obscure things you need to know to use it correctly for your task, and each one is a possible cause for the problems you're seeing. So rather than immediately trying to debug them, let's take a step back and look at the original problem:
Substitute given strings (ie passwords) within all text files (without specifying a specific file/file-type)
Ensure that the updated Git history does not contain the old password text
Do the above as simply as possible
There is a tailor-made solution to this problem:
Use The BFG... not git-filter-branch
The BFG Repo-Cleaner is a simpler alternative to git-filter-branch specifically designed for removing passwords and other unwanted data from Git repository history.
Ways in which the BFG helps you in this situation:
The BFG is 10-720x faster
It automatically runs on all tags and references, unlike git-filter-branch - which only does that if you add the extraordinary --tag-name-filter cat -- --all command-line option (Note that the example command you gave in the Question DOES NOT have this, a possible cause of your problems)
The BFG doesn't generate any refs/original/ refs - so no need for you to perform an extra step to remove them
You can express you passwords as simple literal strings, without having to worry about getting regex-escaping right. The BFG can handle regex too, if you really need it.
Using the BFG
Carefully follow the usage steps - the core bit is just this command:
$ java -jar bfg.jar --replace-text replacements.txt my-repo.git
The replacements.txt file should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass # replace with 'examplePass' instead
PASSWORD3==> # replace with the empty string
regex:password=\w+==>password= # Replace, using a regex
Your entire repository history will be scanned, and all text files (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latest commit) will be replaced.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
Looks OK. Remember that filter-branch retains the original commits under refs/original/, e.g.:
$ git commit -m 'add secret password, oops!'
[master edaf467] add secret password, oops!
1 file changed, 4 insertions(+)
create mode 100644 secret
$ git filter-branch --tree-filter "find . -type f -exec sed -Ei '' -e 's/(aSecretPassword1|aSecretPassword2|aSecretPassword3)/removedSensitiveInfo/g' {} \;"
Rewrite edaf467960ade97ea03162ec89f11cae7c256e3d (2/2)
Ref 'refs/heads/master' was rewritten
Then:
$ git grep aSecretPassword `git rev-list --all`
edaf467960ade97ea03162ec89f11cae7c256e3d:secret:aSecretPassword2
but:
$ git lola
* e530e69 (HEAD, master) add secret password, oops!
| * edaf467 (refs/original/refs/heads/master) add secret password, oops!
|/
* 7624023 Initial
(git lola is my alias for git log --graph --oneline --decorate --all). Yes, it's in there, but under the refs/original name space. Clear that out:
$ rm -rf .git/refs/original
$ git reflog expire --expire=now --all
$ git gc
Counting objects: 6, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), done.
Total 6 (delta 0), reused 0 (delta 0)
and then:
$ git grep aSecretPassword `git rev-list --all`
$
(as always, run filter-branch on a copy of the repo Just In Case; and then removing original refs, expiring the reflog "now", and gc'ing, means stuff is Really Gone).

ignoring changes matching a string in git diff

I've made a single simple change to a large number of files that are version controlled in git and I'd like to be able to check that no other changes are slipping into this large commit.
The changes are all of the form
- "main()",
+ OOMPH_CURRENT_FUNCTION,
where "main()" could be the name of any function. I want to generate a diff of all changes that are not of this form.
The -G and -S options to git diff are tantalisingly close--they find changes that DO match a string or regexp.
Is there a good way to do this?
Attempts so far
Another question describes how regexs can be negated, using this approach I think the command should be
git diff -G '^((?!OOMPH_CURRENT_FUNCTION).)*$'
but this just returns the error message
fatal: invalid log-grep regex: Invalid preceding regular expression
so I guess git doesn't support this regex feature.
I also noticed that the standard unix diff has the -I option to "ignore changes whose lines all match RE". But I can't find the correct way to replace git's own diff with the unix diff tool.
Try the following:
$ git diff > full_diff.txt
$ git diff -G "your pattern" > matching_diff.txt
You can then compare the two like so:
$ diff matching_diff.txt full_diff.txt
If all changes match the pattern, full_diff.txt and matching_diff.txt will be identical, and the last diff command will not return anything.
If there are changes that do not match the pattern, the last diff will highlight those.
You can combine all of the above steps and avoid having to create two extra files like so:
diff <(git diff -G "your pattern") <(git diff) # works with other diff tools too
No more grep needed!
With Git 2.30 (Q1 2021), "git diff"(man) family of commands learned the "-I<regex>" option to ignore hunks whose changed lines all match the given pattern.
See commit 296d4a9, commit ec7967c (20 Oct 2020) by Michał Kępień (kempniu).
(Merged by Junio C Hamano -- gitster -- in commit 1ae0949, 02 Nov 2020)
diff: add -I<regex> that ignores matching changes
Signed-off-by: Michał Kępień
Add a new diff option that enables ignoring changes whose all lines (changed, removed, and added) match a given regular expression.
This is similar to the -I/--ignore-matching-lines option in standalone diff utilities and can be used e.g. to ignore changes which only affect code comments or to look for unrelated changes in commits containing a large number of automatically applied modifications (e.g. a tree-wide string replacement).
The difference between -G/-S and the new -I option is that the latter filters output on a per-change basis.
Use the 'ignore' field of xdchange_t for marking a change as ignored or not.
Since the same field is used by --ignore-blank-lines, identical hunk emitting rules apply for --ignore-blank-lines and -I.
These two options can also be used together in the same git invocation (they are complementary to each other).
Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate that it is logically a "sibling" of xdl_mark_ignorable_regex() rather than its "parent".
diff-options now includes in its man page:
-I<regex>
--ignore-matching-lines=<regex>
Ignore changes whose all lines match <regex>.
This option may be specified more than once.
Examples:
git diff --ignore-blank-lines -I"ten.*e" -I"^[124-9]"
A small memleak in "diff -I<regexp>" has been corrected with Git 2.31 (Q1 2021).
See commit c45dc9c, commit e900d49 (11 Feb 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 45df6c4, 22 Feb 2021)
diff: plug memory leak from regcomp() on {log,diff} -I
Signed-off-by: Ævar Arnfjörð Bjarmason
Fix a memory leak in 296d4a9 ("diff: add -I that ignores matching changes", 2020-10-20, Git v2.30.0-rc0 -- merge listed in batch #3) by freeing the memory it allocates in the newly introduced diff_free().
This memory leak was intentionally introduced in 296d4a9, see the discussion on a previous iteration of it.
At that time freeing the memory was somewhat tedious, but since it isn't anymore with the newly introduced diff_free() let's use it.
Let's retain the pattern for diff_free_file() and add a diff_free_ignore_regex(), even though (unlike "diff_free_file") we don't need to call it elsewhere.
I think this will make for more readable code than gradually accumulating a giant diff_free() function, sharing "int i" across unrelated code etc.
Use git difftool to run a real diff.
Example: https://github.com/cben/kubernetes-discovery-samples/commit/b1e946434e73d8d1650c887f7d49b46dcbd835a6
I've created a script running diff the way I want to (here I'm keeping curl --verbose outputs in the repo, resulting in boring changes each time I rerun the curl):
#!/bin/bash
diff --recursive --unified=1 --color \
--ignore-matching-lines=serverAddress \
--ignore-matching-lines='^\* subject:' \
--ignore-matching-lines='^\* start date:' \
--ignore-matching-lines='^\* expire date:' \
--ignore-matching-lines='^\* issuer:' \
--ignore-matching-lines='^< Date:' \
--ignore-matching-lines='^< Content-Length:' \
--ignore-matching-lines='--:--:--' \
--ignore-matching-lines='{ \[[0-9]* bytes data\]' \
"$#"
And now I can run git difftool --dir-diff --extcmd=path/to/above/script.sh and see only interesting changes.
An important caveat about GNU diff -I aka --ignore-matching-lines: this merely prevents such lines from making a chunk "intersting" but when these changes appear in same chunk with other non-ignored changes, it will still show them. I used --unified=1 above to reduce this effect by making chunks smaller (only 1 context line above and below each change).
I think that I have a different solution using pipes and grep. I had two files that needed to be checked for differences that didn't include ## and g:, so I did this (borrowing from here and here and here:
$ git diff -U0 --color-words --no-index file1.tex file2.tex | grep -v -e "##" -e "g:"
and that seemed to do the trick. Colors still were there.
So I assume you could take a simpler git diff command/output and do the same thing. What I like about this is that it doesn't require making new files or redirection (other than a pipe).

Is my regex too greedy?

Background: We're using a tape library and the backup software NetWorker to back up data here. The client that's installed is fairly basic, and when we need to restore more than one target directory we create a script that simply calls X client instances in the background via a script with X of the following lines:
recover -c client-srv -t "Mon Dec 10 08:00:00" -s barckup-srv -d /dest/dir/ -f -a /src/dir &
The trouble is that different partitions/directories backed up from the same machine at the same time might be spread across several different tapes, and some of those tapes may have been removed from the library between the backup and restore.
Up until recently the only ways the people here have been finding out about which tapes are needed were to either wait for the library to complain that it doesn't have a particular tape, or to set up a fake restore in an crappy old desktop GUI client and hit a particular menu option. The first option is super bad when the tape turns out to be off-site and takes a day to get back, and the second is tedious and time-consuming.
Actual Question: I've written a "meta"-script that reads the script that we've already created with the commands above, feeds it into the interactive CLI client, and gets it to spit out what tapes are required, and if they're actually in the library. To do this, the script uses the following regular expressions to pull out necessary info:
# pull out a list of the -a targets
restore_targets="`sed 's/^.* -a \([^ ]*\) .*$/\1/' $rec_script`"
# pull out a list of -c clients
restore_clients="`sed 's/^.* -c \([^ ]*\) .*$/\1/' $rec_script`"
numclients=`echo $restore_clients | uniq | wc -l`
# pull out a list of -t dates
restore_dates="`sed 's/^.* -t \"\([^\"]*\)\" .*$/\1/' $rec_script`"
numdates=`echo $restore_dates | uniq | wc -l`
I am not terribly familiar with using s/\(x\)/\1/ types of regexes, to the point that I don't remember the name, but is this the best way of accomplishing what I am doing? The commands work, but I'm wondering if I'm using the .* needlessly.
\1 refers to the first capturing group. If you replace foo(.*?) with \1 and feed in foobar, the resulting text becomes bar, as \1 points to the text captured by the first capturing group.
As for your your question, it might be safer and easier to parse the arguments using Python (or another high-level scripting language):
>>> import shlex
>>> shlex.split('recover -c client-srv -t "Mon Dec 10 08:00:00" -s barckup-srv -d /dest/dir/ -f -a /src/dir &')
['recover', '-c', 'client-srv', '-t', 'Mon Dec 10 08:00:00', '-s', 'barckup-srv', '-d', '/dest/dir/', '-f', '-a', '/src/dir', '&']
Now, this is much easier to work with. The quotes are gone and all of the components of the command are nicely split up into a list.
If you want this to be completely foolproof, you could use argparse and implement your own parser for this command line pretty easily. This will enable you to easily get the info, but it might be overkill for your situation.
As for your actual question, you can dissect the regex:
^.* -t "([^\"]*)" .*$
This regex captures -t "foo \" bar", while a non-greedy version would stop at -t "foo \".