how to search git log without a certain commit - regex

Assume I have a branch master with 3 commits, the comments are t123, b1 and b12 separately.
* b90b03f (HEAD -> master) b12
* 27f7577 b1
* 7268b40 t123
And now, I want to use git log --grep <regex> to search the log log without t123.
The result I want is
* b90b03f b12
* 27f7577 b1
So how do I use regex to meet the requirement?

It sounds like what you want is to include all commits whose commit messages do not match some pattern, but --grep includes commits that do match some pattern. But the answer to "how do I write a regexp that matches everything except some pattern" is: You don't.1
You don't need to, because you can use something else (or more precisely, something additional) to exclude the commit with the string "t123" in it. Specifically, if you look at the documentation for git log, you will find that it not only has its --grep=<pattern> option, but also an --invert-grep option:
--invert-grep
Limit the commits output to ones with log message that do not match
the pattern specified with --grep=<pattern>.
That is, instead of inventing some sort of inverse regular expression, you simply tell the command to invert the result from searching for the regular expression. Since your regexp is just a fixed string with no meta-characters in it:
git log --grep t123 --invert-grep
will do the job. (The = between --grep and the <pattern> part is optional for --grep.)
1It is, in some sense, not impossible; it's just way too difficult, inefficient, and most of all, unnecessary.

Related

POSIX ERE Regex - Creating Efficient Regex

I'm working to create some regex entries that are well-formed, and efficient. I'll place an emphasis on efficient, as these regex entries can see thousands of logs per second. Inefficient regex entries can cause severe performance impacts.
Question: Does regex101 (through one flavor) support POSIX ERE Regex? Googling shows that PCRE2 should support BRE+ERE and more.
Regex Type: POSIX ERE
Syslog App: rsyslog (EL7)
Sample Payload (Well formed - Sensitive Information Stripped):
Jul 10 00:00:00 Firewall-Name-Removed CEF:0|Fortinet|FortiGate-removed|1.2.3,build1111 (GA)|0000000013|forward traffic accept|5|start=Jul 10 2022 00:00:00 logver=604091966 deviceExternalId=FG9A9A9A9999999 dvchost=Firewall-Name-Removed ad.vd=root ad.eventtime=1111111111111111111 ad.tz=-9999 ad.logid=0000000013 cat=traffic ad.subtype=forward deviceSeverity=notice src=1.1.1.1 shost=RandomHost1 spt=62119 deviceInboundInterface=DII-Out ad.srcintfrole=lan ad.srcssid=SSID Has Been Removed ad.apsn=ABC123D ad.ap=CHL-07 ad.channel=157 ad.radioband=802.11ac n-only ad.signal=-40 ad.snr=55 dst=2.2.2.2 dpt=53 deviceOutboundInterface=DOI-Out ad.dstintfrole=undefined ad.srccountry=Reserved ad.dstcountry=CountryRemoved externalID=123456789 proto=00 act=accept ad.policyid=000 ad.policytype=policy ad.poluuid=UUID-Removed ad.policyname=policy_name_removed app=DNS ad.trandisp=noop ad.appid=16195 ad.app=DNS ad.appcat=Network.Service ad.apprisk=elevated ad.applist=UTM Name - Removed ad.duration=180 out=0 in=205 ad.sentpkt=0 ad.rcvdpkt=1 ad.utmaction=allow ad.countdns=1 ad.osname=Windows ad.srcswversion=10 ad.mastersrcmac=MAC removed ad.srcmac=MAC removed ad.srcserver=0 tz="-9999"
What I'm attempting to do is remove specific logs that are not required. Normally I'd do this at a SIEM level through something like routing rules (where I can utilize fields), but this isn't possible for the foreseeable future. In this particular case: I'm trying to exclude on the following pieces of information.
Source IP: Is in a specific range
deviceOutboundInterface: is DOI-Out
Current Regex: "\bsrc=1.1.1[4-5]{0,1}.[0-9]{0,3}\b.*?\bdeviceOutboundInterface=DOI-Out\b" (Regex101 link in PCRE2). If that is matched, the log is rejected (through the stop call). Otherwise, it moves onto the other entries to check for unnecessary logs.
Most of my regex entries are in the low double-digits because they're a lot simpler. Is there a better way to make the more complex regex more efficient?
Thank you for any insight you can offer.
You might be able to cut some time with:
src=1\.1\.1[4-5]{0,1}\.[0-9]{0,3}.*?deviceOutboundInterface=DOI-Out
changes:
remove word boundaries
change the . to . in IP address
regex101 has the original efficiency at 383 steps, new is 301 so a potential savings of ~21%. Not terrible but you'll want to make sure any removals were OK.
to be honest, what you have looks pretty good to me.
This RE reduces the number of steps on Reg101 from 383 to 270 (~ -29.5%):
src=1\.1\.1[45]?\.\d{0,3}.*?O[boundIter]*?face=DOI-Out
The original RE already is quite simple, only matching one pattern and one literal string which makes it difficult to optimize. But we can do if we know (from the documentation of the text in question, here the Log Message manual) that an even simpler pattern will not lead to ambiguities.
Changes:
matching literal text whereever possible
replacing range '4-5' with simple elements
instead of matching the long 'deviceOutboundInterface=', use a pattern which will just barely match this string but would possibly match other words if they ever occurred in log messages - but we know they don't.

Git diff: ignore lines starting with a word

As I have learned here, we can tell git diff to ignore lines starting with a * using:
git diff -G '^[[:space:]]*[^[:space:]*]'
How do I tell git to ignore lines starting with a word, or more (for example: * Generated at), not just a character?
This file shall be ignored, it contains only trivial changes:
- * Generated at 2018-11-21
+ * Generated at 2018-11-23
This file shall NOT be ignored, it contains NOT only trivial changes:
- * Generated at 2018-11-21
+ * Generated at 2018-11-23
+ * This line is important! Although it starts with a *
Git is using POSIX regular expressions which seem not to support lookarounds. That is the reason why #Myys 3's approach does not work. A not so elegant workaround could be something like this:
git diff -G '^\s*([^\s*]|\*\s*[^\sG]|\*\sG[^e]|\*\sGe[^n]|\*\sGen[^e]|\*\sGene[^r]|\*\sGener[^a]|\*\sGenera[^t]|\*\sGenerat[^e]|\*\sGenerate[^d]).*'
This will filter out all changes starting with "* Generated".
Test: https://regex101.com/r/kdv4V0/3
Considering you are ignoring changes that does NOT match your regex, you just have to put the words you want inside the expression within a lookahead capture group, like this:
git diff -G '^(?=.*Generated at)[[:space:]]*[^[:space:]*]'
Note that if you want to keep adding words to ignore, just keep adding these groups (don't forget the .*):
However, if the string contains a "Generated at" anywhere in their whole, it shall be ignored. If you want to define exactly how it should start, then replace the . with a [^[:word:]].
git diff -G '^(?=[^[:word:]]*Generated at)[[:space:]]*[^[:space:]*]'
You can have a look at it's behaviour at
Version 1: .*
https://regex101.com/r/kdv4V0/1
Version 2: [^[:word:]]*
https://regex101.com/r/kdv4V0/2
TL;DR: git diff -G is not able to exclude changes only to include changes that match the regex.
Have a look at git diff: ignore deletion or insertion of certain regex
There torek explains how git log and git diff work and how the parameter -G works.

How do I configure Jenkins to strip the leading "origin/" in git branch parameter?

I'm using Jenkins with a branch parameter to specify the branch to build from. Other stuff downstream needs the branch name to not have the leading "origin/" -- just "feature/blahblah" or "bugfix/12345" or similar. The advanced settings for the parameter let me specify a branch filter via regex, but I'm a regex newbie and the solutions I've found in searching are language-dependent. The Jenkins documentation for this is sparse.
When a user clicks on "build with parameters", for the branch I want to see branch names that omit the leading "origin/". I'm not sure how to write a regex for Jenkins that will "consume" that part of the branch name before setting the parameter value.
I solved this problem once before, I'm pretty sure using Stack Overflow, but I can't find those hints now.
For the git branch parameter, set Branch Filter to:
origin/(.*)
I found the parentheses to be counter-intuitive, because if you don't specify a filter you get:
.*
(No parens.) If you are filtering stuff out, you use parens to indicate the part to keep.
I usually use a groovy script evaluated before the job, like:
def map = [:]
map['GIT_BRANCH'] = GIT_BRANCH - 'origin/'
return map
This is using the EnvInject plugin, as described in gitlab-plugin issue 444
If you need to filter multiple patterns without origin/ section, try the following.
origin/(develop.*|feature.*|bugfix.*)
This will list the develop, feature and bugfix branches without the leading origin/.

How to match everything inside the first pair of square brackets

I'm trying to create a regular expression in sieve. The implementation of sieve that I'm using is Dovecot Pigeonhole
I'm subscribed to github project updates and I receive emails from github with the subject in the format that looks like this:
Re: [Opserver] Create issues on Jira from Exception details page (#77)
There is a project name in square bracket included in the subject line. Here is the relevant part of my sieve script:
if address "From" "notifications#github.com" {
if header :regex "subject" "\\[(.*)\\]" {
set :lower :upperfirst "repository" "${1}";
fileinto :create "Subscribtions.GitHub.${repository}"; stop;
} else {
fileinto :create "Subscribtions.GitHub"; stop;
}
}
As you can see from the above, I'm moving the messages to appropriate project IMAP folders. So the message with the subject above will end up in Subscribtions.Github.Opserver
Unfortunately, there is one small problem with this script. If someone adds square brackets in the title of their github issue, the filter breaks. For example if the subject is:
[Project] [Please look at it] - very weird issue
The above filter will move the message to folder Subscribtions.Github.Project] [please look at it which is completely undesirable. I'd like it to be moved to Subscribtions.Github.Project anyway.
This happens because by default regular expressions are greedy. So they match the longest possible match. However when I try to fix it the usual way changing "\\[(.*)\\]" to "\\[(.*?)\\]" nothing seems to change.
How do I write this regular expression so that it acts as desired?
The answer is to change "\\[(.*)\\]" to "\\[([^]]*)\\]".
By reading regex spec linked in the question we disvover that POSIX regular expression are used. Unfortunately those do not support non-greedy matches.
However there is a work around in this particular case, given above.

Doing a 'diff/st' and ignoring the first line if it matches a specific criterion

In a repository for a well known open source project, all files contain a version string with a timestamp as their first line:
<?php // $Id: index.php,v 1.201.2.10 2009-04-25 21:18:24 stronk7 Exp $
Even if I don't really understand why they do this - since the files are already under version control -, I have to live with this.
The main problem is that if I try to 'st' or 'diff' a release to get an idea of what was changed from the previous one, every single file contained in the repository is obviously marked as modified and the diffs become unreadable and unmanageable.
I'm wondering if there's a way to ignoring the first lines doing a diff/st when they match a regexp.
The project is under cvs - cvs, yes, you've read correctly - and included in a bigger mercurial repository.
I don't know about cvs, but with hg you can use any external diff tool with the bundled extdiff extension, and any modern tool should have the ability to let you ignore diffs that match certain patterns.
I swear by Beyond Compare, which allows arbitrary syntax definition.
kdiff3 has preprocessor commands that you can pipe the input through.
If you try
man diff
you'll find
--ignore-matching-lines=RE Ignore changes whose lines all match RE.
search "ignore matching lines" on the web gives examples :
diff --unified --recursive --new-file
--ignore-matching-lines='[$]Author.[$]'
--ignore-matching-lines='[$]Date.[$]' ...
(http://www.cygwin.com/ml/cygwin-apps/2005-01/msg00000.html)
Thus try :
diff --ignore-matching-lines='[<][?]php [/][/] [$]Id:'