Parsing CVS History Output - regex

I just need to get a list of the most recent changes from CVS and parse them.
Example: The CVS user "Lollerskates" checked in a file with spaces. But spaces are the delimiter! And then "skates" checked in a file with a space in a folder name.
% cvs history -c -a -D 2011-03-14
A 2011-03-15 00:17 +0000 jschmoe 1.1 CoolCode.java Awesome/Source/Java/src/com/widgets/foo/ambiguous/abstraction == <remote>
M 2011-03-15 00:17 +0000 sumbody 1.2 MoreCoolCode.java Awesome/Source/Java/src/com/widgets/foo/ambiguous/abstraction == <remote>
A 2011-03-15 00:17 +0000 lollerskates 1.123 This File Name Has Spaces.html Awesome/Source/Java/src/com/widgets/foo/ambiguous/abstraction == <remote>
A 2011-03-15 00:17 +0000 jschmoe 1.1 MyAwesomeProject.java Awesome/Source/Java/src/com/widgets/foo/ambiguous/abstraction == <remote>
M 2011-03-15 00:17 +0000 skates 1.5 BlahBlah.java Awesome/Source/Java/src/com/widgets/foo/content/block type/cart == <remote>
What is a reliable way to parse this?
Alternatively, is there a different CVS command with more easily parsable results?

This regex captures all of these:
\w \d{4}-\d{2}-\d{2} \d{2}:\d{2} \+\d{4} (\w+)\s+(\d+.\d+)\s+([\w\s]+\.\w+)\s+([\w\s/]+)== \<remote\>
The user is in group #1, filename in group #3 and path in group #4.

In this very case probably cut is a better way? If the fields are fixed length...

Related

Extracting string to variable using regex bash

I have a string which is like:
Return-Path: bT.41aywtru20=krja5b54hplm=k29fsc7grl#fake.link.com
Received-SPF: pass (fake.link.com: Sender is authorized to use 'bt.41aywtru20=krja5b54hplm=k29fsc7grl#fake.link.com' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="bt.41aywtru20=krja5b54hplm=k29fsc7grl#fake.link.com"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
for <fake#fake.com>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
Date:To:From:Reply-To:Sender:List-Unsubscribe;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=fake.com; i=#fakelink.com; q=dns/txt; s=s575655;
t=1629812840; h=from : subject : to : message-id : date;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
(TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <tomtest#fakelink.com>)
id 1mIWls-TRjyEC-AK for fake#fake.com; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <fakename#fakelink.com>)
id 1mIWlr-9EFPsz-U0 for fake#fake.com; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: fake#fake2.com
To: fake#fake.com
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <E1mIWlr-9EFPsz-U0#message-id.smtpcorp.com>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
to <abuse-report#smtp2go.com>
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--
This is stored in a variable called $emailText
I'm trying to use a regex to take the From part out of the text
From: fake#fake2.com
My regex isnt super strong, however my testing looks like this works: (?<=From: ).*.
But when I try and take the text out, it appears I can't get the regex to go through properly.
echo [[ $emailText =~ (?<=From: ).*. ]]
bash regex doesn't support lookbehind or lookahead assertions.
It is much easier to use a non-regex approach using awk here:
awk -F ': ' '$1 == "From" {print $2}' <<< "$emailText"
fake#fake2.com
With bash:
[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"
Output:
fake#fake2.com
With your shown samples, attempts; please try following awk code. Simple explanation would be, checking condition if 1st field is From: then print 2nd field of that line.
awk '$1=="From:"{print $2}' Input_file
2nd solution: In case you have only 1 entry of From: in whole file then try following, where we can use exit function to exit from Input_file after printing the matched line, to stop un-necessary reading of whole Input_file.
awk '$1=="From:"{print $2;exit}' Input_file
Assuming you only want the email terminus, here's a quick and dirty Awk script.
awk '/^$/ { exit 1 }
/^From: .* <[^<>#]+#[^<>]+>/ {
split($0, g, /[<>]/); print g[1]; exit }
/^From: / { print $2; exit }' file.eml
This should work correctly for all these cases:
From: Real Name <real.name#example.com>
From: "Name, Real" <outlook#torture.example.com>
From: terminus#example.com
From: terminus#example.com (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <real.name#example.com>
As especially the last example should convince you, you will need significantly more work if you also need the full name of the correspondent in normalized form.
If there should be a mail address present, you can match it first using awk (without the unsupported need for lookarounds)
awk 'match($0, /^From: [^[:space:]#]+#[^[:space:]#]+$/) {
print $2
}' <<< "$emailText"
Output
fake#fake2.com

How to print initial letter of committer in git log?

I have prepared an alias to get a short log report in git
# excerpt from ~/.gitconfig
[alias]
lg = log --all --oneline --graph --decorate --pretty='%C(auto)%h %Cgreen%ai %C(reset)%C(auto)%s %d'
git lg generates one nice line per commit, but without information on the user:
* 623beff 2016-11-14 14:18:36 +0100 extended plotstyle option and automatic colors
or as screenshot:
But I want to see the initial letters of the committer real name (the full name is sometimes too long) in each line:
* 623beff 2016-11-14 14:18:36 +0100 (J.S.) extended plotstyle option and automatic colors
How can I get this result?
there is a way to do this to get the first letter of the first name, using %<(3,trunc)%cN:
git log --all --oneline --graph --decorate --pretty='%C(auto)%h %Cgreen%ai %C(reset)%C(auto)(%<(3,trunc)%cN) %s %d'
output:
* 8759307 2009-01-15 16:11:48 +0000 (S..) Remove spurious code trying to tag a branch root before the mark was created. (HEAD -> master, origin/master, origin/HEAD)
* 939f999 2008-12-11 13:41:37 +0000 (S..) When just writing output file, do not try to devise lock target with no repository.

how to pre-fix a piece of text in github "git log" using shell-script

I need to make a github commit (the text), from the git command git log into a link in an email. So the recipient can click on the link and go directly to the change.
I receive a long list containing lines with the text:
commit some_long_string_of_hexadecimals
and I need to transform this into:
commit https://github.com/account/repo/commit/some_long_string_of_hexadecimals
The log I am receiving contain n-amount of these logs, so I need the script to do this for all instances of this (some_long_string_of_hexadecimals).
Here are a few example log statements:
commit a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
long message describing change.
commit a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
more description
I'd like it to look like this:
commit https://github.com/account/repo/commit/a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
added handling of running tests from within a docker container
How do I achieve this using a shell command ?
Thanks in advance.
awk '$1 == "commit" {$2 = "https://github.com/account/repo/commit/" $2} 1'
check if field 1 equals "commit"
if so, prepend to field 2
if line matched, print modified line, else print line as is

Matching package version string in /bin/sh

I'm trying to match given string and match it to a package version in a /bin/sh script:
if test "x$version" = "x"; then
version="latest";
info "Version parameter not defined, assuming latest";
else
info "Version parameter defined: $version";
info "Matching version to package version"
case "$version" in
[^4.0.]*)
$package_version='1.0.1'
;;
[^4.1.]*)
$package_version='1.1.1'
;;
[^4.2.]*)
$package_version='1.2.6'
;;
*)
critical "Unable to match requested version to package version"
exit 1
;;
esac
fi
However, when I run it I get an error:
23:38:47 +0000 INFO: Version parameter defined: 4.0.0
23:38:47 +0000 INFO: Matching Puppet version to puppet-agent package version (See http://docs.puppetlabs.com/puppet/latest/reference/about_agent.html for more details)
23:38:47 +0000 CRIT: Unable to match requested puppet version to puppet-agent version - Check http://docs.puppetlabs.com/puppet/latest/reference/about_agent.html
23:38:47 +0000 CRIT: Please file a bug report at https://github.com/petems/puppet-install-shell/
23:38:47 +0000 CRIT:
23:38:47 +0000 CRIT: Version: 4.0.0
I'm using the same regex that worked for me in another part of the script:, and it seems to work there:
if test "$version" = 'latest'; then
apt-get install -y puppet-common puppet
else
case "$version" in
[^2.7.]*)
info "2.7.* Puppet deb package tied to Facter < 2.0.0, specifying Facter 1.7.4"
apt-get install -y puppet-common=$version-1puppetlabs1 puppet=$version-1puppetlabs1 facter=1.7.4-1puppetlabs1 --force-yes
;;
*)
apt-get install -y puppet-common=$version-1puppetlabs1 puppet=$version-1puppetlabs1 --force-yes
;;
esac
fi
What am I missing?
Full version of the script is here: https://github.com/petems/puppet-install-shell/blob/fix_puppet_agent_install/install_puppet_agent.sh
case ... esac in a POSIX shell script uses (glob-style) patterns, not regular expressions (while the two are distantly related, there are fundamental differences).
To get true regex matching in a sh script, you'd have to use expr with :, though it's probably not needed here.
To test for a prefix match, use <prefix>* in a case branch - case branches are always matched against the entire argument - no need for anchoring (which patterns don't support).
As an aside, what you're attempting would not even work for prefix matching as a regex. E.g., [^4.0.] is the same as [^.04] - i.e., a negated character class: it matches one character if it is neither . nor 0 nor 4.
When assigning to a variable in a POSIX shell script, do not use $.
To put it all together:
#/bin/sh
if [ "$version" = "" ]; then
version="latest";
info "Version parameter not defined, assuming latest"
else
info "Version parameter defined: $version";
info "Matching version to package version"
case "$version" in
4.0.*)
package_version='1.0.1'
;;
4.1.*)
package_version='1.1.1'
;;
4.2.*)
package_version='1.2.6'
;;
*)
critical "Unable to match requested version to package version"
exit 1
;;
esac
fi

Ignoring headers in log files being indexed in Splunk

Being relatively new to Splunk (ver 6) and even newer to Reg-ex, I have log files that I and trying to index that have a header than I need to ignore. There are 6 header lines. The first 4 all begin with * and the last two are blank lines. I'm assuming they are just carriage returns. I'm looking for help with the regular expression that will ignore these lines in the transforms.conf file when adding the data. Below is an example from the log file I want to add:
*******************************
*** This is a Header ***
*** 07:32:06 Tue Jan 07 ***
*******************************
Jan-07 07:32:06 SERVERNAME:somedatainfo
Jan-07 07:32:06 SERVERNAME:moredatainfo
On the forwarder or indexer you can set the sourcetype for the file that is being monitored.
# in inputs.conf
[monitor:///path/to/your/file]
sourcetype=new_sourcetype
On the indexer, you can get rid of the first lines like this:
# in props.conf
[new_sourcetype]
TRANSFORMS-test=ignore_header
#in transforms.conf
[ignore_header]
REGEX=^\*\*\*+
DEST_KEY=queue
FORMAT=nullQueue