Extracting string to variable using regex bash - regex

I have a string which is like:
Return-Path: bT.41aywtru20=krja5b54hplm=k29fsc7grl#fake.link.com
Received-SPF: pass (fake.link.com: Sender is authorized to use 'bt.41aywtru20=krja5b54hplm=k29fsc7grl#fake.link.com' in 'mfrom' identity (mechanism 'include:spf.smtp2go.com' matched)) receiver=pmxlab01.permission.email; identity=mailfrom; envelope-from="bt.41aywtru20=krja5b54hplm=k29fsc7grl#fake.link.com"; helo=e2i353.smtp2go.com; client-ip=103.2.141.97
Received: from e2i353.smtp2go.com (e2i353.smtp2go.com [103.2.141.97])
by mailserver.fake.com(Proxmox) with ESMTP id A4F983E1048
for <fake#fake.com>; Tue, 24 Aug 2021 14:47:20 +0100 (BST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
d=smtpcorp.com; s=a1-4; h=Feedback-ID:X-Smtpcorp-Track:Message-Id:Subject:
Date:To:From:Reply-To:Sender:List-Unsubscribe;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=; b=STU7lctit7L5LJ2tA3Re1fe4II
lXJbY/SBXTGqCHh9p4K86aLK5Bvz98Q7eR9xwjFib6x4NoZZ5L1fke0XQERd1eQvxkl9R+kRIGU8A
QOtrLPpt8coN8P+syoaTRR4pDJQG9OfJO1fON9OaOP8HwnEg/91ie6Cm+wQRxjwyat859uAcu89Xv
6/mrcequkSp6kfiQN4goZ7vMYJYfBYuooslbTciaK4SYIfxdINyrrWGA6QhJPobdW0uuedRNY5jBG
OdMbVmm7FTpxDJs51rB1PTIcFQ8W1oypcttqSgCjI+5eMVrabU/IoIxhX5F0Cn3zm7E9CHlaJuLt1
CRXVbwdw==;
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=fake.com; i=#fakelink.com; q=dns/txt; s=s575655;
t=1629812840; h=from : subject : to : message-id : date;
bh=cTg4MkkE2uaIjpApjJYQFK3RgYiMF3bwCj8UZjFO4NE=;
b=TEeEsPNLf7Wi6b8aaxE6JvfymfBKYjLq7izcUVrOXTW7sGIznxOA5udhfmDh15Fgp6Qgh
Kv5HX9uPNa8TEeoaJ+gV/4KERuscnc4GXEHwo0eclktx6f6JI5h1/q+qCe34+cN/EweaP5n
iOs+nrzsRuWn/iQ0Yck+b4IXVWHoTW8298xmBNuC1JF4jIVXREJFAC0nACfGU03OlpjDXf/
qvI6Ffnn5YGTNxgIkOdrtymaqOvjG9NM0PWtgSkvsTCJdUvxkrI+rRUG6ixiNi+vifqwvox
aQ6BRnMmeNK7A954Dy9r9r09QzbTthsBsi+lORKH7DntBKhm7Rb5/Q9j0xVA==
Received: from [10.176.58.103] (helo=SmtpCorp) by smtpcorp.com with esmtpsa
(TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <tomtest#fakelink.com>)
id 1mIWls-TRjyEC-AK for fake#fake.com; Tue, 24 Aug 2021 13:47:20 +0000
Received: from [10.86.20.232] (helo=DESKTOP-69OG2R3)
by smtpcorp.com with esmtpsa (TLS1.2:ECDHE_RSA_SECP256R1__AES_256_GCM:256)
(Exim 4.94.2-S2G) (envelope-from <fakename#fakelink.com>)
id 1mIWlr-9EFPsz-U0 for fake#fake.com; Tue, 24 Aug 2021 13:47:19 +0000
MIME-Version: 1.0
From: fake#fake2.com
To: fake#fake.com
Date: 24 Aug 2021 14:46:30 +0100
Subject: Test Email 2xM9e5Dj
Content-Type: multipart/alternative;
boundary=--boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Message-Id: <E1mIWlr-9EFPsz-U0#message-id.smtpcorp.com>
X-Smtpcorp-Track: 1XmW_r9EFeszl0.JChXLDDjoy7xH
Feedback-ID: 575655m:575655aVI_MaS:575655sNpPp5WOdD
X-Report-Abuse: Please forward a copy of this message, including all headers,
to <abuse-report#smtp2go.com>
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a text message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
This is a html message
----boundary_11_ddba370a-13e2-4ffc-8b36-0eb7a5cde80e--
This is stored in a variable called $emailText
I'm trying to use a regex to take the From part out of the text
From: fake#fake2.com
My regex isnt super strong, however my testing looks like this works: (?<=From: ).*.
But when I try and take the text out, it appears I can't get the regex to go through properly.
echo [[ $emailText =~ (?<=From: ).*. ]]

bash regex doesn't support lookbehind or lookahead assertions.
It is much easier to use a non-regex approach using awk here:
awk -F ': ' '$1 == "From" {print $2}' <<< "$emailText"
fake#fake2.com

With bash:
[[ "$emailText" =~ From:\ ([^$'\n']*) ]] && echo "${BASH_REMATCH[1]}"
Output:
fake#fake2.com

With your shown samples, attempts; please try following awk code. Simple explanation would be, checking condition if 1st field is From: then print 2nd field of that line.
awk '$1=="From:"{print $2}' Input_file
2nd solution: In case you have only 1 entry of From: in whole file then try following, where we can use exit function to exit from Input_file after printing the matched line, to stop un-necessary reading of whole Input_file.
awk '$1=="From:"{print $2;exit}' Input_file

Assuming you only want the email terminus, here's a quick and dirty Awk script.
awk '/^$/ { exit 1 }
/^From: .* <[^<>#]+#[^<>]+>/ {
split($0, g, /[<>]/); print g[1]; exit }
/^From: / { print $2; exit }' file.eml
This should work correctly for all these cases:
From: Real Name <real.name#example.com>
From: "Name, Real" <outlook#torture.example.com>
From: terminus#example.com
From: terminus#example.com (Real Name)
From: =?q?utf-8?Real_N=A3=E4me?= <real.name#example.com>
As especially the last example should convince you, you will need significantly more work if you also need the full name of the correspondent in normalized form.

If there should be a mail address present, you can match it first using awk (without the unsupported need for lookarounds)
awk 'match($0, /^From: [^[:space:]#]+#[^[:space:]#]+$/) {
print $2
}' <<< "$emailText"
Output
fake#fake2.com

Related

Why isn't booliean expression working

I am using the following regular expression to filter some junk emails
\bNeotube\b | \bNeotubeTV\b
Here is a sample of the junk email header:
Return-path: <uranus#pschic.info>
Envelope-to: coben#jesusmylord.org
Delivery-date: Thu, 25 May 2017 14:18:58 +0200
Received: from [45.59.120.18] (port=30375 helo=pschic.info)
by ok1057.kvchosting.com with esmtp (Exim 4.89)
(envelope-from <uranus#pschic.info>)
id 1dDrj8-0002X1-Se
for coben#jesusmylord.org; Thu, 25 May 2017 14:18:58 +0200
From: "NeotubeTV" <uranus#pschic.info>
Date: Thu, 25 May 2017 06:58:18 -0500
MIME-Version: 1.0
Subject: Free TV shows, Sports and New Movies on your TV In HD?
To: <coben#jesusmylord.org>
Message-ID: `
The above expression does not work. However, if I just use
\bNeotubeTV\b
the email is filtered. Why doesn't the above OR statement work?
Thanks for your help.
Chris
It's because you have spaces around the | which is saying that you want include those spaces in your regex.

how to pre-fix a piece of text in github "git log" using shell-script

I need to make a github commit (the text), from the git command git log into a link in an email. So the recipient can click on the link and go directly to the change.
I receive a long list containing lines with the text:
commit some_long_string_of_hexadecimals
and I need to transform this into:
commit https://github.com/account/repo/commit/some_long_string_of_hexadecimals
The log I am receiving contain n-amount of these logs, so I need the script to do this for all instances of this (some_long_string_of_hexadecimals).
Here are a few example log statements:
commit a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
long message describing change.
commit a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
more description
I'd like it to look like this:
commit https://github.com/account/repo/commit/a98a897a67896a987698a769786a987a6987697a6
Author: Some Person <some#email.com>
Date: Thu Sep 29 09:48:52 2016 +0200
added handling of running tests from within a docker container
How do I achieve this using a shell command ?
Thanks in advance.
awk '$1 == "commit" {$2 = "https://github.com/account/repo/commit/" $2} 1'
check if field 1 equals "commit"
if so, prepend to field 2
if line matched, print modified line, else print line as is

Using procmail to forward inline

I would like to use procmail to forward a message to another email address. Both the headers and body of the incoming message should be in the body of the outgoing message (inline forwarding).
Example incoming message:
From: outside#example.com
To: me#example.com
Subject: Test
Date: Mon, 03 Nov 2014 05:00:04 GMT
This is a test
The forwarded message should be like this:
From: me#example.com
To: thirdparty#example.com
Subject: Fwd: Test
Date: Mon, 03 Nov 2014 05:01:00 GMT
From: outside#example.com
To: me#example.com
Subject: Test
Date: Mon, 03 Nov 2014 05:00:04 GMT
This is a test
Can this be done using procmail, maybe in conjunction with something like formail?
Easy enough.
:0
* Some conditions, perhaps? Omit this line to forward unconditionally
* ^Subject:[ ]*\/.*
| (echo From: me#example.com; echo To: thirdparty#example.com; \
echo "Subject: Fwd: $MATCH"; echo; cat -) | $SENDMAIL -t
If you don't care about forwarding the original Subject header verbatim, this can be simplified additionally.
The -t flag to sendmail says to use whatever To: and Cc: headers are in the message to determine the recipient. I omitted generating a Date: because (most imitations of) Sendmail will do that for you.
The stuff in the square brackets should be one space and one tab, as usual.
If you want to keep a copy, either add Bcc: yourself (and take care to not have the incoming copy trigger a mail loop!) or change :0 to :0c which makes Procmail continue down the rest if the recipe file.

shell script: get server info from curl

i need some help getting the server info from a curl call done in a shell script.
In my script, I iterate over several URLs in a list and do a cURL on each URL.
But I struggle getting the server information as it is not in a static position of the result of cURL.
>curl -I -s $temp
where, $temp is some arbitrary URL, e.g. example.org
HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: max-age=604800
Content-Type: text/html
Date: Mon, 03 Feb 2014 14:35:39 GMT
Etag: "359670651"
Expires: Mon, 10 Feb 2014 14:35:39 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (iad/19AB)
X-Cache: HIT
x-ec-custom-error: 1
Content-Length: 1270
Above the result for example.org.
Question: how do I extract the part where it says server?
The result should be such that
>echo $server
will yield (so basically the entire rest of the line after the "Server: ")
ECS (iad/19AB)
Thanks a bunch!
Using awk:
server=$(curl -I -s http://example.org | awk -F': ' '$1=="Server"{print $2}')
echo "$server"
ECS (cpm/F858)
OR else you can use grep -oP:
server=$(curl -I -s http://example.org | grep -oP 'Server: \K.+')
echo "$server"
ECS (cpm/F858)

Bash Script: sed/awk/regex to match an IP address and replace

I have a string in a bash script that contains a line of a log entry such as this:
Oct 24 12:37:45 10.224.0.2/10.224.0.2 14671: Oct 24 2012 12:37:44.583 BST: %SEC_LOGIN-4-LOGIN_FAILED: Login failed [user: root] [Source: 10.224.0.58] [localport: 22] [Reason: Login Authentication Failed] at 12:37:44 BST Wed Oct 24 2012
To clarify; the first IP listed there "10.224.0.2" was the machine the submitted this log entry, of a failed login attempt. Someone tried to log in, and failed, from the machine at the 2nd IP address in the log entry, "10.224.0.58".
I wish to replace the first occurrence of the IP address "10.224.0.2" with the host name of that machine, as you can see presently is is "IPADDRESS/IPADDRESS" which is useless having the same info twice. So here, I would like to grep (or similar) out the first IP and then pass it to something like the host command to get the reverse host and replace it in the log output.
I would like to repeat this for the 2nd IP "10.224.0.58". I would like to find this IP and also replace it with the host name.
It's not just those two specific IP address though, any IP address. So I want to search for 4 integers between 1 and 3, separated by 3 full stops '.'
Is regex the way forward here, or is that over complicating the issue?
Many thanks.
Replace a fixed IP address with a host name:
$ cat log | sed -r 's/10\.224\.0\.2/example.com/g'
Replace all IP addresses with a host name:
$ cat log | sed -r 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/example.com/g'
If you want to call an external program, it's easy to do that using Perl (just replace host with your lookup tool):
$ cat log | perl -pe 's/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/`host \1`/ge'
Hopefully this is enough to get you started.
There's variou ways to find th IP addresses, here's one. Just replace "printf '<<<%s>>>' " with "host" or whatever your command name is in this GNU awk script:
$ cat tst.awk
{
subIp = gensub(/\/.*$/,"","",$4)
srcIp = gensub(/.*\[Source: ([^]]+)\].*/,"\\1","")
"printf '<<<%s>>>' " subIp | getline subName
"printf '<<<%s>>>' " srcIp | getline srcName
gsub(subIp,subName)
gsub(srcIp,srcName)
print
}
$
$ gawk -f tst.awk file
Oct 24 12:37:45 <<<10.224.0.2>>>/<<<10.224.0.2>>> 14671: Oct 24 2012 12:37:44.583 BST: %SEC_LOGIN-4-LOGIN_FAILED: Login failed [user: root] [Source: <<<10.224.0.58>>>] [localport: 22] [Reason: Login Authentication Failed] at 12:37:44 BST Wed Oct 24 2012
googled this one line command together. but was unable to pass the founded ip address to the ssh command:
sed -n 's/\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}/\nip&\n/gp' test | grep ip | sed 's/ip//' | sort | uniq
the "test" is the file the sed command is searching for for the pattern