How to parse the following date using grep command in bash

How to parse the following date using grep command in bash - regex

Given date in the json file as "ts":"2021-04-23T13:11:57Z" or "2021-05-05T07:22:54+05:00" I want to read the string using grep.
Need help in forming the regex of the last part i.e the time zone.
My current command goes like
grep -Po '"ts":"\K([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-2][0-9]:[0-5][0-9]:[0-5][0-9]+Z this works fine for the first format how do i modify it so that it works on both of the formats..

With your shown samples with GNU grep's PCRE option, you could try following regex to match both of the timings.
grep -oP '(?:"ts":)?"\d{4}-\d{2}-\d{2}T(?:[0-1][1-9]|2[0-4]):(?:[0-4][0-9]|5[0-9])[+:](?:[0-4][0-9]|5[0-9])(?:Z"|\+(?:[0-4][0-9]|5[0-9]):(?:[0-4][0-9]|5[0-9])")' Input_file
Explanation: Adding detailed explanation for above.
(?:"ts":)? ##In a non-capturing group matching "ts": keeping it optional here.
"\d{4}-\d{2}-\d{2}T ##Matching " followed by 4 digits-2digits-2digits T here.
(?: ##Starting 1st non-capturing group here.
[0-1][1-9]|2[0-4] ##Matching 0 to 19 and 20 to 24 here to cover 24 hours.
): ##Closing 1st non-capturing group followed by colon here.
(?: ##Starting 2nd non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 for mins here.
) ##Closing 2nd non-capturing group here.
[+:] ##Matching either + or : here.
(?: ##Starting 3rd capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 for seconds here.
) ##Closing 3rd non-capturing group here.
(?: ##Starting 4th non-capturing group here.
Z"|\+ ##Matching Z" OR +(literal character) here.
(?: ##Starting non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 here.
) ##Closing non-capturing group here.
: ##Matching colon here.
(?: ##Starting non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 here.
)" ##Closing non-capturing group here, followed by "
) ##Closing 4th non-capturing group here.

You can use the following to parse either time string from the line. You will need to isolate the line beginning with "ts:" first. For example the following grep expression will do:
grep -Po '[0-9+TZ:-]{2,}'
Which simply extracts the string of characters made up of [0-9+TZ:-] where there is a repetition of at least {2,}.
Example Use
$ echo '"ts":"2021-04-23T13:11:57Z"' | grep -Po '[0-9+TZ:-]{2,}'
2021-04-23T13:11:57Z
and
$ echo '"ts":"2021-05-05T07:22:54+05:00"' | grep -Po '[0-9+TZ:-]{2,}'
2021-05-05T07:22:54+05:00
The normal caveats apply, you are better served using a json aware utility like jq. That said, you can separate the values with grep, but you must take care in isolating the line.
You can use sed to isolate the line using the normal /match/s/find/replace/ form with a capture group and backreference. For example you can use:
sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
Which matches the line beginning with ^[[:blank:]]*"ts" before extraction and the -n suppresses the normal printing of pattern-space so that only the wanted text is output, e.g.
Example Use
$ echo '"ts":"2021-04-23T13:11:57Z"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-04-23T13:11:57Z
and
$ echo '"ts":"2021-05-05T07:22:54+05:00"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-05-05T07:22:54+05:00

For such a specific string, another option with a bit broader match could be
grep -Po '(?:"ts":)?"\K\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d(?:Z|[+-]\d\d:\d\d)(?=")' file
Explanation
(?:"ts":)? Optionally match "ts":
"\K Match " and clear the match buffer (forget what is matched so far)
\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d Match a date time like pattern with a T char in between
(?: Non capture group
Z Match a Z char
| Or
[+-]\d\d:\d\d Match + or - and 2 digits : 2 digits
) Close non capture group
(?=") Positive lookahead, assert " directly to the right
Output
2021-04-23T13:11:57Z
2021-05-05T07:22:54+05:00
Or using -E for extended regular expressions (which will include the outer double quotes)
grep -Eo '("ts":)?"[0-9]{4}-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9](Z|[+-][0-9][0-9]:[0-9][0-9])"' ./file

Related

Regex to match specific domain and it's subfolder

I want to match a particular domain and its subdomain, no matter how it's entered. In the following example, I want to match all ´test.comif nothing comes after it (only a slash or query strings) OR if a specific folder follows it in this case it's named as:subfolder`. Again the subfolder could have / or query strings after.
Domain
Match
test.com
match
https://test.com
match
https://test.com?foo=bar
match
https://test.com/
match
https://test.com/?foo=bar
match
https://www.test.com
match
https://www.test.com/subfolder
match
https://www.test.com/subfolder/
match
https://www.test.com/subfolder/?foo=bar
match
test.com/subfolder
match
https://www.test.com/foo
no match
test.com/foo
no match
https://www.test.com/jason
no match
https://www.test.com/jason?foo=bar
no match
Right now I have the following regex:
^(?:\S+://)?[^/]+/?$
The problem though is that it matches ANY domains, which is not what I need. I want to match a specific domain and a specific subfolder.
How is this possible?

You may use this regex:
^(?:https?://)?(?:www\.)?test\.com(?:/subfolder)?/?(?:\?\S*)?$
RegEx Demo
RegEx Demo:
^: Start
(?:https?://)?: *optionally* match http://orhttps://`
(?:www\.)?: optionally match www.
test\.com: match test.com
(?:/subfolder)?: optionally match /subfolder
/?: optionally match a trailing /
(?:\?\S*)?: optionally match query string starting with ?
$: End

With your shown samples, could you please try following.
^(?:(?:https?:\/\/)(?:www\.)?)?test\.com(?:(?:(?:\/)?(?:\/subfolder\/?)?(?:\/\?\S+\/?)?)?(?:\?\S+)?)?(?:\/)?$
Online demo for above regex
Explanation: Adding detailed explanation for above.
^ ##Starting of match here by caret sign.
(?: ##Starting non-capturing group here.
(?:https?:\/\/) ##In this non-capturing group which has http/https// in it to match.
(?:www\.)? ##In this non-capturing group keeping www. as an optional here.
)? ##Closing very first non-capturing group here.
test\.com ##Matching string test.com here. (1, calling it 1 for explanation purposes)
(?: ##Starting a non-capturing group here.
(?: ##Starting one more non-capturing group here.(2, calling it for explanation purposes only)
(?:\/)? ##Matching / optional in a non-capturing group here.
(?:\/subfolder\/?)? ##Matching /subfolder /(as optional) and whole non-capturing group as optional.
(?:\/\?\S+\/?)? ##Matching /? and all non-space characters followed by /(optional) in non-capturing group, keep this optional.
)? ##Closing (2) non-capturing group here.
(?: ##Starting non-capturing group here.
\?\S+ ##Matching ? non-spaces values here.
)? ##Closing non-capturing group here.
)? ##Closing (1) non-capturing group here.
(?: ##Starting non-capturing group here.
\/ ##Matching single / here.
)? ##Closing non-capturing group here, keeping it optional.
$ ##Mentioning $ to tell the end of value(match).

Bash regex for same sender and receiver with backreference

I try to make a regex (important that ist a regex because i need it for fail2ban) to match when
the receiver and the sender are the same person:
echo "from=<test#test.ch> to=<test#test.ch>" | grep -E -o '([^=]*\s)[ ]*\1'
What am I doing wrong ?

You might use a pattern to match the format of the string between the brackets with a backreference to that capture.
from(=<[^\s#<>]+#[^\s#<>]+>)\s*to\1
Explanation
from Match literally
( Capture group 1
=< Match literally
[^\s#<>]+ Match 1+ times any char except a whitespace char or # < >
# Match literally
[^\s#<>]+ Again match 1+ times any char except a whitespace char or # < >
> Match literally
) Close group 1
\s*to\1 Match 0+ whitespace chars, to and the backreference to group 1
Regex demo | Bash demo
Use grep -P instead of -E for Perl compatible regular expressions.
For example
echo "from=<test#test.ch> to=<test#test.ch>" | grep -oP 'from(=<[^\s#<>]+#[^\s#<>]+>)\s*to\1'
A bit broader match could be capturing what is between the brackets
[^=\s]+(=<[^<>]+>)\s*[^=\s]+\1
Regex demo

Bash regex matching "0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."

In a Bash script I'm writing, I need to capture the /path/to/my/file.c and 93 in this line:
0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).
0xffffffc0006e0584 is in another_function(char *arg1, int arg2) (/path/to/my/other_file.c:94).
With the help of regex101.com, I've managed to create this Perl regex:
^(?:\S+\s){1,5}\((\S+):(\d+)\)
but I hear that Bash doesn't understand \d or ?:, so I came up with this:
^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)
But when I try it out:
line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([:alpha:]+[:space:]){1,5}\(([:alpha:]+):([0-9]+)\)"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[0]}
I don't get any match. What am I doing wrong? How can I write a Bash-compatible regex to do this?

You are right, Bash uses POSIX ERE and does not support \d shorthand character class, nor does it support non-capturing groups. See more regex features unsupported in POSIX ERE/BRE in this post.
Use
.*\((.+):([0-9]+)\)
Or even (if you need to grab the first (...) substring in a string):
\(([^()]+):([0-9]+)\)
Details
.* - any 0+ chars, as many as possible (may be omitted, only necessary if there are other (...) substrings and you only need to grab the last one)
\( - a ( char
(.+) - Group 1 (${BASH_REMATCH[1]}): any 1+ chars as many as possible
: - a colon
([0-9]+) - Group 2 (${BASH_REMATCH[2]}): 1+ digits
\) - a ) char.
See the Bash demo (or this one):
test='0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93).'
reg='.*\((.+):([0-9]+)\)'
# reg='\(([^()]+):([0-9]+)\)' # This also works for the current scenario
if [[ $test =~ $reg ]]; then
echo ${BASH_REMATCH[1]};
echo ${BASH_REMATCH[2]};
fi
Output:
/path/to/my/file.c
93

In the first pattern you use \S+ which matches a non whitespace char. That is a broad match and will also match for example / which is not taken into account in the second pattern.
The pattern starts with [:alpha:] but the first char is a 0. You could use [:alnum:] instead. Since the repetition should also match _ that could be added as well.
Note that when using a quantifier for a capturing group, the group captures the last value of the iteration. So when using {1,5} you use that quantifier only for the repetition. Its value would be some_function
You might use:
^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$
Regex demo | Bash demo
Your code could look like
line1="0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."
regex="^([[:alnum:]_]+[[:space:]]){1,5}\(((/[[:alpha:]]+)+\.[[:alpha:]]):([[:digit:]]+)\)\.$"
[[ $line1 =~ $regex ]]
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[4]}
Result
/path/to/my/file.c
93
Or a bit shorter version using \S and the values are in group 2 and 3
^([[:alnum:]_]+[[:space:]]){1,5}\((\S+\.[[:alpha:]]):([[:digit:]]+)\)\.$
Explanation
^ Start of string
([[:alnum:]_]+[[:space:]]){1,5} Repeat 1-5 times what is captured in group 1
\( match (
(\S+\.[[:alpha:]]) Capture group 2 Match 1+ non whitespace chars, . and an alphabetic character
: Match :
([[:digit:]]+) Capture group 3 Match 1+ digits
\)\. Match ).
$ End of string
See this page about bracket expressions
Regex demo

Match and Replace ![foo](/bar/) with Regex in SED

I'm trying to write a RegEx for SED to make it match and replace the following MarkDown text:
![something](/uploads/somethingelse)
with:
![something](uploads/somethingelse)
Now, in PCRE the matching pattern would be:
([\!]|^)(\[.*\])(\(\/bar[\/])
as tested on Regex101:
but on SED it's invalid.
I've tried a lot of combinations before asking, but I'm going crazy since I'm not a RegEx expert.
Which is the right SED regex to match and split that string in order to make the replacement with sed as described here?

The sed command you need should be run with the -E option as your regex is POSIX ERE compliant. That is, the capturing parentheses should be unescaped, and literal parentheses must be escaped (as in PCRE).
You may use
sed -E 's;(!\[.*])(\(/uploads/);\1(uploads/;g'
Details
(!\[.*]) - Capturing group 1:
! - a ! char (if you use "...", you need to escape it)
\[.*] - a [, then any 0+ chars and then ]
(\(/uploads/) - Capturing group 2:
\( - a ( char
/uploads/ - an /uploads/ substring.
The POSIX BRE compliant pattern (the actual "quick fix" of your current pattern) will look like
sed 's;\(!\|^\)\(\[.*](\)/\(uploads/\);\1\2\3;g'
Note that the \(...\) define capturing groups, ( matches a literal (, and \| defines an alternation operator.
Details
\(!\|^\) - Capturing group 1: ! or start of string
\(\[.*](\) - Capturing group 2: a [, then 0+ chars, and then (
/ - a / char
\(uploads/\) - Capturing group 3: uploads/ substring
See the online sed demo
The ; regex delimiter helps eliminate escaping \ chars before / and make the pattern more readable.

Pattern regex substitution in Notepad++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How do I achieve this kind of regex substitution in Notepad++ & Linux / Unix Korn shell (Plain BSD Linux)?
z1.9z.01.01 Yabdadba do
da.8p.25.7p Foobar
tg.7j.75.2q Whatever
90.6q.88.zx Jane Doe
Note the char. I am not sure what you want to call it.
Substitution #1
o/p should be
Yabdadba do
Foobar
Whatever
Jane Doe
Substitution #2
o/p should be
9z Yabdadba do
8p Foobar
7j Whatever
6q Jane Doe
Substitution #3
o/p should be
z1.9z.01.01
da.8p.25.7p
tg.7j.75.2q
90.6q.88.zx
I tried using ^.* and $ with the regex option, but it won't do anything.

Using the assumption that the parts are fixed and of this form XX.XX.XX.XX
For Substitution # 1
Find (?m)^[^.\s]{2}(?:\.[^.\s]{2}){3}[^\S\r\n]+(?=\S.*)
Replace nothing
(?m) # Multi-line mode
^ # BOL
[^.\s]{2} # Four parts separated by dot's
(?: \. [^.\s]{2} ){3}
[^\S\r\n]+ # Whitespace following
(?= \S .* ) # Must be some text here
For Substitution # 2
Find (?m)^[^.\s]{2}\.([^.\s]{2})(?:\.[^.\s]{2}){2}(?=[^\S\r\n]+\S.*)
Replace ' $1 '
(?m) # Multi-line mode
^ # BOL
[^.\s]{2} # Four parts separated by dot's
\.
( [^.\s]{2} ) # (1)
(?: \. [^.\s]{2} ){2}
(?= # Whitespace following
[^\S\r\n]+
\S .* # Must be some text here
)
For Substitution # 3
Find (?m)^([^.\s]{2}(?:\.[^.\s]{2}){3})[^\S\r\n]+\S.*
Replace $1
(?m) # Multi-line mode
^ # BOL
( # (1 start), Four parts separated by dot's
[^.\s]{2}
(?: \. [^.\s]{2} ){3}
) # (1 end)
[^\S\r\n]+ # Whitespace following
\S .* # Must be some text here

^([a-z0-9]+?[.]([a-z0-9]+?)[.][a-z0-9]+?[.][a-z0-9]+?[ ]+(.+)$
Capture group 1 contains the dotted strings
Capture group 2 contains the second term of the dotted strings
Capture group 3 contains the names on the right side.
You can try at regex tester online

Since you mentioned Unix shell:
cut -f2 yourfile or awk '{print $2}' yourfile
awk -F"[\t.]" '{print $2, $5}' yourfile
cut -f1 yourfile or awk '{print $1}' yourfile
cut selects fields from files, so your first and last question demanded to select the second and first field. awk is more versatile but can be used for the same task.
Your second question asks for printing the second and fifths fields (fields separated by either tab or ".").

For notepad++ :
Substitution # 1
find = ^.*?\s+(.*?)$
repalce = \1
Substitution # 2
find = ^(\w{2})\.(\w{2})\.(\w{2})\.(\w{2})\s+(.*?)$
repalce = \2 \5
Substitution # 3
find = ^([a-z0-9.]+).*?$
repalce = \1

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to parse the following date using grep command in bash - regex

Related

Regex to match specific domain and it's subfolder

Bash regex for same sender and receiver with backreference

Bash regex matching "0xffffffc0006e0584 is in some_function (/path/to/my/file.c:93)."

Match and Replace ![foo](/bar/) with Regex in SED

Pattern regex substitution in Notepad++ [closed]

Categories

Resources