how to shell script regex perfect matching?

how to shell script regex perfect matching? - regex

I have a Bash script file that matches a regex.
My regex script file:
if [[ "$image" =~ [0-9]+(\.[0-9]+){3}\-[0-9]+$ ]]; then
I need to pass cases that only match 0.0.0.0-0000
These are my inputs and results.
pass : 0.0.0.0-0000
pass : 0.0.0.0.0.0-0000 << Unwanted match
no : 0.0.0.0-word
no : 0.0.0.0
As I marked above 0.0.0.0.0.0-0000 gets a match with my regex.
My question is how can I modify my regex to only match the pattern 0.0.0.0-0000?

Assuming that you are trying to match up some sort of IP address like String I came up with this regex.
^(\d+\.?){4}-\d+
Regex Demo
Note the \d+ in first capturing group (\d+\.?) which will match any number before a .. If the only starting pattern is 0.0.0.0, you can remove the + mark here to only match one digit character.
Explanation:
^ - Captures start of a String
(\d+\.?){4} - Captures a number that ends with a optional . character 4 times in a row capturing 0.0.0.0
-\d+ - Captures - character and sequence of digits in a row capturing -0000

This issue is solved.
The follow answer to up #The fourth bird
i missed anchor(^).
To clarify the starting and ending points, It should be between '^' and '$'.
You can refer to answer
if [[ "$image" =~ ^[0-9]+(\.[0-9]+){3}\-[0-9]+$ ]]; #The fourth bird Jul 11 at 8:43
Thank you for replayers XD

Related

Regex POSIX - How can i find if the start of a line contains a word from a word that appears later in line

I have a UNIX passwd file and i need to find using egrep if the first 7 characters from GECOS are inside the username. I want to check if the username (jkennedy) contains the word "kennedy" from the GECOS.
I was planning to use back-references but the username is before the gecos so i don't know how to implement it.
For example the passwd file contains this line:
jkennedy:x:2473:1067:kennedy john:/root:/bin/bash

As per my original comment, the regex below works for me.
See it in use here - note this regex differs slightly as it's more used for display purposes. The regex below is the POSIX version of this and removes non-capture groups and the unneeded capture group around the backreference.
^[^:]*([^:]{7})([^:]*:){4}\1.*$
^ assert position at the start of the line
[^:]* match any character except : any number of times
([^:]{7}) capture exactly seven of any character except :
([^:]*:){4} match the following exactly four times
[^:]*: match any character except : any number of times, followed by : literally
\1 match the backreference; matches what was previously matched by the first capture gorup
.* match any character (except newline characters) any number of times
$ assert position at the end of the line

Assuming you do NOT want case sensitivity to foul your matching -
declare -l tmpUsr tmpName
while IFS=: read usr x x x name x
do tmpUsr="$usr"; tmpName="$name"
(( ${#name} )) && [[ "$tmpUsr" =~ ${tmpName:0:7} ]] &&
printf "$usr ($name<${tmpName:0:7}>)\n"
done</etc/passwd

last year occurrence from string

I have strings like this:
ACB 01900 X1911D 1910 1955-2011 3424 2135 1934 foobar
I'm trying to get the last occurrence of a single year (from 1900 to 2050), so I need to extract only 1934 from that string.
I'm trying with:
grep -P -o '\s(19|20)[0-9]{2}\s(?!\s(19|20)[0-9]{2}\s)'
or
grep -P -o '((19|20)[0-9]{2})(?!\s\1\s)'
But it matches: 1910 and 1934
Here's the Regex101 example:
https://regex101.com/r/UetMl0/3
https://regex101.com/r/UetMl0/4
Plus: how can I extract the year without the surrounding spaces without doing an extra grep to filter them?

Have you ever heard this saying:
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems.
Keep it simple - you're interested in finding a number between 2 numbers so just use a numeric comparison, not a regexp:
$ awk -v min=1900 -v max=2050 '{yr=""; for (i=1;i<=NF;i++) if ( ($i ~ /^[0-9]{4}$/) && ($i >= min) && ($i <= max) ) yr=$i; print yr}' file
1934
You didn't say what to do if no date within your range is present so the above outputs a blank line if that happens but is easily tweaked to do anything else.
To change the above script to find the first instead of the last date is trivial (move the print inside the if), to use different start or end dates in your range is trivial (change the min and/or max values), etc., etc. which is a strong indication that this is the right approach. Try changing any of those requirements with a regexp-based solution.

I don't see a way to do this with grep because it doesn't let you output just one of the capture groups, only the whole match.
Wit perl I'd do something like
perl -lpe 'if (/^.*\b(19\d\d|20(?:0-4\d|50))\b/) { print $1 }'
Idea: Use ^.* (greedy) to consume as much of the string up front as possible, thus finding the last possible match. Use \b (word boundary) around the matched number to prevent matching 01900 or X1911D. Only print the first capture group ($1).
I tried to implement your requirement of 1900-2050; if that's too complicated, ((?:19|20)\d\d) will do (but also match e.g. 2099).

The regex to do your task using grep can be as follows:
\b(?:19\d{2}|20[0-4]\d|2050)\b(?!.*\b(?:19\d{2}|20[0-4]\d|2050)\b)
Details:
\b - Word boundary.
(?: - Start of a non-capturing group, needed as a container for
alternatives.
19\d{2}| - The first alternative (1900 - 1999).
20[0-4]\d| - The second alternative (2000 - 2049).
2050 - The third alternative, just 2050.
) - End of the non-capturing group.
\b - Word boundary.
(?! - Negative lookahead for:
.* - A sequence of any chars, meaning actually "what follows
can occur anywhere further".
\b(?:19\d{2}|20[0-4]\d|2050)\b - The same expression as before.
) - End of the negative lookahead.
The word boundary anchors provide that you will not match numbers - parts
of longer words, e.g. X1911D.
The negative lookahead provides that you will match just the last
occurrence of the required year.
If you can use other tool than grep, supporting call to a previous
numbered group (?n), where n is the number of another capturing
group, the regex can be a bit simpler:
(\b(?:19\d{2}|20[0-4]\d|2050)\b)(?!.*(?1))
Details:
(\b(?:19\d{2}|20[0-4]\d|2050)\b) - The regex like before, but
enclosed within a capturing group (it will be "called" later).
(?!.*(?1)) - Negative lookahead for capturing group No 1,
located anywhere further.
This way you avoid writing the same expression again.
For a working example in regex101 see https://regex101.com/r/fvVnZl/1

You may use a PCRE regex without any groups to only return the last occurrence of a pattern you need if you prepend the pattern with ^.*\K, or, in your case, since you expect a whitespace boundary, ^(?:.*\s)?\K:
grep -Po '^(?:.*\s)?\K(?:19\d{2}|20(?:[0-4]\d|50))(?!\S)' file
See the regex demo.
Details
^ - start of line
(?:.*\s)? - an optional non-capturing group matching 1 or 0 occurrences of
.* - any 0+ chars other than line break chars, as many as possible
\s - a whitespace char
\K - match reset operator discarding the text matched so far
(?:19\d{2}|20(?:[0-4]\d|50)) - 19 and any two digits or 20 followed with either a digit from 0 to 4 and then any digit (00 to 49) or 50.
(?!\S) - a whitespace or end of string.
See an online demo:
s="ACB 01900 X1911D 1910 1955-2011 3424 2135 1934 foobar"
grep -Po '^(?:.*\s)?\K(?:19\d{2}|20(?:[0-4]\d|50))(?!\S)' <<< "$s"
# => 1934

Looking for regex to match before and after a number

Given the string
170905-CBM-238.pdf
I'm trying to match 170905-CBM and .pdf so that I can replace/remove them and be left with 238.
I've searched and found pieces that work but can't put it all together.
This-> (.*-) will match the first section and
This-> (.[^/.]+$) will match the last section
But I can't figure out how to tie them together so that it matches everything before, including the second dash and everything after, including the period (or the extension) but does not match the numbers between.
help :) and thank you for your kind consideration.

There are several options to achieve what you need in Nintex.
If you use Extract operation, use (?<=^.*-)\d+(?=\.[^.]*$) as Pattern.
See the regex demo.
Details
(?<=^.*-) - a positive lookbehind requiring, immediately to the left of the current location, the start of string (^), then any 0+ chars other than LF as many as possible up to the last occurrence of - and the subsequent subpatterns
\d+ - 1 or more digits
(?=\.[^.]*$) - a positive lookahead requiring, immediately to the right of the current location, the presence of a . and 0+ chars other than . up to the end of the string.
If you use Replace text operation, use
Pattern: ^.*-([0-9]+)\.[^.]+$
Replacement text: $1
See another regex demo (the Context tab shows the result of the replacement).
Details
^ - a start of string anchor
.* - any 0+ chars other than LF up to the last occurrence of the subsequent subpatterns...
- - a hyphen
([0-9]+) - Group 1: one or more ASCII digits
\. - a literal .
[^.]+ - 1 or more chars other than .
$ - end of string.
The replacement $1 references the value stored in Group 1.

I don't know ninetex regex, but a sed type regex:
$ echo "170905-CBM-238.pdf" | sed -E 's/^.*-([0-9]*)\.[^.]*$/\1/'
238
Same works in Perl:
$ echo "170905-CBM-238.pdf" | perl -pe 's/^.*-([0-9]*)\.[^.]*$/$1/'
238

PowerShell -replace to get string between two different characters

I am current using split to get what I need, but I am hoping I can use a better way in powershell.
Here is the string:
server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000
I want to get the server and database with out the database= or the server=
here is the method I am currently using and this is what I am currently doing:
$databaseserver = (($details.value).split(';')[0]).split('=')[1]
$database = (($details.value).split(';')[1]).split('=')[1]
This outputs to:
ss8.server.com
CSSDatabase
I would like it to be as simple as possible.
Thank you in advance

Replacing approach
You may use the following regex replace:
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$dbserver = $s -replace '^server=([^;]+).*', '$1'
$db = $s -replace '^[^;]*;database=([^;]+).*', '$1'
The technique is to match and capture (with (...)) what we need and just match what we need to remove.
Pattern details:
^ - start of the line
server= - a literal substring
([^;]+) - Group 1 (what $1 refers to) matching 1+ chars other than ;
.* - any 0+ chars other than a newline, as many as possible
Pattern 2 is almost the same, the capturing group is shifted a bit to capture another detail, and some more literal values are added to match the right context.
Note: if the values you need to extract may appear anywhere in the string, replace ^ in the first one and ^[^;]*; pattern in the second one with .*?\b (any 0+ chars other than a newline, as few as possible followed with a word boundary).
Matching approach
With a -match, you may do it the following way:
$s -match '^server=(.+?);database=([^;]+)'
The $Matches[1] will contain the server details and $Matches[2] will hold the DB info:
Name Value
---- -----
2 CSSDatabase
1 ss8.server.com
0 server=ss8.server.com;database=CSSDatabase
Pattern details
^ - start of string
server= - literal substring
(.+?) - Group 1: any 1+ non-linebreak chars as few as possible
;database= - literal substring
([^;]+) - 1+ chars other than ;

Another solution with a RegEx and named capture groups, similar to Wiktor's Matching Approach.
$s = 'server=ss8.server.com;database=CSSDatabase;uid=WS_CSSDatabase;pwd=abc123-1cda23-123-A7A0-CC54;Max Pool Size=5000'
$RegEx = '^server=(?<databaseserver>[^;]+);database=(?<database>[^;]+)'
if ($s -match $RegEx){
$Matches.databaseserver
$Matches.database
}

Replace last occurrence of character in string [duplicate]

This question already has answers here:
How to replace last occurrence of characters in a string using javascript
(3 answers)
Closed 6 years ago.
I've got the following string :
01/01/2014 blbalbalbalba blabla/blabla
I would like to replace the last slash with a space, and keep the first 2 slashes in the date.
The closest thing I have come up with was this kind of thing :
PS E:\> [regex] $regex = '[a-z]'
PS E:\> $regex.Replace('abcde', 'X', 3)
XXXde
but I don't know how to start from the end of the line. Any help would be greatly appreciated.
Editing my question to clarify :
I just want to replace the last slash character with a space character, therefore :
01/01/2014 blbalbalbalba blabla/blabla
becomes
01/01/2014 blbalbalbalba blabla blabla
Knowing that the length of "blabla" varies from one line to the other and the slash character could be anywhere.
Thanks :)

Using the following string to match:
(.*)[/](.*)
and the following to replace:
$1 $2
Explanation:
(.*) matches anything, any number of times (including zero). By wrapping it in parentheses, we make it available to be used again during the replace sequence, using the placeholder $ followed by the element number (as an example, because this is the first element, $1 will be the placeholder). When we use the relevant placeholder in the replace string, it will put all of the characters matched by this section of the regex into the resulting string. In this situation, the matched text will be 01/01/2014 blbalbalbalba blabla
[/] is used to match the forward slash character.
(.*) again is used to match anything, any number of times, similar to the first instance. In this case, it will match blabla, making it available in the $2 placeholder.
Basically, the first three elements work together to find a number of characters, followed by a forward slash, followed by another number of characters. Because the first "match everything" is greedy (that is, it will attempt to match as many character as possible), it will include all of the other forward slashes as well, up until the last. The reason that it stops short of the last forward slash is that including it would make the regex fail, as the [/] wouldn't be able to match anything any more.

You can also use lookahead:
'01/01/2014 blbalbalbalba blabla/blabla' -replace '/(?=[^/]+$)',' '
01/01/2014 blbalbalbalba blabla blabla
'/(?=[^/]+$)' will match a '/' character that comes right before a series of 'not /' characters immediately before EOL, but this is probably less efficient than the direct matches.

'01/01/2014 blbalbalbalba blabla/blabla' -replace '^(\d{2}/\d{2})/(\d{4} .*)','$1 $2'
# outputs this:
# 01/01 2014 blbalbalbalba blabla/blabla

Here's how you can do it without regular expressions:
$string = "01/01/2014 blbalbalbalba blabla/blabla"
$last_index = $string.LastIndexOf('/')
$chars = $string.ToCharArray()
$chars[$last_index] = ' '
$new_string = $chars -join ''
Another way:
$string = "01/01/2014 blbalbalbalba blabla/blabla"
$last_index = $string.LastIndexOf('/')
$new_string = $string.Remove($last_index, 1).Insert($last_index, ' ')

$ is the anchor for end of line.
So
(.*?)([a-z])$
should match what you want, and the thing in () is what you want to replace.
Best regards

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

how to shell script regex perfect matching? - regex

This issue is solved. The follow answer to up #The fourth bird i missed anchor(^). To clarify the starting and ending points, It should be between '^' and '$'. You can refer to answer if [[ "$image" =~ ^[0-9]+(\.[0-9]+){3}\-[0-9]+$ ]]; #The fourth bird Jul 11 at 8:43 Thank you for replayers XD

Related

Regex POSIX - How can i find if the start of a line contains a word from a word that appears later in line

last year occurrence from string

Looking for regex to match before and after a number

PowerShell -replace to get string between two different characters

Replace last occurrence of character in string [duplicate]

Categories

Resources