Using Regex to capture data in jenkins pipeline - regex

I want to pull apart a docker image uri inside of my Jenkins pipeline. Example uri = https://test-registry.home.imagehub.com/engineering/images/rhel:latest. I would like to break this down into 4 parts.
https://(test-registry).home.imagehub.com/(engineering/images)/(rhel):(latest)
$1 = test-registry
$2 = engineering/images
$3 = rhel
$4 = latest
caveats $2 sometimes only has one directory level
I believe the proper regex for this is:
/\w+:\/\/(.*)\.[^\/]+/(.*)/(\w+):(.*)/
This should grab everything after the //s up to the first . (period) for $1
The [^\/] should move up to the / divider through the subdomains
I want the second (.*) to be greedy up to but not including the last / divider.
The 3rd part should grab everything after the last / divider and before the :
the 4th part should grab everything after the : (colon)
I did include the regex
import java.util.regex.Pattern
I have tried a lot of different solutions to make this work in my jenkins declarative pipeline such as inline:
sReg = https://test-registry.home.imagehub.com/engineering/images/rhel:latest
pipeline
agent
stages
stage ('regex')
steps
scripts
def sReg = https://test-registry.home.imagehub.com/engineering/images/rhel:latest
def registry = sReg =~ /\w+:\/\/(.*)\./
here I expected:
registry = test-registry
So I echo the result
echo registry.group(1)
but this gives me:
java.lang.IllegalStateException: No match found
Which makes no sense. I have tried just getting the https with /(\w+):/ but same result.
I tried with a class also but same result, no matches.
I would like to get all four in one shot but I will pull out each part in a separate regex if need be. At this point I would like to figure out how I can at least get a match.
I added:
import java.util.regex.Matcher
Then down in the script section I added:
def registry = Sreg =~ ///([^.]+)[^/]+/((?:[^/]+/)*[^/]+)/(.+):(.+)/
println(registry)
When I run the pipeline I get:
[Pipeline] echo
java.util.regex.Matcher[pattern=//([^.]+)[^/]+/((?:[^/]+/)*[^/]+)/(.+):(.+) region=0,78 lastmatch=]
So instead of extracting the information it is equaling the pattern?
I did an echo on Sreg and it shows the uri so it should be getting the uri in my def.

You can access groups with :
def sReg = "https://test-registry.home.imagehub.com/engineering/images/rhel:latest"
def registry = sReg =~ /\w+:\/\/(.*)\.[^\/]+/(.*)/(\w+):(.*)/
println registry[0] //For all groups
println registry[0][1] //For group 1
Sources and examples : https://mrhaki.blogspot.com/2009/09/groovy-goodness-matchers-for-regular.html

So it turns out my whole issue was my agent statement.
Instead of: agent { node 'slave' }
I had: agent {node 'slave'}
Once I fixed this the regex started working.
FYI
I found out also the println(registry) statement is still just the pattern matcher but now println(registry[0][1]) gives me the captured text.

Related

Based on a REGEX pattern set a value using Groovy in Jenkins pipeline

Based on a REGEX validation I am trying to set a value which is not happening as expected in Groovy Jenkins pipeline
If version = 1.1.1 or 1.5.9 or 9.9.9 like that then it has to be a managed store else like 1.22.9 or 1.99.9 ... tmp store. Below is what I tried with
String artefact_bucket_name
def artefact_version = "1.99.0"
if (artefact_version ==~ /[0-9]{1}\.[0-9]{1}\.[0-9]{1}/) {
artefact_bucket_name = "managed-artefact-store"
}
if (artefact_version ==~ /[0-9]{1}\.[0-9]{1,2}\.[0-9]{1}/) {
artefact_bucket_name = "tmp-artefact-store"
}
echo "Application version to deploy is ${artefact_version} from Artefact store ${artefact_bucket_name}"
It looks like you have a mistake in the second regex, that is overriding first one. e.g. when you have artefact_version = 1.1.1 - It matches first regex and second regex as well, so it always will be tmp-artefact-store.
I would change the second regex to match string like:
/[0-9]{1}\.[0-9]{2}\.[0-9]{1}/ - Notice I changed {1,2} to only {2}. This will exclusively match strings like "\d.\d\d.\d", so version like 1.1.1 will match only first regex and version like 1.99.9 - only second

-replace replaces RegEx group with its name

I have a replace statement like this:
$var1 = "http"
$var2 = "1.2.3.4"
$json = $json -replace '(url = ["''])(.*)(:\/\/)(.*)(["''])', "`$1$var1`$3$var2`$5"
It was supposed to leave below line as it is:
url = "http://1.2.3.4"
while it is being changed to
url = "http$31.2.3.4"
As far as I understand `$3 should be replaced with :// just like `$5 has been replaced with ".
Is there any rule that I'm constantly missing?
Edit:
I have checked on multiple computers and here's what I've found out:
The same code works fine on other computers (tested on Windows Server 2016 and Windows 10),
The same code works fine on Azure VM connected via Remote Desktop (Windows Server 2016),
On my computer (Windows Server 2019) it fails as described,
On the same VM as in point 2. using Remote Desktop from my computer it fails as well.
Now I really don't have any idea what's happening. Maybe something with locale?
Settings:
Location: US
Language: English (US)
Keyboard layout: PLP
Specific settings:
Number formats: -123,456,789.00
First day of a week: Monday
Time format: HH:mm:ss
Date format: yyyy-mm-dd
I know that this is seems not to be connected but I have no idea at all.
Edit2:
Seems like RegEx does not work as expected when first character after (even escaped) RegEx group is a number (even as variable). But still have no idea how to omit this.
To exactly match the leading quote I suggest to use a backreference with a nested group
To insert a variable into to the replacement string enclose the name in curly braces
the forward slash doesn't need to be escaped.
to insert a capture group (number $1) in the replacement which might interfere with the following text, enclose the number also in ${1} curly braces
## Q:\Test\2019\01\10\SO_54131783.ps1
$json = 'url = "http://localhost"'
$var1 = "http"
$var2 = "1.2.3.4"
$json = $json -replace '(url = (["''])).*?(://).*?(\2)',
"`${1}${var1}`${3}${var2}`${2}"
$json
url = "http://1.2.3.4"

REGEX Pattern for Username inside a longer string

MAC OSX, PowerShell 6.1 Core
I'm struggling with creating the correct REGEX pattern to find a username string in the middle of a url. In short, I'm working in Powershell Core 6.1 and pulling down a webpage and scraping out the "li" elements. I write this to a file so I have a bunch of lines like this:
<LI>Smith, Jimmy
The string I need is the "jimmysmith" part, and every line will have a different username, no longer than eight alpha characters. My current pattern is this:
(<(.|\n)+?>)|( )
and I can use a "-replace $pattern" in my code to grab the "Smith, Jimmy" part. I have no idea what I'm doing, and any success in getting what I did get was face-roll-luck.
After using several online regex helpers I'm still stuck on how to just get the "string after the third "/" and up-to but not including the last quote.
Thank you for any assistance you can give me.
You could go super-simple,
expand-user/([^"]+)
Find expand-user, then capture until a quotation.
(?:\/.*){2}\/(?<username>.*)"
(?:\/.*) Matches a literal / followed by any number of characters
{2} do the previous match two times
\/ match another /
(?<username>.*)" match everything up until the next " and put it in the
username group.
https://regex101.com/r/0gj7yG/1
Although, since each line is presumably identical up until the username:
$line = ("<LI>Smith, Jimmy ")
$line = $line.Substring(36,$line.LastIndexOf("\""))
the answer is what was posted by Dave. I saved my scraped details to a file (the lines with "li") by doing:
get-content .\list.txt -ReadCount 1000| foreach-object { $_ -match "<li>"} |out-file .\transform.txt
I then used the method proposed by Dave as follows:
$a = get-content .\transform.txt |select-string -pattern '(?:\/.*){2}\/(?<username>.*)"' | % {"$($_.matches.groups[1])"} |out-file .\final.txt
I had to look up how to pull the group name out, and i used this reference to figure that out: How to get the captured groups from Select-String?

Issue using [Regex]::Match in powershell

I am trying to extract an URL from the Body of a Mail in PowerShell.
I am using following regex: (found on this site)
$regexURL = "#^(https?|ftp)://[^\s/$.?#].[^\s]*$#iS"
then I loop into a Mail folder and for each mail-item:
foreach ($Mail in $subfolder.items) {
$a = [Regex]::Match($Mail.Subject, $regexURL).Groups[1].Value
$b = [Regex]::Match($Mail.Body, $regexURL).Groups[1].Value
}
But even when Mail.Subject or Body contains a valid URL, $a and $b stay empty.
I am afraid I did not understand how is Match() working.
Thanx for any help on that question.
Jerome
The regex is for another language (PHP probably), so you need to modify it to the .NET syntax. Powershell is case insensitive by default and S is an optimizing modifier for PHP I think, so we'll skip both of those. Try:
$regexURL = '(https?|ftp)://[^\s/$.?#].[^\s]*$'
Sample:
$regexURL = '(https?|ftp)://[^\s/$.?#].[^\s]*$'
#URL
[regex]::Match("test http://stackoverflow.com/questions/36604481/issue-using-regexmatch-in-powershell/36605171?noredirect=1#comment60807898_36605171", $regexURL).Groups[0].Value
http://stackoverflow.com/questions/36604481/issue-using-regexmatch-in-powershell/36605171?noredirect=1#comment60807898_36605171
#Protocol
[regex]::Match("test http://stackoverflow.com/questions/36604481/issue-using-regexmatch-in-powershell/36605171?noredirect=1#comment60807898_36605171", $regexURL).Groups[1].Value
http

Regex help - separating controller and action

I need to separate the following url:
/myapp/public/controller/action
as $1 will be controller and $2 will be the action.
Here is the REGEX I´m using:
^([a-zA-Z0-9\/\-_]+)\.?([a-zA-Z\-_]+)?$
For some reason It is not separating, but putting the whole result in $1:
$1 = /myapp/public/controller/action
$2 = '' (empty)
PS: action is optional, as I may have /myapp/public/controller. In that case $2 shall be empty.
[EDIT]
The URL string may have the following formats:
/myapp/public/controller
/myapp/public/controller/action
/myapp/public/controller/action/param1
/myapp/public/controller/action/param1/param2/paramN
$1 shall contain always the controller with full path
$2 will receive the remaining (action, action/param1, action/param1/param2/paramN)
The controller will be always myapp/public/controller, where myapp/public is static and controller is the controller name that needs to go to $1 (the 3rd string).
At the extreme we can call /myapp/public and will be sending empty '' controller that will default to index on the application.
PS: Sometimes things that seens simple are exactly the other way.... Thanks for the questions...
^(.+)\/([^\/]+)$
See it in action
With the new requirements:
^((?:\/[^\/]+){2,3})((?:\/[^\/]+)*)$
See it in action
Explaination:
(?:\/[^\/]+) - matches a forward slash followed by characters, which are not forward slashes (like /myapp, /public, /controller, /action and so on)
{2,3} - the controller consists of the first two or three such sequences. Two in the case when you are using the default index of the application.
* - the remaining such sequences are part of the action