Regex match zero or one group - regex

I have filenames in format <pod-name>_<namespace-name>_<container-name>-<dockerid>.log
For example:
pod-name_namespace-name_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
pod-name_namespace-name-1234567890_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
Actually this is the k8s container's log files.
The namespace-name may contain numeric postfix that represents automation system run id (github.run_id - 10 digits number).
I need to parse filenames with regex to extract pod name, namespace name without run id, run id, container name and docker id.
Regex based on default fluentbit kubernetes parser that I need to change for our usage:
(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)(-(?<run_id>\d{10,}))_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
https://rubular.com/r/CROBxpHHgX5UZx
The regex above parses well filenames that contains namespace with run id, but fails to parse namespace without run id:
pod-name_namespace-name_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
https://rubular.com/r/6MSQsnuGzrkVJG
In this case the run_id should be empty string
How to fix it that it match both cases?

You can use
(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+?)(-(?<run_id>\d{10,}))?_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
See the regex demo.
The main point is to make two changes in (?<namespace_name>[^_]+)(-(?<run_id>\d{10,})) part:
make the [^_]+ pattern lazy, so that it could match as few chars other than _ as possibe, i.e. add a ? after +
make the (-(?<run_id>\d{10,})) part optional by adding a ? quantifier after the group.

Related

Regex Group Name prefix multiple options

I'm performing regex extraction for parsing logs for our SIEM. I'm working with PCRE2.
In those logs, I have this problem: I have to extract a field that can be preceded by multiple options and I want use only one group name.
Let me be clearer with an example.
The SSH connection can appear in our log with this form:
UserType=SSH,
And I know that a simple regex expression to catch this is:
UserType=(?<app>.*?),
But, at the same time, SSH can appear with another "prefix":
ACCESS TYPE:SSH;
that can be captured with:
ACCESS\sTYPE:(?<app>.*?);
Now, because the logical field is the same (SSH protocol) and I want map it in every case under group name "app", is there a way to put the previous values in OR and use the same group name?
The desiderd final result is something like:
(UserType=) OR (ACCESS TYPE:) <field_value_here>
You can use
(?:UserType=|ACCESS\sTYPE:)(?<app>[^,;]+)
See the regex demo. Details:
(?:UserType=|ACCESS\sTYPE:) - either UserType= or ACCESS + whitespace + TYPE:
(?<app>[^,;]+) - Group "app": one or more chars other than , and ;.

Visual studio code find replace another part of the string used in the find

I am using visual studio code to find and replace another part of the string.
The string will always contain the string "sitemap" without the quotes but i want to remove index.html
Some examples of what i need replaced:
front/index.htmltemplate.xsl
to
front/template.xsl
com/index.htmlwp-sitemap
to
com/wp-sitemap
Some my attempts on the vs studio code regex search box incude
sitemap[^"']*index.html
and
sitemap.*?(?=index.html)
but neither is identifying all of the strings that need replacing
This might do the trick for you: (index\.html)(?=.+(sitemap))
Explanation:
() = Group multiple tokens together to create a capture group
(index.html) = Create a capture group around your target string to replace
(?=.+(sitemap)) = Create a capture group for sitemap and allow for any type and number characters between sitemap and index.html until reaching "sitemap".
?= means this is a "positive lookahead" meaning it will match a group after the main expression without including it in the result. In this case it means it will match sitemap and any chars before it without including it in your result -- so you just get index.html.
https://regexr.com/6iipm

regex - get new path string from old path string

I'm trying to run a shell script in linux and want to turn this:
/path/to/(\w+)/b/c
into
/path/to/(\w+)/b/(\w+)\.txt
(where \w+ should remain the same as given in input).
I keep getting 'No match found'.
You need to use the capturing group and then use that in your substitution.
\/r\/path\/to\/(\w+).*
Test string
/r/path/to/teststring/b/c
Substitution
/path/to/\1/b/\1\.txt
Result
/path/to/teststring/b/teststring.txt
I have created a regex101 playground for you here
https://regex101.com/r/R0O3OK/1

Regex pattern for Prometheus exporter

I am trying to create a regex pattern for one of the prometheus exporter (jmx exporter) configuration file to export weblogic jms queues.
My String is as below
(com.bea<ServerRuntime=AC_Server-10-100-40-122, Name=iLoyalJMSModule!AC_JMSServer#AC_Server-10-100-40-122#com.ibsplc.iloyal.eai.EN.retro.outErrorqueue, Type=JMSDestinationRuntime, JMSServerRuntime=AC_JMSServer#AC_Server-10-100-40-122><>MessagesCurrentCount)
And the RegEx is as below
Pattern
com.bea<ServerRuntime=(.+), Name=(.+), Type=(.+), JMSServerRuntime=(.+)<>(MessagesCurrentCount|MessagesPendingCount)
Name to display in Prometheus exporter output
name: "weblogic_jmsserver_$1_$5"
Current Output
weblogic_jmsserver_ac_server_10_100_40_122_messagescurrentcount
Now i would like to add the queue outErrorqueue name to my output from the Name= string and the final output should be like below.
Required Output
weblogic_jmsserver_ac_server_10_100_40_122_outErrorqueue_messagespendingcount
You could change the number of capture groups from 5 to the 2 that you need in the replacement. Instead of using .+, you can either use .*? or use a negated character class to match any char except a commen [^,]+
If the surrounding parenthesis of the example data should not be part of the replacement, you can use:
\(com\.bea<ServerRuntime=([^,]+), Name=[^,]+, Type=[^,]+, JMSServerRuntime=.+?<>(Messages(?:Current|Pending)Count)\)
In the replacement use:
weblogic_jmsserver_$1_outErrorqueue_$2
See a regex demo

How to use Postgres Regex Replace with a capture group

As the title presents above I am trying to reference a capture groups for a regex replace in a postgres query. I have read that the regex_replace does not support using regex capture groups. The regex I am using is
r"(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?"gm
The above regex almost does what I need it to but I need to find out how to only allow a match if the capture groups also capture something. There is no situation where a "username" should be matched if it just so happens to be a substring of a word. By ensuring its surrounded by one of the above I can much more confidently ensure its a username.
An example application of the regex would be something like this in postgres (of course I would be doing an update vs a select):
select *, REGEXP_REPLACE(reqcontent,'(?:[\s\(\)\=\)\,])(username)(?:[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm') from table where column like '%username%' limit 100;
If there is any more context that can be provided please let me know. I have also found similar posts (postgresql regexp_replace: how to replace captured group with evaluated expression (adding an integer value to capture group)) but that talks more about splicing in values back in and I don't think quite answers my question.
More context and example value(s) for regex work against. The below text may look familiar these are JQL filters in Jira. We are looking to update our usernames and all their occurrences in the table that contains the filter. Below is a few examples of filters. We originally were just doing a find a replace but that doesn't work because we have some usernames that are only two characters and it was matching on non usernames (e.g je (username) would place a new value in where the word project is found which completely malforms the JQL/String resulting in something like proNEW-VALUEct = balh blah)
type = bug AND status not in (Closed, Executed) AND assignee in (test, username)
assignee=username
assignee = username
Definition of Answered:
Regex that will only match on a 'username' if its surrounded by one of the specials
A way to regex/replace that username in a postgres query.
Capturing groups are used to keep the important bits of information matched with a regex.
Use either capturing groups around the string parts you want to stay in the result and use their placeholders in the replacement:
REGEXP_REPLACE(reqcontent,'([\s\(\)\=\)\,])username([\s\(\)\=\)\,])?' ,'\1NEW-VALUE\2', 'gm')
Or use lookarounds:
REGEXP_REPLACE(reqcontent,'(?<=[\s\(\)\=\)\,])(username)(?=[\s\(\)\=\)\,])?' ,'NEW-VALUE', 'gm')
Or, in this case, use word boundaries to ensure you only replace a word when inside special characters:
REGEXP_REPLACE(reqcontent,'\yusername\y' ,'NEW-VALUE', 'g')