regex - get new path string from old path string - regex

I'm trying to run a shell script in linux and want to turn this:
/path/to/(\w+)/b/c
into
/path/to/(\w+)/b/(\w+)\.txt
(where \w+ should remain the same as given in input).
I keep getting 'No match found'.

You need to use the capturing group and then use that in your substitution.
\/r\/path\/to\/(\w+).*
Test string
/r/path/to/teststring/b/c
Substitution
/path/to/\1/b/\1\.txt
Result
/path/to/teststring/b/teststring.txt
I have created a regex101 playground for you here
https://regex101.com/r/R0O3OK/1

Related

Regex match zero or one group

I have filenames in format <pod-name>_<namespace-name>_<container-name>-<dockerid>.log
For example:
pod-name_namespace-name_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
pod-name_namespace-name-1234567890_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
Actually this is the k8s container's log files.
The namespace-name may contain numeric postfix that represents automation system run id (github.run_id - 10 digits number).
I need to parse filenames with regex to extract pod name, namespace name without run id, run id, container name and docker id.
Regex based on default fluentbit kubernetes parser that I need to change for our usage:
(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)(-(?<run_id>\d{10,}))_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
https://rubular.com/r/CROBxpHHgX5UZx
The regex above parses well filenames that contains namespace with run id, but fails to parse namespace without run id:
pod-name_namespace-name_container-name-7a1d0ed5675bdb365228d43f470fcee20af5c8bea84dd6d886b9bf837a9d358c.log
https://rubular.com/r/6MSQsnuGzrkVJG
In this case the run_id should be empty string
How to fix it that it match both cases?
You can use
(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+?)(-(?<run_id>\d{10,}))?_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
See the regex demo.
The main point is to make two changes in (?<namespace_name>[^_]+)(-(?<run_id>\d{10,})) part:
make the [^_]+ pattern lazy, so that it could match as few chars other than _ as possibe, i.e. add a ? after +
make the (-(?<run_id>\d{10,})) part optional by adding a ? quantifier after the group.

How to extract file name from URL?

I have file names in a URL and want to strip out the preceding URL and filepath as well as the version that appears after the ?
Sample URL
Trying to use RegEx to pull, CaptialForecasting_Datasheet.pdf
The REGEXP_EXTRACT in Google Data Studio seems unique. Tried the suggestion but kept getting "could not parse" error. I was able to strip out the first part of the url with the following. Event Label is where I store URL of downloaded PDF.
The URL:
https://www.dudesolutions.com/Portals/0/Documents/HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
REGEXP_EXTRACT( Event Label , 'Documents/([^&]+)' )
The result:
HC_Brochure_Digital.pdf?ver=2018-03-18-110927-033
Now trying to determine how do I pull out everything after the? where the version data is, so as to extract just the Filename.pdf.
You could try:
[^\/]+(?=\?[^\/]*$)
This will match CaptialForecasting_Datasheet.pdf even if there is a question mark in the path. For example, the regex will succeed in both of these cases:
https://www.dudesolutions.com/somepath/CaptialForecasting_Datasheet.pdf?ver
https://www.dudesolutions.com/somepath?/CaptialForecasting_Datasheet.pdf?ver
Assuming that the name appears right after the last / and ends with the ?, the regular expression below will leave the name in group 1 where you can get it with \1 or whatever the tool that you are using supports.
.*\/(.*)\?
It basically says: get everything in between the last / and the first ? after, and put it in group 1.
Another regular expression that only matches the file name that you want but is more complex is:
(?<=\/)[^\/]*(?=\?)
It matches all non-/ characters, [^\/], immediately preceded by /, (?<=\/) and immediately followed by ?, (?=\?). The first parentheses is a positive lookbehind, and the second expression in parentheses is a positive lookahead.
This REGEXP_EXTRACT formula captures the characters a-zA-Z0-9_. between / and ?
REGEXP_EXTRACT(Event Label, "/([\\w\\.]+)\\?")
Google Data Studio Report to demonstrate.
Please try the following regex
[A-Za-z\_]*.pdf
I have tried it online at https://regexr.com/. Attaching the screenshot for reference
Please note that this only works for .pdf files
Following regex will extract file name with .pdf extension
(?:[^\/][\d\w\.]+)(?<=(?:.pdf))
You can add more extensions like this,
(?:[^\/][\d\w\.]+)(?<=(?:.pdf)|(?:.jpg))
Demo

Regex processing in systemverilog using svlib

I am a new user of svlib package in systemverilog environment. Refer to Verilab svlib. I have following sample text , {'PARAMATER': 'lollg_1', 'SPEC_ID': '1G3HSB_1'} and I want to use regex to extract 1G3HSB from this text.
For this reason, I am using the following code snippet but I am getting the whole line instead of only the information.
wordsRe = regex_match(words[i], "\'SPEC_ID\': \'(.*?)\'");
$display("This is the output of Regex: %s", wordsRe.getStrContents())
Can anybody direct me what is going wrong?
The output I am getting : {'PARAMATER': 'lollg_1', 'SPEC_ID': '1G3HSB_1'}
And, I want to get: 1G3HSB_1
It seems you need to get the contents of the first capturing group with getMatchString(1). Also, you need to use a greedy quantifier (lazy ones are not POSIX compliant) and a negated bracket expression - [^']* instead of .*?:
wordsRe = regex_match(words[i], "\'SPEC_ID\': \'([^\']*)\'");
$display("This is the output of Regex: %s", wordsRe.getMatchString(1))
See the User Guide details:
getMatchString(m) is always exactly equivalent to calling the range method on the Str object containing the string that was searched:
range(getMatchStart(m), getMatchLength(m))

Can I capture a label not found in the test string using regex?

Assuming I have some strings of the following type:
session opened by (uid=0)
session opened by scotty
Is it possible to write a regex that will either capture the text "root" if (uid=0) is found in the string, otherwise capture the normal user name (i.e. scotty)?
Regex does not allow you to capture anything that is missing from the input string. If you know the structure of the input text, you can have a regex pattern return the required part. Here is an example that works for .NET-based regex flavor:
(?s)(?<=\(uid=0\).*opened by )\w+
Matches Found:
[0][0] = scotty

Regex Assistance for a url filepath

Can someone assist in creating a Regex for the following situation:
I have about 2000 records for which I need to do a search/repleace where I need to make a replacement for a known item in each record that looks like this:
<li>View Product Information</li>
The FILEPATH and FILE are variable, but the surrounding HTML is always the same. Can someone assist with what kind of Regex I would substitute for the "FILEPATH/FILE" part of the search?
you may match the constant part and use grouping to put it back
(<li>View Product Information</li>)
then you should replace the string with $1your_replacement$2, where $1 is the first matching group and $2 the second (if using python for instance you should call Match.group(1) and Match.group(2))
You would have to escape \ chars if you're using Java instead.