Regular expression to split by extension but keep the extension

Regular expression to split by extension but keep the extension - regex

Hi all I would like to split a string which has an extension .ps1I used the following regex
var regex = Regex.Split(text, ".ps1");
but I need the extension to exists in the first string. assume I have my script as follows c:\Test\test.ps1 -Arg -Arg1, when I split it I need the string as c:\Test\test.ps1 and -Arg -Arg1 as second string how can I do this

Use a positive lookbehind (?<=\.ps1):
(?<=\.ps1)\s+
See the regex demo
Details:
(?<=\.ps1) - require a .ps1 to be immediately before the current location
\s+ - 1+ whitespace symbols

This will give you the first part in group one and second part in group two
(.+[.]ps1)(.+)
Explanation
(.+[.]ps1) - first group with anything followed by ps1 extension
(.+) - second group with anything after first group

Related

Regex to extract text between two character patterns

I have multiple rows of data that look like the following:
dgov-nonprod-adp-personal.groups
dgov-prod-gcp-sensitive.groups
I want to get the text between the last hyphen and before the period so:
personal
sensitive
I have this regex (?:prod-(.*)-)(.*).groups however it gives two groups and in bigquery I can only extract if there is one group, what would the regex be to just extract the text i want?
Note: after the second hyphen and before the third it will always be prod or nonprod, that's why in my original regex i use prod- since that will be a constant

Assuming the BigQuery function you are using supports a capture group, I would phrase your requirement as:
([^-]+)\.groups$
Demo

For the example data, you can make the pattern a bit more specific matching -nonprod or -prod with a single capture group:
-(?:non)?prod-[^-]+-([^-]+)\.groups$
See a regex demo.
If there can be more occurrences of the hyphen:
-(?:non)?prod(?:-[^-]+)*-([^-]+)\.groups$
The pattern matches
-(?:non)?prod Match either -nonprod or -prod
(?:-[^-]+)* Optionally match - followed by 1+ chars other than -
- Match literally
([^-]+) Capture group 1, match 1+ chars other than -
\.groups Match .groups
$ End of string
See another regex demo.

How to exclude a specific string with REGEX? (Perl)

For example, I have these strings
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLETEA1B
APPLEWINE3B
APPLEWINE1C
I want all of these strings except those that have TEA or WINE1C in them.
APPLEJUCE1A
APPLETREE2B
APPLECAKE3C
APPLEWINE3B
I've already tried the following, but it didn't work:
^APPLE(?!.*(?:TEA|WINE1C)).*$
Any help is appreciated as I'm also kinda new to this.

If you indeed have mutliple strings as you claim, there's no need to jam all that in one regex pattern.
/^APPLE/ && !/TEA|WINE1C/
If you have a single string, the best approach is probably to splice it into lines (split /\n/), but you could also use a single regex match too
/^APPLE(?!.*TEA|WINE1C).*/mg

You can use
^APPLE(?!.*TEA)(?!.*WINE1C).*
See the regex demo.
Details:
^ - start of string
APPLE - a fixed string
(?!.*TEA) - no TEA allowed anywhere to the right of the current location
(?!.*WINE1C) - no WINE1C allowed anywhere to the right of the current location
.* - any zero or more chars other than line break chars as many as possible.

If you don't want to match a string that has both or them (which is not in the current example data):
^APPLE(?!.*(WINE1C|TEA).*(?!\1)(?:TEA|WINE1C)).*
Explanation
^ Start of string
APPLE match literally
(?! Negative lookahead
.*(WINE1C|TEA) Capture either one of the values in group 1
.* Match 0+ characters
(?!\1)(?:TEA|WINE1C) Match either one of the values as long as it is not the same as previously matched in group 1
) Close the lookahead
.* Match the rest of the line
Regex demo

RegEx - double condition to find some string

I'd like to find word RADU3_ or RADU3- in a sentence that begins with xlink:href= and ends with .svg
How to do this?
I've tried following, but does not give the result I'm expecting.
(?=\wxlink:href=|\wsvg\b)|\bRADU3_|\bRADU3-
Just last line in example is good result (RADU3_)
ProductionGraphics\GP1**RADU3-**11_HeatingFurnaceF1.svg
PB:ExpressionText id="RADU3_FUEL GAS _SUM_EX" PBD:LinkUses
xlink:href="C:\ProcBookImport\MaintenanceGraphics\RADU3_AI.svg"
Example...

Not sure exactly how you want to use it but the below pattern finds the string. I put the RADU3 part in a group where I matches RADU3 followed by - or _ ([_-])
(xlink:href=.*)(RADU3[_-]*)(.*\.svg)
Edit, handle multiple occurences
If a string might contain the pattern several times then use ? to allow a group to repeat itself
(RADU3[_-]*?)(.*?\.svg?)
The above could be used in a replace expression like
\1someotherword\3
Where \2 is the second group that is replaced

If you want to make sure that the string starts with xlink:href= and ends with \.svg you could use anchors to assert the start ^ and the end $ of the string.
Use 1 capturing group to make sure xlink:href= comes before RADU3 followed by an underscore or a hyphen. Then you could match it and in the replacement use that capturing group follwed by your replacement.
You could use a positive lookahead to assert that the string ends with \.svg
That will match:
^(xlink:href=.*)\bRADU3[_-](?=.*\.svg$)
^ Assert the start of the string
(xlink:href=.*) Capturing group, match up until the last occurence of ..
\bRADU3[_-] Word boundary to prevent matching part of a larger word. Match RADU3 followed by an underscore or hyphen
(?=.*\.svg$) Positive lookahead to assert the string ends with .svg
See the regex demo

It sounds like you only want the word (substring) if it is in a specific context?
In your case, you can restart the regex midways if you want to have starting and ending conditions (multiple conditions) for a string, but at the same time only want to use these conditions as "if-statements" and not as part of the result.
The following uses this method, and utilizes restarts (\K) in order to only extract the substring you are looking for.
# The string has to start with "xlink:href="
xlink:href=
# Fetch everything up to our match, and the restart the regex
.*\K
# The strings we are looking for
(RADU3[-_])
# String has to end with ".svg"
(?=(.*\.svg))
If you want the entire string matching our rules you are looking for something like this:
#The string has to start with "xlink:href"
^(xlink:href=).*
# The strings we are looking for
(RADU3[-_])
# String has to end with ".svg"
(\w+\.svg)
#Get everything after .svg too
.*
If you only want the ending " after the .svg, you'd want to modify the last part where I just take everything after .svg
You can play around with what I have come up with at regex101 (no affiliation, just love their site): https://regex101.com/r/g0v07V/3/

Capture filename parts: Why doesn't this regexp work?

I'm faily new to regexp and I miss something from capturing groups.
Let's suppose I have a filepath like that
test.orange.john.edn
I want to capture two groups:
test.orange.john (which is the body)
edn (which is the extension)
I used this (and variants of it, taking the $ outside, etc.)
^([a-z]*.)*.([a-z]*$)
But it captures xm only
What did I miss? I do not understand why l is not captured and the body too...
I found answers on the web to capture the extension but I do not understand the problem there.
Thanks

The ^([a-z]*.)*.([a-z]*$) regex is very inefficient as there are lots of unnecessary backtracking steps here.
The start of string is matched, and then [a-z]*. is matched 0+ times. That means, the engine matches as many [a-z] as possible (i.e. it matches test up to the first dot), and then . matches the dot (but only because . matches any character!). So, this ([a-z]*.)* matches test.orange.john.edn only capturing edn since repeating capturing groups only keep the last captured value.
You already have edn in Group 1 at this step. Now, .([a-z]*$) should allocate a substring for the . (any character) pattern. Backtracking goes back and finds n - now, Group 1 only contains ed.
For your task, you should escape the last . to match a literal dot and perhaps, the best expression is
^(.*)\.(.*)$
See demo
It will match all the string up to the end with the first (.*), and then will backtrack to find the last . symbol (so, Group 1 will have all text from the beginning till the last .), and then capturing the rest of the string into Group 2.
If a dot does not have to be present (i.e. if a file name has no extension), add an optional group:
^(.*)(?:\.(.*))?$
See another demo

You can try with:
^([a-z.]+)\.([a-z]+)$
online example

Help with regex - email address matching

I have the following regex which suppose to match email addresses:
[a-z0-9!#$%&'*+\\-/=?^_`{|}~][a-z0-9!#$%&'*+\\-/=?^_`{|}~.]{0,63}#[a-z0-9][a-z0-9\\-]*[a-z0-9](\\.[a-z0-9][a-z0-9\\-]*[a-z0-9])+$.
I have the following code in AS3:
var mails:Array = str.toLowerCase().match(pattern);
(pattern is RegExp with the mentioned regular expression).
I retrieve two results, when str is gaga#example.com:
gaga#example.com
.com
Why?

.com was captured by the last part of the regex (\\.[a-z0-9][a-z0-9\\-]*[a-z0-9]).
Regular expressions capture substrings matched by portions of the pattern that are enclosed in () for later use.
For example, the regex 0x([0-9a-fA-F]) will match a hexadecimal number of the form 0x9F34 and capture the hex portion in a separate group.

I'm not sure about your regex, there is a good tutorial about email validation here.
To me this reads:
[a-z0-9!#$%&'*+\-/=?^_{|}~] # single of chosen character set
[a-z0-9!#$%&'*+\\-/=?^_{|}~.]{0,63} # any of chosen character set with the addition of , \
#
[a-z0-9] # single alpha numeric
[a-z0-9\-]* # any alphanumeric with the addition of -
a-z # single alphabetical
0-9+ # at least one number
$ # end of line
. # any character
As to why you get two sub-strings in your array, its because both match the pattern - see docs

gaga#example.com is the match of the whole regular expression and .com is the last match of the first group ((\\.[a-z0-9][a-z0-9\\-]*[a-z0-9])).

([a-z0-9!#$%&'*+\\-/=?^_`{|}~][a-z0-9!#$%&'*+\\-/=?^_`{|}~.]{0,63}#[a-z0-9\\-]*[a-z0-9]+\\.([a-z0-9\\-]*[a-z0-9]))+$
This seem to work as expected (tested in Regex Tester). Last capturing group removed.

To add to what others have said:
There are two results because it matches both the whole email address, and the last group surrounded by parentheses.
If you don't want a group to be captured you can add ?: to the beginning of the group. Look in the AS documentation for non-capturing groups:
http://www.adobe.com/livedocs/flash/9.0/main/wwhelp/wwhimpl/js/html/wwhelp.htm?href=00000118.html#wp129703
"A noncapturing group is one that is used for grouping only; it is not "collected," and it does not match numbered backreferences. Use (?: and ) to define noncapturing groups, as follows:
var pattern = /(?:com|org|net)/;"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression to split by extension but keep the extension - regex

Use a positive lookbehind (?<=\.ps1): (?<=\.ps1)\s+ See the regex demo Details: (?<=\.ps1) - require a .ps1 to be immediately before the current location \s+ - 1+ whitespace symbols

This will give you the first part in group one and second part in group two (.+[.]ps1)(.+) Explanation (.+[.]ps1) - first group with anything followed by ps1 extension (.+) - second group with anything after first group

Related

Regex to extract text between two character patterns

How to exclude a specific string with REGEX? (Perl)

RegEx - double condition to find some string

Capture filename parts: Why doesn't this regexp work?

Help with regex - email address matching

Categories

Resources