Get first match in closing part of regex - regex

I must take string with regex who got string "[%" "%]" and any text or "" inside this. As example:
Input: dsafsdfadsaffsdadsaffadsaf[%sadsad[%]%%]fdfsadfsad%]fsasdf
Output: [%sadsad[%]
I already wrote expression - \[%(.\n*)*%\], but it takes last of %].
Output: [%sadsad[%]%%]fdfsadfsad%]
Did anyone know how get first of closing match?

Put . and \n inside a capturing or non-capturing group delimited by a logical OR | operator, and make it as non-greedy.
\[%(.|\n)*?%\]
OR
You could do like the below.
\[%[\S\s]*?%\]
[\S\s]*? Matches any space or non-space character non-greedily.

\[%[^\]]*%\]
You can try this to get string upto first closng %].See demo.
https://regex101.com/r/gX5qF3/5
NODE EXPLANATION
--------------------------------------------------------------------------------
\[% '['%
--------------------------------------------------------------------------------
[^\]]* any character except: '\]' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
%\] %']'

Related

Extract the last path-segments of a URI or path using RegEx

I am trying to extract the last section of the following string :
"/subscriptions/5522233222-d762-666e-555a-e6666666666/resourcegroups/rg-sql-Belguim-01/providers/Microsoft.Compute/snapshots/vm-sql-image-v3.3-pre-sysprep-Oct-2021-BG"
I want to capture:
"snapshots/vm-sql-image-v3.3-pre-sysprep-Oct-2021-BG"
I tried below with no luck:
(\w*?\/\w*?)$
How to pull this off using regex?
Use
[^\/]+\/[^\/]+$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Your issues
(\w*?/\w*?)$ is for simple or empty last 2 segments (tested), e.g.
matched hello/world/subscriptions123/snap_shots capturing subscriptions123/snap_shots
matched /1/2// capturing the last 2 empty segments
OK was:
capture-group
/ to match the last path-separator before end ($)
\w*? intended to match the path-segment of any length
What to improve:
*? is a bit too unrestricted, choose quantifier as + for at least one (instead * for any or ? for zero or one)
\w is for word-meta-character, does not match hyphens or dots (OK for snapshot, not for given last segment)
Quick-fixed
(\w+/[\w\.-]+)$ (tested)
added dot \. and hyphen - to character-set containing \w
Simple but solid
(snapshots/[^\/]+)$ (tested)
fore-last path-segment assumed as fix constant snapshots
[^\/] any character except (^) slash in last segment
Note: the slash doesn't need to be escaped \/ like Ryszard answered

How to exclude brackets at the end of the Url

I am new to regex, so any help is really appreciated.
I have an expression to identify a URL :
(http[^'\"]+)
Unfortunately on some URLs, I get additional square brackets at the end
For instance "http://example.com]]"
As the result want to receive "http://example.com"
How do I get rid of those brackets with the help of the regex I wrote above?
What you actually have is called a negated character class, so just add characters that should not be matched. In addition, there's not really a need for a capturing group. That said, you could use
http[^'"\]\[]+
# ^^^^
Note that this will exclude square brackets anywhere in your possible url not just at the end. See a demo on regex101.com.
Stop the match between a word and nonword character:
(http[^'"]+)\b
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
[^'"]+ any character except: ''', '"' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char

REXEX match a String and select up to a char

I am trying to create a regular expression where I can match the initial of the string and then replace anything after an specific char.... like for example:
String = 123456789:0:0 => Output = 123456789:2:4
I need a regex where it need to match "123" in the begging then replace only "0:0" by another String.
to match "123" is easy: ^123, but I cannot find a way after this to go up to : and replace only the rest of string.
I would appreciate any help.
You can use a negated character class to match up till the first occurrence of a colon.
In the replacement use capture group 1, followed by the replacement.
^(123[^:]*:)0:0$
^ Start of string
(123[^:]*:) Match 123 followed by 0+ times any char except : using a negated character class
0:0 Match literally
$ End of string
Regex demo
If you want to replace all after matching the first colon, you could use .+
^(123[^:]*:).+
See another regex demo
Without knowing your exact programming context it seems like you're using a version of a regex_replace function. With suitable grouping this is easy to do.
Don't think of what you want to replace. Think about what you want to keep.
^(.*?123.*?:)(0:0)(.*?)$
And as your replacement string use
$12:4$3
For replacing the "0:0", you can use:
Numbers between 123 and 0:0 => /^(123[0-9]+:)0:0/ replace ${1}2:4
OR
Anything between 123 and 0:0 => /^(123.+:)0:0/ replace ${1}2:4
This RegExp creates a group starting with 123 until it reaches 0:0 and later we use the group in the replacement string. This however depends on the programming language you use:
Example in PHP
$result = preg_replace("/^(123[0-9]+:)0:0/", "\${1}2:4", "123456789:0:0");
Example in JavaScript
let result = "123456789:0:0".replace(/^(123[0-9]+:)0:0/, "$12:4");
With JavaScript:
const str = `123456789:0:0`;
console.log(str.replace(/(?<=^123[^:]*:).*/, `2:4`));
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
123 '123'
--------------------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))

RegEx for removing everything before and after a delimiter

I am trying to remove everything before and after two | delimiters using regex.
An example being:
EM|CX-001|Test Campaign Name
and grabbing everything except CX-001. I cannot use a substring as the number of characters before and after the pipes may change.
I tried using the regex (?<=\|)(.*?)(?=\-), but while this selects CX-001, I need to select everything else but this.
How do I solve this problem?
You can try the following regular expression:
(^[^|]*\|)|(\|[^|]*$)
String input = "EM|CX-001|Test Campaign Name";
System.out.println(
input.replaceAll("(^[^|]*\\|)|(\\|[^|]*$)", "")
); // prints "CX-001"
Explanation of the regular expression:
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\| '|'
--------------------------------------------------------------------------------
[^|]* any character except: '|' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of \2
If you have only 2 pipes in you string, you could either match upon the first pipe or match from the last one until the end of the string:
^.*?\||\|.*$
Explanation
^.*?\| Match from start of string non greedy until the first pipe
| Or
\|.*$ Match from last pipe until end of string
Regex demo
Or you might also use a negated character class [^|]* without the need of capturing groups:
^[^|]*\||\|[^|]*$
Regex demo
Note
In your pattern (?<=\|)(.*?)(?=\-) I think you meant that the last positive lookahead should be (?=\|) instead of the - if you want to select between 2 pipes.
Find: ^[^|]*\|([^|]+).+$
Replace: $1

Trying to match what is before /../ but after / with regular expressions

I am trying to match what is before /../ but after / with a regular expressions, but I want it to look back and stop at the first /
I feel like I am close but it just looks at the first slash and then takes everything after it like... input is this:
this/is/a/./path/that/../includes/face/./stuff/../hat
and my regular expression is:
#\/(.*)\.\.\/#
matching /is/a/./path/that/../includes/face/./stuff/../ instead of just that/../ and stuff/../
How should I change my regex to make it work?
.* means "match any number of any character at all[1]". This is not what you want. You want to match any number of non-/ characters, which is written [^/]*.
Any time you are tempted to use .* or .+ in a regex, be very suspicious. Stop and ask yourself whether you really mean "any character at all[1]" or not - most of the time you don't. (And, yes, non-greedy quantifiers can help with this, but character classes are both more efficient for the regex engine to match against and more clear in their communication of your intent to human readers.)
[1] OK, OK... . isn't exactly "any character at all" - it doesn't match newline (\n) by default in most regex flavors - but close enough.
Change your pattern that only characters other than / ([^/]) get matched:
#([^/]*)/\.\./#
Alternatively, you can use a lookahead.
#(\w+)(?=/\.\./)#
Explanation
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
) end of look-ahead
I think you're essentially right, you just need to make the match non-greedy, or change the (.*) to not allow slashes: #/([^/]*)/\.\./#
In your favourite language, do a few splits and string manipulation eg Python
>>> s="this/is/a/./path/that/../includes/face/./stuff/../hat"
>>> a=s.split("/../")[:-1] # the last item is not required.
>>> for item in a:
... print item.split("/")[-1]
...
that
stuff
In python:
>>> test = 'this/is/a/./path/that/../includes/face/./stuff/../hat'
>>> regex = re.compile(r'/\w+?/\.\./')
>>> regex.findall(me)
['/that/..', '/stuff/..']
Or if you just want the text without the slashes:
>>> regex = re.compile(r'/(\w+?)/\.\./')
>>> regex.findall(me)
['that', 'stuff']
([^/]+) will capture all the text between slashes.
([^/]+)*/\.\. matches that\.. and stuff\.. in you string of this/is/a/./path/that/../includes/face/./stuff/../hat It captures that or stuff and you can change that, obviously, by changing the placement of the capturing parens and your program logic.
You didn't state if you want to capture or just match. The regex here will only capture that last occurrence of the match (stuff) but is easily changed to return that then stuff if used global in a global match.
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\. '.'