Match Sequence using RegEx After a Specified Character - regex

The initial string is [image:salmon-v5-09-14-2011.jpg]
I would like to capture the text "salmon-v5-09-14-2011.jpg" and used GSkinner's RegEx Tool
The closest I can get to my desired output is using this RegEx:
:([\w+.-]+)
The problem is that this sequence includes the colon and the output becomes
:salmon-v5-09-14-2011.jpg
How can I capture the desired output without the colon. Thanks for the help!

Use a look-behind:
(?<=:)[\w+.-]+
A look-behind (coded as (?<=someregex)) is a zero-width match, so it asserts, but does not capture, the match.
Also, your regex may be able to be simplified to this:
(?<=:)[^\]]+
which simply grabs anything between (but not including) a : and a ]

If you are always looking at strings in that format, I would use this pattern:
(?<=\[image:)[^\]]+
This looks behind for [image:, then matches until the closing ]

You have the correct regex only the tool you're using is highlighting the entire match and not just your capture group. Hover over the match and see what "group 1" actually is.
If you want a slightly more robust regex you could try :([^\]]+) which will allow for any characters other than ] to appear in the file name portion.

Related

Regex: how do I match a character before other capture characters?

I'm trying to match on a list of strings where I want to make sure the first character is not the equals sign, don't capture that match. So, for a list (excerpted from pip freeze) like:
ply==3.10
powerline-status===2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
psutil==4.0.0
ptyprocess==0.5.1
I want the captured output to look like this:
==3.10
==4.0.0
==0.5.1
I first thought using a negative lookahead (?![^=]) would work, but with a regular expression of (?![^=])==[0-9]+.* it ends up capturing the line I don't want:
==3.10
==2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
==4.0.0
==0.5.1
I also tried using a non-capturing group (?:[^=]) with a regex of (?:[^=])==[0-9]+.* but that ends up capturing the first character which I also don't want:
y==3.10
l==4.0.0
s==0.5.1
So the question is this: How can one match but not capture a string before the rest of the regex?
Negative look behind would be the go:
(?<!=)==[0-9.]+
Also, here is the site I like to use:
http://www.rubular.com/
Of course it does some times help if you advise which engine/software you are using so we know what limitations there might be.
If you want to remove the version numbers from the text you could capture not an equals sign ([^=]) in the first capturing group followed by matching == and the version numbers\d+(?:\.\d+)+. Then in the replacement you would use your capturing group.
Regex
([^=])==\d+(?:\.\d+)+
Replacement
Group 1 $1
Note
You could also use ==[0-9]+.* or ==[0-9.]+ to match the double equals signs and version numbers but that would be a very broad match. The first would also match ====1test and the latter would also match ==..
There's another regex operator called a 'lookbehind assertion' (also called positive lookbehind) ?<= - and in my above example using it in the expression (?<=[^=])==[0-9]+.* results in the expected output:
==3.10
==4.0.0
==0.5.1
At the time of this writing, it took me a while to discover this - notably the lookbehind assertion currently isn't supported in the popular regex tool regexr.
If there's alternatives to using lookbehind to solve I'd love to hear it.

Regex: ignore characters that follow

I'd like to know how can I ignore characters that follows a particular pattern in a Regex.
I tried with positive lookaheads but they do not work as they preserves those character for other matches, while I want them to be just... discarded.
For example, a part of my regex is: (?<DoubleQ>\"\".*?\"\")|(?<SingleQ>\".*?\")
in order to match some "key-parts" of this string:
This is a ""sample text"" just for "testing purposes": not to be used anywhere else.
I want to capture the entire ""sample text"", but then I want to "extract" only sample text and the same with testing purposes. That is, I want the group to match to be ""sample text"", but then I want the full match to be sample text. I partially achieved that with the use of the \K option:
(?<DoubleQ>\"\"\K.*?\"\")|(?<SingleQ>\"\K.*?\")
Which ignores the first "" (or ") from the full match but takes it into account when matching the group. How can I ignore the following "" (")?
Note: positive lookahead does not work: it does not ignore characters from the following matches, it just does not include them in the current match.
Thanks a lot.
I hope I got your questions right. So you want to match the whole string including the quotes, but you want to replace/extract it only the expression without the quotes, right?
You typically can use the regex replace functionality to extract just a part of the match.
This is the regex expression:
""?(.*?)""?
And this the replace expression:
$1

Regex to extract text with slash separated by slashes

I'm trying to find the element definition from the xpath string using a regex.
However, some element definitions include the slash separator itself.
Sample of xpath:
/primary[#classCode='ABC']/subject[#typeCode='123/a'][organizer/code[#codeSystem='12.35.1.1/b']]/component[#typeCode='RET']/text()
I expect the result:
primary[#classCode='ABC']
subject[#typeCode='123/a'][organizer/code[#codeSystem='12.35.1.1/b']]
component[#typeCode='RET']
text()
Trying something simple, like
(?<=/)(.*?)(?=/)
or similar variations is not adequate.
Is there a regex expression that splites this without further processing the string?
I dont know what is used case but i hope this will help you out..
Regex demo
Regex: \/.*?[\]\)](?=\/|$)
1. \/.*?[\]\)] this will match / then all till first occurrence of ] or )
2. (?=\/|$) positive look ahead for either / or $(end of string)
Although there are better ways to extract xpath than using regex depending on the language, but if you still have to use regex, then you could try this:
(?<=\/|^)(.*?(?:\[.*?\])*)(?=\/|$)
Lookbehind (?<= includes / or starting anchor ^
(.*?(?:\[.*?\])*) is used to extract each segment in the path
(?:\[.*?\]) is a non-capturing group to match anything present within [ and ]
Used quantifier * with above group since xpath segment can contain more than one arguments such as subject[][] in your example.
Lookahead (?=\/|$) includes / or ending anchor $
Regex101 Demo
// Output:
primary[#classCode='ABC']
subject[#typeCode='123/a'][organizer/code[#codeSystem='12.35.1.1/b']]
component[#typeCode='RET']
text()

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

Delphi TRegEx zero - length

I want to Match the Content between '(' and ')' of
Path()
Path(C:\...)
with
(?<=^Path\()(.*)(?=\))
In Notepad++ it matches '' <-- zero length match and 'C:...'.
But using Delphi XE3:
if TRegEx.IsMatch(pDef, '(?<=^Path\()(.*)(?=\))') then begin
does only match 'C:\...' but I need the empty match.
Try with that regex:
Path\((.*)\)
This also match the empty match, as in your example.
Online Demo
Delphi's TRegEx skips all zero-length matches. See QC104562 for details.
Your regex will work with Delphi's TPerlRegEx if you exclude preNotEmpty from the State property.
That said, using lookaround to try to isolate part of the regex match results in inefficient regexes. Much better to use something like Path\(([^)\r\n]*)\) or Path\((.*)\) and retrieve the text matched by the first capturing group to get the actual path. The first regex will correctly match Path(...) when there are additional ) characters on the same line but will not correctly handle paths that contain ) characters.