Regular Expression Words stuck together - regex

Is there a way to write regular expressions to stop right before a particular word or characters?
For example, I have a text like:
Advisor:HarrisTeamTeamRole
So I want to write a regular expression that makes the advisor name dynamic, but only capture Harris. How do I write a regular expression to stop right before Team?

You could use a lookbehind and lookahead like this:
(?<=Advisor:).*?(?=Team)
Debuggex Demo
This will only capture from "Advisor:" up to the first "Team", and the regex will not capture anything else after (including "Team") in a capture group or otherwise. This will require a type of regex that can do lookbehinds... if you are not using that, you'll have to use grouping... which could be as simple as:
Advisor:(.*?)Team
and then just get the capture group #1

Try this one
This regular expression would be:
:([A-Z][a-z]*)
This one captures only the first word after the colon as long as it's in CamelCase, meaning it doesn't have to be the word Team it could be Advisor:HarrisNetworkSomething as well.

You can try in Lazy way and get the matched group from index 1
^Advisor:(.*?)Team
Here is online demo

Related

Regex if then else confusion

I have a problem with the Regex-If-then-else logic:
I am trying to achieve the following:
If the string contains the substring PubDSK then do the Regex Expression
^[\s\S]{24}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*
If it does NOT contain the substring PubDSK then do a different Regex Expression, namely ^[\s\S]{48}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*
I am using this Regex Expression (?(?=^.*PubDSK.*$)^[\s\S]{24}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*|^[\s\S]{48}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*)
The affirmative case works great: https://regex101.com/r/ab9yOv/
BUT the non-affirmative case, doesn't do the trick: https://regex101.com/r/azxGvh/1
I assume it doesn't match so it cannot do the replacement?? How can I tell the regex to do the replacement on the complete string in the ELSE case?
I understand, that this problem can be easily solved with any other programming language, but for this use case I can only use pure regex...
The second \1 backreference refers to the first capturing group of the entire regex. So, it does not refer to the right capturing group defined in the else pattern part. In fact, the second \1 must be replaced with \3 as it refers to the third capturing group.
Also, note that (?=\1) and (?=\3) lookaheads make little sense here as they are followed with [\s\S]* consuming patterns. Just remove the lookahead pattern and use consuming ones.
The fixed pattern looks like
(?(?=^.*PubDSK.*$)^[\s\S]{24}(?=.{10}([\s\S]*))0*(.*?)\1[\s\S]*|^[\s\S]{48}(?=.{10}([\s\S]*))0*(.*?)\3[\s\S]*)
See the regex demo.

How to exclude the beginning string in regex match

I have a string containing the following variable "nonce=1ff7de7518b9a52080489ecd7629796d&" how to get the value between the equal and the "&" in regular expression, I have tried nonce=(.*?).+?(?=&) the ending part excluded "&" but I could not exclude "nonce="
Note: trying to match the value between "=" and "&" will not work as there are many "=" and "&" characters which will result in more than 1 match, the unique string is "nonce"
here is an example https://regexr.com/48vmd
You can use nonce=([^&]+) to match and capture your intended string from group1
Here nonce= will match literally and then ([^&]+) will match all text before & and capture in group1.
Demo
In case your regex flavor supports \K match reset operator, you can use this regex nonce=\K[^&]+ to have your intended text as full match without requiring any group text capture.
Demo without any grouped capture
If you're using Java, you can use this regex which uses look behind and Java supports look behind.
(?<=nonce=)[^&]+
Demo using look behind
If you're looking for the regular expression it would be as simple as nonce=(\w+)&
Demo (assumes RegExp Tester mode of the View Results Tree listener)
Even easier way would be going for Boundary Extractor which basically extracts everything between the given "left" and "right" boundaries:

Regex: how do I match a character before other capture characters?

I'm trying to match on a list of strings where I want to make sure the first character is not the equals sign, don't capture that match. So, for a list (excerpted from pip freeze) like:
ply==3.10
powerline-status===2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
psutil==4.0.0
ptyprocess==0.5.1
I want the captured output to look like this:
==3.10
==4.0.0
==0.5.1
I first thought using a negative lookahead (?![^=]) would work, but with a regular expression of (?![^=])==[0-9]+.* it ends up capturing the line I don't want:
==3.10
==2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
==4.0.0
==0.5.1
I also tried using a non-capturing group (?:[^=]) with a regex of (?:[^=])==[0-9]+.* but that ends up capturing the first character which I also don't want:
y==3.10
l==4.0.0
s==0.5.1
So the question is this: How can one match but not capture a string before the rest of the regex?
Negative look behind would be the go:
(?<!=)==[0-9.]+
Also, here is the site I like to use:
http://www.rubular.com/
Of course it does some times help if you advise which engine/software you are using so we know what limitations there might be.
If you want to remove the version numbers from the text you could capture not an equals sign ([^=]) in the first capturing group followed by matching == and the version numbers\d+(?:\.\d+)+. Then in the replacement you would use your capturing group.
Regex
([^=])==\d+(?:\.\d+)+
Replacement
Group 1 $1
Note
You could also use ==[0-9]+.* or ==[0-9.]+ to match the double equals signs and version numbers but that would be a very broad match. The first would also match ====1test and the latter would also match ==..
There's another regex operator called a 'lookbehind assertion' (also called positive lookbehind) ?<= - and in my above example using it in the expression (?<=[^=])==[0-9]+.* results in the expected output:
==3.10
==4.0.0
==0.5.1
At the time of this writing, it took me a while to discover this - notably the lookbehind assertion currently isn't supported in the popular regex tool regexr.
If there's alternatives to using lookbehind to solve I'd love to hear it.

Why is this regular expression matching so much?

I am trying to use http://www.regexr.com/ to create a regular expression.
Basically I am looking to replace something that matches <Openings>any other tags/text</Openings>
<Openings><opening><item><x>3</x><y>3</y><width>10.5</width><height>13.5</height><type>rectangle</type><clipX>0</clipX><clipY>0</clipY><imgsrc></imgsrc></item></opening></Openings>
I started with ([\<Openings\>])\w+ (http://regexr.com/393mv ) but it seems to be matching too many things. Right now that regular expression should only match <Openings>.
Regex to match the whole Openings tag is,
<Openings>.*?<\/Openings>
If you want to capture the contents inside the Openings tag then try the below,
<Openings>(.*?)<\/Openings>
([\<Openings\>])\w+
The brackets mean "Match any character in this". You should use
(\<Openings\>)\w+
which matches specifically "<Openings>" plus one or more word characters.

regular expressions - my expression

I've got (To) [a-z]+ as regular expression and I've got sentence: To kot dziki pies.
And If I compile it I will retrieve To kot.
So what can I do to retrieve only word after (only kot) "To" instead of "To kot"?
^To (\w+)$ should do the trick. \w is shorthand for any word character, eg. a-z in English; other characters in other languages. If you put parens around To as in your example, it will create a matched group, which means the match for [a-z]+ will be in the second group, and To will be in the first group.
I can really recommend using an interactive tool for testing and developing regular expression, such as Expresso.
use groups inside the regular expression -
"To ([a-z]+)" and then retrieve group 1's value, it will contain "kot" supplied your example string.
Or you could use lookbehinds:
(?<=To )[a-z]+
Then the To does not become part of the captured expression.