Capture part or whole using regex with same capturing name - regex

Given the two following strings :
\06086-afde-4e46-8886-#xxx.com\0xxx7ccd-6293-4343-8e50-xxx
\0name.surname#xxx.com\0xxx6293-4343-8e50-e1d5-xxx
I try to extract 6086-afde-4e46-8886- (id it is a guid) or name.surname#xxx.com (if it is not a guid). The difficulty here is that the captured groups must have the same name.
So far, I have
(?<name>(?:\w{4}-){4}|[a-zA-Z.]{1,}#xxx\.com), but this also captures 7ccd-6293-4343-8e50- or 6293-4343-8e50-e1d5- which I don't want.
I was also thinking about something like \\\0(?<name>(?:\w{4}-){4}|[a-zA-Z.]{1,}#xxx\.com)(?:(?:#xxx\.com)?\\\0),
but then is there a way not to repeat the xxx.com part (because it is more complicated than that). Also, this relies on finding \\0, which I'd like not to, as I don't really know if this will be found somewhere else in the string.
Thanks..

The following regular expression is matching the number 6086-afde-4e46-8886- and the email name.surname#xxx.com into the same group name without using the start sequence \0
(?<_name_>[A-Za-z]+\.[A-Za-z]+#xxx\.com|(?:[\w]{4}-){4}(?=#xxx\.com))
This regular expression uses a positive look ahead (?=#xxx\.com) for matching the number without taking #xxx.com.

try this
\\0(?<_name_>(?:[\w\-\.]+))#xxx\.com
And add all allowed characters inside the square parentheses
demo: http://regexhero.net/tester/?id=be0fed5e-1d24-43cc-9db9-812311c17d61

Seems like you're trying to get the first match. If yes then try the below regex.
^.*?(?<name>(?:\w{4}-){4}|[a-zA-Z.]{1,}#xxx.com)
http://regex101.com/r/jC3uR4/5

Related

Regex for value.contains() in Google Refine

I have a column of strings, and I want to use a regex to find commas or pipes in every cell, and then make an action. I tried this, but it doesn't work (no syntax error, just doesn't match neither commas nor pipes).
if(value.contains(/(,|\|)/), ...
The funny thing is that the same regex works with the same data in SublimeText. (Yes, I can work it there and then reimport, but I would like to understand what's the difference or what is my mistake).
I'm using Google Refine 2.5.
Since value.match should return captured texts, you need to define a regex with a capture group and check if the result is not null.
Also, pay attention to the regex itself: the string should be matched in its entirety:
Attempts to match the string s in its entirety against the regex pattern p and returns an array of capture groups.
So, add .* before and after the pattern you are looking inside a larger string:
if(value.match(/.*([,|]).*/) != null)
You can use a combination of if and isNonBlank like:
if(isNonBlank(value.match(/your regex/), ...

Is there any upper limit for number of groups used or the length of the regex in Notepad++?

I am new to using regex. I am trying to use the regex find and replace option in Notepad++.
I have used the following regex:
((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))(/)((?:)|(\+)|(-))(\d)((?:)|(\+)|(-))
For the following text:
2/2
+2/+2
-2/-2
2+/2+
2-/2-
But I am able to get matches only for the first three. The last two, it only gives partial matches, excluding the last "+" and the "-". I am wondering if there is any upper limit for the number of groups (which i doubt is unlikely) that can be used or any upper limit for the maximum length of the regex. I am not sure why my regex is failing. Or if there is anything wrong with my regex, please correct it.
This is not an issue with Notepad++'s regex engine. The problem is that when you have alternations like (?:)|(\+)|(-), the regex engine will attempt to match the different options in the order they are specified. Since you specified an empty group first, it will attempt to match an empty string first, only matching the + or - if it needs to backtrack. This essentially makes the alternation lazy—it will never match any character unless it has to.
vks's answer works perfectly well, but just in case you actually needed those capturing groups separated out, you can do the same thing just by rewriting your alternations like this:
((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))(/)((\+)|(-)|(?:))(\d)((\+)|(-)|(?:))
or even more simply, like this:
((\+)|(-)|)(\d)((\+)|(-)|)(/)((\+)|(-)|)(\d)((\+)|(-)|)
([-+]?)(\d)([-+]?)(/)([-+]?)(\d)([-+]?)
You can use this simple regex to match all cases.See here.
https://www.regex101.com/r/fG5pZ8/19

Regular Expression - Matching part of a word, with one exception

I need a regular expression that will look up "ship" in any instacne, so: ship, spaceships, starship, shipping etc. However it needs to not look up "warship". Also it needs to be case insensitive. At the moment I've got:
(?!(warship))(?i)ship
...which looks up "ship" but still looks up "warship" thanks to it containing "ship". I've tried:
(?!(warship))^(?i)ship
...which works to an extent but then "starship" doesn't get returned for example. I'm sure the answer is super-simple but I can't see it just now. Your help would be great!
First I wanted to try negative lookbehind:
/(?<!war)ship/
it should match all words instead of warship. But it gets the ship part only. So it is ok if you just check your string by regexp but doesn't work properly if you want to get the matched word.
I suggest the search string:
(?i)(\w*ships?)(?<!warship)(?<!warships)
(?i) ... enables case-insensitive search.
(\w*ships?) ... matches any string starting with 0 or more word characters, containing ship and optionally also plural s at end in a marking group. Also possible would be (\b\w*ship\w*\b) or (\b[a-z]*ship[a-z]*\b) to find only entire words containing anywhere ship inside.
(?<!warship)(?<!warships) ... two negative lookbehinds checking if the found word is whether warship nor warships.
It appears you may be using the .NET engine or something similarly expressive, so you can use lookbehind.
First you need a regex to match the entire word:
\w*ship\w*
Then you can easily modify it to not match anything where war comes before ship, using negative lookbehind.
\w*(?<!war)ship\w*
Also, there's probably no reason to specify the case insensitivity flag in the regex itself, just apply it to the regex object when you create it.
I think you want something like this,
(?i)^(?!warship$)(?=.*ship).*
DEMO
It matches any instances of ship but not a warship
OR
(?i)\b\w*?(?<!war)ship\w*?\b
DEMO

Regex Extract in Google Docs for capturing the end of variable strings

In Google Docs, if I have a series of strings like "Something.Here.Search.Term.Chicago", where the last component after "Term." can be anything.
How do I use regex extract to only capture what comes after "Term."?
Note that the length of the string varies before Term so I can't use Left or Right and position since it's always different.
You can use a positive look-behind as well, to avoid having to capture with groups:
/(?<=Term\.).*/
Though depending on the language you are implementing this with, it may not support look-behinds (namely JavaScript).
If you don't want to mess about with capturing groups and you know the component you want is the substring between the last . and the end of the string, you could use
[^.]+$
Here's what worked for me using you sample data:
=REGEXREPLACE(A1; ".*Term.(.*)" ; "$1")
I don't know Google Docs, but normally in regular expressions, you would do
"Something\.Here\.Search\.Term\.(.*)"
The () means capture and remember the pattern within. In this case .* means everything. You can usually access the pattern as $1, etc. in Javascript.
See Examples of Regular Expressions
What about using a "look-ahead" expression (?=),
then something repeated followed by a word boundary?
Something like this:
(?=Term\\.).*\W

Regular Expression to List accepted words

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?
The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$
Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)
I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.