REGEXEXTRACT before a character

REGEXEXTRACT before a character - regex

I'm trying to write a small regex to extract the text before an optional (.
I have this:
^(.*)[\(.*]|$
But its not working for some reason. Doesn't seem to make it to the $ if there is no ( present.
Any help would be much appreciated
Cheers

Your regex will either capture 0+ times any character in a capturing group (.*) followed by a character class matching one of the listed characters [\(.*], or it will match an empty string due to the alternation |$.
If the first part of the alternation does not match a character from the character class at the end, you will not have a match.
You could use a negated character class to match not a ( from the start of the string:
^[^(]+

Related

Regex positive lookahead multiple occurrence

I have below sample string
abc,com;def,med;ghi,com;jkl,med
I have to grep the string which is coming before keyword ",com" (all occurrences)
Final result which is I am looking for is something like -
abc,ghi
I have tried below positive lookahead regex -
[\s\S]*?(?=com)
But this is only fetching abc, not the ghi.
What modification do I need to make in above regex?

Using a character class [\s\S] can match any character and will also match the , and ;
What you can do is match non whitespace characters except for , and ; using a negated character class and that way you don't have to make it non greedy as well.
Then assert the ,com to the right (followed by a word boundary to prevent a partial word match)
Instead of using a lookahead, you might also use a capture group:
([^\s,;]+),com\b
See a regex demo with the capture group values.

Regex that matches strings that are all lower case and do not contain specific string

I need a regular expression to ensure that entries in a form 1) are all lower case AND 2) do not contain the string ".net"
I can do either of those separately:
^((?!.net).)*$ gives me strings that do not contain .net.
[a-z] only matches lower-cased inputs. But I have not been able to combine these.
I've tried:
^((?!.net).)(?=[a-z])*$
(^((?!.net).)*$)([a-z])
And a few others.
Can anyone spot my error? Thanks!

As you are using a dot in your pattern that would match any char except a newline, you can use a negated character class to exclude matching uppercase chars or a newline.
As suggested by #Wiktor Stribiżew, to rule out a string that contains .net you can use a negative lookahead (?!.*\.net) where the .net (note to escape the dot) is preceded by .* to match 0+ times any character.
^(?!.*\.net)[^\nA-Z]+$
^ Start of string
(?!.*\.net) negative lookahead to make sure the string does not contain .net
[^\nA-Z]+ Match 1+ times any character except a newline or a char A-Z
$ End of string
Regex demo

Regex to validate cookie string (Key value paired)

So far I tried this regex but no luck.
([^=;]+=[^=;]+(;(?!$)|$))+
Valid Strings:
something=value1;another=value2
something=value1 ; anothe=value2
Invalid Strings:
something=value1 ;;;name=test
some=value=3;key=val
somekey=somevalue;

You might use an optional repeating group to get the matches.
If you don't want to cross newline boundaries, you might add \n or \r\n to the negated character class.
^[^=;\n]+=[^=;\n]+(?:;[^=;\n]+=[^=;\n]+)*$
Explanation
^ Start of string
[^=;\n]+=[^=;\n]+ Match the key and value using a negated character class
(?: Non capture group
;[^=;\n]+=[^=;\n]+ Match a comma followed by the same pattern
)* Close group and repeat 0+ times
$ End string
Regex demo

Regex Extract a string between two words containing a particular string

I have the below string
abc-12d-ef-oy-5678-xyz--**--20190120075439322am--**--ghi-66d-ef-oy-8877-sdf--**--sfdfdsgfg--**--20190120075765487am
It is kind of multi character delimited string, delimited by '--**--' I am trying to extract the first and second words which has the -oy- tag in it. This is a column in a table. I am using the regex_extract method but i am not able extract the string which contains a string and ends with a string.
Here is one pattern that i tried .*(.*oy.*)--

If the -oy- can not be at the start or at the end, you could use this pattern to match the 2 hyphen delimited strings with -oy-:
[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+
Regex details
[a-z0-9]+ Match 1+ times a-z0-9
(?: Non capturing group
-[a-z0-9]+ Match - and 1+ times a-z0-9
)* Close group and repeat 0+ times
-oy Match literally
(?:-[a-z0-9]+)+ Repeat 1+ times a group which will match - and 1+ times a-z0-9
You can extend the character class [A-Za-z0-9] to allow what you want to match like uppercase chars.
Regex demo | Java demo
If the matches should be between delimiters, you could use a positive lookbehind and positive lookahead and an alternation:
(?<=^|--\\*\\*--)[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+(?=--\\*\\*--|$)
See a Java demo

You can use this regex which will match string containing -oy- and capture them in group1 and group2.
^.*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*).*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*)
This regex basically matches two strings delimiter separated containing -oy- using this (\w+(?:-\w+)*-oy-\w+(?:-\w+)*) to capture the text.
Demo

Are you able to select values from capture groups?
(?:--\*\*--|^)(.*?-oy-.*?)(?:--\*\*--|$)
?: - Non-capture group, matches the delimiter, begin of line, or end of line but does not create a capture group
*? - Lazy match so you only grab the contents of the field
https://regex101.com/r/aUAvcx/1
--- Second stab at this follows ---
This is convoluted. Hopefully you can use Lookahead and Lookbehind. The last problem I had was the final record was being "Greedy" and sucking up the field before it too. So I had to add an exclusion in the capture group for your delimiter.
See if this works for you.
(?<=--\*\*--|^)((?:(?:(?!--\*\*--).)*)-oy-(?:(?:(?!--\*\*--).)*))(?=--\*\*--|$)
https://regex101.com/r/aUAvcx/3
Basically the (?: are so we are not getting too many capture groups to work with.
There are three parts to this:
The lookbehind - Make sure the field is framed by the delimiter (or start of line)
The capture group - Grab the contents of the field, making sure a delimiter isn't sucked up into it
The lookahead - Make sure the field is framed by the delimiter (or end of line)
As far as the capture group goes, I check the left and right side of the -oy- to make sure the delimiter isn't there.

RegEx: don't capture match, but capture after match

There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?

With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1

It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

REGEXEXTRACT before a character - regex

I'm trying to write a small regex to extract the text before an optional (. I have this: ^(.)[\(.]|$ But its not working for some reason. Doesn't seem to make it to the $ if there is no ( present. Any help would be much appreciated Cheers

Related

Regex positive lookahead multiple occurrence

Regex that matches strings that are all lower case and do not contain specific string

Regex to validate cookie string (Key value paired)

Regex Extract a string between two words containing a particular string

RegEx: don't capture match, but capture after match

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

REGEXEXTRACT before a character - regex

I'm trying to write a small regex to extract the text before an optional (. I have this: ^(.*)[\(.*]|$ But its not working for some reason. Doesn't seem to make it to the $ if there is no ( present. Any help would be much appreciated Cheers

Related

Regex positive lookahead multiple occurrence

Regex that matches strings that are all lower case and do not contain specific string

Regex to validate cookie string (Key value paired)

Regex Extract a string between two words containing a particular string

RegEx: don't capture match, but capture after match

Categories

Resources

I'm trying to write a small regex to extract the text before an optional (. I have this: ^(.)[\(.]|$ But its not working for some reason. Doesn't seem to make it to the $ if there is no ( present. Any help would be much appreciated Cheers