Regex to match _ or end of string - regex

I'm working with MATLAB's regexp() and I'm trying to find a regular expression that would match only file names containing Cyto but not CytoBlue. My problem is that the file names look either like Texture_Variance_Cyto_4_90 and Texture_Variance_CytoBlue_4_90, or HIST_9BinsHistBin7_Cyto and HIST_9BinsHistBin7_CytoBlue.
If I just try to match Cyto, I also capture all the files containing CytoBlue. If I try to match Cyto_, I miss the file names where Cyto is the last element. I guess I'd need something that says "match either _ or the end of the string". I tried Cyto[_\Z] but that does not work, I again miss all the elements that ends with Cyto.

Cyto(?=$|_)
This matches Cyto, followed by ("(?=...)") the end of the string ("$") or _. Note that the underscore is not returned as part of the match.

use this regex: Cyto(_.*?(?= ))?\b

MATLAB supports positive and negative lookaheads, so this this should work:
Cytp(?!Blue)
...meaning "Cyto" not followed by "Blue".

Related

REGEX string extraction between underscore and file extension

I have a series of string like the following one:
abc_8g_1980_312.tif
from which I would like to extract the string '312' i.e. everything between the 3rd underscore and the file extension'.tif' string.
I'm trying using this website https://regex101.com/
inserting this regular expression: (\d{3})(\.tif$)
but I'm not getting what I would like to have.
Any suggestions would be appreciated.
To get the last 3 digits after an underscore with extension .tif you can also use lookarounds asserting _ to the left, and .tif to the right at the end of the string.
(?<=_)\d{3}(?=\.tif$)
Regex demo
Assuming what you want to capture would always be the last underscore-separated term in your file name, you could use:
(?<=_)[^_]+(?=\.)
Demo

Regex to MATCH number string (with optional text) in a sentence

I am trying to write a regex that matches only strings like this:
89-72
10-123
109-12
122-311(a)
22-311(a)(1)(d)(4)
These strings are embedded in sentences and sometimes there are 2 potential matches in the sentence like this:
In section 10-123 which references section 122-311(a) there is a phone number 456-234-2222
I do not want to match the phone. Here is my current working regex
\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*
see DEMO
I've been looking on Stack and have not found anything yet. Any help would be appreciated. Will be using this in a google sheet and potentially postgres.
Based on regex, suggested by #Wiktor Stribiżew:
=REGEXEXTRACT(A1,REPT("\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)(?:.*)",LEN(REGEXREPLACE(REGEXREPLACE(A1,"\b(\d{2,3}-\d{2,3}\b(?:\([A-Za-z0-9]\))*)(?:[^-]|$)", char (9)),"[^"&char(9)&"]",""))))
The formula will return all matches.
String:
A
In 22-311(a)(1)(d)(4) section 10-123 which ... 122-311(a) ... number 456-234-2222
Output:
B C D
22-311(a)(1)(d)(4) 10-123 122-311(a)
Solution
To extract all matches from a string, use this pattern:
=REGEXEXTRACT(A1,
REPT(basic_regex & "(?:.*)",
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]",""))))
The tail of a function:
LEN(REGEXREPLACE(REGEXREPLACE(A1,basic_regex, char (9)),"[^"&char(9)&"]","")))
is just for finding number 3 -- how many entries of a pattern in a string.
To not match the phone number you have to indicate that the match must neither be preceded nor followed by \d or -. Google spreadsheet uses RE2 which does not support look around assertion (see the list of supported feature) so as far as I can tell, the only solution is to add a character before and after the match, or the string boundary:
(?:^|[^-\d])\d{2,3}\-\d{2,3}(\([a-zA-Z0-9]\))*(?:$|[^-\d])
(?:^|[^-\d]) means either the start of a line (^) or a character that is not - or \d (you might want to change that, and forbid all letters as well). $ is the end of a line. ^ and $ only do what you want with the /m flag though
As you can see here this finds the correct strings, but with additional spaces around some of the matches.

How write a regex starts and ends with particular string?

I hava a string, like this:
{"content":(uint32)123", "id":(uint64)111, "test":{"hi":"(uint32)456"}}
I want to get result:
(uint32)123
(uint64)111
so I write regex like this:
[^(?!\")](\(uint32\)|\(uint64\))(\d)+[^(?!\")$]
but the result is:
:(uint32)123
:(uint64)111,
here the result adds : and ,
I hope that the regex does not begin with " and does not end with " , now I should how change my regex?
(\(uint(?:32|64)\)\d+) Works for me. It captures the entire string (uint[32/64])<any number of digits\> without bothering about the characters that come before or after.
Tested the following one in python
(?<!\")(\(uint32\)|\(uint64\))\d+(?!(\"|\d))
It looked like you was trying to use negative lookahead and negative lookbehind checks. But you did couple of mistakes:
You put them inside symbol group like this: [^(?!\")] what this regexp really mean - not any of symbols inside square bracket (^ - stands for not). How it should be instead: (?!\") - which mean symbol after current position shouldn't be quote (note: this will also work if there is no symbol after
To check symbol before you need to use look ahead check which have syntax (?<!some_regexp). So it would be (?<!\")
You don't need checks for start or end of the line. If you do you can put then into separate negative look ahead/behind statement.
Here is corrected example without line start/end checks:
(?<!\")(\(uint32\)|\(uint64\))(\d)+(?!\")(?!\d)
Note: you need to add (?!\d) at the end, cause otherwise it would match everything except last digit if there is quote.
Here is example with start/end of line checks:
(?<!^)(?<!\")(\(uint32\)|\(uint64\))(\d)+(?!\")(?!\d)(?!$)
P.S.: depending on language you using - you might not need to escape quote - you do need to escape quote only in case it is string escape sequence not regexp escape sequence.

regex for match inside a word

Say I have following similar texts:
_startOneEnd
_startTwoEnd
_startThreeEnd
I want to match on:
begins with _start
ends with End
and I want capture the bit in-between, e.g., One, Two, Three in the variable above:
Can anyone suggest a regex to capture this?
If each line of input contains only the text similar to your examples, something like this should work:
/^_start(.*)End$/
The ^ anchors the pattern to the start of the string. The $ anchors it to the end of the string. The parenthesis capture the middle part.
In C#, you may use this:
(?<=_start).*(?=End)
It isn't clear if the part in the middle may only be the examples given.
If so, use this:
_start((One)|(Two)|(Three))End
If not, is it can be anything, try this:
_start(.*?)End
Note that the match is non-greedy.

How to match a string that does not end in a certain substring?

how can I write regular expression that dose not contain some string at the end.
in my project,all classes that their names dont end with some string such as "controller" and "map" should inherit from a base class. how can I do this using regular expression ?
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Do a search for all filenames matching this:
(?<!controller|map|anythingelse)$
(Remove the |anythingelse if no other keywords, or append other keywords similarly.)
If you can't use negative lookbehinds (the (?<!..) bit), do a search for filenames that do not match this:
(?:controller|map)$
And if that still doesn't work (might not in some IDEs), remove the ?: part and it probably will - that just makes it a non-capturing group, but the difference here is fairly insignificant.
If you're using something where the full string must match, then you can just prefix either of the above with ^.* to do that.
Update:
In response to this:
but using both
public*.class[a-zA-Z]*(?<!controller|map)$
public*.class*.(?<!controller)$
there isnt any match case!!!
Not quite sure what you're attempting with the public/class stuff there, so try this:
public.*class.*(?<!controller|map)$`
The . is a regex char that means "anything except newline", and the * means zero or more times.
If this isn't what you're after, edit the question with more details.
Depending on your regex implementation, you might be able to use a lookbehind for this task. This would look like
(?<!SomeText)$
This matches any lines NOT having "SomeText" at their end. If you cannot use that, the expression
^(?!.*SomeText$).*$
matches any non-empty lines not ending with "SomeText" as well.
You could write a regex that contains two groups, one consists of one or more characters before controller or map, the other contains controller or map and is optional.
^(.+)(controller|map)?$
With that you may match your string and if there is a group() method in the regex API you use, if group(2) is empty, the string does not contain controller or map.
Check if the name does not match [a-zA-Z]*controller or [a-zA-Z]*map.
finally I did it in this way
public.*class.*[^(controller|map|spec)]$
it worked