Groovy - Extract a string between two different strings [duplicate] - regex

This question already has answers here:
Regex Match all characters between two strings
(16 answers)
Closed 5 years ago.
I have files names in the below format -
India_AP_Dev1.txt
USA_GA_QA2.txt
USA_NY_AWSDev1.txt
AUS_AA_BB_QA4.txt
I want to extract only the environment part from the file name i.e. Dev1, QA2, AWSDev1, QA4etc. How can I go about with this type of file names. I thought about substring but the environment length is not constant. Is it possible to do it with regex
Appreciate your help. TIA

It is definitely possible using lookarounds:
(?<=_)[^._]*(?=\.)
(?<=_) match is preceded by _
[^._] take all characters except . and _
(?=\.) match is followed by .
Demo

Related

How to match optional group, if it is already matched by main group? [duplicate]

This question already has an answer here:
regexp match anything before and after a word, if it exists
(1 answer)
Closed 9 months ago.
I have input strings like:
any[sym)bol_text
any[sym)bol_text (any[sym)bol_text) any[sym)bol_text
any[sym)bol_text (this_text)
any[sym)bol_text2 (this_text)Fzcj
And I have regexp:
(?<text>[^\r\n]+)(?:\(this_text\))?
But I can't handle strings with (this_text) optional group. It matches by first one, but I don't need this exact text in output
^(?<text>.+?)(?:\(this_text\).*)?$
So yes, last group should contains handling any text and ends with $

Extract all chars between parenthesis [duplicate]

This question already has answers here:
Regular Expression to get a string between parentheses in Javascript
(10 answers)
Closed 2 years ago.
I used
let regExp = /\(([^)]+)\)/;
to extract
(test(()))
from
aaaaa (test(())) bbbb
but I get only this
(test(()
How can I fix my regex ?
Don't use a negative character set, since parentheses (both ( and )) may appear inside the match you want. Greedily repeat instead, so that you match as much as possible, until the engine backtracks and finds the first ) from the right:
console.log(
'aaaaa (test(())) bbbb'
.match(/\(.*\)/)[0]
);
Keep in mind that this (and JS regex solutions in general) cannot guarantee balanced parentheses, at least not without additional post-processing/validation.

RegEx for Dutch ING bankstatement [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Is there anyone who can help me to get the marked pieces out of this file (see image below) with a regular expression? As you can see, it's difficult because the length is not always the same and the part before my goal is sometimes broken down and sometimes not.
Thank you in advance.
Text:
:61:200106D48,66NDDTEREF//00060100142533
/TRCD/01028/
:86:/EREF/SLDD-0705870-5658387529//MARF/11514814-001//CSID/NL59ZZZ390
373820000//CNTP/NL96ABNA0123456789/ABCANL2A/XXXXXXX123///REMI/UST
D//N00814760/
:61:200106D1840,55NDDTEREF//00060100142534
/TRCD/01028/
:86:/EREF/SLDD-0705869-5658387528//MARF/11514814-001//CSID/NL59ZZZ390
373820000//CNTP/NL96ABNA0123456789/ABCANL2A/XXX123XXXX///REMI/UST
D//N00814759/
:61:200106C236,31NTRFEREF//00060100142535
/TRCD/00100/
:86:/EREF/05881000010520//CNTP/NL19INGB0123456789/ABCBNL2A/XX123XXXX//
/REMI/USTD//KLM REF 1000000022/
The length is not always the same but it does not really matter in your case. You can check for a particular pattern at the end of a string.
(?<=\/\/)([\u2022a-zA-Z0-9]+)(?=\/$)
this regex will look for a string of caracter containing bullet (•), numbers, letters (uppercase and lowercase), that followes two front slash (//) and is followed by a slash (/) and the end of the string ( $ ).
You can test more cases here

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

Regexp for string stating with a + and having numbers only [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 4 years ago.
I have the following regex for a string which starts by a + and having numbers only:
PatternArticleNumber = $"^(\\+)[0-9]*";
However this allows strings like :
+454545454+4545454
This should not be allowed. Only the 1st character should be a +, others numbers only.
Any idea what may be wrong with my regex?
You can probably workaround this problem by just adding an ending anchor to your regex, i.e. use this:
PatternArticleNumber = $"^(\\+)[0-9]*$";
Demo
The problem with your current pattern is that the ending is open. So, the string +454545454+4545454 might appear to be a match. In fact, that entire string is not a match, but the engine might match the first portion, before the second +, and report a match.