RegEx for matching characters unless they are contained in certain string [duplicate] - regex

This question already has answers here:
Regex Pattern to Match, Excluding when... / Except between
(6 answers)
Closed 7 years ago.
Let's say I wanna match the letters E, Q and W. However, I don't want them matched if they're found in a certain string of characters, for instance, HELLO.
LKNSDG8E94GO98SGIOWNGOUH09PIHELLOBN0D98HREINBMUE
^ ^ ^ ^ ^
yes yes NO yes yes

There's a nifty regex trick you can use for this. Here's some code in JavaScript, but it can be adapted to any language:
var str = 'LKNSDG8E94GO98SGIOWNGOUH09PIHELLOBN0D98HREINBMUE',
rgx = /HELLO|([EQW])/g,
match;
while ((match = rgx.exec(str)) != null) {
if (match[1]) output(match[1] + '\n');
}
function output(x) { document.getElementById('out').textContent += x; }
<pre id='out'></pre>
Basically, you match on HELLO|([EQW]). Since regex is inherently greedly, if it comes across a HELLO, it'll immediately match that, thereby skipping the E inside of it.
Then you can just check the capture group. If there's something in that capture group, we know it's something we want. Otherwise, it must have been part of the HELLO, so we ignore it.

Related

Regex match exact string pattern not working [duplicate]

This question already has answers here:
Regular expression pipe confusion
(5 answers)
Closed 20 days ago.
I am sure this question has been answered before, but I cannot find exactly what I was looking for on the Stackoverflow. Would you be kind enough to help me with my issue?
What is the issue? My regex pattern would stop on string #2 the minute is found the same pattern as string #1. In other words, it does not know that "deposit.accountNumberXXX" is not the same as "deposit.accountNumber".
How do I create a pattern that it will be able to make a distinction between string #1 and string #2?
I have two strings.
deposit.accountNumber
deposit.accountNumberXXXX
const findReplace = (valuesToBeReplaced: string, dictionaryKeyValue: { [x: string]: string }) => {
const keyValueString = Object.keys(dictionaryKeyValue).join('|');
const pattern = `${keyValueString}\\b`
const result = new RegExp(pattern, 'g');
// iterate through the Keys of the dictionary and return a replacement value
return valuesToBeReplaced.replace(result, (matched) => dictionaryKeyValue[matched]);
};
I have tried different patterns and could not get it to work.
You need parentheses around ${keyValueString}. Otherwise, the regexp looks like
word1|word2|word3\b
and the
\b word boundary is only applied to the last word.
const pattern = `(?:${keyValueString})\\b`

Extract all chars between parenthesis [duplicate]

This question already has answers here:
Regular Expression to get a string between parentheses in Javascript
(10 answers)
Closed 2 years ago.
I used
let regExp = /\(([^)]+)\)/;
to extract
(test(()))
from
aaaaa (test(())) bbbb
but I get only this
(test(()
How can I fix my regex ?
Don't use a negative character set, since parentheses (both ( and )) may appear inside the match you want. Greedily repeat instead, so that you match as much as possible, until the engine backtracks and finds the first ) from the right:
console.log(
'aaaaa (test(())) bbbb'
.match(/\(.*\)/)[0]
);
Keep in mind that this (and JS regex solutions in general) cannot guarantee balanced parentheses, at least not without additional post-processing/validation.

Python regex to parse '#####' text in description field [duplicate]

This question already has answers here:
regex to extract mentions in Twitter
(2 answers)
Extracting #mentions from tweets using findall python (Giving incorrect results)
(3 answers)
Closed 3 years ago.
Here's the line I'm trying to parse:
#abc def#gmail.com #ghi j#klm #nop.qrs #tuv
And here's the regex I've gotten so far:
#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
My goal is to get ['#abc', '#ghi', '#tuv'], but no matter what I do, I can't get 'j#klm' to not match. Any help is much appreciated.
Try using re.findall with the following regex pattern:
(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)
inp = "#abc def#gmail.com #ghi j#klm #nop.qrs #tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))#[A-Za-z]+(?=\s|$)', inp)
print(matches)
This prints:
['#abc', '#ghi', '#tuv']
The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the # symbol is either a space or the start of the string. We can't use a word boundary here because # is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like #nop.qrs. Again, a word boundary alone would not be sufficient.
just add the line initiation match at the beginning:
^#[A-Za-z]+[^0-9. ]+\b | #[A-Za-z]+[^0-9. ]
it shoud work!

Regexp for string stating with a + and having numbers only [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 4 years ago.
I have the following regex for a string which starts by a + and having numbers only:
PatternArticleNumber = $"^(\\+)[0-9]*";
However this allows strings like :
+454545454+4545454
This should not be allowed. Only the 1st character should be a +, others numbers only.
Any idea what may be wrong with my regex?
You can probably workaround this problem by just adding an ending anchor to your regex, i.e. use this:
PatternArticleNumber = $"^(\\+)[0-9]*$";
Demo
The problem with your current pattern is that the ending is open. So, the string +454545454+4545454 might appear to be a match. In fact, that entire string is not a match, but the engine might match the first portion, before the second +, and report a match.

Parse formula name and arguments with regex [duplicate]

This question already has answers here:
How to get function parameter names/values dynamically?
(34 answers)
Closed 6 years ago.
The objective of this Regex (\w*)\s*\([(\w*),]*\) is to get a function name and its arguments.
For example, given f1 (11,22,33)
the Regex should capture four elements:
f1
11
22
33
What's wrong with this regex?
You can do it with split Here is an example in javascript
var ar = str.match(/\((.*?)\)/);
if (ar) {
var result = ar[0].split(",");
}
Reference: https://stackoverflow.com/a/13953005/1827594
Some things are hard for regexes :-)
As the commenters above are saying, '*' can be too lax. It means zero or more. So foo(,,) also matches. Not so good.
(\w+)\s*\((\w+)(?:,\s*(\w+)\s*)*\)
That is closer to what you want I think. Let's break that down.
\w+ <-- The function name, has to have at least one character
\s* <-- zero or more whitespace
\( <-- parens to start the function call
(\w+) <-- at least one parameter
(?:) <-- this means not to save the matches
,\s* <-- a comma with optional space
(\w+) <-- another parameter
\s* <-- followed by optional space
This is the result from Python:
>>> m = re.match(r'(\w+)\s*\((\w+)(?:,\s*(\w+)\s*)*\)', "foo(a,b,c)")
>>> m.groups()
('foo', 'a', 'c')
But, what about something like this:
foo(a,b,c
d,e,f)
?? Yeah, it gets hard fast with regexes and you move on to richer parsing tools.