I am using this extension for chrome (It's called Word Replacer II) and I'm trying to create a Regex find and replace.
Quick backstory, my partner is recovering from an eating disorder and I want to find all mentions of Kilojoules and kJs and replace them with .
I am entirely new to Regex and after a few hours, I'm not much closer to getting a working expression.
I need it to remove up to 4 digits before the letters "kJs". E.g, 400kJs and 1000kJs. I'd like the "400kJs and 1000kJs" to be replaced with "[removed kJs] and [removed kJs]".
The code I have put together so far is;
\s+(a{1,4}<=\d)\s+(?=kJ)
And help would be much appreciated!
You may use the following approach:
\d{1,4}\s*kJs\b
See the regex demo
If you need to keep kJs, you may wrap the right part of the pattern with a lookahead, \d{1,4}(?=\s*kJs\b).
If you do not want to touch 5 or more digit numbers, use
\b\d{1,4}\s*kJs\b
(?<!\d)\d{1,4}\s*kJs\b
That is, add a word boundary, \b, or a left-hand digit boundary, (?<!\d).
Pattern details
\d{1,4} - one to four digits
\s* - 0+ whitespaces
kJs - a string of letters
\b - a word boundary (may not be necessary if there can be no word starting with kJs).
Related
Im currenty having issues with a regex that Im creating. The regex has to extract all the groups that says number #### between Hello and Regards. At this moment my regex only extracts one group and I need all the groups inside, at this case I have 2, but there may be more inside.
Regex Image
I'm using the web page https://regex101.com/
Flavor: PCRE (PHP)
Regex: Hello\s.*(number\s*[\d]*)\s.*Regards
Text:
This is my test text number 25120
Hello my name is testing
I'm 20 years old
Please help me with the regex number 1542
I have been trying to create the regex many times this is my number 5152
Regards
I'm still trying my attempt number 5150
Result:
My Result is only the group number 5152 but inside is another group number 1542.
You may use
(?si)(?:\G(?!\A)|\bHello\b)(?:(?!\bHello\b).)*?\K\bnumber\s*\d+(?=.*?\bRegards\b)
See the regex demo.
Details
(?si) - s - DOTALL modifier making . match any chars, and i makes the pattern case insensitive
(?:\G(?!\A)|\bHello\b) - either the end of the previous match (\G(?!\A)) or (|) a whole word Hello (\bHello\b)
(?:(?!\bHello\b).)*? - any char, 0 or more times but as few as possible, that does not start a whole word Hello char sequence
\K - match reset operator that discards all text matched so far
\bnumber - a whole word number
\s* - 0+ whitespaces
\d+ - 1+ digits
(?=.*?\bRegards\b) - there must be a whole word Regards somewhere after any 0+ chars (as few as possible).
I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}
Talking about Regex, I am facing with the problem to replace hyphenations in the beginning part of a composed word.
For example:
wo-wo-wo-wonder -> wonder
hi-hi-hi-hi -> hi
wo-wo-wo -> wo
f-f-f-fight
So, for every word inside a text, I want to replace words that before the main word (wonder) have a partial or total repetition of the main word (wo-wo-wo but also wonder-wonder-wonder).
At the same time, composed words like bi-linear or
pre-trained MUST NOT be replaced, because in this case the hyphenation (pre) is not part of the main word (train).
I've seen this solution [Python find all occurrences of hyphenated word and replace at position ] and apparently it can be a good solution.
But my problem is quite different because I don't want to impose constraints about the length of hyphenation, and at the same time I want to check that hyphen is part of the main word.
This is the Regex I am actually using but as explained, it doesn't solve my full problem.
re.sub(r'(?<!\S)(\w{1,3})(?:-\1)*-(\w+)(?!\S)', '\\2', s)
Use
r'(?<!\S)(\w+)(?:-\1)*-(\1)'
or
r'\b(\w+)(?:-\1)*-(\1)'
See the regex demo
Details
(?<!\S) - a whitespace boundary (if you use \b, a word boundary)
(\w+) - Group 1: any one or more word chars
(?:-\1)* - 0 or more repetitions of - and Group 1 value
- - a hyphen
(\1) - Group 2: same value as in Group 1.
Python sample re.sub:
s = re.sub(r'(?<!\S)(\w+)(?:-\1)*-(\1)', r'\2', s)
Every time I need to use a regex I realize I've forgotten everything about them.
I am trying to match all words that have only lowercase alphanumeric characters AND do not have doubled alphanumeric characters AND are also within {10,12} characters long.
Now, to figure out if a character is followed by the same character, I would do (.)\1. To see if a word is within 10 and 12 characters I do {10,12}. To grab only lowercase letters and the digits, I do [0-9a-z].
But how do I link them together?
Cheers!
PS: this will be running on a fairly large NLP xml (100mb+), so I would appreciate it if the regex wasn't the slowest alternative.
I think this will do what you want: -
/\b(?:([a-z0-9])(?!\1)){10,12}\b/
Explanation: -
\b // Word boundary
(?:
([a-z0-9]) // Match lowercase letters or digit
(?!\1) // Not followed by the same digit as before
){10,12} // 10 to 12 times.
\b // Word boundary
Here's one, although I'm not sure there won't be a better way...
/\b(?:([a-z0-9])(?!\1)){10,12}\b/
Here is my attempt:
(\b(?![0-9a-z]*([0-9a-z])\2)[0-9a-z]{10,12}\b)
(We have to use a lookahead, and some kind of boundary is usually very important for it to function properly. Hence \b).
At the time of writing, another answer has a false positive, matching a part of eoeuaoarounn
I am trying to make an regex in PCRE for string detection. The kind of strings I want to detect are abcdef001, zxyabc003. A word with first 6 characters are a-zA-Z and last two or three are digits 0-9; and this string could be anywhere in the whole text.
E.g - "User activity from server1, user id abcdef009, time 10.20am".
How do I go about this?
Try this:
/[a-zA-Z]{6}[0-9]{2,3}/
If you want to limit it to whole words, try:
/\b[a-zA-Z]{6}[0-9]{2,3}\b/
\b - word boundry
[a-zA-Z]{6} - six letters
[0-9]{2,3} - either 2 or 3 numbers
\b - word boundry
Use regex pattern
/[a-z]{6}\d{2,3}/i