Lookbehind conundrum OR maybe groups?

Lookbehind conundrum OR maybe groups? - regex

I am sorry I can't formulate a good question:
This regex should find the word 'period' followed by a whitespace and one digit:
period.*(?=\s[0-9]{1})|alternative
If I input the line TEST 2019 to period 3.csv the regex matches period.
If I input the line TEST period 3 2019.csv the regex matches period 3.
My indtended match is period 3
You can se what I mean from this screenshot from regex101:
For now I have solved it with lookbehind positve like this:
(?<=period\s)[0-9]{1,4}|alternative
This matches the digit after 'period' and I can just add 'period' for my specific purpose. But I don't understand why I get different matches.

You don't need .* after period, so just remove it in from your regex and write it like this
period(?=\s[0-9]{1})|alternative
This matches period literally which is followed by a whitespace and a number (ensured by your positive look ahead). Also you really don't need to write {1} as that's be default and is redundant. Also if you don't want period to match partially in a larger text, use word boundary \b before it and change your regex to this,
\bperiod(?=\s[0-9])|alternative
Demo
Also, your look behind (?<=period\s)[0-9]{1,4}|alternative is not correct for matching the text period and indeed that look behind will just match the number which is preceded by period and one whitespace.
Check this Demo

Related

Notepad++ Regex Find all endline without periods

I'm trying to find all lines without ending period (dot) but without finding blank (empty) lines. And after that I want to add ending period to that sentence.
Example:
The good is whatever stops such things from happening.
Meaning as the Higher Good
It was from this that I drew my fundamental moral conclusions.
I have tried few regex but they also find empty lines as well.
Is there a regex for Notepad++ that can achieve that?

Enable Regular Expression match, then search for:
\S(?<!\.)\K\s*$
and replace with:
.$0
Breakdown:
\S Match a non-whitespace character
(?<!\.) It shouldn't be a period
\K Reset match
\s* Match optional whitespace characters
$ End of line

You could use something like this to find the lines that you are interested in adding capture group to it and appending you needed chars.
(?<!\.)\r\n
This works by using negative look behind (?<!\.) to check that there is no . before \r
There is a group or regex operators that can be used to accomplish this type of tasks.
Look ahead positive (?=)
Look ahead negative (?!)
Look behind positive (?<=)
Look behind negative (?

Try this short and effective solution too.
Search: \w$
Replace: $0.

Capture number between two whitespaces (RegEx)

I have the following data:
SOMEDATA .test 01/45/12 2.50 THIS IS DATA
and I want to extract the number 2.50 out of this. I have managed to do this with the following RegEx:
(?<=\d{2}\/\d{2}\/\d{2} )\d+.\d+
However that doesn't work for input like this:
SOMEDATA .test 01/45/12 2500 THIS IS DATA
In this case, I want to extract the number 2500.
I can't seem to figure out a regex rule for that. Is there a way to extract something between two spaces ? So extract the text/number after the date until the next whitespace ? All I know is that the date will always have the same format and there will always be a space after the text and then a space after the number I want to extract.
Can someone help me out on this ?

Capture number between two whitespaces
A whitespace is matched with \s, and non-whitespace with \S.
So, what you can use is:
\d{2}\/\d{2}\/\d{2} +(\S+)
^^^
See the regex demo
The 1+ non-whitespace symbols are captured into Group 1.
If - for some reason - you need to only get the value as a whole match, use your lookbehind approach:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Or - if you are using PCRE - you may leverage the match reset operator \K:
\d{2}\/\d{2}\/\d{2} +\K\S+
^^
See another demo
NOTE: the \K and a capture group approaches allow 1 or more spaces after the date and are thus more flexible.

I see some people helped you already, but if you would want an alternative working one for some reason, here's what works too :)
.+ \d+\/\d+\/\d+ (\d+[\.\d]*)
So the .+ matches anything plus the first space
then the \d+/\d+/\d+ is the date parsing plus a space
the capturing group is the number, as you can see I made the last part optional, so both floating point values and normal values can be matched. Hope this helped!
Proof: https://regex101.com/r/fY3nJ2/1

Just make the fractal part optional:
(?<=\d{2}\/\d{2}\/\d{2} )\d+(?:\.\d+)?
Demo: https://regex101.com/r/jH3pU7/1
Update following clarifications in comments:
To match anything (but space) surrounded by spaces and prepended by date use:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Demo: https://regex101.com/r/jH3pU7/3

Rather than capture, you can make your entire match be the target text by using a look behind:
(?<=\d\d(\/\d\d){2} )\S+
This matches the first series of non-whitespace that follows a "date like" part.
Note also the reduction in the length of the "date like" pattern. You may consider using this part of the regex in whatever solution you use.

check if there is a word repeated at least 2 or more times. (Regular Expression)

Using Regular Expression,
from any line of input that has at least one word repeated two or more times.
Here is how far i got.
/(\b\w+\b).*\1
but it is wrong because it only checks for single char, not one word.
input: i might be ill
output: < i might be i>ll
<> marks the matched part.
so, i try to do (\b\w+\b)(\b\w+\b)*\1
but it is not working totally.
Can someone give help?
Thanks.

this should work
(\b\w+\b).*\b\1\b
greedy algorithm will ensure longest match. If you want second instance to be a separate word you have to add the boundaries there as well. So it's the same as
\b(\w+)\b.*\b\1\b

Positive lookahead is not a must here:
/\b([A-Za-z]+)\b[\s\S]*\b\1\b/g
EXPLANATION
\b([A-Za-z]+)\b # match any word
[\s\S]* # match any character (newline included) zero or more times
\b\1\b # word repeated
REGEX 101 DEMO

To check for repeated words you can use positive lookahead like this.
Regex: (\b[A-Za-z]+\b)(?=.*\b\1\b)
Explanation:
(\b[A-Za-z]+\b) will capture any word.
(?=.*\b\1\b) will lookahead if the word captured by group is present or not. If yes then a match is found.
Note:- This will produce repeated results because the word which is matched once will again be matched when regex pointer captures it as a word.
You will have to use programming to strip off the repeated results.
Regex101 Demo

Replace multiple dots in string with different character but same amount

I have a string like the following
"blaa...blup..blaaa...bla."
Every part where there is more than one dot must be replaced by "_" but it must have the same amount as replaced chars.
The string should result in:
"bla___blup__blaaa___bla."
Notice that the last dot is not replaced as it has not other dots "connected".
I tried using following regex approach in powershell but I always get a legnth of 2 for the match regardless if there where 3 or more dots:
$string -replace '(.)\1+',("_"*'$&'.length)
Any ideas?

You can use the \G anchor to glue a match to the previous.
\.(?=\.)|\G(?!^)\.
\.(?=\.) match a period if another one is ahead.
|\G(?!^)\. or replace period if there was a previous match (but not start)
Replace with underscore. See demo at regexstorm

You can use the following pattern:
\.(?=\.)|(?<=\.)\.
And replace with _.
The pattern simply looks for either a period that is preceded by a period or a period which is followed by a period:
\.(?=\.) - Matches a period which is followed by a period
| - Or
(?<=\.)\. - Matches a period which is preceded by a period
See the online demo.

None of the languages and regex flavors I know allow you to evaluate the backreference numeric value "on the fly", you can only use it in the callback function. See Use a function in Powershell replace.
However, in this particular case, you can use the following regex:
((?=\.{2})|(?!^)\G)\.
And replace with _.
See the regex demo here.
And the explanation:
((?=\.{2})|(?!^)\G) - a boundary that either matches a location before 2 dots (with (?=\.{2})) or the end of the previous successful match (with (?!^)\G)
\. - a literal dot.

Regular expression to match last number in a string

I need to extract the last number that is inside a string. I'm trying to do this with regex and negative lookaheads, but it's not working. This is the regex that I have:
\d+(?!\d+)
And these are some strings, just to give you an idea, and what the regex should match:
ARRAY[123] matches 123
ARRAY[123].ITEM[4] matches 4
B:1000 matches 1000
B:1000.10 matches 10
And so on. The regex matches the numbers, but all of them. I don't get why the negative lookahead is not working. Any one care to explain?

Your regex \d+(?!\d+) says
match any number if it is not immediately followed by a number.
which is incorrect. A number is last if it is not followed (following it anywhere, not just immediately) by any other number.
When translated to regex we have:
(\d+)(?!.*\d)
Rubular Link

I took it this way: you need to make sure the match is close enough to the end of the string; close enough in the sense that only non-digits may intervene. What I suggest is the following:
/(\d+)\D*\z/
\z at the end means that that is the end of the string.
\D* before that means that an arbitrary number of non-digits can intervene between the match and the end of the string.
(\d+) is the matching part. It is in parenthesis so that you can pick it up, as was pointed out by Cameron.

You can use
.*(?:\D|^)(\d+)
to get the last number; this is because the matcher will gobble up all the characters with .*, then backtrack to the first non-digit character or the start of the string, then match the final group of digits.
Your negative lookahead isn't working because on the string "1 3", for example, the 1 is matched by the \d+, then the space matches the negative lookahead (since it's not a sequence of one or more digits). The 3 is never even looked at.
Note that your example regex doesn't have any groups in it, so I'm not sure how you were extracting the number.

I still had issues with managing the capture groups
(for example, if using Inline Modifiers (?imsxXU)).
This worked for my purposes -
.(?:\D|^)\d(\D)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Lookbehind conundrum OR maybe groups? - regex

Related

Notepad++ Regex Find all endline without periods

Capture number between two whitespaces (RegEx)

check if there is a word repeated at least 2 or more times. (Regular Expression)

Replace multiple dots in string with different character but same amount

Regular expression to match last number in a string

Categories

Resources