Match range without duplicating - regex

I need to match 1/1 all the way through 4/48. 1/1,1/2,1/3 all the way through 1/48,2/1 through 2/48 ....3/1 through 3/48 and 4/1 through 4/48.... but having a hard time with the regex as it must not contain a comma at the end and not duplicate any thing that was already input.
^([1-4]\/([1-9]|[1-4][0-8]|[1-3][0-9]))[?\,]+$

To match numbers from 1 to 48
either 1 to 9 : [1-9]
or 10 to 39 : [1-3][0-9]
or 40 to 48 : 4[0-8]
which gives
[1-9]|[1-3][0-9]|4[0-8]
Update from comments, the following regex
^(?:(?:^|,)[1-4]\/(?:[1-9]|[1-3][0-9]|4[0-8])(?=,|$))+$
About non-capturing group and lookaheads.
a non-capturing group (?:..) is the same a group (..) except that it can't be backreferenced, so it can be preferred to avoid to icrement the number of the backreference.
a lookhead is not consumming which means that after the match the cursor in the input string doesn't move forward. for example after matching (?:,) the input cursor will after , while after matching (?=,) i.e. ensuring matches the following at input cursor position, the cursor position will still be before the ,.

Related

Negate a character group to replace all other characters

I have the following string:
"Thu Dec 31 22:00:00 UYST 2009"
I want to replace everything except for the hours and minutes so I get the following result:
"22:00"
I am using this regex :
(^([0-9][0-9]:[0-9][0-9]))
But its not matching anything.
This would be my line of actual code :
println("Thu Dec 31 22:00:00 UYST 2009".replace("(^([0-9][0-9]:[0-9][0-9]))".toRegex(),""))
Can someone help me to correct the regex?
The reason the one you have isn't working is because you are asserting that the line starts right before the minutes and seconds, which isn't the case. This can be fixed by removing the assertion (^).
If you need the assertion to remain, there is another way. In most languages, you wouldn't be able to use a variable-length positive lookbehind here, but lucky for you, it looks like you can in Kotlin.
A positive lookbehind is basically just telling the pattern "this comes before what I'm looking for". It's denoted by a group beginning with ?<=. In this case, you can use something like (?<=^[\w ]+). This will match all word characters or spaces between the beginning of the line and where the pattern that comes after it is able to match. Appending it to your expression would look something like (?<=^[\w ]+)([0-9][0-9]:[0-9][0-9]) (note you will have to escape the \w in order for it to be in a string and not be angry about it).
Side note, Yogesh_D is correct in saying that \d\d:\d\d is the same as your [0-9][0-9]:[0-9][0-9]. Using this, it would look more like (?<=^[\w ]+)\d\d:\d\d.
You may use various solutions, here are two:
val text = """Thu Dec 31 22:00:00 UYST 2009"""
val match = """\b(?:0?[1-9]|1\d|2[0-3]):[0-5]\d\b""".toRegex().find(text)
println(match?.value)
val match2 = """\b(\d{1,2}:\d{2}):\d{2}\b""".toRegex().find(text)
println(match2?.groupValues?.getOrNull(1))
Both return 22:00. See regex #1 demo and regex #2 demo.
The regex complexity should be selected based on how messy the input string is.
Details
\b - a word boundary
(?:0?[1-9]|1\d|2[0-3]) - an optional zero and then a non-zero digit, or 1 and any digit, or 2 and a digit from 0 to 3
: - a : char
[0-5]\d - 0, 1, 2, 3, 4 or 5 and then any one digit
\b - a word boundary.
If there is a match with this regex, you get it as a whole match, so you can access it via match?.value.
If you do not have to worry about any pre-valiation when matching, you may simply match 3 colon-separated digit pairs and capture the first two, see the second regex:
\b - a word boundary
(\d{1,2}:\d{2}) - Group 1: one or two digits, : and two digits
:\d{2} - a : and two digits (not captured)
\b - a word boundary.
If there is a match, we need Group 1 value, hence match2?.groupValues?.getOrNull(1) is used.
I am not sure what language you are using but why use negation when you can directly match the first digits in the hh:mm format.
Assuming that the date string format always is in the format with a hh:mm in there.
This regex snippet should have the first group match the hh:mm.
https://regex101.com/r/aHdehZ/1
The regex to use is (\d\d:\d\d)

Regex substitution for substring from position n to m and remove leading 0

I have the following string
020075307354H 021133360876 981497910079937800ABC CDE FGH THY 0M19780403015001O+2¹qujzh_¢o\piVN¤«²µerNA¥\^?©E|=V_®¢Zu<£;Æ^TV½IÌc¤±·Gl.ÁEÊO·9y¹Bs¾Ë©ºFT¥*ÉA¬=iÚÒ®{æ*»¨;ÄNÕ®Ûòæ¦'Ñ…9>ÙYKè¹t/R{(>ÔÕBã2½7q¹|u…nztf~¦spw_ZX£\¦~Qa²mn¡¨QX«W±¯¯¦¨d£¾}·`B¶M}Qc|AµOÇ~Äd¤·¯HÇaI_¶²ÂÆYC?xÄR²>½HpÃjÁNLifm#ÕEí¾)ZvÇÊzØ)D&¦áÑM¡ç…1F¥Åh9R[9Fä¤Ãå<÷¼T}Ã…©ÎCDNs«E`É?¤eñ/ï´¯Åíÿt
and I want to use 1 Regex substitution to do the following 2 tasks:
Get the substring from position 49 to 58 -> 0079937800
Strip leading zeros from this substring -> 79937800
The desired end result is 79937800.
I figured out, that I can substitute the substring of task 1 with .{48}(.{10}).+.
The second task of removing leading I figured I can get using (\b0*([1-9][0-9]*|0)\b) , but how can I combine both tasks and get a working substitution string?
You can capture the "marker" that follows the 10-character string in a capture group in a positive lookahead, then match the desired substring with an arbitrary number of leading zeroes, and follow it with another positive lookahead to ensure that it is followed by the marker captured in the first capture group. The desired substring will then be in the second capture group:
^.{48}(?=.{10}(.*))0*(.*?)(?=\1)
Demo: https://regex101.com/r/Q61KYJ/1
Since you commented that the requirement for a substitution is mandated by your software, you can simply add .* at the end of the above regex and substitute the match with the second capture group:
^.{48}(?=.{10}(.*))0*(.*?)(?=\1).*
Demo: https://regex101.com/r/FcRAGB/1

How to improve a regex for print range?

I would like to improve a VBA regex for a print range.
Currently I have this:
(\d+(-\d+)*)+(,\d+(-\d+)*)*
But, for an entry 12-25,45,50-53 this is returning the , and - like this:
Match 1: -25
Match 2: ,50-53
Match 3: -53
and is not returning the 45
Ideally I'd like a group returned for each comma delimited entry without any , or - like this:
Match 1: (12-25)
Match 2: (45)
Match 3: (50-53)
The reason 45 is not in a group is that you are repeating the second capturing group. When you are repeating a capturing group, the group contains the value of the last iteration.
So (,\d+(-\d+)*) will capture ,45. Now the whole group is repeated due to the outer * and within that last iteration ,50 is captured by ,\d+ and -53 is captured by -\d+
What you might do is match 1+ digits and use a single optional group for the hyphen and 1+ digits part to get 3 matches.
Use a positive lookahead (?=,|$) to assert what is directly on the right is a comma or the end of the string.
\d+(?:-\d+)?(?=,|$)
Regex demo
If you want 3 groups, you could use:
(\d+(?:-\d+)?),(\d+(?:-\d+)?),(\d+(?:-\d+)?)
Regex demo

How would I find values in a file, but only on lines that don't start with #?

I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)

How can I replace this expression in chain regex (notepad++)?

i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me
This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.