We're receiving a file from a customer which we need to read and save some values into our ERP-System.
the customer sends us a date in a week format like: 201814 this would mean the 14th week of the year 2018
the customer sends this date never in the same place in the file, so the only way i think I can get this date, is by searching the string in the file by regex.
My Regex should probably consist of the following conditions:
the length of the string is always 6 characters
all characters are nummeric values
the string always starts with 20
the last two values have to be between 01 and 53
what would the perfect regex for this be? there are many other "nummeric-only" values in the file, that's why i need to be so specific
I know I can do the length condition like this {1,6} and I know that [0-9] matches all digits from zero to nine, but I can't see how I can restrict 01 to 53.
Can someone help me with my regex? thanks a lot!
You may try this:
\b20\d{2}(?:0[1-9]|[1-4]\d|5[0-3])(?!\d)
Demo
Explanation:
\b word boundary start of string indiciator
20 literaly must start with 20
\d{2} followed by any two digits
(?: non capturing group starts here
0[1-9] means 01 to 09
or
[1-4]\d means 10 to 49
or
5[0-3] means 50-53
) end of non capturing group
(?!\d) negative lookahead to ensure the entire match is not followed
by a digit. The entire regex is formed such a way that you should not need to measure 6 digits; as if it is not 6 digit then the above conditions won't be met.
Use This: (20)\d{2}([1-4][0-9]|[0][1-9]|[5][1-3])
Demo
Related
I'm trying to validate a user inputed video timestamp in the format hours:minutes:seconds using a regex. So I'm assuming the hours component can be arbitrarily long, so the final format is basically hhhh:mm:ss where there can be any number of h. This is what I have so far
(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])
where
(
([0-9]+:)? # hh: optionally with an arbitrary number of h
([0-5][0-9]:) # mm: , with mm from 00 to 59
)? # hh:mm optionally
([0-5][0-9]) # ss , wtih ss from 00 to 59
which I believe is almost there, but doesn't handle cases like 1:31 or just 1. So to account for this if I add the first digit inside the mm and ss blocks as optional,
(([0-9]+:)?([0-5]?[0-9]:))?([0-5]?[0-9])
firstly the last seconds block starts matching values like 111. Also values like 1:1:12 are matched , which I don't want (should be 1:01:12). So how can I modify this so that m:ss and s are valid whereas h:m:ss,m:s and sss are not?
I am new to regular expressions, so apologies in advance if I'm doing something stupid. Any help is appreciated. Thanks.
You can match either 1 or more digits followed by an optional :mm:ss part, or match mm:ss.
To also match 6:12 and not 1:1:12 make only the first digit optional in the second part of the pattern.
^(?:\d+(?::[0-5][0-9]:[0-5][0-9])?|[0-5]?[0-9]:[0-5][0-9])$
^ Start of string
(?: Non capture group
\d+ Match 1+ digits
(?::[0-5][0-9]:[0-5][0-9])? Match an optional :mm:ss part, both in range 00 - 59
| Or
[0-5]?[0-9]:[0-5][0-9] Match m:ss or mm:ss both in range 00-59 where the first m is optional
) Close non capture group
$ End of string
Regex demo
Doesn't adding the positional anchors(^ and $)solve your problem?
^(([0-9]+:)?([0-5][0-9]:))?([0-5][0-9])$
Check here: https://regex101.com/r/fRZf2R/1
I have the following string:
"Thu Dec 31 22:00:00 UYST 2009"
I want to replace everything except for the hours and minutes so I get the following result:
"22:00"
I am using this regex :
(^([0-9][0-9]:[0-9][0-9]))
But its not matching anything.
This would be my line of actual code :
println("Thu Dec 31 22:00:00 UYST 2009".replace("(^([0-9][0-9]:[0-9][0-9]))".toRegex(),""))
Can someone help me to correct the regex?
The reason the one you have isn't working is because you are asserting that the line starts right before the minutes and seconds, which isn't the case. This can be fixed by removing the assertion (^).
If you need the assertion to remain, there is another way. In most languages, you wouldn't be able to use a variable-length positive lookbehind here, but lucky for you, it looks like you can in Kotlin.
A positive lookbehind is basically just telling the pattern "this comes before what I'm looking for". It's denoted by a group beginning with ?<=. In this case, you can use something like (?<=^[\w ]+). This will match all word characters or spaces between the beginning of the line and where the pattern that comes after it is able to match. Appending it to your expression would look something like (?<=^[\w ]+)([0-9][0-9]:[0-9][0-9]) (note you will have to escape the \w in order for it to be in a string and not be angry about it).
Side note, Yogesh_D is correct in saying that \d\d:\d\d is the same as your [0-9][0-9]:[0-9][0-9]. Using this, it would look more like (?<=^[\w ]+)\d\d:\d\d.
You may use various solutions, here are two:
val text = """Thu Dec 31 22:00:00 UYST 2009"""
val match = """\b(?:0?[1-9]|1\d|2[0-3]):[0-5]\d\b""".toRegex().find(text)
println(match?.value)
val match2 = """\b(\d{1,2}:\d{2}):\d{2}\b""".toRegex().find(text)
println(match2?.groupValues?.getOrNull(1))
Both return 22:00. See regex #1 demo and regex #2 demo.
The regex complexity should be selected based on how messy the input string is.
Details
\b - a word boundary
(?:0?[1-9]|1\d|2[0-3]) - an optional zero and then a non-zero digit, or 1 and any digit, or 2 and a digit from 0 to 3
: - a : char
[0-5]\d - 0, 1, 2, 3, 4 or 5 and then any one digit
\b - a word boundary.
If there is a match with this regex, you get it as a whole match, so you can access it via match?.value.
If you do not have to worry about any pre-valiation when matching, you may simply match 3 colon-separated digit pairs and capture the first two, see the second regex:
\b - a word boundary
(\d{1,2}:\d{2}) - Group 1: one or two digits, : and two digits
:\d{2} - a : and two digits (not captured)
\b - a word boundary.
If there is a match, we need Group 1 value, hence match2?.groupValues?.getOrNull(1) is used.
I am not sure what language you are using but why use negation when you can directly match the first digits in the hh:mm format.
Assuming that the date string format always is in the format with a hh:mm in there.
This regex snippet should have the first group match the hh:mm.
https://regex101.com/r/aHdehZ/1
The regex to use is (\d\d:\d\d)
I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)
String = '11111111111110000000000000000000110000000000000011111111111111111111111111111111110011111111111110000011110000011111111111110000000000011111111111111111010001111111111111111111110011111111111111111111111111110111112111121111111111111111111000011000001011111111111101022111101111001111111111110000001000000111111111111111000000000000011111111111111100011111111001011111111100000000000000000000000000000000100111001000000000000000000011000000000000001111111000000000000000000000000000000000001111100000000000000000000011000000000000000000000010000000000333333333'
I want a pattern to take out 10 characters after the first 100 so i want to have 100 - 110 then I want to compare that one and see if that string with a length of 10 have 4 zeros in a row.
How can I do this with only Regex? I have been using substring before.
You could use this:
^.{100}(?=.{0,6}0000)(.{10})
Explanation:
^: matches the start of the string to avoid that the pattern is used anywhere in the input
.{100}: match 100 characters
(?= ): look ahead. This does not capture, but just verifies something that is still ahead.
.{0,6}: 0 to 6 characters
0000: literally 4 zeroes
(.{10}): 10 characters, this time they are captured and can be referenced back with \1 or $1 depending on the flavour of regex.
The above answer is perfect. But that matches all the characters including first 100.
In case of ignoring first 100, we can use
(?<=.{100})
To check the required pattern in last 10 characters after first 100 only, we can use
(?<=.{100})(?=.{0,6}0000)(.{10})
You can test it here
Update : I checked the link today. It's taking somewhere else.
i have this text
14 two 25 three 12 four 40 five 10
I want to obtain "14 two 14 25 three 14 25 12 four 14 25 12 40 five 14 25 12 40 10"
For example, when I replace (14 two ) for (14 two 14 ) this start after of 14 I can't start it after two.
Is there any other alternative to do?
For example using a group that is not included in match ( a group before match ) for replace it ?
please help me
This should do the trick for you:
Regex: ((?:\s?\d+\s?)+)((?:[a-zA-Z](?![^a-zA-Z]+\1))+)
Replacement: $1$2 $1
You will need to click on the "replace all" button for this to work (it cannot be done in one shot, it has to be repeated as long as it can find match. Online PHP example)
Explanation:
\s: Match a single space character
?: the previous expression must be matched 0 or 1 time.
\s?: Match a space character 0 or 1 time.
\d: Match a digit character (the equivalent of [0-9]).
+: The previous expression must be matched at least one time (u to infinite).
\d+: Match as much digit characters as you (but at least one time).
(): Capture group
(?:): Non-capturing group
((?:\s?\d+\s?)+): Match an optional space character followed by one or more digit characters followed by an optional space character. The expression is surrounded by a non-capturing group followed by a plus. That mean that the regex will try to match as much combination of space and digit character as it can (so you can end up with something like '14 25 12 40').
The capture group is meant to keep the value to reuse it in the replacement.You cannot simply add the plus at the end of the capture group without the non-capturing group within because it would only remember the last digits capture ('12' instead of the whole '14 25 12' use to build '14 25 12 40').
[a-zA-Z]: Match any English letters in any case (lower, upper).
\1: reference to what have been capture in the first group.
(?!): Negative lookahead.
[^]: Negative character class, so [^a-zA-Z] means match anything
((?:[a-zA-Z](?![^a-zA-Z]+\1))+): The negative lookahead is meant to make sure that we don't always end up matching the first "14 two" in the input text. Without it, we would end up in an infinite loop giving results as "14 two 14 14 14 14 14 14 25 three 12 four 40 five 10" (the "14" before "25" being repeated until you reach the timeout).
Basically, for every English letter we match, we lookahead to assert that the content of the first capture group (by example "14") is not present in our digit sequence.
For the replacement, $1$2 $1 means put the content of the capture group 1 and 2, add a space and put the content of the capture group 1 once more.