I keep getting into situations where I end up making two regular expressions to find subtle changes (such as one script for 0-9 and another for 10-99 because of the extra number)
I usually use [0-9] to find strings with just one digit and then [0-9][0-9] to find strings with multiple digits, is there a better wildcard for this?
ex. what expression would I use to simultaneously find the strings
6:45 AM and 10:52 PM
You can specify repetition with curly braces. [0-9]{2,5} matches two to five digits. So you could use [0-9]{1,2} to match one or two.
[0-9]{1,2}:[0-9]{2} (AM|PM)
I personally prefer to use \d for digits, thus
\d{1,2}:\d{2} (AM|PM)
[0-9] 1 or 2 times followed by : followed by 2 [0-9]:
[0-9]{1,2}:[0-9]{2}\s(AM|PM)
or to be valid time:
(?:[1-9]|1[0-2]):[0-9]{2}\s(?:AM|PM)
If you are looking for a time patten, you'd do something like:
\d{1,2}:\d{1,2} (AM|PM)
Or for more specific time regex
[0-1]{0,1}[0-9]{1,2}:[0-5][0-9] (AM|PM)
Much like the other answers, except the AM/PM is not captured, which should be more efficient
\d{1,2}:\d{1,2}\s(?:AM|PM)
if I have a file containing:
1 ABC
2 123XYZ
3 6:45 AM
4 123DHD
5 ABC
6 10:52 PM
7 CDE
and run the following
$>grep -P '6:45\sAM|10:52\sPM' temp
6:45 AM
10:52 PM
$>.
should do the trick (-P is a perl regx)
EDIT:
Perhaps I misunderstood, the other answers are very good if I were looking to just find a time, but you seem to be after specific times. the others would match ANY time in HH:MM format.
overall, I believe the items you are after would be the | pipe character which is used in this case to allow alternative phrases and the {n,m} match n-m times {1,2} would match 1-2 times, etc.
It can be able to check all type of time formats :
e.g. 12:05PM, 3:19AM, 04:25PM, 23:52PM
my $time = "12:52AM";
if ($time =~ /^[01]?[0-9]\:[0-5][0-9](AM|PM)/) {
print "Right Time Dude...";
}
else { print "Wrong time Dude"; }
This is the regex you want.
/^[01]?[0-9]\:[0-5][0-9](AM|PM)/
Having this string as input:
Sat, 6 May 2017 02:08:08 +0000
I did this regEx to get combinations of one or two digits:
[0-9]*:[0-9]*:[0-9]*
Related
Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.
You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.
Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1
It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.
Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .
I need a little help with Regex.
I want the regex to validate the following sentences:
fdsufgdsugfugh PCL 6
dfdagf PCL 11
fdsfds PCL6
fsfs PCL13
kl;klkPCL6
fdsgfdsPCL13
some chars, than PCL and than 6 or a greater number.
How this can be done?
I'd go with something like this:
^(.*)(PCL *)([6-9][0-9]*|[1-5][0-9]+)$
Meaning:
(.*) = some chars
(PCL *) = then PCL with optional whitespaces afterwards
([6-9][0-9]*|[1-5][0-9]+) then 6 or a greater number
This one should suit your needs:
^.*PCL\s*(?:[6-9]|\d{2,})$
Visualization by Debuggex
In bash:
EXPR=^[a-zA-Z]\+ *PCL *\([6-9]\|[0-9]\{2,\}\)
Translated:
Line begins with at least 1 occurence of a character (ignore caps)
Any amount of spaces, PCL, any amount of spaces
Either a number between 6 or 9, or a number with at least 2 digits
This expression used with something like grep "$EXPR" file.txt will output in stdout the lines that are valid.
This worked well for me. Reads logically too according to the way you described the matching
/[^PCL]+PCL\s?*[6-9]\d*/
I have a text with some lines (200+) in this format:
10684 - The jackpot ? discuss Lev 3 --- ? ---
10755 - Garbage Heap ? discuss Lev 5 --- ? ---
I hant to retrieve the first number (10684 or 10755) only if number after "Lev" is greater than 3.
I'm able to get the first number with this regex: ([0-9]+) - but without the 'level' restrictions.
How this could be made?
Thanks in advance.
(\d+) - .*?Lev (?:[4-9]|[1-9]\d+)
The first \d+ matches line number as you have done.
The next .*? is a lazy quantifier, which will not consume too many characters. And the following expression will guide it to the right place. (lazy quantifier is usually more efficient)
The second parenthesis, (?:[4-9]|[1-9]\d+), matches either single digital numbers greater than 3 or two digital numbers without leading zero.
Alright stackoverflow doesn't properly show my image. Take this link : http://regexr.com?36n5l
Example Output:
Regular expressions doesn't recognize numbers as numbers (only strings). You can do this though:
([0-9]+) - .*Lev (?:[4-9][^0-9]|[1-9][0-9]+)
Basically, we use the alternation operator (|) to accept only a single digit greater than 3 (enforced by checking that the following character is not a digit) or a multi-digit number not beginning with a zero.
In case that level number might be the end of the line, though, you might have to do this:
([0-9]+) - .*Lev (?:[4-9](?:[^0-9]|$)|[1-9][0-9]+)
(I'm assuming whatever regex engine you're using can't handle lookaround assertions. In the future, try to always include what language you're using when you're asking a regex question.)
Ah, I just read your edit that the number is always less than 10. Well, that's much easier then:
([0-9]+) - .*Lev [4-9]
A lookahead is really the best thing because it will leave just the number:
/\d+(?=.*Lev (0*[4-9]|[1-9]\d))/
A bit of Awk trickery:
awk -F '\? +discuss +Lev' '$2>3 { split($1,a,/ */); print a[1] }' file
In bash use this:
var=">3"
perl -lne '/(\d+) - .*Lev (\d+)/; print $1 if $2'"$var"
This is a good solution to be able to pass the condition by parameter.
Given a value I want to validate it to check if it is a valid year. My criteria is simple where the value should be an integer with 4 characters. I know this is not the best solution as it will not allow years before 1000 and will allow years such as 5000. This criteria is adequate for my current scenario.
What I came up with is
\d{4}$
While this works it also allows negative values.
How do I ensure that only positive integers are allowed?
Years from 1000 to 2999
^[12][0-9]{3}$
For 1900-2099
^(19|20)\d{2}$
You need to add a start anchor ^ as:
^\d{4}$
Your regex \d{4}$ will match strings that end with 4 digits. So input like -1234 will be accepted.
By adding the start anchor you match only those strings that begin and end with 4 digits, which effectively means they must contain only 4 digits.
The "accepted" answer to this question is both incorrect and myopic.
It is incorrect in that it will match strings like 0001, which is not a valid year.
It is myopic in that it will not match any values above 9999. Have we already forgotten the lessons of Y2K? Instead, use the regular expression:
^[1-9]\d{3,}$
If you need to match years in the past, in addition to years in the future, you could use this regular expression to match any positive integer:
^[1-9]\d*$
Even if you don't expect dates from the past, you may want to use this regular expression anyway, just in case someone invents a time machine and wants to take your software back with them.
Note: This regular expression will match all years, including those before the year 1, since they are typically represented with a BC designation instead of a negative integer. Of course, this convention could change over the next few millennia, so your best option is to match any integer—positive or negative—with the following regular expression:
^-?[1-9]\d*$
This works for 1900 to 2099:
/(?:(?:19|20)[0-9]{2})/
Building on #r92 answer, for years 1970-2019:
(19[789]\d|20[01]\d)
To test a year in a string which contains other words along with the year you can use the following regex: \b\d{4}\b
In theory the 4 digit option is right. But in practice it might be better to have 1900-2099 range.
Additionally it need to be non-capturing group. Many comments and answers propose capturing grouping which is not proper IMHO. Because for matching it might work, but for extracting matches using regex it will extract 4 digit numbers and two digit (19 and 20) numbers also because of paranthesis.
This will work for exact matching using non-capturing groups:
(?:19|20)\d{2}
Use;
^(19|[2-9][0-9])\d{2}$
for years 1900 - 9999.
No need to worry for 9999 and onwards - A.I. will be doing all programming by then !!! Hehehehe
You can test your regex at https://regex101.com/
Also more info about non-capturing groups ( mentioned in one the comments above ) here http://www.manifold.net/doc/radian/why_do_non-capture_groups_exist_.htm
you can go with sth like [^-]\d{4}$: you prevent the minus sign - to be before your 4 digits.
you can also use ^\d{4}$ with ^ to catch the beginning of the string. It depends on your scenario actually...
/^\d{4}$/
This will check if a string consists of only 4 numbers. In this scenario, to input a year 989, you can give 0989 instead.
You could convert your integer into a string. As the minus sign will not match the digits, you will have no negative years.
I use this regex in Java ^(0[1-9]|1[012])[/](0[1-9]|[12][0-9]|3[01])[/](19|[2-9][0-9])[0-9]{2}$
Works from 1900 to 9999
If you need to match YYYY or YYYYMMDD you can use:
^((?:(?:(?:(?:(?:[1-9]\d)(?:0[48]|[2468][048]|[13579][26])|(?:(?:[2468][048]|[13579][26])00))(?:0?2(?:29)))|(?:(?:[1-9]\d{3})(?:(?:(?:0?[13578]|1[02])(?:31))|(?:(?:0?[13-9]|1[0-2])(?:29|30))|(?:(?:0?[1-9])|(?:1[0-2]))(?:0?[1-9]|1\d|2[0-8])))))|(?:19|20)\d{2})$
You can also use this one.
([0-2][0-9]|3[0-1])\/([0-1][0-2])\/(19[789]\d|20[01]\d)
In my case I wanted to match a string which ends with a year (4 digits) like this for example:
Oct 2020
Nov 2020
Dec 2020
Jan 2021
It'll return true with this one:
var sheetName = 'Jan 2021';
var yearRegex = new RegExp("\b\d{4}$");
var isMonthSheet = yearRegex.test(sheetName);
Logger.log('isMonthSheet = ' + isMonthSheet);
The code above is used in Apps Script.
Here's the link to test the Regex above: https://regex101.com/r/SzYQLN/1
You can try the following to capture valid year from a string:
.*(19\d{2}|20\d{2}).*
Works from 1950 to 2099 and value is an integer with 4 characters
^(?=.*?(19[56789]|20\d{2}).*)\d{4}$