The general problem
I am trying to understand how to prevent the existence of some pattern before or after a sought-out pattern when writing regex's!
A more specific example
I'm looking for a regex that will match dates in the format YYMMDD ((([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1]))) inside a long string while ignoring longer numeric sequences
it should be able to match:
text151124moretext
123text151124moretext
text151124
text151124moretext1944
151124
but should ignore:
text15112412moretext
(reason: it has 8 numbers instead of 6)
151324
(reason: it is not a valid date YYMMDD - there is no 13th month)
how can I make sure that if a number has more than these 6 digits, it won't picked up as a date inside one single regex (meaning, that I would rather avoid preprocessing the string)
I've thought of \D((19|20)([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1]))\D but doesn't this mean that there has to be some character before and after?
I'm using bash 3.2 (ERE)
thanks!
Try:
#!/usr/bin/env bash
extract_date() {
local string="$1"
local _date=`echo "$string" | sed -E 's/.*[^0-9]([0-9]{6})[^0-9].*/\1/'`
#date -d $_date &> /dev/null # for Linux
date -jf '%y%m%d' $_date &> /dev/null # for MacOS
if [ $? -eq 0 ]; then
echo $_date
else
return 1
fi
}
extract_date text15111224moretext # ignore n_digits > 6
extract_date text151125moretext # take
extract_date text151132 # # ignore day 32
extract_date text151324moretext1944 # ignore month 13
extract_date text150931moretext1944 # ignore 31 Sept
extract_date 151126 # take
Output:
151125
151126
If your tokens are line-separated (i.e. there is only one token per line):
^[\D]*[\d]{6}([\D]*|[\D]+[\d]{1,6})$
Basically, this regex looks for:
Any number of non-digits at the beginning of the string;
Exactly 6 digits
Any number of non-digits until the end OR at least one non-digit and at least one digit (up to 6) to the end of the string
This regex passes all of your given sample inputs.
You could use non-capturing groups to define non-digits either side of your date Regex. I had success with this expression and your same test data.
(?:\D)([0-9]{2})(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])(?:\D)
Related
I have following SQL result entries.
Result
---------
TW - 5657980 Due Date updated : to <strong>2017-08-13 10:21:00</strong> by <strong>System</strong>
TW - 5657980 Priority updated from <strong> Medium</strong> to <strong>Low</strong> by <strong>System</strong>
TW - 5657980 Material added: <strong>1000 : Cash in Bank - Operating (Old)/ QTY:2</strong> by <strong>System</strong>#9243
TW - 5657980 Labor added <strong>Kelsey Franks / 14:00 hours </strong> by <strong>System</strong>#65197
Now I am trying to extract a short description from this result and trying to migrate it to the another column in the same table.
Expected result
--------------
Due Date Updated
Priority Updated
Material Added
Labor Added
Ignore first 13 characters. For most of the cases it ends with 'updated'. Few ends with 'added'. It should be case insensitive.
Is there any way to get the expected result.
Solution with substring() using a regular expression. It skips the first 13 characters, then takes the string up to the first ' updated' or ' added', case-insensitive, with leading blank. Else NULL:
SELECT substring(result, '(?i)^.{13}(.*? (?:updated|added))')
FROM tbl;
The regexp explained:
(?i) .. meta-syntax to switch to case-insensitive matching
^ .. start of string
.{13} .. skip the first 13 characters
() .. capturing parenthesis (captures payload)
.*? .. any number of characters (non-greedy)
(?:) .. non-capturing parenthesis
(?:updated|added) .. 2 branches (string ends in 'updated' or 'added')
If we cannot rely on 13 leading characters like you later commented, we need some other reliable definition instead. Your difficulty seems with hazy requirements more than with the actual implementation.
Say, we are dealing with 1 or more non-digits, followed by 1 or more digits, a space and then the payload as defined above:
SELECT substring(result, '(?i)^\D+\d+ (.*? (?:updated|added))') ...
\d .. class shorthand for digits
\D .. non-digits, the opposite of \d
I'm trying to write a regex that checks if string contains 6 or more signs including 1 or more special sign [^0-9a-zA-Z\s] and 1 or more [0-9a-zA-Z].
Spent like 2h and not getting any closer :/
maybe this is of some help:
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?!.*\s).{6,13}$
Password expresion that requires one lower case letter, one upper case letter, one digit, 6-13 length, and no spaces.
Matches:
1agdH*$# | 1agdC*$# | 1agdB*$#
Non-Matches:
wyrn%#*&$# f | mbndkfh782 | BNfhjdhfjd&*)%#$)
This is based on the Regex Lib entry here
Taking the style of Hasson's answer . . .
grep -P '^(?=.*[^a-zA-Z0-9\s])(?=.*[a-zA-Z0-9])(?!.*\s).{6}'
6 or more chars (regexp not ended with $)
1 or more special char (?=.*[^0-9a-zA-Z\s])
1 or more (?=.*[0-9a-zA-Z])
no whitespace (?!.*\s)
Some test data, NO match:
password
pa5sword
pa5sWord
pa5sWord
password
test
1agdA
1agd
wyrn%#*&$# f
mbndkfh782
t1*$
Some test data, YES match:
pa5*Word
pa5*Word
pa5*Word1
pa5*Wor
1agdA*
1agdA*$
1agdA*$#
1agdA*$#1
1agdA*$#12
1agdA*$#123
1agdA*$#a
1agdA*$#ab
1agdA*$#abc
1agdA*$#abcd
BNfhjdhfjd&*)%#$)
In powershell, I'm trying to create a E.164 type regex for a number of countries. I explicitly need to have the (+) plus in my number and in most cases multi number country codes.
For some reason: '+421233339135' does not match '/^(\+[4][2][1])?([1-9]\d\d{7})$'
+421 is the country code, the first digit after the CC needs to be between 1-9, the rest can be any number then 9 digits afterwards is the DID number.
hope someone can help:-)
For some reason: '+421233339135' does not match '/^(\+[4][2][1])?([1-9]\d\d{7})$'
PowerShell is not Perl, a leading / before the pattern is not expected - remove it.
The pattern itself could be described simply as ^(\+421)?([1-9]\d{8})$
PS C:\> $phoneNumber = '+421233339135'
PS C:\> $phoneNumber -match '^(\+421)?([1-9]\d{8})$'
True
Why does the following literal string
1998-${year}
..match against the grep command:
grep "[0-9 ]*-[ 0-9]*" filename.txt ?
What I need is a regex to match any of the following strings containing either a year range or one value of year only.
sdkfmslf 1998-2008
asdassdadsa 1998 - 2008
mkklml mklsmdf 2006
..but NOT this one:
asdsad a s 1998-${year}
* means "match zero or more". You want + which means "one or more."
grep "[0-9 ]+-[0-9]+" filename.txt
Try [0-9]{4}(\s*-\s*[0-9]{4})?. This will match a 4 digit number, or if it is followed by (optional white space)-(optional whitespace) then that must be followed by another 4 digit number.
Your string "asdsad a s 1998-${year}" would still match, since it has a single 4 digit value in it.
I don't like answering my own question, but none of the above worked. Here is what I found by experimenting. I'm sure there could be more elegant solutions, but here is a working version:
grep "[0-9][0-9][0-9][0-9][ ]*[\-]*[ ]*[0-9]*" filename.txt
I keep getting into situations where I end up making two regular expressions to find subtle changes (such as one script for 0-9 and another for 10-99 because of the extra number)
I usually use [0-9] to find strings with just one digit and then [0-9][0-9] to find strings with multiple digits, is there a better wildcard for this?
ex. what expression would I use to simultaneously find the strings
6:45 AM and 10:52 PM
You can specify repetition with curly braces. [0-9]{2,5} matches two to five digits. So you could use [0-9]{1,2} to match one or two.
[0-9]{1,2}:[0-9]{2} (AM|PM)
I personally prefer to use \d for digits, thus
\d{1,2}:\d{2} (AM|PM)
[0-9] 1 or 2 times followed by : followed by 2 [0-9]:
[0-9]{1,2}:[0-9]{2}\s(AM|PM)
or to be valid time:
(?:[1-9]|1[0-2]):[0-9]{2}\s(?:AM|PM)
If you are looking for a time patten, you'd do something like:
\d{1,2}:\d{1,2} (AM|PM)
Or for more specific time regex
[0-1]{0,1}[0-9]{1,2}:[0-5][0-9] (AM|PM)
Much like the other answers, except the AM/PM is not captured, which should be more efficient
\d{1,2}:\d{1,2}\s(?:AM|PM)
if I have a file containing:
1 ABC
2 123XYZ
3 6:45 AM
4 123DHD
5 ABC
6 10:52 PM
7 CDE
and run the following
$>grep -P '6:45\sAM|10:52\sPM' temp
6:45 AM
10:52 PM
$>.
should do the trick (-P is a perl regx)
EDIT:
Perhaps I misunderstood, the other answers are very good if I were looking to just find a time, but you seem to be after specific times. the others would match ANY time in HH:MM format.
overall, I believe the items you are after would be the | pipe character which is used in this case to allow alternative phrases and the {n,m} match n-m times {1,2} would match 1-2 times, etc.
It can be able to check all type of time formats :
e.g. 12:05PM, 3:19AM, 04:25PM, 23:52PM
my $time = "12:52AM";
if ($time =~ /^[01]?[0-9]\:[0-5][0-9](AM|PM)/) {
print "Right Time Dude...";
}
else { print "Wrong time Dude"; }
This is the regex you want.
/^[01]?[0-9]\:[0-5][0-9](AM|PM)/
Having this string as input:
Sat, 6 May 2017 02:08:08 +0000
I did this regEx to get combinations of one or two digits:
[0-9]*:[0-9]*:[0-9]*