Logstash - Change Alphabetic Month to Numeric - regex

I have been trying to develop the Regex to convert alphabetic month (ex. Sep) to numeric equivalent (09)

The translate{} filter would work well for this.

using the datefilter with the MMMpattern will work fine for that:
filter {
date {
match => [ "yourfield", "MMM" ]
}
}
Available Regexes can be found on the DateTimeFormat documentation.

Related

MongoDb Select Query Issue Whit Regular Expression (Starts whit and Ends Whit)

I need a regular expression to validate a String like this:
1678;1678;1678;1678 and 1;0;1;1;0;1
I tried to use this pattern:
db.getCollection('CollectionName').find(
{
"magnitude": /^[1678][1678]$/,
"flag": /^[1][1]$/
}
)
but it doesnt works, i try this two patterns that works separate but not both at the same time
db.getCollection('CollectionName').find(
{
"magnitude": /[1678]$/,
"flag": /[1]$/
}
)
db.getCollection('CollectionName').find(
{
"magnitude": /^[1678]/,
"flag": /^[1]/
}
)
I didnt find any character like * in SQL to use in this
I am using robomongo 1.0.0 for querys
I will apreciate any help
Thanks in advance
If you want to match more than one ; separated strings then use capturing groups.
db.getCollection('CollectionName').find(
{
"magnitude": /^1678(;1678)*$/,
"flag": /^[01](;[01])*$/
}
)
(;1678)* matches the string ;1678, zero or more times.
[01] matches either 0 OR 1

elasticsearch regexp don't work

I need to make a regexp on elasticsearch to filtre some data.
The field I filter on is the name of person. The data are not always well formatted (sometimes, there is no first name, sometimes, the family name is followed by a period or a comma or 'comma+first name' or 'point+first name'....).
For example, using "bouchard" I get the following matches:
"bouchard", "bouchard, m.", "bouchard, j.", "bouchard j.p.", "bouchard. j.p."
I need also to exclude name who begin with same prefixe like "bouchardat".
I tried many regexps and finally found that an exclusion may yield better results:
"query" : { "regexp" : {
"RECORDEDBY" : "bouchard([^a-z].*)"
}}
This doesn't work because it returns "bouchard, m.", "bouchard, j.", "bouchard j.p." but not "bouchard. j.p." and not "bouchard".
I try some regexps with + and .* but they don't work.
( "bouchard([^a-z].*.*)" "bouchard([^a-z]*+.*)")
To make it clear, I want to allow:
bouchard
bouchard, m.
bouchard, j.
bouchard j.p.
bouchard. j.p.
I want to exclude
bouchardat
Any advice is welcome.
In this case, you could use a conditional operator to exclude every [a-z] suffix if no special character like '', '.', or ',' follows the word you are looking for:
((bouchard)+?([ .,]+)[ ,.a-zA-Z]*)|(bouchard[^a-zA-Z]?)
This regexp returns for the condition (there has to be [ .,]+):
bouchard
bouchard, m.
bouchard, j.
bouchard j.p.
bouchard. j.p.
and ignores the stuff after the pipe | where no [ .,]+ applies:
bouchardat
Regex101

Splitting a string based on positions with regex

I need to convert this (date) String "12112014" to "12.11.2014"
What i would like to to is:
Split first 2 Strings "12", add ".",
then split the string from 3-4 to get "11", add "."
at the end split the last 4 strings (or 5-8) to get "2012"
I already found out how to get the first 2 characters ( "^\d{2}" ), but I failed to get characters based on a position.
Whatever be the programming language, You should try to extract the digits from string and then join them with a ".".
In perl, it can be done as :
$_ = '12112014';
s/(\d{2})(\d{2})(\d{4})/$1.$2.$3/;
print "$_";
Without you specifying the language you're after, I've picked javascript:
var s = '12012011';
var s2 = s.replace(/(\d{2})(\d{2})(\d{4})/,'$1.$2.$3'));
console.log(s2); // prints "12.01.2011"
The gist of it is that you use () to specify groups inside your regular expression and then can use the groups in your replace expression.
Same in Java:
String s = "12012011";
String s2 = s.replaceAll("(\\d{2})(\\d{2})(\\d{4})", "$1.$2.$3");
System.out.println(s2);
I dont think that you could do that only with split.
You could expand your expression to:
"(^(\d{2})(\d{2})(\d{4}))"
Then access the groups with the Regex language of your choice and build the string you want.
Note that - besides all regex learning - alternatively you could always parse the original string into strongly typed Date or DateTime variables and output the value using the appropriate locales.

Regex validation grails date-like format

Im currently working in a grails project and I ended up to a problem regarding matches constraints in grails. My field should only accept a String with a date-like format exactly like this:
10-25-2012 5:00PM
Is this possible in matches constraint using regex? I'm always having a hard time in data filtering using regex cause it's a little bit confusing.
If it's a data, why not to validate it using standard date formatter? Like:
static constraints = {
mydate validator: {
try {
Date.parse('MM-dd-yyyy hh:mma', it)
return true
} catch (ParseException e) {
return false
}
}
}
Btw, at this case Date can parse not so valid dates (and transform it to correnct value, like 15am to 3pm). If you need exactly valid format, your can compare it with original value:
static constraints = {
mydate validator: {
try {
Date date = Date.parse('MM-dd-yyyy hh:mma', it)
return Date.format('MM-dd-yyyy hh:mma', date) == it
} catch (ParseException e) {
return false
}
}
}
Or you can use SimpleDateFormat instead:
final static DateFormat DATEFORMAT = new SimpleDateFormat('MM-dd-yyyy hh:mma')
static constraints = {
mydate validator: {
try {
Date date = DATEFORMAT.parse(it)
return DATEFORMAT.format(date) == it
} catch (ParseException e) {
return false
}
}
}
Is there no Date object you can use? I don't know, but I can help you with the regex:
Constructing a regex is not difficult and especially in your case straight forward:
^\d{2}-\d{2}-\d{4} \d{2}:\d{2}[AP]M$
^ Matches the start of the string
$ Matches the end of the string
\d is a digit
{2} is a quantifier that makes the previous character required 2 times
[AP] is a character class that matches A or P
This regex just checks the format, not if the digits represent a valid Date or Time! (e.g. 99-99-0000 35:61PM is valid)
Read my Blog post What absolutely every Programmer should know about regular expressions for some more brief information.
Try this one: ^(0[1-9]|1[012])-(0[1-9]|[12][0-9]|3[01])-((?:19|20)\d\d) ([0-1]?[0-9]|2[0-4]):([0-5][0-9])(?::([0-5][0-9]))?(AM|PM)$
^ Start of string
(0[1-9]|1[012]) Month
-
(0[1-9]|[12][0-9]|3[01]) Day
-
((?:19|20)\d\d) Year
([0-1]?[0-9]|2[0-4]) HH
:
([0-5][0-9]) MM
(?::([0-5][0-9]))? optional :SS
(AM|PM)
$ End of string
It captures month, day, year, hours, minutes, seconds and AM/PM.
Edit: As Vikas points out it does not check how many days a month can have.
Try this
\b(?:(?:(?:0?[469]|11)\-(?:0?(?:[1-9]|[12][0-9])|30)\-(?:1[7-9][0-9]{2}|200[0-9]|201[0-2]))|(?:(?:0?[13578]|1[02])\-(?:0?(?:[1-9]|[12][0-9])|3[01])\-(?:1[7-9][0-9]{2}|200[0-9]|201[0-2]))|(?:(?:0?2)\-(?:0?(?:[1-9]|1[0-9])|2[0-8])\-(?:1[7-9][0-9]{2}|200[0-9]|201[0-2]))) +(?:(?:(?:0?([0-9])|1[0-2])):(?:0?([0-9])|[1-5][0-9])(?:[AP]M))\b
or
\b(?:(?:(?:0?[469]|11)\-(?:0?(?:[1-9]|[12][0-9])|30)\-(?:1[7-9][0-9]{2}|200[0-9]|201[0-2]))|(?:(?:0?[13578]|1[02])\-(?:0?(?:[1-9]|[12][0-9])|3[01])\-(?:1[7-9][0-9]{2}|200[0-9]|201[0-2]))|(?:(?:0?2)\-(?:0?(?:[1-9]|1[0-9])|2[0-9])\-(?:1[7-9][0-9]{2}|200[0-9]|201[0-2]))) +(?:(?:(?:0?([0-9])|1[0-2])):(?:0?([0-9])|[1-5][0-9])(?:[AP]M))\b
Note: Two patterns both have a common limitation that it would match invalid dates if the month is february. If-else conditionals are accepted in some RegEx flavors but there is no such functions exist for arithmatic operations. May be in far or near future this could be implemented in some RegEx flavors, but I think this not the case for what RegEx is basically is.
First pattern would not find any date greater than 28 from month of February. And it would never match any invalid date.
Second is exactly the same as above, but for it the number is 29.
And I am well-known about the fact that this issue could not entirely solved by using RegEx.

Dynamic regex for date-time formats

Is there an existing solution to create a regular expressions dynamically out of a given date-time format pattern? The supported date-time format pattern does not matter (Joda DateTimeFormat, java.text.SimpleDateTimeFormat or others).
As a specific example, for a given date-time format like dd/MM/yyyy hh:mm, it should generate the corresponding regular expression to match the date-times within the specified formats.
I guess you have a limited alphabet that your time formats can be constructed of. That means, "HH" would always be "hours" on the 24-hour clock, "dd" always the day with leading zero, and so on.
Because of the sequential nature of a time format, you could try to tokenize a format string of "dd/mm/yyyy HH:nn" into an array ["dd", "/", "mm", "/", "yyyy", " ", "HH", ":", "nn"]. Then go ahead and form a pattern string from that array by replacing "HH" with "([01][0-9]|2[0-3])" and so on. Preconstruct these pattern atoms into a lookup table/array. All parts of your array that are not in the lookup table are literals. Escape them to according regex rules and append them to you pattern string.
EDIT: As a side effect for a regex based solution, when you put all regex "atoms" of your lookup table into parens and keep track of their order in a given format string, you would be able to use sub-matches to extract the required components from a match and feed them into a CreateDate function, thus skipping the ParseDate part altogether.
If you are looking for basic date checking, this code matches this data.
\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\b
10/07/2008
10.07.2008
1-01/2008
10/07/08
10.07.2008
1-01/08
Code via regexbuddy
SimpleDateFormat already does this with the parse() method.
If you need to parse multiple dates from a single string, start with a regex (even if it matches too leniently), and use parse() on all the potential matches found by the regex.
The below given js / jQuery code is for dynamically generated RegEx for the Date format only, not for DateTime (Development version not fully tested yet.)
Date Format should be in "D M Y".
E.g.
DD-MM-YY
DD-MM-YYYY
YYYY-MM-DD
YYYY-DD-MM
MM-DD-YYYY
MM-DD-YY
DD/MM/YY
DD/MM/YYYY
YYYY/MM/DD
YYYY/DD/MM
MM/DD/YYYY
MM/DD/YY
Or other formats but created with "D M Y" characters:
var dateFormat = "DD-MM-YYYY";
var order = [];
var position = {"D":dateFormat.search('D'),"M":dateFormat.search('M'),"Y":dateFormat.search('Y')};
var count = {"D":dateFormat.split("D").length - 1,"M":dateFormat.split("M").length - 1,"Y":dateFormat.split("Y").length - 1};
var seprator ='';
for(var i=0; i<dateFormat.length; i++){
if(["Y","M","D"].indexOf(dateFormat.charAt(i))<0){
seprator = dateFormat.charAt(i);
}else{
if(order.indexOf(dateFormat.charAt(i)) <0 ){
order.push(dateFormat.charAt(i));
}
}
}
var regEx = "^";
$(order).each(function(ok,ov){
regEx += '(\d{'+count[ov]+'})'+seprator;
});
regEx = regEx.substr(0,(regEx.length)-1);
regEx +="$";
var re = new RegExp(regEx);
console.log(re);
NOTE: There is no validation check for months / days
e.g. month should be in 01-12 or date should be in 01-31