How to capture a filename with or without an extension

How to capture a filename with or without an extension - regex

I'm trying to capture and replace a filename like
000035 ZSMS_1.mp3
but also a file like
000035 1OMNA
(I'm basically trying to reorder them so they look like e.g., ZSMS_1(000035).mp3).
I've tried
^(\d+) (.*)(\..*$)?
^(\d+) (.*?)(\..*$)?
What I expect to happen:
000035 ZSMS_1.mp3:
[
{
"groups": [
"000035",
"ZSMS_1",
".mp3"
],
"match": "000035 ZSMS_1.mp3"
}
]
000035 1OMNA:
[
{
"groups": [
"000035",
"1OMNA",
""
],
"match": "000035 1OMNA"
}
]
What happens:
1.
^(\d+) (.*)(\..*$)?
000035 ZSMS_1.mp3:
[
{
"groups": [
"000035",
"ZSMS_1.mp3",
""
],
"match": "000035 ZSMS_1.mp3"
}
]
000035 1OMNA:
[
{
"groups": [
"000035",
"1OMNA",
""
],
"match": "000035 1OMNA"
}
]
^(\d+) (.*?)(\..*$)?
000035 ZSMS_1.mp3:
[
{
"groups": [
"000035",
"",
""
],
"match": "000035 "
}
]
000035 1OMNA:
[
{
"groups": [
"000035",
"",
""
],
"match": "000035 "
}
]

You may use
^(\d+)\h+(.*?)(\.[^.]*)?$
See the regex demo
Details
^ - start of string
(\d+) - Group 1: one or more digits
\h+ - 1+ horizontal whitespaces (for better regex engine cross-compatibility, you may use [^\S\r\n]+ or just [ \t]+ to match a tab or space)
(.*?) - Group 2: zero or more chars other than linebreak chars, as few as possible
(\.[^.]*)? - an optional capturing group #3: a dot and then 0 or more chars other than . as many as possible
$ - end of string.

You could try following regex:
^(\d+)\s*(?:(\w+)?)(?:(\.\w+)?)$
Details:
^ - start of line
(\d+) - Group 1: matches a digit
\s* - separates group 1 and 2
(?:(\w+)?) - Group 2 (optional): matches any word character
(?:(\.\w+)?) - Group 3 (optional): matches the character . and any word character
$ - end of line
Demo

Related

Split log message on space for grok pattern

I am two days new to grok and ELK.
I am struggling with breaking up the log messages based on space and make them appear as different fields in the logstash.
My input pattern is:
2022-02-11 11:57:49 - app - INFO - function_name=add elapsed_time=0.0296 input_params=6_3
I would like to see different fields in the logstash/kibana for function_name, elapsed_time and input_params.
At the moment, I have a following .conf
input{
file{
path => "/path/to/log/file"
start_position => "beginning"
}
}
filter{
grok{
match => {"message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:log-level} %{(?<function_name>[^.]*)\.(?<elapsed_time>[^.]*)\.(?<input>[^.]*)}"}
}
date {
match => ["timestamp", "ISO8601"]
}
function_name {
match => ["function_name", "DATA"]
}
elapsed_time {
match => ["elapsed_time", "BASE16FLOAT"]
}
input {
match => ["input", "DATA"]
}
}
output{
elasticsearch{
hosts => ["localhost:9200"]
index => "math_apis"
}
stdout{codec => rubydebug}
}
But this only produces a following message in logstash
{
"host" => "hostname",
"#timestamp" => 2022-02-11T06:27:49.404Z,
"message" => "2022-02-11 11:57:49 - app - INFO - function_name=add elapsed_time=0.0296 input_params=6_3",
"path" => "path/to/log/file",
"#version" => "1",
"tags" => [
[0] "_grokparsefailure"
]
}

You can use the following pattern:
%{TIMESTAMP_ISO8601:timestamp} - \S+ - %{LOGLEVEL:log_level} - function_name=%{NOTSPACE:function_name} elapsed_time=%{NOTSPACE:elapsed_time} input_params=%{NOTSPACE:input}
Details:
%{TIMESTAMP_ISO8601:timestamp} - timestamp field
- - a literal string
\S+ - any one or more non-whitespace chars
- - a literal string
%{LOGLEVEL:log_level} - LOGLEVEL pattern
- function_name= - a literal string
%{NOTSPACE:function_name} - function_name field of one or more non-whitespace chars
elapsed_time= - space and elapsed_time= string
%{NOTSPACE:elapsed_time} - elapsed_time field of one or more non-whitespace chars
input_params= - literal string
%{NOTSPACE:input} - input field of one or more non-whitespace chars.
See more about Grok patterns here.
Test output:
{
"timestamp": [
[
"2022-02-11 11:57:49"
]
],
"YEAR": [
[
"2022"
]
],
"MONTHNUM": [
[
"02"
]
],
"MONTHDAY": [
[
"11"
]
],
"HOUR": [
[
"11",
null
]
],
"MINUTE": [
[
"57",
null
]
],
"SECOND": [
[
"49"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"log_level": [
[
"INFO"
]
],
"function_name": [
[
"add"
]
],
"elapsed_time": [
[
"0.0296"
]
],
"input": [
[
"6_3"
]
]
}

Elasticsearch pattern regex start with

I would like to ask if exists some documentation which describe how to work with Elasticseach pattern regex.
I need to write Pattern Capture Token Filter which filter only tokes start with specific word. For example input tokens stream should be like ("abcefgh", "abc123" , "aabbcc", "abc", "abdef") and my tokenizer will return only tokes abcefgh , abc123, abc because those tokens start with "abc".
Can someone help me how to achieve this use-case?
Thanks.

I suggest something like this:
"analysis": {
"analyzer": {
"my_trim_keyword_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim",
"generate_tokens",
"eliminate_tokens",
"remove_empty"
]
}
},
"filter": {
"eliminate_tokens": {
"pattern": "^(?!abc)\\w+$",
"type": "pattern_replace",
"replacement": ""
},
"generate_tokens": {
"type": "pattern_capture",
"preserve_original": 1,
"patterns": [
"(([a-z]+)(\\d*))"
]
},
"remove_empty": {
"type": "stop",
"stopwords": [""]
}
}
}
If your tokens are the result of a pattern_capture filter, you'd need to add after this filter the one called eliminate_tokens in my example which basically matches token that don't start with abc. Those that don't match are replaced by empty string ("replacement": "").
After this, to remove the empty tokens I added the remove_empty filter which is basically a stop filter where the stopword is "" (empty string).

How to use regex inside in query using morphia?

Mongodb allows regex expression of pattern /pattern/ without using $regex expression.
http://docs.mongodb.org/manual/reference/operator/query/in/
How can i do it using morphia ?
If i give Field criteria with field operator as in and value of type "java.util.regex.Pattern" then the equivalent query generated in
$in:[$regex: 'given pattern'] which wont return expected results at all.
Expectation: $in :[ /pattern1 here/,/pattern2 here/]
Actual using 'Pattern' object : $in : [$regex:/pattern1 here/,$regex:/pattern 2 here/]

I'm not entirely sure what to make of your code examples, but here's a working Morphia code snippet:
Pattern regexp = Pattern.compile("^" + email + "$", Pattern.CASE_INSENSITIVE);
mongoDatastore.find(EmployeeEntity.class).filter("email", regexp).get();
Note that this is really slow. It can't use an index and will always require a full collection scan, so avoid it at all cost!
Update: I've added a specific code example. The $in is not required to search inside an array. Simply use /^I/ as you would in string:
> db.profile.find()
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac5ca63f282f56de64bf"),
"tags": [
"Spain",
"Mexico"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ad56a63f282f56de64c2"),
"tags": [
"ireland"
]
}
> db.profile.find({ tags: /^I/ })
{
"_id": ObjectId("54f3ac3fa63f282f56de64bd"),
"tags": [
"India",
"Australia",
"Indonesia"
]
}
{
"_id": ObjectId("54f3ac4da63f282f56de64be"),
"tags": [
"Island",
"Antigua"
]
}
{
"_id": ObjectId("54f3ac6da63f282f56de64c0"),
"tags": [
"Israel"
]
}
{
"_id": ObjectId("54f3ad17a63f282f56de64c1"),
"tags": [
"Germany",
"Indonesia"
]
}
Note: The position in the array makes no difference, but the search is case sensitive. Use /^I/i if this is not desired or Pattern.CASE_INSENSITIVE in Java.

Single RegEx Filter
use .filter(), .criteria(), or .field()
query.filter("email", Pattern.compile("reg.*exp"));
// or
query.criteria("email").contains("reg.*exp");
// or
query.field("email").contains("reg.*exp");
Morphia converts this into:
find({"email": { $regex: "reg.*exp" } })
Multiple RegEx Filters
query.or(
query.criteria("email").contains("reg.*exp"),
query.criteria("email").contains("reg.*exp.*2"),
query.criteria("email").contains("reg.*exp.*3")
);
Morphia converts this into:
find({"$or" : [
{"email": {"$regex": "reg.*exp"}},
{"email": {"$regex": "reg.*exp.*2"}},
{"email": {"$regex": "reg.*exp.*3"}}
]
})
Unfortunately,
You cannot use $regex operator expressions inside an $in.
MongoDB Manual 3.4
Otherwise, we could do:
Pattern[] patterns = new Pattern[] {
Pattern.compile("reg.*exp"),
Pattern.compile("reg.*exp.*2"),
Pattern.compile("reg.*exp.*3"),
};
query.field().in(patterns);
hopefully, one day morphia will support that :)

How to match array of sub string with array of string using mongo?

I have follwoing collection structure -
{
"_id": ObjectId("54c784d71e14acf9ae833f9f"),
"vms": [
{
"name": "ABC",
"ids": [
"abc.60a980004270457730244662385a4f69",
"abc.60a980004270457730244662385a4f6d"
]
},
{
"name": "PQR",
"ids": [
"abc.6d867d9c7acd60001aed76eb2c70bd53",
"abc.60a980004270457730244662385a4f6d"
]
},
{
"name": "XYZ",
"ids": [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
]
}
I have an array which contains substrings of ids. here is an array for your reference -
myArray = [ "4270457730244662385a4f69","4270457730244662385a4f6d" , "4270457730244662385a4f6b"]
I want to find each element of myArray is not present in ids as a substring using mongo.
Currently I am able to find single element using regex in mongo.
In above example, I want output as:
[
{
"name": "XYZ",
"ids": [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
]
How do I find substring in array using mongo??

It is possible to do it using regex. You can match the string for multiple substrings using or operator. It is | in regex. Search for 'Boolean "or"' on wikipedia
MongoDB query using aggregation:
db.collection_name.aggregate([
{$unwind: "$vms"},
{$match: {
"vms.ids": {$not: /.*(4270457730244662385a4f69|4270457730244662385a4f6d|4270457730244662385a4f6b).*/}}
}
])
Output will be
{
"_id" : ObjectId("54c784d71e14acf9ae833f9f"),
"vms" : {
"name" : "XYZ",
"ids" : [
"abc.600605b00237d91016cdc38f376bd31d",
"abc.600605b00237d91016cdc38f376cd32f"
]
}
}

To split a json file.. Extracting data between curly braces

I have a json file. I want to split that file into different parts..
Following is my file's content.
I want to split the content based on the curly brackets {},
"1010320": {
"abc": [
"1012220",
"hiiiiiiiii."
],
"xyz": "Describe"
},
"1012757": {
"pqr": [
"1013757",
"x"
]
},
"1014220": {
"abc": [
"1018420",
"sooooo"
],
"answer": "4th"
},
"1019660": {
"abc": [
"1031920",
"welcome"
],
"xyz": "Describing&Interpreting"
},
"1034280": {
"abc": [
"1040560",
"Ok..."
],
"nop": "Student Question"
},
The output should be:
1) "abc": [
"1012220",
"hiiiiiiiii."
],
"xyz": "Describe"
2) "pqr": [
"1013757",
"x"
]
3) "abc": [
"1018420",
"sooooo"
],
"answer": "4th"
plz.. help..

i think this will be useful for you
(?<=\{)\n\s+((?:[\n]+|.*)+?)\n\}
regex demo here : http://regex101.com/r/rS3wI5

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to capture a filename with or without an extension - regex

Related

Split log message on space for grok pattern

Elasticsearch pattern regex start with

How to use regex inside in query using morphia?

How to match array of sub string with array of string using mongo?

To split a json file.. Extracting data between curly braces

Categories

Resources