MongoDB regular expression with indexed field - regex

I was creating my first app using MongoDB.
Created index for a field, and tried a find query with $regex param, launched in a shell
> db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
{
"cursor" : "BtreeCursor A_1 multi",
"nscanned" : 500001,
"nscannedObjects" : 10,
"n" : 10,
"millis" : 956,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"A" : [
[
"",
{
}
],
[
/BLABLA!25500[0-9]/,
/BLABLA!25500[0-9]/
]
]
}
}
It's very strange, because when i'm launching the same query, but with no index in collection, the performance is much better.
> db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 500002,
"nscannedObjects" : 500002,
"n" : 10,
"millis" : 531,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Obviously, searching a field with index without regex is working much faster(i.e. searching document with constant field) , but i'm really interested in reason of such behavior.

The reason for the performance differential here is likely that, with the index enabled, your query must traverse the index (load into memory), then load the matching documents to be returned into memory also. Since you are not using the prefix query all values in the index will be scanned and tested against the regular expression. Not very efficient.
When you remove the index you are just doing a table scan and matching the regex there - essentially you simplified things from the first one slightly.
You might be able to make the indexed version quicker if it were a covered index query, it would also likely be faster if this were a compound index and you needed to combine it with the criteria for another field.
When you use a prefix query, it's not that it only uses an index then, but you use the index efficiently, which is key, and hence you see the real performance gains.

Related

writing a logic to check if value exists

working on the data returned by code
Trying to add some logic that if the value exists, show it else put empty
<cfset myStruct = {
"access_token" : "#st.access_token#",
"id": "#res.names[1].metadata.source.id#",
"name" : "#isDefined('res.names') ? res.names[1].displayname : ''#",
"other" : {
"email" : "#res.emailAddresses[1].value#"
}
}>
Open in new window
its not clean and it throws error on line 3 which is ID, so what kind of isDefined or structkeyexists i can write if it exists add it, else put an empty value
You could try Elvis operator
Edit: Unless you really need the values as a string, you do not need to use pounds to output the values
Edit 2: Have updated the example to use the right comment
<cfset myStruct = {
"access_token" : "#st.access_token#" <!--- If you have numeric token and need it to be a string --->
, "id" : res.names[ 1 ].metadata.source.id ?: ""
, "name" : res.names[ 1 ].displayname ?: ""
, "other" : {
"email" : res.emailAddresses[ 1 ].value ?: ""
}
}>

fail2ban-regex doesn't match snort logfile in alert_json format

I try to match a fail2ban-regex with a snort3 logfile in alert_json format.
example alert_json output in log-file:
{ "timestamp" : "21/03/22-12:23:56.370262", "seconds" : 1616412236, "action" : "allow", "class" : "none", "b64_data" : "lVAAFpTzAXEAAAAAoAJyELUuAAACBAW0BAIICikv9agAAAAAAQMDBw==", "dir" : "C2S", "dst_addr" : "6.7.8.9", "dst_ap" : "6.7.8.9:0", "eth_dst" : "00:11:22:33:44:55", "eth_len" : 102, "eth_src" : "11:11:22:33:44:55", "eth_type" : "0x800", "gid" : 1, "icmp_code" : 3, "icmp_id" : 0, "icmp_seq" : 0, "icmp_type" : 3, "iface" : "eth0", "ip_id" : 5814, "ip_len" : 68, "msg" : "ICMP Traffic Detected", "mpls" : 0, "pkt_gen" : "raw", "pkt_len" : 88, "pkt_num" : 2270045, "priority" : 0, "proto" : "ICMP", "rev" : 0, "rule" : "1:10000001:0", "service" : "unknown", "sid" : 10000001, "src_addr" : "1.2.3.4", "src_ap" : "1.2.3.4:0", "tos" : 192, "ttl" : 64, "vlan" : 0 }
my fail2ban-regex which didn't match:
^\{.*\"src_addr\"\ :\ \"<HOST>\".*\}$
i tryed this on regexr.com and it match.
i already found out there is maybe some problem with the timestamp but i didn't figured out which?
can somebody help here?
thanks
It'd probably depend on fail2ban version, for example latest fail2ban >= 0.10.6/0.11.2 does not require timestamp anymore (it would simulate "now"), so it shows to me the IP and current time (as I execute it):
$ fail2ban-regex -v /tmp/log '^\{.*\"src_addr\"\ :\ \"<HOST>\".*\}$'
...
Lines: 1 lines, 0 ignored, 1 matched, 0 missed
To specify own datepattern you have to set it in filter (or supply to fail2ban-regex with -d parameter), so this will work:
# either for timestamp tag:
$ fail2ban-regex -v -d ^\{\s*"timestamp"\s*:\s*"%y/%m/%d-%H:%M:%S\.%f" /tmp/log \"src_addr\"\ :\ \"<HOST>\"
# or for posix seconds (probably better because don't need conversion):
$ fail2ban-regex -v -d '"seconds"\s*:\s*{EPOCH}\s*,\s*' /tmp/log '\"src_addr\"\ :\ \"<HOST>\"'
Note that in fail2ban configs you must escape every % as %% due to python ini-configs substitution rules.
Also note that fail2ban cuts part of message matching date pattern out before it apply pref- or failregex.
Also note that your RE is a bit vulnerable, see https://github.com/fail2ban/fail2ban/issues/2932#issuecomment-777320874 for a better example.

How to query with conditionals in MongoDB

I am new to MongoDB and am learning how to query for multiple things at once with conditionals.
I have a database with a document called 'towns' that contains an id, name, population, date of last census, items it is famous for, and mayor. For example, this is what one of the towns looks like (please keep in mind, this is old, random data, nothing is up to date, it is just an example for me to learn):
{
"_id" : ObjectId("60232b0bbae1e5336c5ebc96"),
"name" : "New York",
"population" : 22200000,
"lastCensus" : ISODate("2016-07-05T00:00:00Z"),
"famousFor" : [
"the MOMA",
"food"
],
"mayor" : {
"name" : "Bill de Blasio",
"party" : "D"
}
I am trying to find all towns with names that contain an e and that are famous for food or beer.
I currently have this query:
db.towns.find({name: {$regex:"e"}}, {$or: [{famousFor:{$regex: 'food'}}, {famousFor:{$regex: 'beer'}}]})
If I split up the name and the $or expression, it works, but together I get errors like:
Error: error: {
"ok" : 0,
"errmsg" : "Unrecognized expression '$regex'",
"code" : 168,
"codeName" : "InvalidPipelineOperator"
Or, if I switch the query to db.towns.find({name:/e/}, {$or: [{famousFor:/food/}, {famousFor:/beer/}]}) I get the error:
Error: error: {
"ok" : 0,
"errmsg" : "FieldPath field names may not start with '$'.",
"code" : 16410,
"codeName" : "Location16410"
What am I doing wrong? Is it how I am structuring the query?
Thanks in advance!
Problem Is the syntax.
find({condition goes here}, {projection goes here})
You need to put all of your conditions within one curly brace.
db.towns.find({name: {$regex:"e"}, $or: [{famousFor:{$regex: 'food'}}, {famousFor:{$regex: 'beer'}}]})

MongoDB case insensitive query on text with parenthesis

I have a very annoying problem with a case insensitive query on mongodb.
I'm using MongoTemplate in a web application and I need to execute case insensitive queries on a collection.
with this code
Query q = new Query();
q.addCriteria(Criteria.where("myField")
.regex(Pattern.compile(fieldValue, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
return mongoTemplate.findOne(q,MyClass.class);
I create the following query
{ "myField" : { "$regex" : "field value" , "$options" : "iu"}}
that works perfectly when I have simple text, for example:
caPITella CapitatA
but...but...when there are parenthesis () the query doesn't work.
It doesn't work at all, even the query text is wrote as is wrote in the document...Example:
query 1:
{"myField" : "Ceratonereis (Composetia) costae" } -> 1 result (ok)
query 2:
{ "myField" : {
"$regex" : "Ceratonereis (Composetia) costae" ,
"$options" : "iu"
}} -> no results (not ok)
query 3:
{ "scientificName" : {
"$regex" : "ceratonereis (composetia) costae" ,
"$options" : "iu"
}} -> no results (....)
So...I'm doing something wrong? I forgot some Pattern.SOME to include in the Pattern.compile()? Any solution?
Thanks
------ UPDATE ------
The answer of user3561036 helped me to figure how the query must be built.
So, I have resolved by modifying the query building in
q.addCriteria(Criteria.where("myField")
.regex(Pattern.compile(Pattern.quote(myFieldValue), Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
The output query
{ "myField" : { "$regex" : "\\Qhaliclona (rhizoniera) sarai\\E" , "$options" : "iu"}}
works.
If using the $regex operator with a "string" as input then you must quote literals for reserved characters such as ().
Normally that's a single \, but since it's in a string already you do it twice \\:
{ "myField" : {
"$regex" : "Ceratonereis \\(Composetia\\) costae" ,
"$options" : "iu"
}}
It's an old question, but you can use query.replace(/[-[\]{}()*+?.,\\/^$|#\s]/g, "\\$&");
This is working with aggregate and matches :
const order = user_input.replace(/[-[\]{}()*+?.,\\/^$|#\s]/g, "\\$&");
const regex = new RegExp(order, 'i');
const query = await this.databaseModel.aggregate([
{
$match: {
name : regex
}
// ....
Use $strcasecmp.
The aggregation framework was introduced in MongoDB 2.2. You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings.
It's more recommended and easier than using regex.

SpringMongo Case insensitive search regex

I am trying a case insensitive search in Mongo. Basically I want case insensitive string match I am using regex. Here is my code
Query query = new Query( Criteria.where(propName).regex(value.toString(), "i"));
But the above dosent match my whole string(a string sometime with spaces). It returns values even if its a substring.
Eg: Suppose my collection has 2 values "Bill" and "Bill status',It returns me "bill" even if my search is "bill status". It returns results even if the there is a sub string of the string I am searching for
I tried Query query = new Query( Criteria.where(propName).is(value.toString()));
But the above is case sensitive. Can someone please help on this.
Insensitive search using regex search. input_location could be Delhi,delhi,DELHI, it works for all.
Criteria.where("location").regex( Pattern.compile(input_location, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
The regex /^bill$/i will match just against "Bill" in a case-insensitive manner.
Here is an example showing this (in the mongo shell):
> db.foo.insert({name: "Bill"});
> db.foo.insert({name: "Bill status"});
> db.foo.insert({name: "another Bill"});
> db.foo.find()
{ "_id" : ObjectId("5018e182a499db774b92bf25"), "name" : "Bill" }
{ "_id" : ObjectId("5018e191a499db774b92bf26"), "name" : "Bill status" }
{ "_id" : ObjectId("5018e19ba499db774b92bf27"), "name" : "another Bill" }
> db.foo.find({name: /bill/i})
{ "_id" : ObjectId("5018e182a499db774b92bf25"), "name" : "Bill" }
{ "_id" : ObjectId("5018e191a499db774b92bf26"), "name" : "Bill status" }
{ "_id" : ObjectId("5018e19ba499db774b92bf27"), "name" : "another Bill" }
> db.foo.find({name: /^bill$/i})
{ "_id" : ObjectId("5018e182a499db774b92bf25"), "name" : "Bill" }
However, a regex query will not use an index, except if it is left-rooted (ie. of the form /^prefix/) and if the i case-insensitive flag is not used. You may pay a substantial performance penalty over using a query that uses an index. As such, depending on your use case, a better alternative might be to use application logic in some way, for example:
Enforce a case when you insert items into the database (e.g. convert "bill" to "Bill").
Do a search against your known case (e.g. search just against "Bill").