Regex to find string between patterns not containing specific string - regex

Ok gurus,
Lets say I have the following string:
{
"event" : "party" ,
"Id" : "store" ,
"timestamp" : "2019-07-07T13:14:26.329Z" ,
"localDateTime" : "2019-07-07T16:14" ,
"orderStateUpdate" : {
"id" : "fj09bA9ywfGS" ,
"orderId" : "2315043" ,
"visitId" : "2315043" ,
"items" :{{
"id" : "fj09bA6K3K8u" ,
"quantity" : 1 ,
"stat" : "ok"
},
{
"id" : "fj09bA6K3K8u2" ,
"quantity" : 2 ,
"stat" : "ok"
}}
,
"items" :{{
"id" : "fj09bA6K3K8u" ,
"quantity" : 1 ,
"stat" : "junk"
},
{
"id" : "fj09bA6K3K8u2" ,
"quantity" : 2 ,
"stat" : "ok"
}}
,
"extraParams" : {"extraparamstuff1":"bugger"},"somethingelse" :"blahblahblah"
}}
The string has two (nested arrays) wrapped by double curly braces. This string specifically contains an error where the LAST curly brace is ALSO double; what I am trying to capture with regex is the string that starts with '}}' , ends with '}}' and DOES NOT CONTAIN '{{' like so:
}}
,
"extraParams" : {"extraparamstuff1":"bugger"},"conversationLink" :"https://qa.app.package.ai/qa/#/app/dashboard?d=1561248000000&c=fdxkID9IifGv&p=fdxfaFgV1l1Y"
}}
I am Regex-challenged, but have come up with this:
(?:(\}\})).*(?:\{\{).*(?:\}\s*?\})
which captures
}}
,
"items" :{{
"id" : "fj09bA6K3K8u" ,
"quantity" : 1 ,
"itemState" : "LOADED"
},
{
"id" : "fj09bA6K3K8u2" ,
"quantity" : 2 ,
"itemState" : "LOADED2"
}}
,
"extraParams" : {"extraparamstuff1":"bugger"},"conversationLink" :"https://qa.app.package.ai/qa/#/app/dashboard?d=1561248000000&c=fdxkID9IifGv&p=fdxfaFgV1l1Y"
}}
which is too much. Can someone help me understand how to find this? This is for error-checking inbound data (and yes I need to check for extra opening '{{' as well).

Okay, so, I think you need a negative lookahead since you have to accept curly braces, but not doubles... this is what I've come up with, not sure if it will work in every case though.
}}([^{]|{(?!{))+}}
It basically says: look for two closing curlies (}}), then either any non-opening curly character ([^{]) OR a single opening curly character (using negative lookahead) ({(?!{)), repeat that as many times as needed (+), and finish with a double closing curly (}})
Link to live (updateable) demo: https://regex101.com/r/kwlzco/2

Related

How to query with conditionals in MongoDB

I am new to MongoDB and am learning how to query for multiple things at once with conditionals.
I have a database with a document called 'towns' that contains an id, name, population, date of last census, items it is famous for, and mayor. For example, this is what one of the towns looks like (please keep in mind, this is old, random data, nothing is up to date, it is just an example for me to learn):
{
"_id" : ObjectId("60232b0bbae1e5336c5ebc96"),
"name" : "New York",
"population" : 22200000,
"lastCensus" : ISODate("2016-07-05T00:00:00Z"),
"famousFor" : [
"the MOMA",
"food"
],
"mayor" : {
"name" : "Bill de Blasio",
"party" : "D"
}
I am trying to find all towns with names that contain an e and that are famous for food or beer.
I currently have this query:
db.towns.find({name: {$regex:"e"}}, {$or: [{famousFor:{$regex: 'food'}}, {famousFor:{$regex: 'beer'}}]})
If I split up the name and the $or expression, it works, but together I get errors like:
Error: error: {
"ok" : 0,
"errmsg" : "Unrecognized expression '$regex'",
"code" : 168,
"codeName" : "InvalidPipelineOperator"
Or, if I switch the query to db.towns.find({name:/e/}, {$or: [{famousFor:/food/}, {famousFor:/beer/}]}) I get the error:
Error: error: {
"ok" : 0,
"errmsg" : "FieldPath field names may not start with '$'.",
"code" : 16410,
"codeName" : "Location16410"
What am I doing wrong? Is it how I am structuring the query?
Thanks in advance!
Problem Is the syntax.
find({condition goes here}, {projection goes here})
You need to put all of your conditions within one curly brace.
db.towns.find({name: {$regex:"e"}, $or: [{famousFor:{$regex: 'food'}}, {famousFor:{$regex: 'beer'}}]})

Ubuntu 16 sed not working with parenthesis

Oh, I can't get past this SED regex. This line "entrytimestamp" : ISODate("2020-09-09T16:07:34.526Z") in the first record should also be transformed but since it does not have a comma after the closing parenthesis it is not. Simply I want to remove "ISODate(" and the closing parenthesis ")". But it should not matter if is it the last element or not. I have double/triple checked the REGEX but I am missing something. Does anybody have any idea?
root## cat inar.json
[
{
"_id" : ObjectId("5f58fdc632e4de001621c1ca"),
"USER" : null,
"entrytimestamp" : ISODate("2020-09-09T16:07:34.526Z")
},
{
"_id" : ObjectId("5f590118c205630016dcafb4"),
"entrytimestamp" : ISODate("2020-09-09T16:21:44.346Z"),
"USER" : null
}
]
sed -E "s/(.+\"entrytimestamp\"\s:\s)ISODate\((\"[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{1,3}Z\")\)(.+)/\1\2\3/" inar.json
[
{
"_id" : ObjectId("5f58fdc632e4de001621c1ca"),
"USER" : null,
"entrytimestamp" : ISODate("2020-09-09T16:07:34.526Z")
},
{
"_id" : ObjectId("5f590118c205630016dcafb4"),
"entrytimestamp" : "2020-09-09T16:21:44.346Z",
"USER" : null
}
]
You may use this sed:
sed -E 's/("entrytimestamp" *: *)ISODate\(([^)]+)\)/\1\2/' file
[
{
"_id" : ObjectId("5f58fdc632e4de001621c1ca"),
"USER" : null,
"entrytimestamp" : "2020-09-09T16:07:34.526Z"
},
{
"_id" : ObjectId("5f590118c205630016dcafb4"),
"entrytimestamp" : "2020-09-09T16:21:44.346Z",
"USER" : null
}
]
Command Details
("entrytimestamp" *: *): Match starting "entrytimestamp" : part with optional spaces around :. Capture this part in group #1
ISODate\(: Match ISODate(
([^)]+): Match 1+ of any character that is not ). Capture this part in group #2
\): Match closing )
/\1\2: Put back-references #1 and #2 back in substitution
Your regex does not match the first line you intend to match because of the last (.+) that matches at least one or more characters. As there is only a ) at the end and nothing else to match, the pattern fails.
Use (.*) to match any zero or more characters:
sed -E "s/(.+\"entrytimestamp\"\s:\s)ISODate\((\"[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{1,3}Z\")\)(.*)/\1\2\3/" inar.json
This is how the expression works.

how to add special characters in mongo $regex

I want to look for "\r" in a string field I have in mongo, and I fount this, which looks like it works good:
db.users.findOne({"username" : {$regex : ".*son.*"}});
the problem is that i want to look for "\r" and I can find it, which I know its there, so I just did:
db.users.findOne({"username" : {$regex : ".*\r.*"}});
and it dosent work, how can I fix this?
example document:
{
"personId" : 1,
"personName" : "john",
"address" : {
"city" : "Rue Neuve 2\\r\\rue Pré-du-Mar \\r ché 1 1003 Lausanne",
"street" : "",
"zipCode" : "",
"streetNumber" : ""
}
}
so my query is:
db.users.findOne({"address.city" : {$regex : ".*\r.*"}});
also tried:
db.users.findOne({"address.city" : {$regex : ".*\\r.*"}});
try
db.users.findOne({"username" : {$regex : ".*\\r.*"}});
I think your issue is that you have your .* backwards at the end. You are looking for a "2." literal followed by any characters as opposed to what you have at the beginning, .*, saying anything before the literal that isn't a carriage return. Try to change this to
db.users.findOne({"username" : {$regex : ".*\\r*."}});
Which says give me "\r" with any non carriage return characters before the literal and any non carriage return characters after the literal.
I found that the way to do it is:
db.users.findOne({"username" : {$regex : ".*\\\\.*"}});

MongoDB case insensitive query on text with parenthesis

I have a very annoying problem with a case insensitive query on mongodb.
I'm using MongoTemplate in a web application and I need to execute case insensitive queries on a collection.
with this code
Query q = new Query();
q.addCriteria(Criteria.where("myField")
.regex(Pattern.compile(fieldValue, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
return mongoTemplate.findOne(q,MyClass.class);
I create the following query
{ "myField" : { "$regex" : "field value" , "$options" : "iu"}}
that works perfectly when I have simple text, for example:
caPITella CapitatA
but...but...when there are parenthesis () the query doesn't work.
It doesn't work at all, even the query text is wrote as is wrote in the document...Example:
query 1:
{"myField" : "Ceratonereis (Composetia) costae" } -> 1 result (ok)
query 2:
{ "myField" : {
"$regex" : "Ceratonereis (Composetia) costae" ,
"$options" : "iu"
}} -> no results (not ok)
query 3:
{ "scientificName" : {
"$regex" : "ceratonereis (composetia) costae" ,
"$options" : "iu"
}} -> no results (....)
So...I'm doing something wrong? I forgot some Pattern.SOME to include in the Pattern.compile()? Any solution?
Thanks
------ UPDATE ------
The answer of user3561036 helped me to figure how the query must be built.
So, I have resolved by modifying the query building in
q.addCriteria(Criteria.where("myField")
.regex(Pattern.compile(Pattern.quote(myFieldValue), Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
The output query
{ "myField" : { "$regex" : "\\Qhaliclona (rhizoniera) sarai\\E" , "$options" : "iu"}}
works.
If using the $regex operator with a "string" as input then you must quote literals for reserved characters such as ().
Normally that's a single \, but since it's in a string already you do it twice \\:
{ "myField" : {
"$regex" : "Ceratonereis \\(Composetia\\) costae" ,
"$options" : "iu"
}}
It's an old question, but you can use query.replace(/[-[\]{}()*+?.,\\/^$|#\s]/g, "\\$&");
This is working with aggregate and matches :
const order = user_input.replace(/[-[\]{}()*+?.,\\/^$|#\s]/g, "\\$&");
const regex = new RegExp(order, 'i');
const query = await this.databaseModel.aggregate([
{
$match: {
name : regex
}
// ....
Use $strcasecmp.
The aggregation framework was introduced in MongoDB 2.2. You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings.
It's more recommended and easier than using regex.

Vi regular expression

Looking to perform a find and replace on the following string:
"_id" : { "$oid" : "52853800bb1177ca391c17ff" }, "Ticker" : "A", "Profit Margin" : 0.137, "Institutional Ownership" : 0.847, "EPS growth past 5 years" : 0.158, "Total Debt/Equity" : 0.5600000000000001, "CurrentRatio" : 3, "Return on Assets" : 0.089, "Sector" : "Healthcare", "P/S" : 2.54, "Change from Open" : -0.0148, "Performance (YTD)" : 0.2605, "Performance (Week)" : 0.0031, "Quick Ratio" : 2.3, "Insider Transactions" : -0.1352, "P/B" : 3.63, "EPS growth quarter over quarter" : -0.29, "Payout Ratio" : 0.162, "Performance (Quarter)" : 0.09279999999999999, "Forward P/E" : 16.11, "P/E" : 19.1, "200-Day Simple Moving Average" : 0.1062, "Shares Outstanding" : 339, "Earnings Date" : { "$date" : 1384464600000 }, "52-Week High" : -0.0544, "P/Cash" : 7.45, "Change" : -0.0148, "Analyst Recom" : 1.6, "Volatility (Week)" : 0.0177, "Country" : "USA", "Return on Equity" : 0.182, "50-Day Low" : 0.0728, "Price" : 50.44, "50-Day High" : -0.0544, "Return on Investment" : 0.163, "Shares Float" : 330.21, "Dividend Yield" : 0.0094, "EPS growth test years" : 0.13 }
Specifically, I want to find all characters in quotations and remove any whitespaces found. i.e. "Profit Margin" becomes "ProfitMargin", "Institutional Ownership" becomes "InstitutionalOwnership" etc. I'd like to do this in Vi.
Thanks for the help in advance!
A possible answer:
:%s/\("[^"]*"\)/\=substitute(submatch(1), " ", "", "g")/g
And the way I got it:
Search what we want to replace => /".*" (quote symbol + n times whatever + quote symbol)
Do it properly => /"[^"]*" (quote symbol + n times whatever is not a quote symbol + quote symbol)
Transform that into a substitution that does nothing => :%s/\("[^"]*"\)/\1/g
Check :help :%s, from there :help sub-replace-special.
Use the magic \= learned before, still doing nothing => :%s/\("[^"]*"\)/\=submatch(1)/g
Replace \=submatch(1) by something useful => :%s/\("[^"]*"\)/\=substitute(submatch(1), " ", "", "g")/g (:help substitute).