Ubuntu 16 sed not working with parenthesis - regex

Oh, I can't get past this SED regex. This line "entrytimestamp" : ISODate("2020-09-09T16:07:34.526Z") in the first record should also be transformed but since it does not have a comma after the closing parenthesis it is not. Simply I want to remove "ISODate(" and the closing parenthesis ")". But it should not matter if is it the last element or not. I have double/triple checked the REGEX but I am missing something. Does anybody have any idea?
root## cat inar.json
[
{
"_id" : ObjectId("5f58fdc632e4de001621c1ca"),
"USER" : null,
"entrytimestamp" : ISODate("2020-09-09T16:07:34.526Z")
},
{
"_id" : ObjectId("5f590118c205630016dcafb4"),
"entrytimestamp" : ISODate("2020-09-09T16:21:44.346Z"),
"USER" : null
}
]
sed -E "s/(.+\"entrytimestamp\"\s:\s)ISODate\((\"[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{1,3}Z\")\)(.+)/\1\2\3/" inar.json
[
{
"_id" : ObjectId("5f58fdc632e4de001621c1ca"),
"USER" : null,
"entrytimestamp" : ISODate("2020-09-09T16:07:34.526Z")
},
{
"_id" : ObjectId("5f590118c205630016dcafb4"),
"entrytimestamp" : "2020-09-09T16:21:44.346Z",
"USER" : null
}
]

You may use this sed:
sed -E 's/("entrytimestamp" *: *)ISODate\(([^)]+)\)/\1\2/' file
[
{
"_id" : ObjectId("5f58fdc632e4de001621c1ca"),
"USER" : null,
"entrytimestamp" : "2020-09-09T16:07:34.526Z"
},
{
"_id" : ObjectId("5f590118c205630016dcafb4"),
"entrytimestamp" : "2020-09-09T16:21:44.346Z",
"USER" : null
}
]
Command Details
("entrytimestamp" *: *): Match starting "entrytimestamp" : part with optional spaces around :. Capture this part in group #1
ISODate\(: Match ISODate(
([^)]+): Match 1+ of any character that is not ). Capture this part in group #2
\): Match closing )
/\1\2: Put back-references #1 and #2 back in substitution

Your regex does not match the first line you intend to match because of the last (.+) that matches at least one or more characters. As there is only a ) at the end and nothing else to match, the pattern fails.
Use (.*) to match any zero or more characters:
sed -E "s/(.+\"entrytimestamp\"\s:\s)ISODate\((\"[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{1,3}Z\")\)(.*)/\1\2\3/" inar.json
This is how the expression works.

Related

Logstash Grok Regex: get each line in each block

I need a custom logstash-grok regex pattern
Some sample data:
abc blabla
[BLOCK]
START=1
END=2
[/BLOCK]
more blabla
[BLOCK]
START=3
END=4
[/BLOCK]
Note: each line ends in a newline character.
How do I capture all START and END values?
The desired result is:
{ "BLOCK1": { "START:"1", "END":"2"} }, "BLOCK2": { "START":"3", "END":"4" } }
I tried
START \bSTART=(?<start>\d*)
END \bEND=(?<end>\d*)
but the result is the values of only the first block:
{ "start": "1", "end": "2" }
I also tried using the multiline character (?m) in front of the grok pattern but that doesn't work either...
Any help is appreciated.

fail2ban-regex doesn't match snort logfile in alert_json format

I try to match a fail2ban-regex with a snort3 logfile in alert_json format.
example alert_json output in log-file:
{ "timestamp" : "21/03/22-12:23:56.370262", "seconds" : 1616412236, "action" : "allow", "class" : "none", "b64_data" : "lVAAFpTzAXEAAAAAoAJyELUuAAACBAW0BAIICikv9agAAAAAAQMDBw==", "dir" : "C2S", "dst_addr" : "6.7.8.9", "dst_ap" : "6.7.8.9:0", "eth_dst" : "00:11:22:33:44:55", "eth_len" : 102, "eth_src" : "11:11:22:33:44:55", "eth_type" : "0x800", "gid" : 1, "icmp_code" : 3, "icmp_id" : 0, "icmp_seq" : 0, "icmp_type" : 3, "iface" : "eth0", "ip_id" : 5814, "ip_len" : 68, "msg" : "ICMP Traffic Detected", "mpls" : 0, "pkt_gen" : "raw", "pkt_len" : 88, "pkt_num" : 2270045, "priority" : 0, "proto" : "ICMP", "rev" : 0, "rule" : "1:10000001:0", "service" : "unknown", "sid" : 10000001, "src_addr" : "1.2.3.4", "src_ap" : "1.2.3.4:0", "tos" : 192, "ttl" : 64, "vlan" : 0 }
my fail2ban-regex which didn't match:
^\{.*\"src_addr\"\ :\ \"<HOST>\".*\}$
i tryed this on regexr.com and it match.
i already found out there is maybe some problem with the timestamp but i didn't figured out which?
can somebody help here?
thanks
It'd probably depend on fail2ban version, for example latest fail2ban >= 0.10.6/0.11.2 does not require timestamp anymore (it would simulate "now"), so it shows to me the IP and current time (as I execute it):
$ fail2ban-regex -v /tmp/log '^\{.*\"src_addr\"\ :\ \"<HOST>\".*\}$'
...
Lines: 1 lines, 0 ignored, 1 matched, 0 missed
To specify own datepattern you have to set it in filter (or supply to fail2ban-regex with -d parameter), so this will work:
# either for timestamp tag:
$ fail2ban-regex -v -d ^\{\s*"timestamp"\s*:\s*"%y/%m/%d-%H:%M:%S\.%f" /tmp/log \"src_addr\"\ :\ \"<HOST>\"
# or for posix seconds (probably better because don't need conversion):
$ fail2ban-regex -v -d '"seconds"\s*:\s*{EPOCH}\s*,\s*' /tmp/log '\"src_addr\"\ :\ \"<HOST>\"'
Note that in fail2ban configs you must escape every % as %% due to python ini-configs substitution rules.
Also note that fail2ban cuts part of message matching date pattern out before it apply pref- or failregex.
Also note that your RE is a bit vulnerable, see https://github.com/fail2ban/fail2ban/issues/2932#issuecomment-777320874 for a better example.

Regex to find string between patterns not containing specific string

Ok gurus,
Lets say I have the following string:
{
"event" : "party" ,
"Id" : "store" ,
"timestamp" : "2019-07-07T13:14:26.329Z" ,
"localDateTime" : "2019-07-07T16:14" ,
"orderStateUpdate" : {
"id" : "fj09bA9ywfGS" ,
"orderId" : "2315043" ,
"visitId" : "2315043" ,
"items" :{{
"id" : "fj09bA6K3K8u" ,
"quantity" : 1 ,
"stat" : "ok"
},
{
"id" : "fj09bA6K3K8u2" ,
"quantity" : 2 ,
"stat" : "ok"
}}
,
"items" :{{
"id" : "fj09bA6K3K8u" ,
"quantity" : 1 ,
"stat" : "junk"
},
{
"id" : "fj09bA6K3K8u2" ,
"quantity" : 2 ,
"stat" : "ok"
}}
,
"extraParams" : {"extraparamstuff1":"bugger"},"somethingelse" :"blahblahblah"
}}
The string has two (nested arrays) wrapped by double curly braces. This string specifically contains an error where the LAST curly brace is ALSO double; what I am trying to capture with regex is the string that starts with '}}' , ends with '}}' and DOES NOT CONTAIN '{{' like so:
}}
,
"extraParams" : {"extraparamstuff1":"bugger"},"conversationLink" :"https://qa.app.package.ai/qa/#/app/dashboard?d=1561248000000&c=fdxkID9IifGv&p=fdxfaFgV1l1Y"
}}
I am Regex-challenged, but have come up with this:
(?:(\}\})).*(?:\{\{).*(?:\}\s*?\})
which captures
}}
,
"items" :{{
"id" : "fj09bA6K3K8u" ,
"quantity" : 1 ,
"itemState" : "LOADED"
},
{
"id" : "fj09bA6K3K8u2" ,
"quantity" : 2 ,
"itemState" : "LOADED2"
}}
,
"extraParams" : {"extraparamstuff1":"bugger"},"conversationLink" :"https://qa.app.package.ai/qa/#/app/dashboard?d=1561248000000&c=fdxkID9IifGv&p=fdxfaFgV1l1Y"
}}
which is too much. Can someone help me understand how to find this? This is for error-checking inbound data (and yes I need to check for extra opening '{{' as well).
Okay, so, I think you need a negative lookahead since you have to accept curly braces, but not doubles... this is what I've come up with, not sure if it will work in every case though.
}}([^{]|{(?!{))+}}
It basically says: look for two closing curlies (}}), then either any non-opening curly character ([^{]) OR a single opening curly character (using negative lookahead) ({(?!{)), repeat that as many times as needed (+), and finish with a double closing curly (}})
Link to live (updateable) demo: https://regex101.com/r/kwlzco/2

how to add special characters in mongo $regex

I want to look for "\r" in a string field I have in mongo, and I fount this, which looks like it works good:
db.users.findOne({"username" : {$regex : ".*son.*"}});
the problem is that i want to look for "\r" and I can find it, which I know its there, so I just did:
db.users.findOne({"username" : {$regex : ".*\r.*"}});
and it dosent work, how can I fix this?
example document:
{
"personId" : 1,
"personName" : "john",
"address" : {
"city" : "Rue Neuve 2\\r\\rue Pré-du-Mar \\r ché 1 1003 Lausanne",
"street" : "",
"zipCode" : "",
"streetNumber" : ""
}
}
so my query is:
db.users.findOne({"address.city" : {$regex : ".*\r.*"}});
also tried:
db.users.findOne({"address.city" : {$regex : ".*\\r.*"}});
try
db.users.findOne({"username" : {$regex : ".*\\r.*"}});
I think your issue is that you have your .* backwards at the end. You are looking for a "2." literal followed by any characters as opposed to what you have at the beginning, .*, saying anything before the literal that isn't a carriage return. Try to change this to
db.users.findOne({"username" : {$regex : ".*\\r*."}});
Which says give me "\r" with any non carriage return characters before the literal and any non carriage return characters after the literal.
I found that the way to do it is:
db.users.findOne({"username" : {$regex : ".*\\\\.*"}});

Replace numbers one by one with regex

I have strings such as this:
"Query_string" : [ 1345.6423, 5656.5, 346.324, 880.0 ],
"Query_string" : [ 1345.6423, 5656.5, 346.324, 880.0 ],
"Query_string" : [ 1345.6423, 5656.5, 346.324, 880.0 ],
Random code 124253
String.....
I need to replace digits that have "query_string" in front of them to be zero, like so:
"Query_string" : [ 0000.0000, 0000.0, 000.000, 000.0 ],
But other stuff should stay in place, eg:
Random code 124253
I tried this:
(^\"Query\_string\"\s\:\s\[\s)|\d|(\s\]\,)
But it matches all digits, including "Random code 124253"
sed ": loop
s/\("Query_string".*\)[1-9]/\10/
t loop" YourFile
This sed expression replaces all non-zero digits on Query lines with 0:
sed '/^"Query_string"/{s/[1-9]/0/g}' input
Another version:
sed '/^"Query_string"/!b;s/[1-9]/0/g' input
Still another:
sed '/^"Query_string"/s/[1-9]/0/g' input