Parsing String into Custom Object with Powershell and regex - regex

I have a String, which I try to parse into a array of PSCustom Object with sub expression.
The String looks like this :
date=2021-09-13 time=20:05:25 devname="chwitrfg01" devid="FG10E0TB20903187" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" eventtime=1631556325 srcip=192.168.10.226 srcname="192.168.10.226" srcport=54809 srcintf="port8" srcintfrole="dmz" dstip=8.8.4.4 dstname="dns.google" dstport=53 dstintf="wan1" dstintfrole="lan" poluuid="01533038-da7b-51eb-b854-8fd38a0deba3" sessionid=1472996904 proto=17 action="accept" policyid=278 policytype="policy" service="DNS" dstcountry="United States" srccountry="Reserved" trandisp="snat" transip=194.56.218.226 transport=54809 duration=180 sentbyte=245 rcvdbyte=144 sentpkt=2 rcvdpkt=1 shapersentname="default_class" shaperdropsentbyte=0 shaperrcvdname="default_class" shaperdroprcvdbyte=0 appcat="unscanned" dstdevtype="Unknown" dstdevcategory="None" masterdstmac="00:00:0c:07:ac:8d" dstmac="00:00:0c:07:ac:8d" dstserver=1
And I tried something like this, but I'm a total noob in regex and have no Idea how to solve this. Is there a easy way, to add each value to a property of the custom object?
$Pattern = #(
'(?<devname>\devname=w+)'
'(?<srcip>(srcip=?:[0-9]+\.){3}[0-9]+):(?<srcport>srcport=[0-9]+)'
'(?<dstip>(dstip=?:[0-9]+\.){3}[0-9]+):(?<dstport>dstport=[0-9]+)'
) -join '\s+'
$cmd |
ForEach-Object {
if ($_ -match $Pattern) {
$Matches.Remove(0)
[PsCustomObject]#{
srcip = $_.Groups['srcip'].Value
dstip = $_.Groups['dstip'].Value
dstport = $_.Groups['dstport'].Value
srcport = $_.Groups['srcport'].Value
fw = $_.Groups['devname'].Value
}
}
}| Select-Object -First 5
$cmd | Format-Table

The simplest way to do this that I know of us the ConvertFrom-StringData cmdlet. That cmdlet creates a hashtable of name/value pairs out of a set of name=value formatted things. What you would do is put each value on its own line to make a multi-ling string, then create a new custom object, and use that hashtable to define the properties.
$cmd -replace ' (\w+=)',"`n`$1"|
%{new-object psobject -prop (ConvertFrom-StringData $_)}
Or the shorter version in v3+ (thanks to #mklement0):
$cmd -replace ' (\w+=)',"`n`$1"|
%{[pscustomobject] (ConvertFrom-StringData $_)}
When I ran that against the string you provided I got back:
sessionid : 1472996904
action : "accept"
rcvdbyte : 144
vd : "root"
logid : "0000000013"
policyid : 278
duration : 180
proto : 17
dstname : "dns.google"
srcintf : "port8"
eventtime : 1631556325
appcat : "unscanned"
srcip : 192.168.10.226
dstip : 8.8.4.4
trandisp : "snat"
srcname : "192.168.10.226"
srcport : 54809
devid : "FG10E0TB20903187"
dstdevcategory : "None"
level : "notice"
sentbyte : 245
shaperdroprcvdbyte : 0
sentpkt : 2
masterdstmac : "00:00:0c:07:ac:8d"
shaperrcvdname : "default_class"
poluuid : "01533038-da7b-51eb-b854-8fd38a0deba3"
type : "traffic"
srcintfrole : "dmz"
subtype : "forward"
policytype : "policy"
dstport : 53
transip : 194.56.218.226
shapersentname : "default_class"
dstdevtype : "Unknown"
dstserver : 1
dstcountry : "United States"
dstintf : "wan1"
service : "DNS"
srccountry : "Reserved"
shaperdropsentbyte : 0
dstintfrole : "lan"
transport : 54809
date : 2021-09-13
rcvdpkt : 1
dstmac : "00:00:0c:07:ac:8d"
devname : "chwitrfg01"
time : 20:05:25
You could probably strip quotes out of it if that is desired.

Related

fail2ban-regex doesn't match snort logfile in alert_json format

I try to match a fail2ban-regex with a snort3 logfile in alert_json format.
example alert_json output in log-file:
{ "timestamp" : "21/03/22-12:23:56.370262", "seconds" : 1616412236, "action" : "allow", "class" : "none", "b64_data" : "lVAAFpTzAXEAAAAAoAJyELUuAAACBAW0BAIICikv9agAAAAAAQMDBw==", "dir" : "C2S", "dst_addr" : "6.7.8.9", "dst_ap" : "6.7.8.9:0", "eth_dst" : "00:11:22:33:44:55", "eth_len" : 102, "eth_src" : "11:11:22:33:44:55", "eth_type" : "0x800", "gid" : 1, "icmp_code" : 3, "icmp_id" : 0, "icmp_seq" : 0, "icmp_type" : 3, "iface" : "eth0", "ip_id" : 5814, "ip_len" : 68, "msg" : "ICMP Traffic Detected", "mpls" : 0, "pkt_gen" : "raw", "pkt_len" : 88, "pkt_num" : 2270045, "priority" : 0, "proto" : "ICMP", "rev" : 0, "rule" : "1:10000001:0", "service" : "unknown", "sid" : 10000001, "src_addr" : "1.2.3.4", "src_ap" : "1.2.3.4:0", "tos" : 192, "ttl" : 64, "vlan" : 0 }
my fail2ban-regex which didn't match:
^\{.*\"src_addr\"\ :\ \"<HOST>\".*\}$
i tryed this on regexr.com and it match.
i already found out there is maybe some problem with the timestamp but i didn't figured out which?
can somebody help here?
thanks
It'd probably depend on fail2ban version, for example latest fail2ban >= 0.10.6/0.11.2 does not require timestamp anymore (it would simulate "now"), so it shows to me the IP and current time (as I execute it):
$ fail2ban-regex -v /tmp/log '^\{.*\"src_addr\"\ :\ \"<HOST>\".*\}$'
...
Lines: 1 lines, 0 ignored, 1 matched, 0 missed
To specify own datepattern you have to set it in filter (or supply to fail2ban-regex with -d parameter), so this will work:
# either for timestamp tag:
$ fail2ban-regex -v -d ^\{\s*"timestamp"\s*:\s*"%y/%m/%d-%H:%M:%S\.%f" /tmp/log \"src_addr\"\ :\ \"<HOST>\"
# or for posix seconds (probably better because don't need conversion):
$ fail2ban-regex -v -d '"seconds"\s*:\s*{EPOCH}\s*,\s*' /tmp/log '\"src_addr\"\ :\ \"<HOST>\"'
Note that in fail2ban configs you must escape every % as %% due to python ini-configs substitution rules.
Also note that fail2ban cuts part of message matching date pattern out before it apply pref- or failregex.
Also note that your RE is a bit vulnerable, see https://github.com/fail2ban/fail2ban/issues/2932#issuecomment-777320874 for a better example.

Powershell regex group regex matches but doesn't have my group. What's missing?

I have some code that I am porting from a jenkins script and I need it as a shell command. So I know the regex works - What's blowing my mind is how it can match but then not have my capture group. What I need is just the root level directory names as such:
foo
baz
How can it "match" but then not have my group? BTW: If there is a simpler way to achieve this, I am all ears.
PS E:\SysData\Jenkins\workspace\chb0_chb0mb_example> git diff --name-only origin/master feature/foo | %{ Resolve-Path -Relative $_ } | sls '.\\.*\\.*' | sls '\\.\\(.+?)\\.*|.*' | %{$_.matches}
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 48
Value : .\foo\Nuget\deleteme.txt
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 55
Value : .\baz\QC_OH_DARKESol\deleteme.txt
Assuming I have the question right. For one thing, a literal period has to be backslashed. But it works without backslashing it anyway. There's no backslash at the beginning. Not everyone has the git command. This pattern could be shorter, but it works. I'm expanding the groups property, which you didn't show.
'.\foo\Nuget\deleteme.txt' | sls '.\\(.+?)\\.*|.*' | % matches | % groups
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 6
Value : .\foo\
Success : True
Name : 1
Captures : {1}
Index : 2
Length : 3
Value : foo
Piping the object from the first sls to the second sls messes something up with the group capture. It seems like a bug. Submitted: piping select-string to itself and the strange effect on matches The value property isn't even right here.
'abc' | select-string a | select-string '(b)' | % matches | % groups
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 1
Value : a # should be b
Compare with sending a plain string to the second select-string, which gives the right output:
'abc' | select-string a | % line | select-string '(b)' | % matches | % groups
Groups : {0, 1}
Success : True
Name : 0
Captures : {0}
Index : 1
Length : 1
Value : b
Success : True
Name : 1
Captures : {1}
Index : 1
Length : 1
Value : b
js2010's helpful answer points out a potential problem with your approach (.\\ should be \.\\), succinctly demonstrates the unexplained behavior you've experienced (for which they've created a GitHub issue), and suggests a workaround (inserting | % Line).
To solve your problem more directly:
# Inputs are sample paths.
'.\foo\Nuget\deleteme.txt',
'.\bar\QC_OH_DARKESol\deleteme.txt' |
foreach { if ($_ -match '^\.\\([^\\]+)') { $Matches[1] } }
The above yields the following strings:
foo
bar
That is, it extracts the first path component following literal .\ from the input paths, using foreach (ForEach-Object) to apply -match, the regular-expression matching operator to each input string, whose matching results are reflected in the automatic $Matches variable, which is a hash table whose 0 entry is the overall match, with entry 1 containing the 1st capture group's value, 2 the 2nd's, ...; named capture groups (e.g., (?<root>...)), if present, have entries by their name (e.g., root).
An alternative is to use the switch statement with the -Regex option:
switch -Regex (
git diff --name-only origin/master feature/foo | Resolve-Path -Relative
) {
'^\.\\([^\\]+)' { $Matches[1] }
}

Trying to extract only the id3 size using exiftool.

Does anyone know of a way to only extract the id3 size information using exiftool. There is a robust documentation but I'm not seeing how to extract only the id3 info.
======== Test_file.mp3
ExifToolVersion : 10.78
FileName : First-time Bosses.mp3
Directory : .
FileSize : 33 MB
FileModifyDate : 2018:02:08 16:35:02-06:00
FileAccessDate : 2018:02:09 13:40:59-06:00
FileInodeChangeDate : 2018:02:09 13:36:59-06:00
FilePermissions : rw-rw-r--
FileType : MP3
FileTypeExtension : mp3
MIMEType : audio/mpeg
MPEGAudioVersion : 1
AudioLayer : 3
AudioBitrate : 128 kbps
SampleRate : 44100
ChannelMode : Joint Stereo
MSStereo : Off
IntensityStereo : Off
CopyrightFlag : False
OriginalMedia : False
Emphasis : None
ID3Size : 222797
EncodedBy : iTunes 9.1
Title : Test_file
Artist : Test_file
Album : Test_file
Genre : Test_file
PictureFormat : PNG
PictureType : Other
PictureDescription :
Picture : (Binary data 212414 bytes, use -b option
to extract)
Duration : 0:36:14 (approx)
I would use grep to find the line, and then awk to keep only the last word:
grep ID3Size Test_file.mp3 | awk 'NF>1{print $NF}'
If that was generated by some function foo, you wouldn't need to mention a source file:
foo | grep ID3Size | awk 'NF>1{print $NF}'

Powershell Regex Query

I'm having some issues pulling the desired values from my source string.
I have 2 possible string formats (that I'm differentiating based on a -match operation and if check):
{u'specialGroup': u'projectWriters', u'role': u'WRITER'
or
, {u'role': u'WRITER', u'userByEmail': u'john#domain.com'
What I desire to return from the regex:
[0]projectWriters
[1]WRITER
and
[0]WRITER
[1]john#domain.com
Basically I need to return all values between start string : u' and end string ' as array values [0] and [1] but cannot figure out the regex pattern.
Trying:
[regex]::match($stuff[1], ": u'([^']+)'").groups
Groups : {: u'WRITER', WRITER}
Success : True
Captures : {: u'WRITER'}
Index : 10
Length : 11
Value : : u'WRITER'
Success : True
Captures : {WRITER}
Index : 14
Length : 6
Value : WRITER
But no sign of john#domain.com value.
A pragmatic approach, assuming that all strings have the same field structure:
$strings = "{u'specialGroup': u'projectWriters', u'role': u'WRITER'}",
", {u'role': u'WRITER', u'userByEmail': u'john#domain.com'"
$strings | ForEach-Object { ($_ -split 'u''|''' -notmatch '[{}:,]')[1,3] }
yields:
projectWriters
WRITER
WRITER
john#domain.com
As for what you tried:
[regex]::match() only ever returns one match, so you need to base your solution on [regex]::matches() - plural! - which returns all matches, and then extract the capture-group values of interest.
$strings | ForEach-Object { [regex]::matches($_, ": u'([^']+)'").Groups[1,3].Value }

MongoDB case insensitive query on text with parenthesis

I have a very annoying problem with a case insensitive query on mongodb.
I'm using MongoTemplate in a web application and I need to execute case insensitive queries on a collection.
with this code
Query q = new Query();
q.addCriteria(Criteria.where("myField")
.regex(Pattern.compile(fieldValue, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
return mongoTemplate.findOne(q,MyClass.class);
I create the following query
{ "myField" : { "$regex" : "field value" , "$options" : "iu"}}
that works perfectly when I have simple text, for example:
caPITella CapitatA
but...but...when there are parenthesis () the query doesn't work.
It doesn't work at all, even the query text is wrote as is wrote in the document...Example:
query 1:
{"myField" : "Ceratonereis (Composetia) costae" } -> 1 result (ok)
query 2:
{ "myField" : {
"$regex" : "Ceratonereis (Composetia) costae" ,
"$options" : "iu"
}} -> no results (not ok)
query 3:
{ "scientificName" : {
"$regex" : "ceratonereis (composetia) costae" ,
"$options" : "iu"
}} -> no results (....)
So...I'm doing something wrong? I forgot some Pattern.SOME to include in the Pattern.compile()? Any solution?
Thanks
------ UPDATE ------
The answer of user3561036 helped me to figure how the query must be built.
So, I have resolved by modifying the query building in
q.addCriteria(Criteria.where("myField")
.regex(Pattern.compile(Pattern.quote(myFieldValue), Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)));
The output query
{ "myField" : { "$regex" : "\\Qhaliclona (rhizoniera) sarai\\E" , "$options" : "iu"}}
works.
If using the $regex operator with a "string" as input then you must quote literals for reserved characters such as ().
Normally that's a single \, but since it's in a string already you do it twice \\:
{ "myField" : {
"$regex" : "Ceratonereis \\(Composetia\\) costae" ,
"$options" : "iu"
}}
It's an old question, but you can use query.replace(/[-[\]{}()*+?.,\\/^$|#\s]/g, "\\$&");
This is working with aggregate and matches :
const order = user_input.replace(/[-[\]{}()*+?.,\\/^$|#\s]/g, "\\$&");
const regex = new RegExp(order, 'i');
const query = await this.databaseModel.aggregate([
{
$match: {
name : regex
}
// ....
Use $strcasecmp.
The aggregation framework was introduced in MongoDB 2.2. You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings.
It's more recommended and easier than using regex.