Reg Exp not matching - regex

I have a regular expression that is not finding a match with text that is in my file
Reg Ex:
^[ \t]*#[ \t]+vtk[ \t]+DataFile[ \t]+Version[ \t]+([^\s]+)[ \t]*\n(.*)\n[ \t]*(ASCII|BINARY)[ \t]*\n[ \t]*DATASET[ \t]+([^ ]+)[ \t]*\n
File text:
# vtk DataFile Version 4.2
ASCII
DATASET
When I cut off the expression to the following it works:
^[ \t]*#[ \t]+vtk[ \t]+DataFile[ \t]+Version[ \t]+([^\s]+)[ \t]*\n(.*)\n[ \t]*
Why is the text not being matched?

I think you are matching (.*)\n too many and after DATASET there is no more data to match but in your pattern there is still [ \t]+([^ ]+)[ \t]*\n which are not optional.
Try it like this:
^[ \t]*#[ \t]+vtk[ \t]+DataFile[ \t]+Version[ \t]+([^\s]+)[ \t]*\n[ \t]*(ASCII|BINARY)[ \t]*\n[ \t]*DATASET
In parts, your pattern would look like:
^
[ \t]*#
[ \t]+vtk
[ \t]+DataFile
[ \t]+Version
[ \t]+([^\s]+) This group will match the 4.2
[ \t]*\n
[ \t]*(ASCII|BINARY)
[ \t]*\n
[ \t]*DATASET
Regex demo

Related

How to match all characters between [ ] and except ", "

Hello I try to extract each group of data ( each data is separated by , from a string like that
MyString=[XXXXXX:XX XX XX XX, XXXXX:332.83, XXXXX:XXX-XX-XX XX:XX:XX, XXXX:0.0, XXXX:2, XXXX:0, XXXX:-256, counter_tipeee:5, XXXX:136935, XXXX:0, XXXX:XX XXX XXX, XXXX:0.5, XXXXX:true, XXXX:0.509375, XXX:0.0, XXXX:[2022-06-14 06:45:00], 2022-09-17 XXXXX:1]
With this regex, I can match all characters except ,
([^,]*)
https://regex101.com/r/lCN2YK/1
But I search to mismatch ,
The problem is if I remove space with \s, it removes space from certain data of my string. I search to extract all data that is not precisely coma+space ,
Another problem with my regex, it does not exclude the first [ and the last ] from my string. I can't exclude all [ ] because certain data have [ ]
I found this regex to exclude the first and last character ^.(.*).$ but don't know how to combine my two regex
https://regex101.com/r/CAsKHE/1
The output that I expect is
List<String> My_goal= [
XXXXXX:XX XX XX XX
XXXXX:332.83
XXXXX:XXX-XX-XX XX:XX:XX
XXXX:0.0, XXXX:2
....
2022-09-17,XXXXX:1
]
Try this:
(?<=(?<!: *)\[).*?(?=,)|(?<=, *(?=[^ \r\n]))(?:.*?(?=,)|[^,\r\n\[\]]+?(?=\])|[^,\r\n]+\](?= *\]))
See regex demo.

Using Regex to delete contents between repeating brackets

I'm trying to remove unneeded words between brackets that contains certain modifier ('DeleteMe') and don't delete contents between brackets that contains other words ('DontDeleteMe').
I though it was simple but proved difficult due to repeating brackets see below.
[
aljdsfjfldsa DeleteMe aldsjflajdf
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
[
aljdsfjfldsa DeleteMe aldsjflajdf
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
Desired output
[
aldskjfal DontDeleteMe asdlkjflasdj
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
I tried the following but the problem is the second line will be deleted with the third line.
(?s)\[.*?'DeleteMe'.*?\]
You can use a word boundary in combination with a negated character class [^
\[[^][]*\bDontDeleteMe\b[^][]*\]
Regex demo
If the word is DeleteMe you can match it using word boundaries and repace with an empty string.
\[[^][]*\bDeleteMe\b[^][]*\]
Regex demo

grok parsing issue

I have an input line that looks like this:
localhost_9999.kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec.OneMinuteRate
and I can use this pattern to parse it:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{JAVACLASS:kafka_metric_name}
which gives me this:
{
"kafka_node": [
[
"localhost_9999.kafka.server"
]
],
"kafka_metric_type": [
[
"SessionExpireListener"
]
],
"kafka_metric_name": [
[
"ZooKeeperSyncConnectsPerSec.OneMinuteRate"
]
]
}
I want to split the OneMinuteRate into a seperate field but can't seem to get it to work. I've tried this:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{WORD:kafka_metric_name}.%{WORD:attr_type}"
but get nothing back then.
I'm also using https://grokdebug.herokuapp.com/ to test these out...
You can either use your last regex with an escaped . (note that a . matches any char but newline and a \. will match a literal dot char), or use DATA type for the last but one field and a GREEDYDATA for the last field:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=% {DATA:kafka_metric_name}\.%{GREEDYDATA:attr_type}
Since %{DATA:name} translates to (?<name>.*?) and %{GREEDYDATA:name} translates to (?<name>.*), the name part will match any chars, 0 or more occurrences, as few as possible, up to the first ., and attr_type .* pattern will greedily "eat up" the rest of the line up to its end.

Issue on parsing logs using regex

I have tried separating the wowza logs using regex for data analysis, but I couldn't separate the section below.
I need a SINGLE regex pattern that would satisfy below both log formats.
Format 1:
live wowz://test1.example.com:443/live/_definst_/demo01|wowz://test2.example.com:443/live/_definst_/demo01 test
Format 2:
live demo01 test
I am trying to split the line on the 3 parameters and capturing them in the groups app, streamname and id, but streamname should only capture the text after the last /.
This is what I've tried:
(?<stream_name>[^/]+)$ --> Using this pattern I could only separate the format 1 "wowz" section. Not entire Format 1 example mentioned above.
Expected Output
{
"app": [
[
"live"
]
],
"streamname": [
[
"demo1"
]
],
"id": [
[
"test"
]
]
}
You can achieve what you specified using the following regex:
^(?<app>\S+) (?:\S*/)?(?<streamname>\S+) (?<id>\S+)$
regex101 demo
\S+ matches any number of characters except whitespace.
(?:\S*/)? to optionally consume the characters in the second parameter up to the last /. This is not included in the group, so it won't be captured.

Floor all numbers in an array using regular expression

I have a 53MB file containing a lot of arrays in the form of:
[ 730762.36433458142, 7043260.1900061285 ]
There are always two numbers in the array and I want to use regular expression replace in Notepad++, so the array become of the form:
[ 730762, 7043260 ]
So strip the digits after the comma from the number.
The problem is that there are also comma numbers outside these arrays, which should stay intact.
Does anyone know which regex expression I can use?
EDIT:
The solution that #npinti provides makes my editor crash. I always used an expression with lookbehind and lookahead that only finds the dot and digits after the comma, and replace that with an empty string, but I can't find it any more.
May be the provided solutions cost to much memory? I don't know.
You could use something like so: \[\s+(\d+)\.\d+,\s+(\d+)\.\d+\s+] and replace it with [ \1, \2 ].
Given this:
[ 730762.36433458142, 7043260.1900061285 ]
[ 123.36433458142, 456.1900061285 ]
[ 456.36433458142, 789.1900061285 ]
123.123,123.456
456.789,456.1010
[ 789.36433458142, 987.1900061285 ]
[ 987.36433458142, 654.1900061285 ]
Yields
[ 730762, 7043260 ]
[ 123, 456 ]
[ 456, 789 ]
123.123,123.456
456.789,456.1010
[ 789, 987 ]
[ 987, 654 ]
You can use the following to match
\[( \d+)\.\d+,( \d+)\.\d+ \]
And replace with [\1,\2 ]