How to match all characters between [ ] and except ", " - regex

Hello I try to extract each group of data ( each data is separated by , from a string like that
MyString=[XXXXXX:XX XX XX XX, XXXXX:332.83, XXXXX:XXX-XX-XX XX:XX:XX, XXXX:0.0, XXXX:2, XXXX:0, XXXX:-256, counter_tipeee:5, XXXX:136935, XXXX:0, XXXX:XX XXX XXX, XXXX:0.5, XXXXX:true, XXXX:0.509375, XXX:0.0, XXXX:[2022-06-14 06:45:00], 2022-09-17 XXXXX:1]
With this regex, I can match all characters except ,
([^,]*)
https://regex101.com/r/lCN2YK/1
But I search to mismatch ,
The problem is if I remove space with \s, it removes space from certain data of my string. I search to extract all data that is not precisely coma+space ,
Another problem with my regex, it does not exclude the first [ and the last ] from my string. I can't exclude all [ ] because certain data have [ ]
I found this regex to exclude the first and last character ^.(.*).$ but don't know how to combine my two regex
https://regex101.com/r/CAsKHE/1
The output that I expect is
List<String> My_goal= [
XXXXXX:XX XX XX XX
XXXXX:332.83
XXXXX:XXX-XX-XX XX:XX:XX
XXXX:0.0, XXXX:2
....
2022-09-17,XXXXX:1
]

Try this:
(?<=(?<!: *)\[).*?(?=,)|(?<=, *(?=[^ \r\n]))(?:.*?(?=,)|[^,\r\n\[\]]+?(?=\])|[^,\r\n]+\](?= *\]))
See regex demo.

Related

How to exclude a regex match when I have comma without space after

I have a regex working with 99% of my situations. But one is not working
My input is like that
MyString=[XXXXXX:XX XX XX XX, XXXXX:332.83, XXXXX:XXX-XX-XX XX:XX:XX, XXXX:0.0, XXXX:2, XXXX:0, XXXX:-256, XXXXX:5, XXXX:136935, XXXX:0, XXXX:XX XXX XXX, XXXX:0.5, XXXXX:true, XXXX:0.509375, XXX:0.0, [XXXX:2022-06-14 06:45:00], 2022-09-17,XXXXX:1]
This regex allows to match all key:value in between the first [ and last ]
(?<=(?<!: *)\[).*?(?=,)|(?<=, *(?=[^ \r\n]))(?:.*?(?=,)|[^,\r\n\[\]]+?(?=\])|[^,\r\n]+\](?= *\]))
It split all key:value when a comma is detected, but I have specific data where I have a comma without space after. For example, the last date of my example is split because of , I search to exclude this split match to match all 2022-09-17,XXXXX:1
I search for a regex that match only data that a separate by , and not ,
Here is the example with the split of the last data I search to prevent
https://regex101.com/r/8fd7Xv/1
You can add a space, or 1 or more spaces at the positions that you assert a comma.
(?<=(?<!: *)\[).*?(?=, )|(?<=, +(?=[^ \r\n]))(?:.*?(?=, )|[^,\r\n\[\]]+?(?=\])|[^,\r\n]+\](?= *\]))
See the updated pattern https://regex101.com/r/Dlx8Xi/1

Change type of enclosure brackets with special conditions

What I'm trying to achieve is changing square brackets [] to curly/brace brackets {}.
There are two conditions, some start with [", the others end with "]
There will not be any occurrences where both exist in same string. Haven't run across any yet.
BEFORE:
[Strained breathing]
["Wanna Give My Love"
by The Sons of Rainier]
[Mavrick blows a fart]
["Hallelujah"
by The Sons of Rainer]
[Victor over the phone]
[The Korgi's "Everybody's
Got To Learn Sometime"]
[Lola chuckles]
["It's Good"
by Jack Hammer]
[Uno Hype's "Leave"]
Here's what I would like as the end results
AFTER:
[Strained breathing]
{"Wanna Give My Love"
by The Sons of Rainier playing}
[Mavrick blows a fart]
{"Hallelujah"
by The Sons of Rainer}
[Victor over the phone]
{The Korgi's "Everybody's
Got To Learn Sometime"}
[Lola chuckles]
{"It's Good"
by Jack Hammer}
{Uno Hype's "Leave"}
Here are my attempts:
Find: (?=\[")([\S\s]+?)\]
Replace: \{$1\}
Find: (?=\[[A-Z])([\S\s]+?)\"]
Replace: \{$1\}
Find: \["([A-Z][\S\s]+?)\]
Replace: \{$1\}
So frustrated that my light blub is still so dim in regards to regex.
Thanks in Advance
You could use this regex:
\[("[^]]+|[^]]+")\]
which matches a [ followed by either
a " and some number of non-] characters; or
some number of non-] characters followed by a "
and then followed by a ], and replace it with {\1}.
Regex demo on regex101
You can use
\[([^]["]*"[^][]*)]
Explanation
\[ Match [
( Capture group 1
[^]["]* Optionally match any char except ] [ "
" Then match a single "
[^][]* Optionally match any char except ] [
) Close group 1
] Match ]
Regex demo
In the replacement use {\1}

Using Regex to delete contents between repeating brackets

I'm trying to remove unneeded words between brackets that contains certain modifier ('DeleteMe') and don't delete contents between brackets that contains other words ('DontDeleteMe').
I though it was simple but proved difficult due to repeating brackets see below.
[
aljdsfjfldsa DeleteMe aldsjflajdf
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
[
aljdsfjfldsa DeleteMe aldsjflajdf
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
Desired output
[
aldskjfal DontDeleteMe asdlkjflasdj
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
I tried the following but the problem is the second line will be deleted with the third line.
(?s)\[.*?'DeleteMe'.*?\]
You can use a word boundary in combination with a negated character class [^
\[[^][]*\bDontDeleteMe\b[^][]*\]
Regex demo
If the word is DeleteMe you can match it using word boundaries and repace with an empty string.
\[[^][]*\bDeleteMe\b[^][]*\]
Regex demo

Matching key/value pairs with comments

For a JavaScript application, I'm trying to come up with a regex that will match key/value pairs in a string. It's working pretty well, but there is one last thing that I need to implement and I'm not sure how.
The syntax is very similar to what you'll find in a .env file. So key/value pairs look like KEY=value.
A few rules that I have already implemented:
The key
alphanumeric string.
can't be empty and can't be a number.
may contain an underscore
The value
can be string
may be surrounded by single or double quotes, or none at all.
Now I'm trying to add comments with # in there. It works, except when # is between the quotes. Any idea how to fix that? Thanks!
Here is my code sample:
// This is my regex
const regex = /^\s*(?![0-9_]*\s*=\s*([\W\w\s.]*)\s*$)[A-Z0-9_]+\s*=\s*(.*)?\s*(?<!#.*)/gi;
// Outputs [ "KEY=value " ] --> OK
const str = `KEY=value # Comment`;
console.log(str.match(regex));
// Outputs [ "KEY2=val" ] --> OK
const str2 = `KEY2=val#ue # Comment`;
console.log(str2.match(regex));
// Outputs [ "key3='value3' " ] --> OK
const str3 = `key3='value3' # Comment`;
console.log(str3.match(regex));
// Outputs [ "key_4='val" ] --> NOT OK
// Expecting [ "key_4='val#ue4' " ]
const str4 = `key_4='val#ue4' # Comment`;
console.log(str4.match(regex));
EDIT:
Here is another sample for testing:
# The following are matching
ONE = This is ONE
TWO=This is TWO
THREE="This is 'THREE'"
FOUR = "This is \"FOUR\""
fi_ve = 'This is \'FIVE\''
six='This is "SIX"'
NUMBER7="This is SEVEN" # Comment for SEVEN
number8="This is EIGHT"#Comment for EIGHT
NINE="This is #9"
TEN=This is #10
ELEVEN=
TWELVE=10
THIRTEEN=TRUE
FOURTEEN="true"
FIFTEEN=false
SIXTEEN='FALSE'
# The following are not matching(incl. empty line)
17="Is not valid because the key is a number"
="Is also not valid because the key is missing"
You may use
([A-Za-z_]\w*)[ \t]*=[ \t]*('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^\r\n#]*)
See the regex demo
([A-Za-z_]\w*) - Group 1:
[ \t]*=[ \t]* - a = enclosed with 0 or more spaces or tabs
('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*"|[^\r\n#]*) - Group 2:
'[^'\\]*(?:\\.[^'\\]*)*'| - a '...' like substring that may contain any string escape sequence, or
"[^"\\]*(?:\\.[^"\\]*)*"| - a "..." like substring that may contain any string escape sequence, or
[^\r\n#]* - 0 or more chars other than #, CR and LF

grok parsing issue

I have an input line that looks like this:
localhost_9999.kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec.OneMinuteRate
and I can use this pattern to parse it:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{JAVACLASS:kafka_metric_name}
which gives me this:
{
"kafka_node": [
[
"localhost_9999.kafka.server"
]
],
"kafka_metric_type": [
[
"SessionExpireListener"
]
],
"kafka_metric_name": [
[
"ZooKeeperSyncConnectsPerSec.OneMinuteRate"
]
]
}
I want to split the OneMinuteRate into a seperate field but can't seem to get it to work. I've tried this:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{WORD:kafka_metric_name}.%{WORD:attr_type}"
but get nothing back then.
I'm also using https://grokdebug.herokuapp.com/ to test these out...
You can either use your last regex with an escaped . (note that a . matches any char but newline and a \. will match a literal dot char), or use DATA type for the last but one field and a GREEDYDATA for the last field:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=% {DATA:kafka_metric_name}\.%{GREEDYDATA:attr_type}
Since %{DATA:name} translates to (?<name>.*?) and %{GREEDYDATA:name} translates to (?<name>.*), the name part will match any chars, 0 or more occurrences, as few as possible, up to the first ., and attr_type .* pattern will greedily "eat up" the rest of the line up to its end.