Extracting only the first occurance between square brackets in a line - regex

I want extract the only the value between square brackets in a given line.
From the text
TID: [-1] [] [2019-07-29 10:18:41,876] INFO
I want to extract the first occurrence between square brackets which is -1.
I tried using
(?<Ten ID>((^(?!(TID: )))*((?<=\[).*?(?=\]))))
but it gives
-1, ,2019-07-29 10:18:41,876
as resultant matches.
How to capture only the first occurrence?
You can access the regex editor here.

Regarding
Is there a solution without group capturing?
You may use
/\bTID:\s*\[\K[^\]]+(?=\])/
See the Rubular demo
Details
\bTID: - whole word TID followed with a colon
\s* - 0+ whitespace chars
\[ - a [ char
\K - match reset operator that discards the text matched so far
[^\]]+ - one or more chars other than ]
(?=\]) - a positive lookahead that makes sure there is a ] char immediately to the right of the current location.

You might capture the first occurrence in the named capturing group using a negated character class:
\ATID: \[(?<Ten ID>[^\[\]]+)\]
\A Start of string
TID: Match literally
\[ Match [
(?<Ten ID> Named capturing group Ten ID
[^\[\]]+ Match not [ or ] using a negated character class
) Close group
\] Match ]
See https://rubular.com/r/4Hc80yrDxGVgvi

str = “TID:] [-1] [] [2019-07-29 10:18:41,876] INFO”
i1 = str.index(‘[‘)
#=> 6
i2 = str.index(‘]’, i1+1)
#=> 9
i1.nil? || i2.nil? ? nil : str[i1+1..i2-1]
#=> “-1”

Related

why adding group to my regex changes what it catches

I have the line:
[asos-qa:2021:5]#0 Row[info=[ts=-9223372036854775808] ]: 6, 23 |
I want to get the first word: asos-qa, so I tried this regex: ^\[\S*?(:|]) and it gets me: [asos-qa:.
So in order to get only the word without the other characters I tried to add a group (python syntax): ^\[(?P<app_id>\S*)?(:|]) but for some reason it returns [asos-qa:2021:5].
What am I doing wrong?
Your ^\[(?P<app_id>\S*)?(:|]) regex returns [asos-qa:2021:5] because \S* matches any zero or more non-whitespace chars greedily up to the last available :or ] in the current chunk of non-whitespace chars, ? you used is applied to the whole (?P<app_id>\S*) group pattern and is also greedy, i.e. the regex engine tries at least once to match the group pattern.
You need
^\[(?P<app_id>[^]\s:]+)
See the regex demo. Details:
^ - start of string
\[ - a [ char
(?P<app_id>[^]\s:]+) - Group "app_id": any one or more chars other than ], whitespace and :. NOTE: ] does not need to be escaped when it is the first char in the character class.
See the Python demo:
import re
pattern = r"^\[(?P<app_id>[^]\s:]+)"
text = "[asos-qa:2021:5]#0 Row[info=[ts=-9223372036854775808] ]: 6, 23 |"
m = re.search(pattern, text)
if m:
print( m.group(1) )
# => asos-qa
Your pattern uses a greedy \S which matches any non whitespace character.
You can make it non greedy using \S*? like ^\[(?P<app_id>\S*?)(:|]) which will have the value in capture group 1.
Or you can use a negated character class not matching : assuming the closing ] will be there.
^\[(?P<app_id>[^:]+)
Regex demo | Python demo
Example code
import re
pattern = r"\[(?P<app_id>[^:]+)"
s = "[asos-qa:2021:5]#0 Row[info=[ts=-9223372036854775808] ]: 6, 23 |"
match = re.match(pattern, s)
if match:
print(match.group("app_id"))
Output
asos-qa
Or matching only words characters with an optional hyphen in between:
^\[(?P<app_id>\w+(?:-\w+)*)[^]\[]*]
Regex demo

Get the second match by regex

I want to get the second occurrence of the matching pattern (inside the brackets) by using a regex.
Here is the text
[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN
I want to extract 2 from this text.I tried using
(?<Ten ID>((^)*((?<=\[).+?(?=\]))))
But it matches 2019-07-29 09:48:11,928 , 2 , AM.
How to get only 2 ?
To get a substring between [ and ] (square brackets) excluding the brackets you may use /\[([^\]\[]*)\]/ regex:
\[ - a [ char
([^\]\[]*) - Capturing group 1: any 0+ chars other than [ and ]
\] - a ] char.
To get the second match, you may use
str = '[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN'
p str[/\[[^\]\[]*\].*?\[([^\]\[]*)\]/m, 1]
See this Ruby demo. Here,
\[[^\]\[]*\] - finds the first [...] substring
.*? - matches any 0+ chars as few as possible
\[([^\]\[]*)\] - finds the second [...] substring and captures the inner contents, returned with the help of the second argument, 1.
To get Nth match, you may also consider using
str = '[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN'
result = ''
cnt = 0
str.scan(/\[([^\]\[]*)\]/) { |match| result = match[0]; cnt +=1; break if cnt >= 2}
puts result #=> 2
See the Ruby demo
Note that if there are fewer matches than you expect, this solution will return the last matched substring.
Another solution that is not generic and only suits this concrete case: extract the first occurrence of an int number inside square brackets:
s = "[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN"
puts s[/\[(\d+)\]/, 1] # => 2
See the Ruby demo.
To use the regex in Fluentd, use
\[(?<val>\d+)\]
and the value you need is in the val named group. \[ matches [, (?<val>\d+) is a named capturing group matching 1+ digits and ] matches a ].
Fluentular shows:
Copy and paste to fluent.conf or td-agent.conf
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /\[(?\d+)\]/
Records
Key Value
val 2
From extract string between square brackets at second occurrence
/\[[^\]]*\][^[]*\[([^\]]*)\]/
You can use this, and need the second capture group.
If you know that it's always the second match, you can use scan and take the second result:
"[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN".scan(/\[([^\]]*)\]/)[1].first
# => "2"
def nth_match(str, n)
str[/(?:[^\[]*\[){#{n}}([^\]]*)\]/, 1]
end
str = "Little [Miss] Muffet [sat] on a [tuffet] eating [pie]."
nth_match(str, 1) #=> "Miss"
nth_match(str, 2) #=> "sat"
nth_match(str, 3) #=> "tuffet"
nth_match(str, 4) #=> "pie"
nth_match(str, 5) #=> nil
We could write the regular expression in free-spacing mode to document it.
/
(?: # begin a non-capture group
[^\[]* # match zero or more characters other than '['
\[ # match '['
){#{n}} # end non-capture group and execute it n times
( # start capture group 1,
[^\]]* # match zero or more characters other than ']'
) # end capture group 1
\] # match ']'
/x # free-spacing regex definition mode
/(?:[^\[]*\[){#{n}}([^\]]*)\]/

Options matching in a command

I'm actually creating a discord bot and I'm trying to match some command options and I have a problem getting the value between the square brackets. (if there is)
I've already tried to add a ? to match one or more of these but it's not working, searching about how I could match between two characters but found nothing that helped me.
Here is the pattern I've got so far : https://regexr.com/4icgi
and here it is in text : /[+|-](.+)(\[(.+)\])?/g
What I expect it to do is from an option like that : +user[someRandomPeople]
to extract the parameter user and the value someRandomPeople and if there is no square brackets, it will only extract the parameter.
You may use
^[+-](.*?)(?:\[(.*?)\])?$
Or, if there should be no square brackets inside the optional [...] substring at the end:
^[+-](.*?)(?:\[([^\][]*)\])?$
Or, if the matches are searched for on different lines:
^[+-](.*?)(?:\[([^\][\r\n]*)\])?$
See the regex demo and the regex graph:
Details
^ - start of string
[+-] - + or - (note that | inside square brackets matches a literal | char)
(.*?) - Group 1: any 0 or more chars other than line break chars as few as possible
(?:\[(.*?)\])? - an optional sequence of
\[ - a [ char
(.*?) - Group 2: any 0 or more chars other than line break chars as few as possible ([^\][]* matches 0 or more chars other than [ and ])
\] - a ] char
$ - end of string.

how to capture from group from end line in js regex?

I'm trying to capture a text into 3 groups I have managed to capture 2 groups but having an issue with the 3rd group.
This is the text :
<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3]
Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending
itemsCount=1
I'm using the following regex:
(?=- )(.*?)(?= - )|(?=])(.*?)(?= -)
My 3rd group should be : "After sending itemsCount=1"
any suggestions?
Your original expression is fine, just missing a $:
(?=- )(.*?)(?= - |$)|(?=])(.*?)(?= -)
Demo
and maybe we would slightly modify that to an expression similar to:
(?=-\s+).*?([A-Z].*?)(?=\s+-\s+|$)|(?=]\s+).*?([A-Z].*?)(?=\s+-)
Demo
You have 2 capturing groups. You don't get the match for the third part because the postitive lookahead in the first alternation is not considering the end of the string. You might solve that by using an alternation to look at either a space or assert the end of the string
(?=[-\]] )(.*?)(?= - |$)
^^
If those matches are ok, you could simplify that pattern by making use of a character class to match either - or ] like [-\]] and omit the alternation and the group as you now have only the matches.
Your pattern then might look like (also capturing the leading hyphen like the first 2 matches)
(?=[-\]] ).*?(?= - |$)
Regex demo
If this is your string and you want to have 3 capturing groups, you might use:
^.*?\[\d+\]([^-]+)-([^-]+)-\s*([^-]+)$
^ Start of string
.*? Match any char except a newline non greedy
\[\d+\] match [ 1+ digits ]
([^-]+)- Capture group 1, match 1+ times not -, then match -
([^-]+)- Capture group 2, match 1+ times not -, then match -
\s* Match 0+ whitespace chars
([^-]+) Capture group 2, match 1+ times not -
$ End of string
Regex demo
For example creating the desired object from the comments, you could first get all the matches from match[0] and store those in an array.
After you have have all the values, assemble the object using the keys and the values.
var output = {};
var regex = new RegExp(/(?=[-\]] ).*?(?= - |$)/g);
var str = `<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3] Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending itemsCount=1`;
var match;
var values = [];
var keys = ['Thread', 'Class', 'Message'];
while ((match = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (match.index === regex.lastIndex) {
regex.lastIndex++;
}
values.push(match[0]);
}
keys.forEach((key, index) => output[key] = values[index]);
console.log(output);

Trying to work out why this regex is not working? Regex should be less restrictive

The Text :
[prc:tl:plfl]
is matched by:
\[prc:tl:[^]]*plfl\]
However I need to also match:
[prc:tl:plfl,tr]
Basically "plfl" can appear anywhere in the string after "tl:" and before next "]"
So all of the following should match
[prc:tl:plfl,tr]
[prc:tl:tr, plfl]
[prc:tl:tr, plfl,sr]
[prc:tl:plfl,tr, sr, mr]
What is missing from my regex?
MAny thanks in advance.
You may match any text other than ] after plfl with a negated character class [^\]] (you are actually already using it in the regex):
\[prc:tl:[^\]]*?plfl[^\]]*\]
See the regex demo
Details
\[prc:tl: - a [prc:tl: substring
[^\]]*? - a negated character class matching any 0+ chars other than ] as few as possible
plfl - a plfl substring
[^\]]* - any 0+ chars other than ] as few as possible
\] - a ] char.
See the Regulex graph: