I want to get the second occurrence of the matching pattern (inside the brackets) by using a regex.
Here is the text
[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN
I want to extract 2 from this text.I tried using
(?<Ten ID>((^)*((?<=\[).+?(?=\]))))
But it matches 2019-07-29 09:48:11,928 , 2 , AM.
How to get only 2 ?
To get a substring between [ and ] (square brackets) excluding the brackets you may use /\[([^\]\[]*)\]/ regex:
\[ - a [ char
([^\]\[]*) - Capturing group 1: any 0+ chars other than [ and ]
\] - a ] char.
To get the second match, you may use
str = '[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN'
p str[/\[[^\]\[]*\].*?\[([^\]\[]*)\]/m, 1]
See this Ruby demo. Here,
\[[^\]\[]*\] - finds the first [...] substring
.*? - matches any 0+ chars as few as possible
\[([^\]\[]*)\] - finds the second [...] substring and captures the inner contents, returned with the help of the second argument, 1.
To get Nth match, you may also consider using
str = '[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN'
result = ''
cnt = 0
str.scan(/\[([^\]\[]*)\]/) { |match| result = match[0]; cnt +=1; break if cnt >= 2}
puts result #=> 2
See the Ruby demo
Note that if there are fewer matches than you expect, this solution will return the last matched substring.
Another solution that is not generic and only suits this concrete case: extract the first occurrence of an int number inside square brackets:
s = "[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN"
puts s[/\[(\d+)\]/, 1] # => 2
See the Ruby demo.
To use the regex in Fluentd, use
\[(?<val>\d+)\]
and the value you need is in the val named group. \[ matches [, (?<val>\d+) is a named capturing group matching 1+ digits and ] matches a ].
Fluentular shows:
Copy and paste to fluent.conf or td-agent.conf
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /\[(?\d+)\]/
Records
Key Value
val 2
From extract string between square brackets at second occurrence
/\[[^\]]*\][^[]*\[([^\]]*)\]/
You can use this, and need the second capture group.
If you know that it's always the second match, you can use scan and take the second result:
"[2019-07-29 09:48:11,928] #hr.com [2] [AM] WARN".scan(/\[([^\]]*)\]/)[1].first
# => "2"
def nth_match(str, n)
str[/(?:[^\[]*\[){#{n}}([^\]]*)\]/, 1]
end
str = "Little [Miss] Muffet [sat] on a [tuffet] eating [pie]."
nth_match(str, 1) #=> "Miss"
nth_match(str, 2) #=> "sat"
nth_match(str, 3) #=> "tuffet"
nth_match(str, 4) #=> "pie"
nth_match(str, 5) #=> nil
We could write the regular expression in free-spacing mode to document it.
/
(?: # begin a non-capture group
[^\[]* # match zero or more characters other than '['
\[ # match '['
){#{n}} # end non-capture group and execute it n times
( # start capture group 1,
[^\]]* # match zero or more characters other than ']'
) # end capture group 1
\] # match ']'
/x # free-spacing regex definition mode
/(?:[^\[]*\[){#{n}}([^\]]*)\]/
Related
I am looking for a regex substitution to transform N white spaces at the beginning of a line to N . So this text:
list:
- first
should become:
list:
- first
I have tried:
str = "list:\n - first"
str.gsub(/(?<=^) */, " ")
which returns:
list:
- first
which is missing one . How to improve the substitution to get the desired output?
You could make use of the \G anchor and \K to reset the starting point of the reported match.
To match all leading single spaces:
(?:\R\K|\G)
(?: Non capture group
\R\K Match a newline and clear the match buffer
| Or
\G Assert the position at the end of the previous match
) Close non capture group and match a space
See a regex demo and a Ruby demo.
To match only the single leading spaces in the example string:
(?:^.*:\R|\G)\K
In parts, the pattern matches:
(?: Non capture group
^.*:\R Match a line that ends with : and match a newline
| Or
\G Assert the position at the end of the previous match, or at the start of the string
) Close non capture group
\K Forget what is matched so far and match a space
See a regex demo and a Ruby demo.
Example
re = /(?:^.*:\R|\G)\K /
str = 'list:
- first'
result = str.gsub(re, ' ')
puts result
Output
list:
- first
I would write
"list:\n - first".gsub(/^ +/) { |s| ' ' * s.size }
#=> "list:\n - first"
See String#*
Use gsub with a callback function:
str = "list:\n - first"
output = str.gsub(/(?<=^|\n)[ ]+/) {|m| m.gsub(" ", " ") }
This prints:
list:
- first
The pattern (?<=^|\n)[ ]+ captures one or more spaces at the start of a line. This match then gets passed to the callback, which replaces each space, one at a time, with .
You can use a short /(?:\G|^) / regex with a plain text replacement pattern:
result = text.gsub(/(?:\G|^) /, ' ')
See the regex demo. Details:
(?:\G|^) - start of a line or string or the end of the previous match
- a space.
See a Ruby demo:
str = "list:\n - first"
result = str.gsub(/(?:\G|^) /, ' ')
puts result
# =>
# list:
# - first
If you need to match any whitespace, replace with a \s pattern. Or use \h if you need to only match horizontal whitespace.
I have the line:
[asos-qa:2021:5]#0 Row[info=[ts=-9223372036854775808] ]: 6, 23 |
I want to get the first word: asos-qa, so I tried this regex: ^\[\S*?(:|]) and it gets me: [asos-qa:.
So in order to get only the word without the other characters I tried to add a group (python syntax): ^\[(?P<app_id>\S*)?(:|]) but for some reason it returns [asos-qa:2021:5].
What am I doing wrong?
Your ^\[(?P<app_id>\S*)?(:|]) regex returns [asos-qa:2021:5] because \S* matches any zero or more non-whitespace chars greedily up to the last available :or ] in the current chunk of non-whitespace chars, ? you used is applied to the whole (?P<app_id>\S*) group pattern and is also greedy, i.e. the regex engine tries at least once to match the group pattern.
You need
^\[(?P<app_id>[^]\s:]+)
See the regex demo. Details:
^ - start of string
\[ - a [ char
(?P<app_id>[^]\s:]+) - Group "app_id": any one or more chars other than ], whitespace and :. NOTE: ] does not need to be escaped when it is the first char in the character class.
See the Python demo:
import re
pattern = r"^\[(?P<app_id>[^]\s:]+)"
text = "[asos-qa:2021:5]#0 Row[info=[ts=-9223372036854775808] ]: 6, 23 |"
m = re.search(pattern, text)
if m:
print( m.group(1) )
# => asos-qa
Your pattern uses a greedy \S which matches any non whitespace character.
You can make it non greedy using \S*? like ^\[(?P<app_id>\S*?)(:|]) which will have the value in capture group 1.
Or you can use a negated character class not matching : assuming the closing ] will be there.
^\[(?P<app_id>[^:]+)
Regex demo | Python demo
Example code
import re
pattern = r"\[(?P<app_id>[^:]+)"
s = "[asos-qa:2021:5]#0 Row[info=[ts=-9223372036854775808] ]: 6, 23 |"
match = re.match(pattern, s)
if match:
print(match.group("app_id"))
Output
asos-qa
Or matching only words characters with an optional hyphen in between:
^\[(?P<app_id>\w+(?:-\w+)*)[^]\[]*]
Regex demo
I want extract the only the value between square brackets in a given line.
From the text
TID: [-1] [] [2019-07-29 10:18:41,876] INFO
I want to extract the first occurrence between square brackets which is -1.
I tried using
(?<Ten ID>((^(?!(TID: )))*((?<=\[).*?(?=\]))))
but it gives
-1, ,2019-07-29 10:18:41,876
as resultant matches.
How to capture only the first occurrence?
You can access the regex editor here.
Regarding
Is there a solution without group capturing?
You may use
/\bTID:\s*\[\K[^\]]+(?=\])/
See the Rubular demo
Details
\bTID: - whole word TID followed with a colon
\s* - 0+ whitespace chars
\[ - a [ char
\K - match reset operator that discards the text matched so far
[^\]]+ - one or more chars other than ]
(?=\]) - a positive lookahead that makes sure there is a ] char immediately to the right of the current location.
You might capture the first occurrence in the named capturing group using a negated character class:
\ATID: \[(?<Ten ID>[^\[\]]+)\]
\A Start of string
TID: Match literally
\[ Match [
(?<Ten ID> Named capturing group Ten ID
[^\[\]]+ Match not [ or ] using a negated character class
) Close group
\] Match ]
See https://rubular.com/r/4Hc80yrDxGVgvi
str = “TID:] [-1] [] [2019-07-29 10:18:41,876] INFO”
i1 = str.index(‘[‘)
#=> 6
i2 = str.index(‘]’, i1+1)
#=> 9
i1.nil? || i2.nil? ? nil : str[i1+1..i2-1]
#=> “-1”
I'm trying to capture a text into 3 groups I have managed to capture 2 groups but having an issue with the 3rd group.
This is the text :
<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3]
Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending
itemsCount=1
I'm using the following regex:
(?=- )(.*?)(?= - )|(?=])(.*?)(?= -)
My 3rd group should be : "After sending itemsCount=1"
any suggestions?
Your original expression is fine, just missing a $:
(?=- )(.*?)(?= - |$)|(?=])(.*?)(?= -)
Demo
and maybe we would slightly modify that to an expression similar to:
(?=-\s+).*?([A-Z].*?)(?=\s+-\s+|$)|(?=]\s+).*?([A-Z].*?)(?=\s+-)
Demo
You have 2 capturing groups. You don't get the match for the third part because the postitive lookahead in the first alternation is not considering the end of the string. You might solve that by using an alternation to look at either a space or assert the end of the string
(?=[-\]] )(.*?)(?= - |$)
^^
If those matches are ok, you could simplify that pattern by making use of a character class to match either - or ] like [-\]] and omit the alternation and the group as you now have only the matches.
Your pattern then might look like (also capturing the leading hyphen like the first 2 matches)
(?=[-\]] ).*?(?= - |$)
Regex demo
If this is your string and you want to have 3 capturing groups, you might use:
^.*?\[\d+\]([^-]+)-([^-]+)-\s*([^-]+)$
^ Start of string
.*? Match any char except a newline non greedy
\[\d+\] match [ 1+ digits ]
([^-]+)- Capture group 1, match 1+ times not -, then match -
([^-]+)- Capture group 2, match 1+ times not -, then match -
\s* Match 0+ whitespace chars
([^-]+) Capture group 2, match 1+ times not -
$ End of string
Regex demo
For example creating the desired object from the comments, you could first get all the matches from match[0] and store those in an array.
After you have have all the values, assemble the object using the keys and the values.
var output = {};
var regex = new RegExp(/(?=[-\]] ).*?(?= - |$)/g);
var str = `<13>Apr 5 16:09:47 node2 Services: 2016-04-05 16:09:46,914 INFO [3] Drivers.KafkaInvoker - KafkaInvoker.SendMessages - After sending itemsCount=1`;
var match;
var values = [];
var keys = ['Thread', 'Class', 'Message'];
while ((match = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (match.index === regex.lastIndex) {
regex.lastIndex++;
}
values.push(match[0]);
}
keys.forEach((key, index) => output[key] = values[index]);
console.log(output);
Here are the patterns:
Red,Green (and so on...)
Red (+5.00),Green (+6.00) (and so on...)
Red (+5.00,+10.00),Green (+6.00,+20.00) (and so on...)
Red (+5.00),Green (and so on...)
Each attribute ("Red,"Green") can have 0, 1, or 2 modifiers (shown as "+5.00,+10.00", etc.).
I need to capture each of the attributes and their modifiers as a single string (i.e. "Red (+5.00,+10.00)", "Green (+6.00,+20.00)".
Help?
Another example (PCRE):
((?:Red|Green)(?:\s\((?:\+\d+\.\d+,?)+\))?)
Explanation:
(...) // a capture group
(?:...) // a non-capturing group
Read|Green // matches Red or Green
(?:...)? // an optional non-capturing group
\s // matches any whitespace character
\( // matches a literal (
(?:...)+ // a non-capturing group that can occur one or more times
\+ // matches a literal +
\d+ // matches one or more digits
\. // matches a literal .
\d+ // matches one or more digits
,? // matches an optional comma
\) //matches a literal )
Update:
Or actually if you just want to extract the data, then
((?:Red|Green)(?:\s\([^)]+\))?)
would be sufficient.
Update 2: As pointed out in your comment, this would match anything in the first part but , and (:
([^,(]+(?:\s\([^)]+\))?)
(does not work, too permissive)
to be more restrictive (allowing only characters and numbers, you can just use \w:
(\w+(?:\s\([^)]+\))?)
Update 3:
I see, the first of my alternatives does not work correctly, but \w works:
$pattern = "#\w+(?:\s\([^)]+\))?#";
$str = "foo (+15.00,-10.00),bar (-10.00,+25),baz,bing,bam (150.00,-5000.00)";
$matches = array();
preg_match_all($pattern, $str, $matches);
print_r($matches);
prints
Array
(
[0] => Array
(
[0] => foo (+15.00,-10.00)
[1] => bar (-10.00,+25)
[2] => baz
[3] => bing
[4] => bam (150.00,-5000.00)
)
)
Update 4:
Ok, I got something working, please check whether it always works:
(?=[^-+,.]+)[^(),]+(?:\s?\((?:[-+\d.]+,?)+\))?
With:
$pattern = "#(?=[^-+,.]+)[^(),]+(?:\s?\((?:[-+\d.]+,?)+\))?#";
$str = "5 lb. (+15.00,-10.00),bar (-10.00,+25),baz,bing,bam (150.00,-5000.00)";
preg_match_all gives me
Array
(
[0] => Array
(
[0] => 5 lb. (+15.00,-10.00)
[1] => bar (-10.00,+25)
[2] => baz
[3] => bing
[4] => bam (150.00,-5000.00)
)
)
Maybe there is a simpler regex, I'm not an expert...
PCRE format:
(Red|Green)(\s\((?P<val1>.+?)(,){0,1}(?P<val2>.+?){0,1}\)){0,1}
Match from PHP:
preg_match_all("/(Red|Green)(\s\((?P<val1>.+?)(,){0,1}(?P<val2>.+?){0,1}\)){0,1}/ims", $text, $matches);
Here's my bid:
/
(?:^|,) # Match line beginning or a comma
(?: # parent wrapper to catch multiple "color (+#.##)" patterns
( # grouping pattern for picking off matches
(?:(?:Red|Green),?)+ # match the color prefix
\s\( # space then parenthesis
(?: # wrapper for repeated number groups
(?:\x2B\d+\.\d+) # pattern for the +#.##
,?)+ # end wrapper
\) # closing parenthesis
)+ # end matching pattern
)+ # end parent wrapper
/
Which translates to:
/(?:^|,)(?:((?:(?:Red|Green),?)+\s\((?:(?:\x2B\d+\.\d+),?)+\))+)+/
EDIT
Sorry, it was only catching the last pattern before. This will catch all matches (or should).