Regex to match data between brackets and length of the string - regex

I'm using the following string:
02:05:31,624 TRACE [com.test.enterprise.process.module.AZZADM13] (default task-6) [2019-06-10][02:05:31][5330985][TESTSRV ][AZZADM13 ][process - ENTER ]
using regular expression I would like to match the TESTSRV. To do so i need to match on a value that is in between 2 brackets, is a capital letter (A-Z) or a space and has a length of 10 (includes brackets) or length of 8 (does not include bracket).
Here is my starting expression:
\[([A-Z ]+)\]{10}
This matches the "in between brackets" but i can't seem to get the length to work. Any advice appreciated.
In this example, i would expect to match on TESTSRV.

I was not sure what our other inputs would look like, my guess is that we might have optional spaces, and we can add those anywhere in our expression that is required:
\[[0-9]+\](\s+)?\[(\s+)?(.+?)(\s+)?\](\s+)?\[[A-Z0-9]
Demo 1
If optional spaces, (\s+)?, are unnecessary, we can simply remove any of (\s+)?.
Here, we have a left boundary:
\[[0-9]+\]
and a right boundary:
\[[A-Z0-9]
and our desired output is in this capturing group:
(.+?)
along with some optional spaces groups:
(\s+)?
RegEx Circuit
jex.im visualizes regular expressions:

Related

Regular expression using non-greedy matching -- confusing result

I thought I understood how the non-greedy modifier works, but am confused by the following result:
Regular Expression: (,\S+?)_sys$
Test String: abc,def,ghi,jkl_sys
Desired result: ,jkl_sys <- last field including comma
Actual result: ,def,ghi,jkl_sys
Use case is that I have a comma separated string whose last field will end in "_sys" (e.g. ,sometext_sys). I want to match only the last field and only if it ends with _sys.
I am using the non-greedy (?) modifier to return the shortest possible match (only the last field including the comma), but it returns all but the first field (i.e. the longest match).
What am I missing?
I used https://regex101.com/ to test, in case you want to see a live example.
You can use
,[^,]+_sys$
The pattern matches:
, Match the last comma
[^,]+ Match 1 + occurrences of any char except ,
_sys Match literally
$ End of string
See a regex demo.
If you don't want to match newlines and whitespaces:
,[^\s,]+_sys$
It sounds like you're looking for the a string that ends with "_sys" and it has to be at the end of the source string, and it has to be preceded by a comma.
,\s*(\w+_sys)$
I added the \s* to allow for optional whitespace after the comma.
No non-greedy modifiers necessary.
The parens are around \w+_sys so you can capture just that string, without the comma and optional whitespace.

Regex for replacing letters next to numbers uppercase

I have this regex (\w+) replace with \u$0
This makes first letter caps for example: james1 to James1.
But I need a regex to make the first letter caps of each word when it starts with a number for example
12james
1azz4ds
1995brandon
666metal
to
12James
1Azz4ds
1995Brandon
666Metal
How do I solve this problem?
Here, we can also collect the digits, then letters maybe both upper or lowercase, and replace it:
[0-9]+([A-Za-z])
We will be adding a start char to capture only those letters that we wish to replace:
^[0-9]+([A-Za-z])
or:
^([0-9]+)([A-Za-z])
and for this expression our replacement would look like to something similar to:
$1\u$2
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
You could match a word boundary \b, match 1+ digits \d+ and then forget what is matched using \K. Then match a single lowercase a-z:
\b\d+\K[a-z]
Replace with:
\u$0
See a Regex demo
If there can be not a non whitespace before the digits, instead of using \b you might also use:
(?<!\S)\d+\K[a-z]
See another Regex demo

Regex in middle of text doesn't match

I have a regex to find url's in text:
^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$
However it fails when it is surrounded by text:
https://regex101.com/r/0vZy6h/1
I can't seem to grasp why it's not working.
Possible reasons why the pattern does not work:
^ and $ make it match the entire string
(?!:\/\/) is a negative lookahead that fails the match if, immediately to the right of the current location, there is :// substring. But [a-zA-Z0-9-_]+ means there can't be any ://, so, you most probably wanted to fail the match if :// is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/).
[a-zA-Z]{2,11}? - matches 2 chars only if $ is removed since the {2,11}? is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.
Use
(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}
See the regex demo. Add \b word boundaries if you need to match the substrings as whole words.
Note in Python regex there is no need to escape /, you may replace (?<!:\/\/) with (?<!://).
The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.

Regex - Regular expression for repeat word with prefix

How do I create a regular expression to match subword which start with same prefix, for example aaa, the random word after that has random length.
aaa[randomword1]aaa[randomword2]
If I use pattern
(aaa\w+)*
it match (aaa) and [randomword1]aaa[randomword2]. But I want to match groups: aaa, randomword1, aaa, randomword2.
EDIT: I mean in the string may have multi times aaa, and I need match all subword aaa_randomword_times_n.
I suggest aaa(\w+)aaa(\w+), hope it will help you:)
You can use following regular expression :
\b(aaa|(?<=\[).*?(?=\]))\b
\b..\b -> zero-width assertion word boundary to match word
aaa -> your specific word to look
| -> check for optional
(?<=[) look behind zero width assertion which checks characters after
open square bracket([)
.*? : character to match
(?=])) => look ahead zero width assertion which matches characters
before closing square bracket(])

check if there is a word repeated at least 2 or more times. (Regular Expression)

Using Regular Expression,
from any line of input that has at least one word repeated two or more times.
Here is how far i got.
/(\b\w+\b).*\1
but it is wrong because it only checks for single char, not one word.
input: i might be ill
output: < i might be i>ll
<> marks the matched part.
so, i try to do (\b\w+\b)(\b\w+\b)*\1
but it is not working totally.
Can someone give help?
Thanks.
this should work
(\b\w+\b).*\b\1\b
greedy algorithm will ensure longest match. If you want second instance to be a separate word you have to add the boundaries there as well. So it's the same as
\b(\w+)\b.*\b\1\b
Positive lookahead is not a must here:
/\b([A-Za-z]+)\b[\s\S]*\b\1\b/g
EXPLANATION
\b([A-Za-z]+)\b # match any word
[\s\S]* # match any character (newline included) zero or more times
\b\1\b # word repeated
REGEX 101 DEMO
To check for repeated words you can use positive lookahead like this.
Regex: (\b[A-Za-z]+\b)(?=.*\b\1\b)
Explanation:
(\b[A-Za-z]+\b) will capture any word.
(?=.*\b\1\b) will lookahead if the word captured by group is present or not. If yes then a match is found.
Note:- This will produce repeated results because the word which is matched once will again be matched when regex pointer captures it as a word.
You will have to use programming to strip off the repeated results.
Regex101 Demo