Regex - Regular expression for repeat word with prefix - regex

How do I create a regular expression to match subword which start with same prefix, for example aaa, the random word after that has random length.
aaa[randomword1]aaa[randomword2]
If I use pattern
(aaa\w+)*
it match (aaa) and [randomword1]aaa[randomword2]. But I want to match groups: aaa, randomword1, aaa, randomword2.
EDIT: I mean in the string may have multi times aaa, and I need match all subword aaa_randomword_times_n.

I suggest aaa(\w+)aaa(\w+), hope it will help you:)

You can use following regular expression :
\b(aaa|(?<=\[).*?(?=\]))\b
\b..\b -> zero-width assertion word boundary to match word
aaa -> your specific word to look
| -> check for optional
(?<=[) look behind zero width assertion which checks characters after
open square bracket([)
.*? : character to match
(?=])) => look ahead zero width assertion which matches characters
before closing square bracket(])

Related

Regex to match data between brackets and length of the string

I'm using the following string:
02:05:31,624 TRACE [com.test.enterprise.process.module.AZZADM13] (default task-6) [2019-06-10][02:05:31][5330985][TESTSRV ][AZZADM13 ][process - ENTER ]
using regular expression I would like to match the TESTSRV. To do so i need to match on a value that is in between 2 brackets, is a capital letter (A-Z) or a space and has a length of 10 (includes brackets) or length of 8 (does not include bracket).
Here is my starting expression:
\[([A-Z ]+)\]{10}
This matches the "in between brackets" but i can't seem to get the length to work. Any advice appreciated.
In this example, i would expect to match on TESTSRV.
I was not sure what our other inputs would look like, my guess is that we might have optional spaces, and we can add those anywhere in our expression that is required:
\[[0-9]+\](\s+)?\[(\s+)?(.+?)(\s+)?\](\s+)?\[[A-Z0-9]
Demo 1
If optional spaces, (\s+)?, are unnecessary, we can simply remove any of (\s+)?.
Here, we have a left boundary:
\[[0-9]+\]
and a right boundary:
\[[A-Z0-9]
and our desired output is in this capturing group:
(.+?)
along with some optional spaces groups:
(\s+)?
RegEx Circuit
jex.im visualizes regular expressions:

How to find words that contain string with a limited size

I need to find all the words in an inputted text that has (?i:val) in it and are no longer that 5 characters.
So far I got: \b([a-zA-Z]*(?i:val)[a-zA-Z]*){1,4}\b
If we take this sample text to look in: In computer science, a value is an expression which cannot be evaluated any further (a normal form). Val is also a match
I get 3 matches (value, evaluated and Val), however evaluated should not match the pattern, as it is too long. What is the right way to get this straight?
Your pattern does not account for the length of the words matched.
Use word boundaries and a lookahead like this:
(?i)\b(?=\w*val)\w{1,5}\b
See regex demo
The regex matches:
\b - a leading word boundary since the next pattern is \w
(?=\w*val) - a lookahead making sure there is a val substring after zero or more word characters
\w{1,5} - matches 1 to 5 word characters
\b - trailing word boundary that stops words of more than 5 characters long from matching
You may use an ASCII JS version of the regex:
/\b(?=[a-z]*val)[a-z]{1,5}\b/i
It's important to understand why the "evaluated" was matched. Note:
[a-zA-Z]* matches the "e"
(?i:val) matches "val"
[a-zA-Z]* matches "uated"
Actually there's not repetition here! The pattern was matched in only one iteration.
You can achieve what you want using lookarounds, but I think that regex is not the best tool for this task. I highly recommend you using other functions depending on what you have.

how to get sub-string using regex if I specify start and end, without start characters?

I have string like this:
12abcc?p_auth=123ABC&ABC&s
Start of symbol is "p_auth=" and end of string first "&" symbol.
P.S symbol '&' and 'p_auth=' must not be included.
I have wrote that regex:
(p_auth).+?(?=&)
Ok, thats works well, it gets that sub-string:
p_auth=123ABC
bot how to get string without 'p_auth'?
Use look-arounds:
(?<=p_auth=).*?(?=&)
See regex demo
The look-behind (?<=p_auth=) and the look-ahead (?=&) do not consume characters as they are zero-width assertions. They just check for the substring presence either before or after a certain subpattern.
A couple more words about (?<=p_auth=). It is a positive look-behind. Positive because it require a pattern inside it to appear on the left, before the "main" subpattern. If the look-behind subpattern is found, the result is just "true" and the regex goes on checking the rest of subpatterns. If not, the match is failed, the engine goes on looking for another match at the next index.
Here is some description from regular-expressions.info:
It [the look-behind] tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a "b" that is not preceded by an "a", using negative lookbehind. It doesn't match cab, but matches the b (and only the b) in bed or debt. (?<=a)b (positive lookbehind) matches the b (and only the b) in cab, but does not match bed or debt.
In most cases, you do not really need look-arounds. In this case, you could just use a
p_auth(.*?)&
And get the first capturing group value.
The .*? pattern will look for any number of characters other than a newline, but as few as possible that are required to find a match. It is called lazy dot matching, because the ? symbol makes the * quantifier stop before the first symbol that is matched by the subsequent subpattern in the regular expression.
The .*& would match all the substring until the last & because * quantifier is greedy - it will consume as many characters it can match as possible.
See more at Repetition with Star and Plus regular-expressions.info page.
p_auth(.+?)(?=&)
Simply use this and grab the group 1 or capture 1.

Regex search for characters like "/", "<" and ">"

What should be the regex pattern if my texts contain the characters like "\ / > <" etc and I want to find them. That's because regex treats "/" like it's part of the search pattern and not an individual character.
For example, I want to find Super Kings from the string <span>Super Kings</span>, using VB 2010.
Thanks!
Just try this:
\bYour_Keyword_to_find\b
\b is used in RegEx for matching word boundary.
[EDIT]
You might be looking for this:
(?<=<span>)([^<>]+?)(?=</span>)
Explanation:
<!--
(?<=<span>)([^<>]+?)(?=</span>)
Options: case insensitive; ^ and $ match at line breaks
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<span>)»
Match the characters “<span>” literally «<span>»
Match the regular expression below and capture its match into backreference number 1 «([^<>]+?)»
Match a single character NOT present in the list “<>” «[^<>]+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=</span>)»
Match the characters “</span>” literally «</span>»
-->
[/EDIT]
In regex you must escape the / with \.
For instance, try: <span>(.*)<\/span> <span>([^<]*)<\/span> or <span>(.*?)<\/span>
Read more from:
http://www.regular-expressions.info/characters.html

Regular Expression to Match any Number of Characters in the Middle of a Defined String

I am trying to come up with a regular expression to match a particular pattern.
If a sample test string is as follows:
/wp-content/themes/sometheme/style.css
The regular expression should:
MATCH /wp-content/themes/ exactly, from the beginning, and should also match /style.css exactly, from the end.
NOT MATCH, when the remainder (between the beginning and end strings in item 1) is rwsarbor
MATCH, when the remainder is anything BUT mythemename
For the dynamic part in the middle, it should match any number of characters, and any character type (not just a-z, 0-9, etc)
For example, it should not match:
/wp-content/themes/mythemename/style.css
It should match
/wp-content/themes/jfdskjh-ekhb234_sf/style.css
/wp-content/themes/another_theme/style.css
/wp-content/themes/any_other-theme/style.css
/wp-content/themes/!##$%^&*()_+{}|:"?</style.css
This one is a little out of my league in terms of complexity, so I am looking to the community for assistance.
Try this :
^/wp-content/themes/(?!mythemename).*/style.css$
Demo : http://regexr.com?30ote
Hint : Using Negative look-ahead assertion.
Just make two regex out of it, one to match, and one to not match (here doing it with grep):
echo /wp-content/themes/sametheme/style.css | egrep "^/wp-content/themes/.*/style.css$" | egrep -v "(simetheme|sametheme)"
Instead of rwsarbor and mytheme I choosed something better testable.
A shorter demo would have been fine, btw: /start/middle/end
Vim regex:
^\/wp-content\/themes\/\(rwsarbor\|mythemename\)\#!.\{-}\/style\.css$
Important bits:
\(__\|__\) - match one or the other pattern
\#! - match if the preceeding atom didn't match
.\{-\} - Like .* but non-greedy, otherwise style.css would get sucked up here
Syntax and modifiers are dependent on the specific regex engine you're going to use.