How to regex the class name out of this? - regex

So imagine I have big long string and inside it, I have this piece of text....
(BlahUtils.loggerName(MyClass.class.getName())
I want to extract out "MyClass".
If I do:
def matcher1 = test =~ /MyClass/
matcher1[0]
I get it. But then MyClass can be anything and that is what I want to extract out. How do I do that?

You may use
/(?<=loggerName\()\w+(?=\.class\b)/
See the regex demo
Details
(?<=loggerName\() - right before, there must be loggerName( substring
\w+ - 1+ word chars
(?=\.class\b) - right after, there must be a .class as whole word.
See the Groovy demo:
String test = "(BlahUtils.loggerName(MyClass.class.getName())"
def m = (test =~ /(?<=loggerName\()\w+(?=\.class\b)/)
if (m) {
println m.group();
}

Simple no-brainer:
'(BlahUtils.loggerName(MyClass.class.getName())'.eachMatch( /loggerName\(([^\(\)\.]+)/ ){ println it[ 1 ] }
gives MyClass

Related

Do not show match that contains a string

Guys I have the following case:
def AllEnviroments = ['company-test', 'MYTEST-1234', 'company-somethingelse-something']
def EnviromentChoices = (AllEnviroments =~ /company-.*|MYTEST/).findAll().collect().sort()
How to exclude from this regex strings that have ” something ” or whatever it will be in it's place inside it so it will print only company test and DPDHLPA
Expected result :
PRINT company-test and mytest-1234
and NOT "company-something"
You can use
def AllEnviroments = ['company-test', 'MYTEST-1234', 'company-somethingelse-something']
def EnviromentChoices = AllEnviroments.findAll { it =~ /company-(?!something).*|MYTEST/ }.sort()
print(EnviromentChoices)
// => [MYTEST-1234, company-test]
Note that the .findAll is run directly on the string array where the regex is updated with a negative lookahead to avoid matching any strings where something comes directly after company-.
See the Groovy demo.

Perl regex: look for keyword which are not starting with

Example 1: "hello this is me. KEYWORD: blah"
Example 2: "KEYWORD: apple"
I just want to be able to catch KEYWORD in example 1, not 2 since in 2, it starts with KEYWORD
if ($line =~/KEYWORD:/x) {
# do something
}
The above code catch both examples. How can I change regex so that it only catches KEYWORD in example 1?
PS Eventually I want example 1 to be KEYWORD: blah
If you are just looking for a keyword, you should be using index and not a regex :
if (index($line, 'KEYWORD') > 0) {
# do something
}
See the documentation : index STR, SUBSTR returns -1 if SUBSTR isn't found in STR, otherwise it return the index of SUBSTR in STR (starting at 0).
If you want to look for a more complex pattern than a simple keyword, then you should do as #Perl Dog said in his answer.
You are looking for a negative lookbehind assertion, i.e. for 'KEYWORD' that is not preceeded by a certain string (in your case the start-of-line marker ^):
if ($line =~/(?<!^)KEYWORD:/x) {
# found KEYWORD in '$line', but not at the beginning
print $line, "\n";
}
Output:
hello this is me. KEYWORD: blah
Update: As stated in the comments, the /x modifier isn't necessary in my first regex but can be used to make the pattern more readable. It allows for whitespace (including newlines) and/or comments in the pattern to improve readability. The downside is that every blank/space character in the actual pattern has to be escaped (to distinguish it from the comments) but we don't have these here. The pattern can thus be re-written as follows (the result is the same):
if ($line =~ / (?<! # huh? (?) ahh, look left (<) for something
# NOT (!) appearing on the left.
^) # oh, ok, I got it, there must be no '^' on the left
KEYWORD: # but the string 'KEYWORD:' should come then
/x ) {
# found KEYWORD in '$line', but not at the beginning
print $line, "\n";
}
The answer is actually quite simple!
/.KEYWORD/ # Not at the start of a line
/.KEYWORD/s # Not at the start of the string
By the way, you might want to add \b before KEYWORD to avoid matching NOTTHEKEYWORD.
I think you need to give better, real examples
On the face of it, all you need is
if ( /KEYWORD/ and not /^KEYWORD/ ) {
...
}
Another simple regex
print if /^.+KEYWORD/;
match
hello this is me. KEYWORD: blah

Groovy regex PatternSyntaxException when parsing GString-style variables

Groovy here. I'm being given a String with GString-style variables in it like:
String target = 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
Keep in mind, this is not intended to be used as an actual GString!!! That is, I'm not going to have 3 string variables (animal, role and bodyPart, respectively) that Groovy will be resolving at runtime. Instead, I'm looking to do 2 distinct things to these "target" strings:
I want to be able to find all instances of these variables refs ("${*}") in the target string, and replace it with a ?; and
I also need to find all instances of these variables refs and obtain a list (allowing dupes) with their names (which in the above example, would be [animal,role,bodyPart])
My best attempt thus far:
class TargetStringUtils {
private static final String VARIABLE_PATTERN = "\${*}"
// Example input: 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
// Example desired output: 'How now brown ?. The ? has oddly-shaped ?.'
static String replaceVarsWithQuestionMarks(String target) {
target.replaceAll(VARIABLE_PATTERN, '?')
}
// Example input: 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
// Example desired output: [animal,role,bodyPart] } list of strings
static List<String> collectVariableRefs(String target) {
target.findAll(VARIABLE_PATTERN)
}
}
...produces PatternSytaxException anytime I go to run either method:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 0
${*}
^
Any ideas where I'm going awry?
The issue is that you have not escaped the pattern properly, and findAll will only collect all matches, while you need to capture a subpattern inside the {}.
Use
def target = 'How now brown ${animal}. The ${role} has oddly-shaped ${bodyPart}.'
println target.replaceAll(/\$\{([^{}]*)\}/, '?') // => How now brown ?. The ? has oddly-shaped ?.
def lst = new ArrayList<>();
def m = target =~ /\$\{([^{}]*)\}/
(0..<m.count).each { lst.add(m[it][1]) }
println lst // => [animal, role, bodyPart]
See this Groovy demo
Inside a /\$\{([^{}]*)\}/ slashy string, you can use single backslashes to escape the special regex metacharacters, and the whole regex pattern looks cleaner.
\$ - will match a literal $
\{ - will match a literal {
([^{}]*) - Group 1 capturing any characters other than { and }, 0 or more times
\} - a literal }.

Using Regex is there a way to match outside characters in a string and exclude the inside characters?

I know I can exclude outside characters in a string using look-ahead and look-behind, but I'm not sure about characters in the center.
What I want is to get a match of ABCDEF from the string ABC 123 DEF.
Is this possible with a Regex string? If not, can it be accomplished another way?
EDIT
For more clarification, in the example above I can use the regex string /ABC.*?DEF/ to sort of get what I want, but this includes everything matched by .*?. What I want is to match with something like ABC(match whatever, but then throw it out)DEF resulting in one single match of ABCDEF.
As another example, I can do the following (in sudo-code and regex):
string myStr = "ABC 123 DEF";
string tempMatch = RegexMatch(myStr, "(?<=ABC).*?(?=DEF)"); //Returns " 123 "
string FinalString = myStr.Replace(tempMatch, ""); //Returns "ABCDEF". This is what I want
Again, is there a way to do this with a single regex string?
Since the regex replace feature in most languages does not change the string it operates on (but produces a new one), you can do it as a one-liner in most languages. Firstly, you match everything, capturing the desired parts:
^.*(ABC).*(DEF).*$
(Make sure to use the single-line/"dotall" option if your input contains line breaks!)
And then you replace this with:
$1$2
That will give you ABCDEF in one assignment.
Still, as outlined in the comments and in Mark's answer, the engine does match the stuff in between ABC and DEF. It's only the replacement convenience function that throws it out. But that is supported in pretty much every language, I would say.
Important: this approach will of course only work if your input string contains the desired pattern only once (assuming ABC and DEF are actually variable).
Example implementation in PHP:
$output = preg_replace('/^.*(ABC).*(DEF).*$/s', '$1$2', $input);
Or JavaScript (which does not have single-line mode):
var output = input.replace(/^[\s\S]*(ABC)[\s\S]*(DEF)[\s\S]*$/, '$1$2');
Or C#:
string output = Regex.Replace(input, #"^.*(ABC).*(DEF).*$", "$1$2", RegexOptions.Singleline);
A regular expression can contain multiple capturing groups. Each group must consist of consecutive characters so it's not possible to have a single group that captures what you want, but the groups themselves do not have to be contiguous so you can combine multiple groups to get your desired result.
Regular expression
(ABC).*(DEF)
Captures
ABC
DEF
See it online: rubular
Example C# code
string myStr = "ABC 123 DEF";
Match m = Regex.Match(myStr, "(ABC).*(DEF)");
if (m.Success)
{
string result = m.Groups[1].Value + m.Groups[2].Value; // Gives "ABCDEF"
// ...
}

Regular Expression to match two characters unless they're within two positions of another character

I'm trying to create a regular expression to match some certain characters, unless they appear within two of another character.
For example, I would want to match abc or xxabcxx but not tabct or txxabcxt.
Although with something like tabctxxabcxxtabcxt I'd want to match the middle abc and not the other two.
Currently I'm trying this in Java if that changes anything.
Try this:
String s = "tabctxxabcxxtabcxt";
Pattern p = Pattern.compile("t[^t]*t|(abc)");
Matcher m = p.matcher(s);
while (m.find())
{
String group1 = m.group(1);
if (group1 != null)
{
System.out.printf("Found '%s' at index %d%n", group1, m.start(1));
}
}
output:
Found 'abc' at index 7
t[^t]*t consumes anything that's enclosed in ts, so if the (abc) in the second alternative matches, you know it's the one you want.
EDITED! It was way wrong before.
Oooh, this one's tougher than I thought. Awesome. Using fairly standard syntax:
[^t]{2,}abc[^t]{2,}
That will catch xxabcxx but not abc, xabc, abcx, xabcx, xxabc, xxabcx, abcxx, or xabcxx. Maybe the best thing to do would be:
if 'abc' in string:
if 't' in string:
return regex match [^t]{2,}abc[^t]{2,}
else:
return false
else:
return false
Is that sufficient for your intention?