Groovy Regex matcher OR - either unique groups or duplicate group ids - regex

I want to regex search a string representing a log file, locate multiple potential error message, and print the result.
Example Input (single string)
...
Unable to parse YAML file: [mapping values are not allowed in this context] at line 1234\r\n
worthless line\r\n
KitchenInventory.cs(285,118): error CS2345: 'ButteryBread' does not contain a definition for 'DropOnFloor'\r\n
another worthless line\r\n
...
Each error type has certain grouping categories that I'm attempting to capture, which may or may not be present in other error types.
My goal is to output error details after doing the regex search, filling in as many values as are present in each error regex:
Error: Unable to parse YAML file
Details: mapping values are not allowed in this context
Line: 1234
Error: CS2345
Details: 'ButteryBread' does not contain a definition for 'DropOnFloor'
Filename: KitchenInventory.cs
Line: 285,118
(Notice that filename is ommited for the YAML error, due to YAML errors not containing that piece of information.
I've found that I cannot do a logical OR to create a massive Regex string, because there are duplicate grouping names. I attempted to do each regex search individually, but I receive an exception when checking for the presence of a field:
def pattern = ~patternString
def matcher = pattern.matcher(lines)
while (matcher.find()) {
if (matcher.group('filename')) {
println "Filename: "+matcher.group('filename')
}
if (matcher.group('error')) {
println "Error: "+matcher.group('error')
}
if (matcher.group('message')) {
println "Message: "+matcher.group('message')
}
}
Caught: java.lang.IllegalArgumentException: No group with name <errorCode>
java.lang.IllegalArgumentException: No group with name <errorCode>
Is there a way for me to iterate through Groovy regex group names that COULD be present, without throwing exceptions? Alternatively, am I going about this the wrong way, and is there an easier way for me to achieve my goal?

I was looking for the same thing and was not able to find a "groovier" way to do it.
Only solution that I found was to verify if the pattern string contains the group name:
if (stringPattern.contains('message')) {
println "Message: "+matcher.group('message')
}

Related

Regex for finding the name of a method containing a string

I've got a Node module file containing about 100 exported methods, which looks something like this:
exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};
Goal: What I'd like to do is figure out how to grab the name of any method which contains a call to fooMethod, and return the correct method names: methodTwo and methodThree. I wrote a regex which gets kinda close:
exports\.(\w+).*(\n.*?){1,}fooMethod
Problem: using my example code from above, though, it would effectively match methodOne and methodThree because it finds the first instance of export and then the first instance of fooMethod and goes on from there. Here's a regex101 example.
I suspect I could make use of lookaheads or lookbehinds, but I have little experience with those parts of regex, so any guidance would be much appreciated!
Edit: Turns out regex is poorly-suited for this type of task. #ctcherry advised using a parser, and using that as a springboard, I was able to learn about Abstract Syntax Trees (ASTs) and the recast tool which lets you traverse the tree after using various tools (acorn and others) to parse your code into tree form.
With these tools in hand, I successfully built a script to parse and traverse my node app's files, and was able to find all methods containing fooMethod as intended.
Regex isn't the best tool to tackle all the parts of this problem, ideally we could rely on something higher level, a parser.
One way to do this is to let the javascript parse itself during load and execution. If your node module doesn't include anything that would execute on its own (or at least anything that would conflict with the below), you can put this at the bottom of your module, and then run the module with node mod.js.
console.log(Object.keys(exports).filter(fn => exports[fn].toString().includes("fooMethod(")));
(In the comments below it is revealed that the above isn't possible.)
Another option would be to use a library like https://github.com/acornjs/acorn (there are other options) to write some other javascript that parses your original target javascript, then you would have a tree structure you could use to perform your matching and eventually return the function names you are after. I'm not an expert in that library so unfortunately I don't have sample code for you.
This regex matches (only) the method names that contain a call to fooMethod();
(?<=exports\.)\w+(?=[^{]+\{[^}]+fooMethod\(\)[^}]+};)
See live demo.
Assuming that all methods have their body enclosed within { and }, I would make an approach to get to the final regex like this:
First, find a regex to get the individual methods. This can be done using this regex:
exports\.(\w+)(\s|.)*?\{(\s|.)*?\}
Next, we are interested in those methods that have fooMethod in them before they close. So, look for } or fooMethod.*}, in that order. So, let us name the group searching for fooMethod as FOO and the name of the method calling it as METH. When we iterate the matches, if group FOO is present in a match, we will use the corresponding METH group, else we will reject it.
exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})
Explanation:
exports\.(?<METH>\w+): Till the method name (you have already covered this)
(\s|.)*?\{(\s|.)*?: Some code before { and after, non-greedy so that the subsequent group is given preference
(\}|(?<FOO>fooMethod)(\s|.)*?\}): This has 2 parts:
\}: Match the method close delimiter, OR
(?<FOO>fooMethod)(\s|.)*?\}): The call to fooMethod followed by optional code and method close delimiter.
Here's a JavaScript code that demostrates this:
let p = /exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})/g
let input = `exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};';`
let match = p.exec( input );
while( match !== null) {
if( match.groups.FOO !== undefined ) console.log( match.groups.METH );
match = p.exec( input )
}

RegEx not containing specific numbers group GoLang

I am using GoLang RegEx to find a specific number in a message
Invite code for the server ABC
your code is: 4361858022791184384
I am using this RegEx
([0-9]){19}
I want to delete any message which does not contain any invite code.
So that people can only send invite code to that specific place and specific action can be performed. And useless messages get deleted automatically.
I tried to negate it, but it also ignores other numbers.
I want a regex that captures every message which does not contain exactly 19 digits number.
FindString returns an empty string on failure, and Find returns nil. So
you could test against that:
package main
import "regexp"
const s = `Invite code for the server ABC
your code is: 4361858022791184384`
func main() {
re := regexp.MustCompile(`\d{19}`)
find := re.FindString(s)
if find == "" {
panic(re)
}
println(find)
}
https://golang.org/pkg/regexp#Regexp.Find
https://golang.org/pkg/regexp#Regexp.FindString

Extract username from forward slash separated text

I need to extract a username from the log below via regex for a log collector.
Due to the nature of the logs we're getting its not possible to define exactly how many forward slashes are going to be available and I need to select a specific piece of data, as there are multiple occurances of similar formatted data.
Required data:
name="performedby" label="Performed By" value="blah.com/blah/blah blah/blah/**USERNAME**"|
<46>Jun 23 10:38:49 10.51.200.76 25113 LOGbinder EX|3.1|success|2016-06-23T10:38:49.0000000-05:00|Add-MailboxPermission Exchange cmdlet issued|name="occurred" label="Occurred" value="6/23/2016 10:38:49 AM"|name="cmdlet" label="Cmdlet" value="Add-MailboxPermission"|name="performedby" label="Performed By" value="blah.com/blah/blah blah/blah/USERNAME"|name="succeeded" label="Succeeded" value="Yes"|name="error" label="Error" value="None"|name="originatingserver label="Originating Server" value="black"|name="objectmodified" label="Object Modified" value="blah/blah/USERNAME"|name="parameters" label="Parameters" value="Name: Identity, Value: [blah]Name: User, Value: [blah/blah]Name AccessRights, Value: [FullAccess]Name: InheritanceType, Value: [All]"|name="properties" label="Modified Properties" value="n/a"|name="additionalinfo" label="Additional Information"
I've tried a few different regex commands but I'm not able to extract the necessary information without exactly stating how many / there will be.
blah\.com[.*\/](.*?)"\|name
Try this :
blah\.com.*\/(.*?)"\|
Check here
If your username format is this :
value="abc.xyz/something/something/..../USERNAME"
then use this :
\..*\/(.*?)"
check here
Possible solution:
value="[a-z\.\/]*\/(.*)"
(The first capture group is the username)
Working example:
https://regex101.com/r/qZ0zC8/2
Mayby like this?
blah.(\w+\/)+\K([\w]+)
It's catch Username but since it's between ** so I also match them
tested in notepad++

Scala REGEX match for MAC address

Good evening Stackoverflow,
I am stuck in a spot where I can't get Scala regex matches to play nice, here is my code
private def handle_read(packet: TFTPReadRequestPacket, tftp_io: TFTP): Unit = {
val MAC_REGEX = "([0-9A-F]{2}[:-]){5}([0-9A-F]{2})".r
packet.getFilename match {
case MAC_REGEX(a) => println(s"Client is coming from $a")
}
}
When the regex is ([0-9A-F]{2}[:-]) and I request for the file 70-it is fine and spits out that the client is "coming from 70", but when it is the full regex and I request 70-CD-60-74-24-9C it throws an exception like such
[ERROR] [04/28/2015 21:25:27.818] [polydeploy-baremetal-akka.actor.default-dispatcher-4] [akka://polydeploy-baremetal/user/TFTP_Queue] 70-CD-60-74-24-9C (of class java.lang.String)
scala.MatchError: 70-CD-60-74-24-9C (of class java.lang.String)
at com.polydeploy.baremetal.TFTPQueue$.handle_read(TFTPQueue.scala:40)
at com.polydeploy.baremetal.TFTPQueue$.com$polydeploy$baremetal$TFTPQueue$$handle_request(TFTPQueue.scala:33)
at com.polydeploy.baremetal.TFTPQueue$$anonfun$receive$1.applyOrElse(TFTPQueue.scala:14)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at com.polydeploy.baremetal.TFTPQueue$.aroundReceive(TFTPQueue.scala:10)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
What I am wanting to try and accumplish is to be able to have a TFTP request come in for pxelinux.cfg/01-70-CD-60-74-24-9C and pull out the MAC address.
Any and all help is greatly appreciated!
Thanks, Liam.
When the regex is ([0-9A-F]{2}[:-]) and I request for the file 70- it is fine
This is because, in this case, your regex contains a single group.
This worked for me:
val MAC_REGEX = "(([0-9A-F]{2}[:-]){5}([0-9A-F]{2}))".r
"70-CD-60-74-24-9C" match {
case MAC_REGEX(a, _*) => println(s"Client is coming from $a")
}
// prints "Client is coming from 70-CD-60-74-24-9C"
It works because I wrapped the entire regex with a group. a captures that outer group and _* is a sequence of ignored matches for all the other groups. Apparently Regex's extractor returns a list with an element for each capture group.
I have a feeling there is a better way to do this though...

awk/regex: parsing error logs not always returned error description

I recently asked for help to parse out Java error stacks from a group of log files and got a very nice solution at the link below (using awk).
Pull out Java error stacks from log files
I marked the question answered and after some debugging and studying I found a few potential issues and since they are unrelated to my initial question but rather due to my limited understanding of awk and regular expressions, I thought it might be better to ask a new question.
Here is the solution:
BEGIN{ OFS="," }
/[[:space:]]+*<Error / {
split("",n2v)
while ( match($0,/[^[:space:]]+="[^"]+/) ) {
name = value = substr($0,RSTART,RLENGTH)
sub(/=.*/,"",name)
sub(/^[^=]+="/,"",value)
$0 = substr($0,RSTART+RLENGTH)
n2v[name] = value
print name value
}
code = n2v["ErrorCode"]
desc[code] = n2v["ErrorDescription"]
count[code]++
if (!seen[code,FILENAME]++) {
fnames[code] = (code in fnames ? fnames[code] ", " : "") FILENAME
}
}
END {
print "Count", "ErrorCode", "ErrorDescription", "Files"
for (code in desc) {
print count[code], code, desc[code], fnames[code]
}
}
One issue I am having with it is that not all ErrorDescriptions are being captured. For example, this error description appears in the output of this script:
ErrorDescription="Database Error."
But this error description does not appear in the results (description copied from actual log file):
ErrorDescription="Operation not allowed for reason code "7" on table "SCHEMA.TABLE".. SQLCODE=-668, SQLSTATE=57016, DRIVER=4.13.127"
Nor does this one:
ErrorDescription="Cannot Find Person For Given Order."
It seems that most error descriptions are not being returned by this script but do exist in the log file. I don't see why some error descriptions would appear and some not. Does anyone have any ideas?
EDIT 1:
Here is a sample of the XML I am parsing:
<Errors>
<Error ErrorCode="ERR_0139"
ErrorDescription="Cannot Find Person For Given Order." ErrorMoreInfo="">
...
...
</Error>
</Errors>
The pattern in the script will not match your data:
/[[:space:]]+*<Error / {
Details:
The "+" tells it to match at least one space.
The space after "Error" tells it to match another space - but your data has no space before the "=".
The "<" is unnecessary (but not part of the problem).
This would be a better pattern:
/^[[:space:]]*ErrorDescription[[:space:]]*=[[:space:]]*".*"/
This regex would only match the error description.
ErrorDescription="(.+?)"
It uses a capturing group to remember your error description.
Demo here. (Tested against a combination of your edit and your previous question error log.)