Regex expression for grabbing certain keyword and combine them together

Regex expression for grabbing certain keyword and combine them together - regex

I have the following text and i would like to grab the keyword "ATTEMPT TO ACCESS DATABASE" and "was denied", and combine these two together. However, there are some user and path name information in between these two keywords, and this makes it difficult for me to capture the thing i want, as i'm still not yet a regex expert. XD
<182>Mar 27 09:38:55 4.3.2.1 [5439570:00311-46004] 03/11/2015 14:13:05
ATTEMPT TO ACCESS DATABASE mail/abc.nsf by USER was denied
Is there a single regex expression that can help me to fulfill my requirement? Would greatly appreciate for all the help!

Have a look at the following regex:
(ATTEMPT\s+TO\s+ACCESS\s+DATABASE\s+)(\S+)\s+by\s+([\w.-]+)\s+(was\s+denied)
Ouput is:
MATCH 1
1. [71-98] `ATTEMPT TO ACCESS DATABASE `
2. [98-110] `mail/abc.nsf`
3. [114-118] `USER`
4. [119-129] `was denied`
So, you can combine Group 1 and 4 to get ATTEMPT TO ACCESS DATABASE was denied (in any case due to i option), and Group 2 will contain the path name, and Group 3 will hold the user name.

Related

Queries on "Get Regex matches" in Robocorp

I have a form in MS Word which the user fills and emails me. I have to open the form and capture all the details entered by the user and use the same to submit a form in my portal.
I am trying to create a robot using Robocorp to automate this process. Using "Get All texts" - RPA Word library, I am logging the contents from the Word document in Robocorp and then trying to get the required data using Regex but need some help on extracting the data using Regex.
Please find the raw text logged in Robocorp below,
Source Text
Query 1:
Need to extract Manager name:
In Regex101, I am getting the name returned as expected upon using, [^Manager\n].*
In Robocorp, when I use 'Get regex matches' with [^Manager\n].*, I am getting all the content of the text file.
Please help me with the regex to use in Robocorp to extract the Manager name.
Query 2:
I need to extract the answers provided by the user for the questions in the above form. (Note: The answers change with every form submitted)
I tried the below,
For eg: I pulled one question first from the above form using - Get Regex matches (?s)(Lunch).*?(No).
I got the below value returned in robocorp,
['Lunch account required? \x07☒ Yes\r☐ No']
Now again from this value returned (using this as string), I tried to get the answer selected by the user using,
Get Regex matches (?<=☒)\s\w+
But I am getting the error "TypeError: expected string or bytes-like object".
Not sure, If the above flow is right or can I get the answers selected by the user for all questions in a different way?
Sorry if my questions are simple. I am totally new to using Regex and in my learning phase.

Lookup On Table Using Wildcards:

I have this requirement, where my lookup table is:
Fi Sortorder
003 Transit
002 Serial
001 Transit,Serial
0** Transit,Account
*** Account
I want to look up on this table on FI field from source, so here problem is how will I search for the wild cards, in Source I will always get 3 digits in FI col, so lookin up on exact 3 digit is fine but how to lookup for last 2 digits and extract its sortorder col.
If I use wildcards then there might be possibility that wrong sortorder will be taken so I have to lookup with more precise information. Also there are 100s of combination and entries its just an example. I don't need full solution with syntax and all, I am just looking for approach. I will post my solution soon which I am trying right now if you guys think of anything kindly let me know.

If you need to do a lookup using less number of characters, just use a substring. And create a lookup for each number of characters you need. Eg. if you need to match using last 2 digits (TSQL):
use select right(Fi, 2) from yourtable as lookup SQL override
do a substring on your port in expression transformation to get last 2 characters from Fi port
use if for lookup condition
Just make sure to set the order and Lookup policy on multiple match according to your needs.

How to have multiple regex for same element

I have a text box which has a regular expression which is something like below
^AB[a-zA-Z0-9]{20}$
which basically allows charecter AB , followed by 20 either alphabetic or numbers, and for example lets consider the validation error for not following this regex is Some Test Error
I have a scenario where user enters AB1234 and tabs out of the text box, and the error Some Test Error shows, but I have a requirement of not showing the same error message Some Test Error if user is trying to follow the format but not adhering to the entire regex.
Scenario 1 :- User enters CD12345675438976524381
I need to show Some Test Error
Scenario 2 : USer enters AB12345
I need to shoe Different Test Error, because user tried to enter a value starting from AB*
How can achieve this, is there a way of specifying multiple regex's?

I am not sure which language you are using... but I suppose that you may change the regex, when user got the message once. While the user is trying to enter the entire string, don't count the number, unless the user input the 21st char or something not belong to [a-zA-Z0-9]...
I wish I made myself understood, the point is that I suppose you change the regex in time.

I think you can for example use multiple regexes and check the input:
if input is valid, everithing is ok,
if input is invalid check: a) if starts with AB (regex: ^AB) or if is valid length (regex ^([^A][^B][a-zA-Z0-9]{20})$) show proper info
if is totally invalid, give another info
OR you can use one long regex, like:
^(AB[a-zA-Z0-9]{20})$|^(AB[a-zA-Z0-9]{0,19}|AB[a-zA-Z0-9]{21,})$|^([^A][^B][a-zA-Z0-9]{20})$
DEMO
which capture given type of input in saparete groups,
and then find which groups was captured to check level of correctness:
if group 1 exist - valid string,
if group 2 - starts with AB but inproper length,
if group 3 - proper lenght, invalid beginning
I sure there are also other solutions.

Using RegEx to Find a Block of Text

I'm attempting to block a long string of unnecessary text that's on every page of a document.
Ex: "36075 This is another page and this is the date March 4 2013"
I know this must be very simple, but I'm hoping there is a way to block text verbatim. Is the only way to block this text by using a lot of /d/s/w+/+ etc or is there is a way to say, "match 36075 This is another page and this is the date March 4 2013".
This would be SO HELPFUL to know. Thank you for helping!

From what you wrote I assume you need to get leading numbers from string, to do it you just need to use this pattern: ^\d+ which from this input:
36075 This is another page and this is the date March 4 2013
will return this:
36075
For future, in case of such questions please provide example string and expected output. As well as what you have tried.

I realized the issue I was having. I didn't need to use RegEx. The program I was using has the functionality to match specific words or groups of words and pronounce them differently. What I discovered is that it will not match the words unless the word groups are input exactly the way the program typically reads them.
Ergo --> The channel saw
the end of the British hold over
Would have to be listed as one group for, "The channel saw" and a second group for "the end of the British hold over"
In addition, there were some numbers --> 11960_30_o_ho_
and if the program naturally read 119 and then 60_3 and then _o_ho_ then three strings would need to be input for each section.
A few frustrating hours later, problem solved :) Thank you for your assistance.

Regex for extracting a limited amount of folders (top 3 levels)

Consider the following file path:
\\fileserver\share\documents\department\my_project\a_sub_folder\myfile.doc
I need to extract the text "\documents\department\my_project" with a regular expression. Details:
Exclude "fileserver" and "share"
Limit to 3 "logical" top level folders after, thereby excluding "\a_sub_folder"
Don't include file name ("myfile.doc")
Using the following regex..:
^.*share(?P<folders>\\.+)\\.+
..I get this in my "folders" group:
\documents\department\my_project\a_sub_folder
The part that nags me is how to get rid of "a_sub_folder". I've tried adding repetition operators to the folders-group with no effect:
^.*share(?P<folders>\\.+){1,3}\\.+
^.*share(?P<folders>\\.+){1,3}?\\.+
The first one of the two above doesn't change the output, while the second one returns an empty group "folders"
I have a feeling that my regex is fundamentally wrong, but unable to see why. Can anyone please shed some light this?
thanks :)
/Geir

How about:
^.*share(?P<folders>(?:\\[^\\]+){1,3})

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js