REGEX help to capture certain values from string - regex

I am hoping someone can assist with the REGEX I am trying to do. I just want to be able to capture the first group of characters immediately after either "Job" or "Job -".
EXAMPLE:
Job PXDFUH34 RE443 JRA99
Job - W0WEIN12SD UIS90 TYPSOS48
I want to only capture PXDFUH34 and W0WEIN12SD in this example.
UPDATE
I was able to use this to capture what I needed.
\s(\w+)\s
However, I ran into a special character (#) that this regex doesn't like. How do I account for # now?
EXAMPLE:
Job R#DFUH34 RE143 JRU89
Job - W0WEIN12SD# UIS10 TTPSOS45

Try this regex:
Job\b[\s-]*(\S+)
It means:
Look for Job and a limit \b - to avoid text like Jobless
and [\s-] spaces and hyphens * as many as possible you can find,
and then group ()
the first word \S+.
Regex live here.
Hope it helps.

Try this regex
^Job\s\-?\s?\K[^\s]*\b
On the basis of #alanmoore comments this is the alternative
^Job\s\-?\s?([^\s]*)\b
Working Regex

Related

How to delete all other characters except match case using Regex

arn:aws:iam::aws:policy/AmazonEC2FullAccess
arn:aws:iam::aws:policy/IAMFullAccess
arn:aws:iam::s:policy/CloudWatchAgentServerPolicy
arn:aws:iam::aws:policy/AdministratorAccess
arn:aws:iam::aws:policy/aws-service-role/AWSSupportServiceRolePolicy
arn:aws:iam::aws:policy/aws-service-role/AWSTrustedAdvisorServiceRolePolicy
arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
arn:aws:iam::aws:policy/aws-service-role/AmazonElasticFileSystemServiceRolePolicy
arn:aws:iam::aws:policy/IAMAccessAnalyzerFullAccess
arn:aws:iam::aws:policy/aws-service-role/AWSBackupServiceLinkedRolePolicyForBackup
Here i need only the policy names which is at the end.
I need only the letters after /
this is the regex am using (?<=/).*
the output of this regex is this
arn:aws:iam::aws:policy/AdministratorAccess
arn:aws:iam::aws:policy/aws-service-role/AWSSupportServiceRolePolicy
As you can see in 1) it is greping correctly, but in 2) i need the letters after the last occurrence of /
and i need to delete everything except the match case.
Kindly someone drop your suggestions to achieve this.
Note: am aware that i can get the aws policy names using boto3, but am curious about the above usecase.
You can just use grep with regexp and write result in another file. the remove original, if you want.
Something like
grep -Eao '\(?<=/).*' 'logs.log' >result.log
You can use the lookbehind assertion, and then match any char except a / or newline till the end of the string [^/\r\n]+$
(?<=/)[^/\r\n]+$
See a regex demo
If you use PCRE, you can also make use of \K to forget what is matched so far.
.*\/\K[^/\r\n]+$
See another regex demo.

RegEx Replace - Remove Non-Matched Values

Firstly, apologies; I'm fairly new to the world of RegEx.
Secondly (more of an FYI), I'm using an application that only has RegEx Replace functionality, therefore I'm potentially going to be limited on what can/can't be achieved.
The Challange
I have a free text field (labelled Description) that primarily contains "useless" text. However, some records will contain either one or multiple IDs that are useful and I would like to extract said IDs.
Every ID will have the same three-letter prefix (APP) followed by a five digit numeric value (e.g. 12911).
For example, I have the following string in my Description Field;
APP00001Was APP00002TEST APP00003Blah blah APP00004 Apple APP11112OrANGE APP
THE JOURNEY
I've managed to very crudely put together an expression that is close to what I need (although, I actually need the reverse);
/!?APP\d{1,5}/g
Result;
THE STRUGGLE
However, on the Replace, I'm only able to retain the non-matched values;
Was TEST Blah blah Apple OrANGE APP
THE ENDGAME
I would like the output to be;
APP00001 APP00002 APP00003 APP00004 APP11112
Apologies once again if this is somewhat of a 'noddy' question; but any help would be much appreciated and all ideas welcome.
Many thanks in advance.
You could use an alternation | to capture either the pattern starting with a word boundary in group 1 or match 1+ word chars followed by optional whitespace chars.
What you capture in group 1 can be used as the replacement. The matches will not be in the replacement.
Using !? matches an optional exclamation mark. You could prepend that to the pattern, but it is not part of the example data.
\b(APP\d{1,5})\w*|\w+\s*
See a regex demo
In the replacement use capture group 1, mostly using $1 or \1

Extracting String using regex

I am using a HTA Application I wrote for our help desk to take notes.
I've been using regex (Best I can) to CTRL+A our ticket pop up and click parse on my app to fill out information
I need to find "TICKET - T00000000.0000 - Account Security (Company Name...)" and only grab the "Account Security" section. or for future grab whatever is between the 2nd - and the (
Any suggestions would be grand
here is an example what I've tried and what I am using
try {
$(".problem_description", context).val(clipdata.match(/TICKET -.+[)]/)[0]);
}
catch (e) {
}
Update
I have tried a few of the suggestions here but the results still seem to give me the entire string or error out in my script.
Here's the regex using positive lookbehind:
(?<=TICKET\ -\ T\d{8}\.\d{4}\ -\ ).*\)
Here's regex101 explanation: https://regex101.com/r/6BN16e/1
The query effectively says matching anything after "TICKET - T(8 digits).(4 digits) - ". You can of course tweak it to your specification.
Here's a tutorial on lookahead and lookbehind that may be helpful: https://www.regular-expressions.info/lookaround.html
Use a capture group. In a regex you can use parentheses to mark a capture group. So if you define a pattern where a portion of it marks the text you want to extract, you can wrap that portion in parentheses. The object returned by the match function in most languages is an object that lets you access the values of individual capture groups.
Try this regex I quickly made up: /[^-]*-[^-]*- ([^(]*)/
Full example: var matches = "TICKET - T00000000.0000 - Account Security (Company Name...)".match(/[^-]*-[^-]*- ([^(]*)/)
Your value will be in matches[1].
It says: start from the beginning, look for anything not a dash, then a dash, then anything not a dash, then another dash, then a space, then capture anything not a left-parenthesis into a capture group.
This one will leave an extra space at the end of the captured group value. Also, it will truncate your value if your value contains a left parenthesis.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

RegEx: capture entire group content

I am writing a parser for some Oracle commands, like
LOAD DATA
INFILE /DD/DATEN
TRUNCATE
PRESERVE BLANKS
INTO TABLE aaa.bbb
( some parameters... )
I already created a regex to match the entire command. I am now looking for a way to capture the name of the input file ("/DD/DATEN" for instance here).
My problem is that using the following regex will only return the last character of the first group ("N").
^\s*LOAD DATA\s*INFILE\s*(\w|\\|/)+\s*$
Debuggex Demo
Any ideas?
Many thanks in advance
EDIT: following #HamZa 's question, here would be the entire regex to parse Oracle LOAD DATA INFILE command (simplified though):
^\s*LOAD DATA\s*INFILE\s*((?:\w|\\|/)+)\s*((?:TRUNCATE|PRESERVE BLANKS)\s*){0,2}\s*INTO TABLE\s*((?:\w|\.)+)\s*\(\s*((\w+)\s*POSITION\s*\(\s*\d+\s*\:\s*\d+\s*\)\s*((DATE\s*\(\s*(\d+)\s*\)\s*\"YYYY-MM-DD\")|(INTEGER EXTERNAL)|(CHAR\s*\(\s*(\d+)\s*\)))\s*\,{0,1}\s*)+\)\s*$
Debuggex Demo
Let's point out the wrongdoer in your regex (\w|\\|/)+. What happens here ?
You're matching either a word character or a back/forwardslash and putting it in group 1 (\w|\\|/) after that you're telling the regex engine to do this one or more times +. What you actually want is to match those characters several times before grouping them. So you might use a non-matching group (?:) : ((?:\w|\\|/)+).
You might notice that you could just use a character class after all ([\w\\/]+). Hence, your regex could look like
^\s*LOAD DATA\s*INFILE\s*([\w\\/]+)\s*$
On a side note: that end anchor $ will cause your regex to fail if you're not using multiline mode. Or is it that you intentionally didn't post the full regex :) ?
Not tested but...
^\s*LOAD DATA\s*INFILE\s*(\S+)\s*$