How to delete all other characters except match case using Regex - regex

arn:aws:iam::aws:policy/AmazonEC2FullAccess
arn:aws:iam::aws:policy/IAMFullAccess
arn:aws:iam::s:policy/CloudWatchAgentServerPolicy
arn:aws:iam::aws:policy/AdministratorAccess
arn:aws:iam::aws:policy/aws-service-role/AWSSupportServiceRolePolicy
arn:aws:iam::aws:policy/aws-service-role/AWSTrustedAdvisorServiceRolePolicy
arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
arn:aws:iam::aws:policy/aws-service-role/AmazonElasticFileSystemServiceRolePolicy
arn:aws:iam::aws:policy/IAMAccessAnalyzerFullAccess
arn:aws:iam::aws:policy/aws-service-role/AWSBackupServiceLinkedRolePolicyForBackup
Here i need only the policy names which is at the end.
I need only the letters after /
this is the regex am using (?<=/).*
the output of this regex is this
arn:aws:iam::aws:policy/AdministratorAccess
arn:aws:iam::aws:policy/aws-service-role/AWSSupportServiceRolePolicy
As you can see in 1) it is greping correctly, but in 2) i need the letters after the last occurrence of /
and i need to delete everything except the match case.
Kindly someone drop your suggestions to achieve this.
Note: am aware that i can get the aws policy names using boto3, but am curious about the above usecase.

You can just use grep with regexp and write result in another file. the remove original, if you want.
Something like
grep -Eao '\(?<=/).*' 'logs.log' >result.log

You can use the lookbehind assertion, and then match any char except a / or newline till the end of the string [^/\r\n]+$
(?<=/)[^/\r\n]+$
See a regex demo
If you use PCRE, you can also make use of \K to forget what is matched so far.
.*\/\K[^/\r\n]+$
See another regex demo.

Related

Regex for internal URL

I'm trying to create a regex that matches with internal URLs (the ones that don't include the domain or http) that I can find in a file like this one:
category/subcategory/sub-subcategory/item-1
For that I'm using:
/\w+\/.+\/[\w\-]+/
But some URLs are like this:
category/subcategory
And I need a regular expression that also catch those. Do I have to create a different one or is it possible to create one that match both examples? Is for a BASH script but if you have an idea it does not matter if it is for other engine.
Thank you!!
Update: I forgot the context. Each line of the file is like this:
"11","category/subcategory/sub-subcategory/item-1","index.php?option=com_trombinoscopeextended&Itemid=125&lang=es&view=trombinoscope","251","0","0000-00-00","","","","","","","0"
Or like this:
"4","category/subcategory","index.php?option=com_trombinoscopeextended&Itemid=121&lang=es","0","1","0000-00-00","","","","","","","0"
I need to extract the examples for each line.
Thanks.
You may use
/\w+(\/[\w-]+)+/
See the regex demo.
Details
\w+ - 1+ word chars
(\/[\w-]+)+ - 1 or more consecutive sequences of
\/ - a / char
[\w-]+ - 1+ word or - chars.
A hint: you might read in your string with a kind of a CSV parser using your preferred language, and then only return fields that match ^\w+(\/[\w-]+)+$ pattern (here, ^ matches the start of the string and $ matches the end of the string).
That is pretty specific. I came up with this one after some testing. We have subdomains we need to check for as well.
(?!https?:)/?[^/][^/].*|(https?:)?//([^.]*\.)?yourdomain\.com(/.*)?
Someone can probably make it better, but this works for me.

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

REGEX help to capture certain values from string

I am hoping someone can assist with the REGEX I am trying to do. I just want to be able to capture the first group of characters immediately after either "Job" or "Job -".
EXAMPLE:
Job PXDFUH34 RE443 JRA99
Job - W0WEIN12SD UIS90 TYPSOS48
I want to only capture PXDFUH34 and W0WEIN12SD in this example.
UPDATE
I was able to use this to capture what I needed.
\s(\w+)\s
However, I ran into a special character (#) that this regex doesn't like. How do I account for # now?
EXAMPLE:
Job R#DFUH34 RE143 JRU89
Job - W0WEIN12SD# UIS10 TTPSOS45
Try this regex:
Job\b[\s-]*(\S+)
It means:
Look for Job and a limit \b - to avoid text like Jobless
and [\s-] spaces and hyphens * as many as possible you can find,
and then group ()
the first word \S+.
Regex live here.
Hope it helps.
Try this regex
^Job\s\-?\s?\K[^\s]*\b
On the basis of #alanmoore comments this is the alternative
^Job\s\-?\s?([^\s]*)\b
Working Regex

Notepad++ and delimiters: automatically replace ``string'' by \command{string}

Within Notepad++, I want to replace many instances of the type ``string'' by \command{string} where string can be any string of characters. I am fairly close to what I want to achieve with:
Find: (?<=``)(.*?)(?='')
Replace: \\command{\1}
There is still a problem. With the regex code above, instead of \command{string} I get ``\command{string}'' and I am not sure why the `` and '' are not removed?
It is because you are using lookaround assertions. Lookaround (zero-width) assertions only assert that a position can be matched and do not "consume" any characters on the string. You can use the below regular expression.
Find: ``([^']+)''
Replace: \\command{\1}
You need to wrap everything into a capture group and use that. NP++ seems to not support lookahead/behind, but you dont need that for this specific case anyway:
``([^']+)'' -> \\command{\1}
This will make sure it does not match two commands (longest match) in something like:
run ``ls -l'' or ``ls -a''

regex limiting wildcards for url folders

I'd like to set up a regular expression that matches certain patterns for a URL:
http://www.domain.com/folder1/folder2/anything/anything/index.html
This matches, and gets the job done:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/.*\/.*\/index\.html([\?#].*)?$/.test(location.href)
I'm unsure how to limit the wildcards to one folder each. So how can I prevent the following from matching:
http://www.domain.com/folder1/folder2/folder3/folder4/folder5/index.html
(note: folder 5+ is what I want to prevent)
Thanks!
Try this regular expression:
/^http:\/\/www\.domain\.com\/(?:\w+\/){1,3}index\.html([\?#].*)?$/
Change the number 3 to the maximum depth of folders possible.
. matches any character.
[^/] matches any characters except /.
Since the / character marks the begining and end of regex literals, you may have to escape them like this: [^\/].
So, replacing .* by [^\/]* will do what you want:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^\/]*\/[^\/]*\/index\.html([\?#].*)?$/.test(location.href)
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^/]*\/[^/]*\/index\.html([\?#].*)?$/
I don't remember whether we should escape the slashes within the []. I don't think so.
EDIT: Aknoledging tom's comment using + instead of *:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^/]+\/[^/]+\/index\.html([\?#].*)?$/
/^http:\/\/www\.domain\.com\/\([^/]*\/\)\{2\}/
And you can change 2 to whatever number of directories you want to match.
You may use:
^http:\/\/www\.domain\.com\/folder1\/folder2\/(\w*\/){2}index\.html([\?#].*)?$/.test(location.href)