REGEX: select everything to the left until the first specified delimeter

REGEX: select everything to the left until the first specified delimeter - regex

I'm using ColdFusion functions to query an Active Directory Database, return Membership information for a user, then REGEX functions to search the output for specified groups. I made "|" a delimiter.
Anyway, here's some example output:
CN=Group One,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Two,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Three,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Four,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Five,OU=Distribution Lists,DC=Domain,DC=org
What I would like to capture is this:
CN=Group Three,OU=Distribution Lists,DC=Domain,DC=org
Here is what I've tried so far:
^|CN=(.*Group? Three)
Here's a link to the example: http://rubular.com/r/DIGZOPwTag
What's my problem?
Well, this doesn't work all that great... It goes to the left, but it goes too far! How do I stop it at the first occurrence of |CN= to the left?
Thank you in advance for your time. It is appreciated.
!!Clarification!!
Better Example Output:
CN=Pay Band 50,OU=Distribution Lists,DC=Domain,DC=org|CN=Human Resources,OU=Distribution Lists,DC=Domain,DC=org|CN=SiteA Staff,OU=Distribution Lists,DC=Domain,DC=org|CN=SiteB Additional Staff,OU=Distribution Lists,DC=Domain,DC=org|CN=Executives,OU=Distribution Lists,DC=Domain,DC=org
Desired matches:
I'm looking for specific groups:
Site Name w/Possible Spaces Staff
Site Name w/Possible Spaces Additional Staff
It would be awesome to return: "StieAlpha Staff", "Site Beta Additional Staff". It would also be acceptable to include the "CN=" prefix because I could use it to do queries later.
"Staff" and "Additional Staff" will always be part of the group(s) I want to match.
What I've tried, again
^|CN=[^|CN=]*? Staff|Additional? Staff
This new example is not quite perfect as it doesn't grab all of "Site Beta". "Site Beta" Could be any name of any building, for example.
example link: http://rubular.com/r/vq5JcrvaBR

It is not really clear what you want to extract, if only the "Group Three" CN value or all CN values.
You can extract every CN value with this regex:
CN=([^,]*)
this regex begins extracting after each "CN=" occurence and continues extraction until the first comma (,).

A RegEx to fit your demands is
^.*?\|. Visualisation:

Related

regex group matching based on first entry

As part of regex match, I am trying to select development / product based on first entry being dd-develop / dd.
eg.
The given code below always matches development, whether first string entry is "dd-develop" or just "dd".
I wanted to determine second or third word based on first value.
Any Ideas ?
Regex: (?(?=) (?:development) | (?:product))
Text: dd-develop development product.

From the looks of it, you're trying to decide whether to capture "development" or "product" based on the first word. This regex does that:
(:?dd-develop .*(development).*)|(?:dd .*(product).*)
If your string starts with dd-develop, it captures "development". If it starts with dd, it captures "product". To reverse this, just switch the words in the capture group.
Try it here!

How to extract all IMDb ID's from string

I have a block of text where I want to search for IMDb link, if found I want to extract the IMDdID.
Here is an example string:
http://www.imdb.com/Title/tt2618986
http://www.google.com/tt2618986
https://www.imdb.com/Title/tt2618986
http://www.imdb.com/title/tt1979376/?ref_=nv_sr_1?ref_=nv_sr_1
I want to only extract 2618986 from lines 1, 3 and 4.
Here is the regex line I am currently using but am not having luck:
(?:http|https)://(?:.*\.|.*)imdb.com/(?:t|T)itle(?:\?|/)(..\d+)(.+)?
https://regex101.com/r/ERtoRz/1

If you are interested in only extracting the ID, so 2618986, none of the comments quite nail it, since they match tt2618986. Building on top of #The fourth bird answer, you will need to separate tt2618986 into two parts - tt and 2618986. So instead of a single ([a-zA-Z0-9]+), have [a-zA-Z]+([0-9]+).
^https?://www\.imdb\.com/[Tt]itle[?/][a-zA-Z]+([0-9]+)
Regex Demo
You can then extract the 2618986 part by calling group 1.

This expression might simply extract those desired digits:
^(?:https?://)(?:www\.)?imdb\.com/title/[a-z]+([0-9]+).*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.

You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.

I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

Ultraedit, regular expression help, extracting 2 values, comma separated

I have this file where I only want to extract the email address and first name from our client list.
So a sample from the file:
a#abc.com,www.abc.com,2011-11-15 00:00:00,8.8.8.8,John,Doe,209 Park Rd,See,FL,33870,,,
b#abc.com,cde.com,2011-11-07 00:00:00,4.4.4.4,Erickson,Crast,136 Kua St # 1367,Pearl,HI,96782,,8084568190,
I would like to get back
a#abc.com,John
b#abc.com,Erickson
So basically email address and First Name
I know I can do this in powershell but maybe a find and replace in ultraedit will be faster
Note: you will notice some fields are not provided so it will show ",," meaning those fields were left empty when the user signed up but the amount of comma in each line is the same, 12 being the count.

So basically there are fields separated by ",". Without looking at the correct content (i.e. email/timestamp etc. will need to have a certain format which could also be checked) let's just try to extract the values of the first and fourth field.
so I'd suggest
a Replace-Operation where you search for
^([^,]*),[^,]*,[^,]*,[^,]*,([^,]*),.*$
and replace it with
\1 # \2
Options: "Regular Expressions: Unix".
(Just inserted the # to have a separator, although the first whitespace would be sufficient. But you'll get the idea, I assume...)
Result:
a#abc.com # John
b#abc.com # Erickson

RegEx pattern to handle URL with dates

I moved to a new website and it mangled up my URL's. Now blog posts are accessible from multiple URL's and would like to redirect one pattern to the other.
I am trying to redirect the first case to the second case:
~/blogs/johndoe/john-doe/2014/03/14/test-article1 =>
~/blogs/john-doe/2014/03/14/test-article1
~/blogs/jimjones/jim-jones/2014/03/14/test-articleb =>
~/blogs/jim-jones/2014/03/14/test-articleb
How do I create a pattern smart enough to slice out the first "johndoe" and "jimjones"? I am using this for IIS rewrite but I think any RegEx should work. Thanks for any help.

This works:
^~/blogs/\w+/(\w+)-(\w+)/(\d{4})/(\d\d)/(\d\d)/([\w-]+)$
Debuggex Demo
It just discards the non-dash name. It doesn't know if its equal to the dash name or not. And it also assumes that the date numbers are valid. 9899/45/33 would be matched.
Capture groups:
First name
Last name
Year
Month
Day
Article name

I don't know about IIS rewrites, but this should work:
/^~/blogs\/[a-z]+\/ -> ~/blogs/
The regular expression will match the start of a string, following by ~/blogs/, followed by a string of all lowercase characters.

I don't use IIS, but this should be at least close.
Pattern:
^blogs/\w+/(\w+/)
Action
blogs/{R:1}
Handy usage doc

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

REGEX: select everything to the left until the first specified delimeter - regex

It is not really clear what you want to extract, if only the "Group Three" CN value or all CN values. You can extract every CN value with this regex: CN=([^,]*) this regex begins extracting after each "CN=" occurence and continues extraction until the first comma (,).

A RegEx to fit your demands is ^.*?\|. Visualisation:

Related

regex group matching based on first entry

How to extract all IMDb ID's from string

How do I use regex to return text following specific prefixes?

Ultraedit, regular expression help, extracting 2 values, comma separated

RegEx pattern to handle URL with dates

Categories

Resources