How to extract all IMDb ID's from string - regex

I have a block of text where I want to search for IMDb link, if found I want to extract the IMDdID.
Here is an example string:
http://www.imdb.com/Title/tt2618986
http://www.google.com/tt2618986
https://www.imdb.com/Title/tt2618986
http://www.imdb.com/title/tt1979376/?ref_=nv_sr_1?ref_=nv_sr_1
I want to only extract 2618986 from lines 1, 3 and 4.
Here is the regex line I am currently using but am not having luck:
(?:http|https)://(?:.*\.|.*)imdb.com/(?:t|T)itle(?:\?|/)(..\d+)(.+)?
https://regex101.com/r/ERtoRz/1

If you are interested in only extracting the ID, so 2618986, none of the comments quite nail it, since they match tt2618986. Building on top of #The fourth bird answer, you will need to separate tt2618986 into two parts - tt and 2618986. So instead of a single ([a-zA-Z0-9]+), have [a-zA-Z]+([0-9]+).
^https?://www\.imdb\.com/[Tt]itle[?/][a-zA-Z]+([0-9]+)
Regex Demo
You can then extract the 2618986 part by calling group 1.

This expression might simply extract those desired digits:
^(?:https?://)(?:www\.)?imdb\.com/title/[a-z]+([0-9]+).*$
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

Related

Is there a regex pattern to extract values inbetween quote marks when there may be a quote mark in the quote

I am trying to create a regex pattern to extract an element from the below structures:
Video 'https://www.linkedin.com/in/arjun-sofat-2990939b' rendered
Video 'Pennythornes's Bar - NW9 8LU' rendered
I am looking to extract the elements for the unique name.
So the general structure would be
Video 'EXTRACTHERE' rendered
For the two examples given above the regex should make the extractions go to
https://www.linkedin.com/in/arjun-sofat-2990939b'
Pennythornes's Bar - NW9 8LU
Many thanks :)
Please try this regexp.
/Video ['"](.*)['"] rendered/g

How to combine multiple RegEx commands for Notepad++ using capture groups and alternations?

I am converting exported SQL views as files to a different syntax using a separate specialized conversion tool. This tool can't handle certain commands and formatting so I'm using Notepad++ with RegEx to alter the files ahead of time.
So far I am getting the results that I want, but it takes three separate Find/Replace actions. I'd like to reduce these three RegEx actions down to one if possible.
Find: (.*)(CREATE VIEW.*\nGO)(.*)
Replace: \2
Find: (CREATE VIEW )(.*)(\r\nAS)
Replace: \1"\2"\3
Find: (oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace: (?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)```
I'm using Notepad++ 7.7.1 64-bit, Find/Replace with Regular Expression search mode - ". matches newline" check on.
You'll see in my code that I'm already using capture groups with alternation. I thought I could combine the first two RegEx steps as additional capture groups to Step 3 but it doesn't work out, possibly because they are nested.
I tried referencing the nested groups by incrementing the referencing number accordingly, but it doesn't work (blanks out the result).
Here is an example SQL view file. It's not a working view because I added "oldschema2" so the RegEx would have something to find for one of the replacements, but it's representative as an example here.
garbage
text
beforehand
CREATE VIEW [oldschema1].[viewname]
AS
SELECT DISTINCT
TOP (100) PERCENT oldschema1.TABLENAME.FIELD1, oldschema1.TABLENAME.FIELD2
FROM oldschema1.TABLENAME
WHERE (oldschema1.TABLENAME.FIELD3 = N'Z003') AND oldschema2.TABLENAME.FIELD2 = 1
ORDER BY oldschema1.TABLENAME.FIELD1
GO
garbage
text
after
Here is some additional details of what I'm trying to achieve with each pass.
Notepad++ RegEx Step 1 - isolate view block from CREATE VIEW to GO
Find:
(.*)(CREATE VIEW.*\nGO)(.*)
Replace:
\2
Step 2 - put quotes around view name
Find:
(CREATE VIEW )(.*)(\r\nAS)
Replace:
\1"\2"\3
Step 3 - remove/replace various texts and insert a line at the beginning of the file
Find:
(oldschema1\.|\[oldschema1\]\.|\[|\]|TOP \(100\) PERCENT|oldschema2\.)|(^GO$)|(\A^(.*?))
Replace:
(?1)(?2\;)(?3SET SCHEMA schemaname\; \n\n\1)
The expected output from the above example would be:
SET SCHEMA schemaname;
CREATE VIEW "viewname"
AS
SELECT DISTINCT
TABLENAME.FIELD1, TABLENAME.FIELD2
FROM TABLENAME
WHERE (TABLENAME.FIELD3 = N'Z003') AND TABLENAME.FIELD2 = 1
ORDER BY TABLENAME.FIELD1
;
which I achieve with the above three steps, but I'd like to do it in one Find/Replace if possible.
I'm pretty new to RegEx, and StackOverflow for that matter. Your help is greatly appreciated.
Step 1
I'm not so sure about it, but I'm guessing that maybe we would want an expression similar to:
[\s\S]*?(CREATE VIEW[\s\S]*GO\s*)[\s\S]*
to be replaced with $1, where our desired data is in this capturing group:
(CREATE VIEW[\s\S]*GO\s*)
and we can even remove \s*:
(CREATE VIEW[\s\S]*GO)
and just try:
[\s\S]*?(CREATE VIEW[\s\S]*GO)[\s\S]*
with an m flag.
In the right panel of this demo, the expression is further explained, if you might be interested.
Step 2
We can likely try:
(CREATE VIEW)(.*)
and replace with:
SET SCHEMA schemaname;\n\n$1 "viewname"
Demo
Step 3
This step would probably be done with an expression similar to:
TOP \(100\) PERCENT |oldschema1\.
being replaced with an empty string.
Demo
Step 4:
\s*GO being replaced with \n; or just ; and we might likely have the desired output, not sure though.
Demo

How do I use regex to return text following specific prefixes?

I'm using an application called Firemon which uses regex to pull text out of various fields. I'm unsure what specific version of regex it uses, I can't find a reference to this in the documentation.
My raw text will always be in the following format:
CM: 12345
APP: App Name
BZU: Dept Name
REQ: First Last
JST: Text text text text.
CM will always be an integer, JST will be sentence that may span multiple lines, and the other fields will be strings that consist of 1-2 words - and there's always a return after each section.
The application, Firemon, has me create a regex entry for each field. Something simple that looks for each prefix and then a return should work, because I return after each value. I've tried several variations, such as "BZU:\s*(.*)", but can't seem to find something that works.
EDIT: To be clear I'm trying to get the value after each prefix. Firemon has a section for each field. "APP" for example is a field. I need a regex example to find "APP:" and return the text after it. So something as simple as regex that identifies "APP:", and grabs everything after the : and before the return would probably work.
You can use (?=\w+ )(.*)
Positive lookahead will remove prefix and space character from match groups and you will in each match get text after space.
I am a little late to the game, but maybe this is still an issue.
In the more recent versions of FireMon, sample regexes are provided. For instance:
jst:\s*([^;]?)\s;
will match on:
jst:anything in here;
and result in
anything in here

kimonolabs >Text before comma

I'm trying to scrape a piece of text from a website using Kimonolabs. The text is succesfully scraped using the advanced setting:
div > div > ul > li.location > span.value
The text being scraped using this CSS selector is:
Cityname, streetname 1
However, I wish to delete everything before the comma so that only remains:
Cityname
I wish to do this with regex, but I'm totally ignorant about it. What I do konw is that it has to containof 3 blocks when using Kimonolabs: https://help.kimonolabs.com/hc/en-us/articles/203043464-Manually-input-regular-expressions
Can anybody help me setting up the correct regex? All I got so far is the following, but it's not the correct markup for Kimonolabs (it doesn't allow for it in the dashboard):
^(.+?),
See the docs you referred to:
The regular expression pattern in kimono is defined in three parts. It's important that any custom regular expression you produce retains the three part notation, with the surrounding ( ) for each part. The first part refers to the pattern to the left of the desired content. The middle part refers to the pattern that the desired content must match and the third part refers to the pattern to the right of the desired content.
So, you seem to need:
/^()([^,]+)()/
Or, /(^)([^,]+)(,)/ (it should be equivalent), and the 2nd capture group (the middle part) should capture the Cityname.

REGEX: select everything to the left until the first specified delimeter

I'm using ColdFusion functions to query an Active Directory Database, return Membership information for a user, then REGEX functions to search the output for specified groups. I made "|" a delimiter.
Anyway, here's some example output:
CN=Group One,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Two,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Three,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Four,OU=Distribution Lists,DC=Domain,DC=org|CN=Group Five,OU=Distribution Lists,DC=Domain,DC=org
What I would like to capture is this:
CN=Group Three,OU=Distribution Lists,DC=Domain,DC=org
Here is what I've tried so far:
^|CN=(.*Group? Three)
Here's a link to the example: http://rubular.com/r/DIGZOPwTag
What's my problem?
Well, this doesn't work all that great... It goes to the left, but it goes too far! How do I stop it at the first occurrence of |CN= to the left?
Thank you in advance for your time. It is appreciated.
!!Clarification!!
Better Example Output:
CN=Pay Band 50,OU=Distribution Lists,DC=Domain,DC=org|CN=Human Resources,OU=Distribution Lists,DC=Domain,DC=org|CN=SiteA Staff,OU=Distribution Lists,DC=Domain,DC=org|CN=SiteB Additional Staff,OU=Distribution Lists,DC=Domain,DC=org|CN=Executives,OU=Distribution Lists,DC=Domain,DC=org
Desired matches:
I'm looking for specific groups:
Site Name w/Possible Spaces Staff
Site Name w/Possible Spaces Additional Staff
It would be awesome to return: "StieAlpha Staff", "Site Beta Additional Staff". It would also be acceptable to include the "CN=" prefix because I could use it to do queries later.
"Staff" and "Additional Staff" will always be part of the group(s) I want to match.
What I've tried, again
^|CN=[^|CN=]*? Staff|Additional? Staff
This new example is not quite perfect as it doesn't grab all of "Site Beta". "Site Beta" Could be any name of any building, for example.
example link: http://rubular.com/r/vq5JcrvaBR
It is not really clear what you want to extract, if only the "Group Three" CN value or all CN values.
You can extract every CN value with this regex:
CN=([^,]*)
this regex begins extracting after each "CN=" occurence and continues extraction until the first comma (,).
A RegEx to fit your demands is
^.*?\|. Visualisation: