RegEx include to next blank space - regex

I'm trying to write a regexp that will find the letters "AD" followed by 4 number digits. In front of AD there should be a blank space.
Example: AD1239
My code: \bBC[0-9]{4}
The next part I don't know how to do. If there is an attached hyphen followed by characters... I want them to be included until the next empty space.
Example: asdf AD3213-4332 asd
The above should output AD3213-4332
Any help is appreciated, Thanks

You can use this regex:
\bAD[0-9]{4}(?:-\S+)?
Here (?:-\S+)? is a non capturing group that will match an optional group that is a hyphen followed by 1+ non-space characters.

Related

How to find regex for multiple conditions

I am trying to find regex which would find below matches. I would replace these with blank. I am able to create regex for few of these conditions individually, but I am not able to figure out how to create one regex for all of these
Strings:
song1 artist (SiteWithMp3Keyword.com).mp3
02.song2 | siteWithdownloadKeyword.in 320 Kbps
song3 [SitewithDjKeyword.in] 128kbps.mp3
Output
song1 artist.mp3
song2
song3.mp3
Criteria for match:
Case Insensitive
Find Strings with particular keyword and remove whole word, even if inside any braces
Find kpbs keyword and remove it along with any number before it (128/320)
if string ends in .mp3, keep it as it is.
Remove junk characters (like | ) and replace _ with space.
Remove number if present at start of string, like 001_ 02. etc.
Trim whitespaces before and after remaining string
Example Regex for 2.
\S+(mp3|dj|download)\S+
https://regex101.com/r/nxp4d3/1
Try this regex ....
Find:^[0-9. ]*(song\d+ (\w+ )?).*?(\.mp3 ?)?$
Replace with:$1$3
P.S , if this code doesn't solve your problem, please share a sample of your real data, so someone well better understand you,
Thanks...
For the example data, you might use:
^\h*(?:\d+\W*)?(\w+(?:\h+\w+)*).*?(\.mp3)?\h*$
The pattern matches:
^ Start of string
\h* Match optional leading spaces
(?:\d+\W*)? Match 1+ digits followed by optional non word characters
(\w+(?:\h+\w+)*) Capture group 1, match word characters optionally repeated with a space in between
.*? Match any character except a newline, as least as possible
(\.mp3)? Optionally capture .mp3 in group 2
\h* Match optional trailing spaces
$ End of string
Regex demo
Replace with capture group 1 and group 2
$1$2

Match certain string on second line of text with regex

I'm new to regex, and would appreciate some guidance/help.
Currently, I'm looking to write an expression, that derives a certain part of text from the 2nd line of the provided text.
Here is the text:
123 anywhere Avenue
Winnipeg, Manitoba R3E 0L7
Canada
Pharmacy Manager: person person
Pharmacy Licence Holder/Owner: 123456 Manitoba Ltd.
see correct formatting with code here
My goal is to derive the 'Manitoba' string from the second line, however I'd like to make it dynamic rather than writing an expression to always fetch Manitoba as a static. I used the below code to target the second line:
(.*)(?=(\n.*){3}$)
(It matches 3 lines up from the last line, thus targeting the desired line)
I noticed, that within the dataset, that the Province (Manitoba) is always in between two spaces.
Is there any addition I can make to the code, so that the expression only targets the second line, then matches the first string in-between spaces?
Perhaps using a lazy expression with a positive lookaround?
If I target all matches in between spaces, it would take both 'Manitoba' and 'R3E 0L7' which I dont want.
I want it to only match the first piece of text in between spaces on the second line.
Any help is much appreciated :-)
Thanks.
One option could be to match the first line, then capture the second word in the second lines in capturing group 1.
Then match the rest of the second line and assert what follows is 3 times a line.
^.*\r?\n\S+[^\S\r\n]+(\S+).*(?=(?:\r?\n.*){3}$)
In parts:
^ Start of string
.*\r?\n Match the whole lines and a newline
\S+ Match 1+ non whitespace char (the first "word")
[^\S\r\n]+ Match 1+ times a whitespace char except newlines
(\S+) Capture group 1 Match 1+ times a non whitespace char (the second "word')
.* Match the rest of the line
(?= Positive lookahead, assert what follows on the right is
(?:\r?\n.*){3}$ Match 3 times a newline followed by 0+ times any except a newline and assert the end of the string
) Close lookahead
Regex demo
You could also turn the lookahead in to a match instead
^.*\r?\n\S+[^\S\r\n]+(\S+).*(?:\r?\n.*){3}$
Regex demo

Regex match the characters with same character in the given string

I am working on validating the pan card numbers. I need to check that the first character and the fifth character should be same while validating the pan card. Whatever the first character in the below string the same should be matched with the fifth character. Can anyone help me in applying the above condition?
Regex I have tried : [A-Za-z]{4}\d{4}[A-Za-z]{1}
Here is my pan card example: ABCDA9999K
If you want to match the full example string where the first A should match up with the fifth A, the pattern should match 5 occurrences of [A-Za-z]{5} instead of [A-Za-z]{4}
You could use a capturing group with a backreference ([A-Za-z])[A-Za-z]{3}\1 to account for the first 5 chars.
You might add word boundaries \b to the start and end to prevent a partial match or add anchors to assert the start ^ and the end $ of the string.
This part of the pattern {1} can be omitted.
([A-Za-z])[A-Za-z]{3}\1\d{4}[A-Za-z]
Regex demo

Regex to match a unlimited repeating pattern between two strings

I have a dataset with repeating pattern in the middle:
YM10a15b5c27
and
YM1b5c17
How can I get what is between "YM" and the last two numbers?
I'm using this but is getting one number in the end and should not.
/([A-Z]+)([0-9a-z]+)([0-9]+)/
Capture exactly two characters in the last group:
/([A-Z]+)([0-9a-z]+)([0-9]{2})/
You should use:
/^(?:([a-z]+))([0-9a-z]+)(?=\1)/
^ matches the start of the sentence. This is really important, because if your code is aaaa1234aaaa, then without the ^, it would also match the aaaa of the end.
(?:([a-z]+)) is a non-capturing group which takes any letter from 'a' to 'z' as group 1
(?=\1) tells the regex to match the text as long as it is followed by the same code at the starting.
All you have to do is extract the code by group(2)
An example is shown here.
Solution
If you want to match these strings as whole words, use \b(([a-z])\2)([0-9a-z]+)(\1)\b. If you need to match them as separate strings, use ^(([a-z])\2)([0-9a-z]+)(\1)$.
Explanation
\b - a word boundary (or if ^ is used, start of string)
(([a-z])\2) - Group 1: any lowercase ASCII letter, exactly two occurrences (aa, bb, etc.)
([0-9a-z]+) - Group 3: 1 or more digits or lowercase ASCII letters
(\1) - Group 4: the same text as stored in Group 1
\b - a word boundary (or if $ is used, end of string).

Regex to find Upper case character at beginning of each word in a field

I created a function that will compare a field against a regex and return 0 if it doesn't match the patter and 1 if it does. I've already created the class so I could create a UDF for the pattern matching.
function(expression,rexex) //If it matches it
I have been researching regex in SQL server for a bit this weekend and am at a bit of a crossroad.
I basically need to have the following pattern with 1 passing and 0 failing. Basically I want the first letter of every word do be capitalized:
the dog is bad - 0
The Dog Is Bad - 1
I'm ashamed to say that it's taken me all day just to figure out how to idenfity the first letter of each work and see if it's capital.
Here is what I have so far.
[\p{Lu}\p{Lt}]
Any help or nudge in the right direction would be appreciated.
Start of match (^) followed by one or more groups ((...)+) of a capital letter ([A-Z]) followed by zero or more word characters (\w*) followed by one or more spaces, or the end ((\s+|$)).
/^([A-Z]\w*(\s+|$))+/
This assumes letters only, and only one space per word:
^((?:\b[A-Z][a-z]*\b) {0,1})+$
Debuggex Demo
Free spaced:
^ //Start of line
( //(Capture)
(?: //(Non-capture)
\b // Followed by word boundary
[A-Z] // Followed by a capital letter
[a-z]* // Followed by zero or more lowercase letters
\b // Followed by word boundary
) {0,1} // Followed by either no space, or one space
)+ // One or more times
$ //End of line
You can use a Negative Lookahead (?!) to validate the line/sentence:
/(?!.*?\b[a-z].*?\b)^.*?$/gm
This will not pass on any string or line which has a word that begins with a lowercase letter.
As it seems you want to be unicode compatible, I'd do:
(?:^|\s+)(\p{lu}\p{Ll}*)