Regex match only the third occurrence of a pattern - regex

The example text is as follows:
01MAR2015 01MAR2015 Example Example
02MAR2015 Example Example Example
03MAR2015 Example Example $2.45
I want to select all the text from the third date (second row) all the way to the dollar amount. I don't know how to skip the first two dates. Thanks for any help.
Expected output:
02MAR2015 Example Example Example
03MAR2015 Example Example $2.45
What I have for now:
([0-9]{2}[A-Z]{3}[0-9]{4}) # to match the date
((\d)*\.(\d){2}) # to match the dollar amount
(?<=([0-9]{2}[A-Z]{3}[0-9]{4}){2})\1.*((\d)*\.(\d){2}) # my attempt

You seem to need to match the text starting at the second line. In AHK, you may use PCRE compatible patterns.
Use
(?<=\n)[0-9]{2}[A-Z]{3}[0-9]{4}[\w\W]*
See the regex demo.
Details
(?<=\n) - matching will start after a newline
[0-9]{2} - 2 digits
[A-Z]{3} - 3 uppercase letters
[0-9]{4} - 4 digits
[\w\W]* - any 0+ chars as many as possible.

Related

regex matching consecutive characters from start and end

Im trying to match a string to that containsthree consecutive characters at the beginning of the line and the same six consecutive characters at the end.
for example
CCC i love regex CCCCCC
the C's would be highlighted from search
I have found a way to find get the first 3 and the last six using these two regex codes but im struggling to combine them
^([0-9]|[aA-zZ])\1\1 and ([0-9]|[aA-zZ])\1\1\1\1\1$
appreciate any help
If you want just one regular expression to "highlight" only the 1st three characters and last six, maybe use:
(?:^([0-9A-Za-z])\1\1(?=.*\1{6}$)|([0-9A-Za-z])\2{5}(?<=^\2{3}.*)$)
See an online demo
(?: - Open non-capture group to allow for alternations;
^([0-9A-Za-z])\1\1(?=.*\1{6}$) - Start-line anchor with a 1st capture group followed by two backreferences to that same group. This is followed by a positive lookahead to assert that the very last 6 characters are the same;
| - Or;
([0-9A-Za-z])\2{5}(?<=^\2{3}.*)$ - The alternative is to match a 2nd capture group with 5 backreferences to the same followed by a positive lookbehind (zero-width) to check that the first three characters are the same.
Now, if you don't want to be too strict about "highlighting" the other parts, just use capture groups:
^(([0-9A-Za-z])\2\2).*(\2{6})$
See an online demo. Where you can now refer to both capture group 1 and 3.

Why my Regex is only giving me ONE group back?

Im currenty having issues with a regex that Im creating. The regex has to extract all the groups that says number #### between Hello and Regards. At this moment my regex only extracts one group and I need all the groups inside, at this case I have 2, but there may be more inside.
Regex Image
I'm using the web page https://regex101.com/
Flavor: PCRE (PHP)
Regex: Hello\s.*(number\s*[\d]*)\s.*Regards
Text:
This is my test text number 25120
Hello my name is testing
I'm 20 years old
Please help me with the regex number 1542
I have been trying to create the regex many times this is my number 5152
Regards
I'm still trying my attempt number 5150
Result:
My Result is only the group number 5152 but inside is another group number 1542.
You may use
(?si)(?:\G(?!\A)|\bHello\b)(?:(?!\bHello\b).)*?\K\bnumber\s*\d+(?=.*?\bRegards\b)
See the regex demo.
Details
(?si) - s - DOTALL modifier making . match any chars, and i makes the pattern case insensitive
(?:\G(?!\A)|\bHello\b) - either the end of the previous match (\G(?!\A)) or (|) a whole word Hello (\bHello\b)
(?:(?!\bHello\b).)*? - any char, 0 or more times but as few as possible, that does not start a whole word Hello char sequence
\K - match reset operator that discards all text matched so far
\bnumber - a whole word number
\s* - 0+ whitespaces
\d+ - 1+ digits
(?=.*?\bRegards\b) - there must be a whole word Regards somewhere after any 0+ chars (as few as possible).

Regex Giftcard number pattern

I am trying to come up with a regex for a giftcard number pattern in an application. I have this so far and it works fine:
(?:5049\d{12}|6219\d{12}) = 5049123456789012
What I need to account for though is numbers that are separated by dashed or spaces like so:
5049-1234-5678-9012
5049 1234 5678 9012
Can I chain these patterns together or do I need to make separate for each type?
The easiest and most simple regex could be:
(?:(5049|6219)([ -]?\d{4}){3})
Explanation:
(5049|6219) - Will check for the '5049' or '6219' start
(x){3} - Will repeat the (x) 3 times
[ -]? - Will look for " " or "-", ? accepts it once or 0 times
\d{4} - Will look for a digit 4 times
A more detailed explanation and example can be found here: https://regex101.com/r/A46GJp/1/
Use (?:5049|6219)(?:[ -]?\d{4}){3}
First, match one of the two leads. Then match 3 groups of 4 digits each, each group optionally preceded by space or dash.
See regex101 for demo, and also explains in more detail.
The above regex will also match if separators are mixed, e.g. 5049 1234-5678 9012. If you don't want that, use
(?:5049|6219)([ -]?)\d{4}(?:\1\d{4}){2} regex101
This captures the first separator, if any, and specifies that the following 2 groups must use that same separator.
Try this :
(?:(504|621)9(\d{12}|(\-\d{4}){3}|(\s\d{4}){3}))
https://regex101.com/r/SyjaT5/6

Regex for Chilean RUT/RUN with PCRE

I'm having issues with the validation of the chilean RUT/RUN with a regex expression in PCRE. I have the next regular expression but sadly can't make it work:
\b[0-9|.]{1,10}\-[K|k|0-9]
I need help to see what is wrong with the code. The application I need to use only uses PCRE.
Thank you.
You may use
^(\d{1,3}(?:\.\d{1,3}){2}-[\dkK])$
to match and capture (that is not usually necessary, but your app requires a capturing group to extract its contents) a whole string that matches the pattern. See the regex demo.
To match shorter strings that match this pattern inside a larger string, you may remove ^ and $ (see demo) or use \b word boundaries instead (see this demo).
Details:
^ - start of string
\d{1,3} - 1 to 3 digits
(?:\.\d{1,3}){2} - 2 sequences of a literal . and 1 to 3 digits
- - a hyphen
[\dkK] - a digit, k or K.
$ - end of string.
As they sometimes omit the dots, I used this one:
^(\d{1,2}(?:[\.]?\d{3}){2}-[\dkK])$
Details:
^ - start of string
\d{1,2} - 1 or 2 digits
(?:[.]?\d{3}){2} - 2 sequences of an optional '.' and 3 digits
- a hyphen
[\dkK] - a digit, k or K
$ - end of string
1234567-k OK
12345678-k OK
1.234.567-k OK
12.345.678-k OK
known issue:
12.345678-k and 12345.678-k still OK and I do not like this :(
You need to change to ^(\d{1,3}(?:\.\d{3}){2}-[\dkK])$ to capture only 2 sequence of 3 digits after the first sequence of 1-3 digits.
please consider being more specific in the REGEX build, since it matched wrong numbers, such as 17.87.335-2. Also the included one did't match formats without the dots or the hyphens.
Please consider using the following format: \b(\d{1,3}(?:(.?)\d{3}){2}(-?)[\dkK])\b
Modified prior version to try the other formats: https://regex101.com/r/2Us0j6/9

Using Notepad++ Regex to format phone numbers

I'm trying to format phone numbers in a large CSV directory. I will need to re-format this periodically as it changes so this is not a one-off solution. I have used Notepad++'s regex replace feature successfully in the past and would like to use this tool if possible. However, I'm open to better/faster methods including scripting like PowerShell, which I am familiar with.
Sample of number formats in the database:
XXX-XXXX
XXXXXXX
XXXXXXXXXX
1XXXXXXXXXX
(XXX) XXX-XXXX
1(XXX) XXX-XXXX
(1XXX) XXX-XXXX
XXX-XXX-XXXX
That last one is what I want all phone numbers to look like in the final output. For the one that is lacking the area code, I would add a default value. For the ones with extra country codes, I would need to truncate it.
Here are some of the regex searches I've used:
FIND: 1-(\d{3})-(\d{3})-(\d{4})
REPLACE: \1-\2-\3
This works!
FIND: 1\((\d{3})\)\s(\d{3})-(\d{4})
REPLACE: \1-\2-\3
This works!
FIND: (\d{11})
REPLACE: ???
This finds the correct string, but I don't know how to format the output.
FIND: (\d{3})-(\d{4})
REPLACE: XXX-\1-\2 (here the XXX is my standard area code that I will add)
This finds the correct substring in XXX-XXX-XXXX as well as XXX-XXXX and zip codes with +4 appended (XXXXX-XXXX). Need to just find the XXX-XXXX without anything preceding it and just from phone numbers. Because this is a CSV file, the actual character before each field is a comma.
My problem is twofold. 1) I don't know how to break up a found string into the parts I need for the replace. I need to convert blocks of digits (7, 10 and 11 digits) and format them to fit the pattern XXX-XXX-XXXX. 2) I don't know how to select just the string I'm searching for (i.e. only XXX-XXXX)
Provided you have a sample list of numbers like
Current Expected
---------------------------------
123-1234 XXX-123-1234
1234567 XXX-123-4567
1234567890 123-456-7890
10123456789 012-345-6789
(123) 456-1234 123-456-1234
1(123) 123-1234 123-123-1234
1-123-123-1234 123-123-1234
(1999) 999-1234 999-999-1234
123-123-1234 123-123-1234
You may use
Find What: ^(?:1-?)?(?|\(1?(\d{3})\)|(\d{3}))[-\s]?(\d{3})[-\s]?(\d{4})$|^(\d{3})[-\s]?(\d{4})$
Replace With: (?1$1-$2-$3:XXX-$4-$5)
Details:
^ - start of string
(?:1-?)? - optional sequence of 1 and an optional -
(?|\(1?(\d{3})\)|(\d{3})) - a branch reset group (syntax is (?|...), all groups inside alternative branches receive same IDs) matching either:
\(1?(\d{3})\) - ( + an optional 1 + Group 1 capturing 3 digits + )
| - or
(\d{3}) - Group 1 (still! because of a branch reset group) capturing 3 digits
[-\s]? - 1 or 0 (optional) - or whitespace
(\d{3}) - Group 2 capturing 3 digits
[-\s]? - an optional - or whitespace
(\d{4}) - Group 3 capturing 4 digits
$ - end of line
| - OR
^ - start of line
(\d{3}) - Group 4 capturing 3 digits
[-\s]? - an optional - or whitespace
(\d{4}) - Group 5 capturing 4 digits
$ - end of line
The replacement pattern:
(?1 - If Group 1 matched, then use
$1-$2-$3 - Backreference to Group 1, 2 and 3 with hyphens in between
: - or else
XXX-$4-$5 - XXX (or whatever the country code is), and Group 4 and 5 separated with a hyphen.
) - end of the if-then block.
I'm not familiar with powershell but yea it would be a good idea to make a small script to do this for you.
For the notepad approach though, i'd try running the replace twice:
FIND: (?:^|,)(\d{3})[ -]?(\d{4})(?:,|$)
REPLACE: XXX-\1-\2 where the XXX is your input area code
FIND: \(?1?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})
REPLACE: \1-\2-\3
I don't think the order matters. Try it out in a test file first.
I'm not sure what you mean by your second question, are the regexes selecting numbers from the wrong column in csv? (if so that's another reason why a script would be better)