Regex to extract Credit Card information - regex

I have some data from which I only want to extract Card details i.e. Card number, Expr month/Year and Cvv.
I tried many patterns but none of them worked for both of them. If one gets matched then the other wont.
Test Data:
4400634848591837Cvv: 362Expm: 04Expy: 20
4400634848591837:04 20 362
4400634848591837|04/24 362
4400634848591837 0420 362
Regex:
(\d{16})[\/\s:|]*?(\d\d)[\/\s|]*?(\d{2,4})[\/\s|-]*?(\d{3})
This matches the rest of them but I haven't figured out how to match first line. I have tried +- Lookahead & Lookbehind but It never worked for me. So any help would be great.
Demo: Here

The part after the 16 digits for the first line has a different format, and the order of the values is also different.
You can use an alternation | with 3 groups to get the values vor the Cvv part.
Note that you don't have to make the character class [\/\s|-]*? non greedy using the ? as the characters can not cross matching the digits that follow.
\b(\d{16})(?:[\/\s:|]*(\d\d)[\/\s|]*(\d{2,4})[\/\s|-]*(\d{3})|Cvv:\s*(\d{3})Expm:\s*(\d\d)Expy:\s*(\d\d))\b
\b A word boundary to prevent a partial match
(\d{16})(?:[\/\s:|]*(\d\d)[\/\s|]*(\d{2,4})[\/\s|-]*(\d{3}) The pattern for the last 3 lines
| Or
Cvv:\s*(\d{3})Expm:\s*(\d\d)Expy:\s*(\d\d)) The pattern for the first line, matching the texts in the line followed by capturing the digits in 3 groups
\b A word boundary
Regex demo

Related

Negate a character group to replace all other characters

I have the following string:
"Thu Dec 31 22:00:00 UYST 2009"
I want to replace everything except for the hours and minutes so I get the following result:
"22:00"
I am using this regex :
(^([0-9][0-9]:[0-9][0-9]))
But its not matching anything.
This would be my line of actual code :
println("Thu Dec 31 22:00:00 UYST 2009".replace("(^([0-9][0-9]:[0-9][0-9]))".toRegex(),""))
Can someone help me to correct the regex?
The reason the one you have isn't working is because you are asserting that the line starts right before the minutes and seconds, which isn't the case. This can be fixed by removing the assertion (^).
If you need the assertion to remain, there is another way. In most languages, you wouldn't be able to use a variable-length positive lookbehind here, but lucky for you, it looks like you can in Kotlin.
A positive lookbehind is basically just telling the pattern "this comes before what I'm looking for". It's denoted by a group beginning with ?<=. In this case, you can use something like (?<=^[\w ]+). This will match all word characters or spaces between the beginning of the line and where the pattern that comes after it is able to match. Appending it to your expression would look something like (?<=^[\w ]+)([0-9][0-9]:[0-9][0-9]) (note you will have to escape the \w in order for it to be in a string and not be angry about it).
Side note, Yogesh_D is correct in saying that \d\d:\d\d is the same as your [0-9][0-9]:[0-9][0-9]. Using this, it would look more like (?<=^[\w ]+)\d\d:\d\d.
You may use various solutions, here are two:
val text = """Thu Dec 31 22:00:00 UYST 2009"""
val match = """\b(?:0?[1-9]|1\d|2[0-3]):[0-5]\d\b""".toRegex().find(text)
println(match?.value)
val match2 = """\b(\d{1,2}:\d{2}):\d{2}\b""".toRegex().find(text)
println(match2?.groupValues?.getOrNull(1))
Both return 22:00. See regex #1 demo and regex #2 demo.
The regex complexity should be selected based on how messy the input string is.
Details
\b - a word boundary
(?:0?[1-9]|1\d|2[0-3]) - an optional zero and then a non-zero digit, or 1 and any digit, or 2 and a digit from 0 to 3
: - a : char
[0-5]\d - 0, 1, 2, 3, 4 or 5 and then any one digit
\b - a word boundary.
If there is a match with this regex, you get it as a whole match, so you can access it via match?.value.
If you do not have to worry about any pre-valiation when matching, you may simply match 3 colon-separated digit pairs and capture the first two, see the second regex:
\b - a word boundary
(\d{1,2}:\d{2}) - Group 1: one or two digits, : and two digits
:\d{2} - a : and two digits (not captured)
\b - a word boundary.
If there is a match, we need Group 1 value, hence match2?.groupValues?.getOrNull(1) is used.
I am not sure what language you are using but why use negation when you can directly match the first digits in the hh:mm format.
Assuming that the date string format always is in the format with a hh:mm in there.
This regex snippet should have the first group match the hh:mm.
https://regex101.com/r/aHdehZ/1
The regex to use is (\d\d:\d\d)

I need to extract all words prior to 4th Space in a line

Good Day
I need to extract all words prior to 5th Space in a line.
Sample Data
Article Number Crt.DI No. Date
6ZZ 999 123 S 000000093 19.01.2021
Article description Replace DI No. Date
I have written a expression to extract what is in between Date and Article and the result is this
(?<=Date)(.|\n)*(?=Article)
6RU 999 123 S 000000093 19.01.2021
however I need to retrieve all those characters before the 4 space
6ZZ 999 123 S
This is a material number and this can be 13 or 14 characters before the 4th space.
Appreciate your support.
Sample Data
Article Number Crt.DI No. Date
6RU 999 123 S 000000093 19.01.2021
Article description Replace DI No. Date
(Please Note : There is new lines in between, these are three consecutive lines and each line is followed by an enter key)
Regards,
Manjesh
You can use a capture group, and use \s to match a whitespace character or a newline.
The capture group approach can be more flexible in case you want to match more than one whitespace chars or newlines after Date and a quantifier in a lookbehind assertion is not supported.
\bDate\s+(\S+(?:\s+\S+){3})[\s\S]*?\bArticle\b
See a regex demo.
Or using lookarounds to get a match only.
(?<=\bDate\s)\S+(?:\s+\S+){3}(?=[\s\S]*?\bArticle\b)
The pattern matches:
(?<=\bDate\s) Positive lookbehind to assert Date to the left followed by a whitespace char that can also match a newline
\S+ Match 1 or more non whitespace chars
(?:\s+\S+){3}
(?= Positive lookahead to assert that what at the right is
[\s\S]*? Match any character including newlines
\bArticle\b Match the word Article
) Close the lookahead
See another regex demo.

Regex to get the word after specific match words

I am trying to pull the dollar amount from some invoices. I need the match to be on the word directly after the word "TOTAL". Also, the word total may sometimes appear with a colon after it (ie Total:). An example text sample is shown below:
4 Discover Credit Purchase - c REF#: 02353R TOTAL: 40.00 AID: 1523Q1Q TC: mzQm 40.00 CHANGE 0.00 TOTAL NUMBER OF ITEMS SOLD = 0 12/23/17 Ql:38piii 414 9 76 1G6 THANK YOU FOR SHOPPING KR08ER Now Hiring - Apply Today!
In the case of the sample above, the match should be "40.00".
The Regex statement that I wrote:
(?<=total)([^\n\r]*)
pulls EVERYTHING after the word "total". I only want the very next word.
This (unlike other answers so far) matches only the total amount (ie without needing to examine groups):
((?<=\bTOTAL\b )|(?<=\bTOTAL\b: ))[\d.]+
See live demo matching when input has, and doesn’t have, the colon after TOTAL.
The reason 2 look behinds (which don’t capture input) are needed is they can’t have variable length. The optional colon is handled by using an alternation (a regex OR via ...|...) of 2 look behinds, one with and one without the colon.
If TOTAL can be in any case, add (?i) (the ignore case flag) to the start of the regex.
What you could do is match total followed by an optional colon :? and zero or more times a whitespace character \s* and capture in a group one or more digits followed by an optional part that matches a dot and one or more digits.
To match an upper or lowercase variant of total you could make the match case insensitive by for example by adding a modifier (?i) or use a case insensitive flag.
\btotal:?\s*(\d+(?:\.\d+)?)
The value 40.00 will be in group 1.
Explanations are in the regex pattern.
string str = "4 Discover Credit Purchase - c REF#: 02353R TOTAL: 40.00 AID: 1523Q1Q";
string pattern = #"(?ix) # 'i' means case-insensitive search
\b # Word boundary
total # 'TOTAL' or 'total' or any other combination of cases
:? # Matches colon if it exists
\s+ # One or more spaces
(\d+\.\d+) # Sought number saved into group
\s # One space";
// The number is in the first group: Groups[1]
Console.WriteLine(Regex.Match(str, pattern).Groups[1].Value);
you can use below regex to get amount after TOTAL:
\bTOTAL\b:?\s*([\d.]+)
It will capture the amount in first group.
Link : https://regex101.com/r/tzze8J/1/
Try this pattern: TOTAL:? ?(\d+.\d+)[^\d]?.
Demo

Regexp: find out if value that repeats several times

I have strings:
TH 8H 5C QS TC
9S 4S JS KS JS
I want the second one to be picked up by reqexp. Help me please to contract the necessary expression.
What I tried so far is: S{5} but of course it look up sequentially.
Could I avoid determining which character I am looking for. I need 5 repetition of any. Could it be like .{5} ?
Thanks in advance!
If you have standalone strings, use
^\wS(?: \wS){4}$
See the regex demo
If these strings appear inside a larger text, replace the ^ and $ anchors with word boundaries \b:
\b\wS(?: \wS){4}\b
See another demo
Note that \w matches any alphanumeric or underscore character. If there can be any non-whitespace character, use \S instead:
\b\SS(?: \SS){4}\b
One more demo
\SS will match a non-whitespace followed with an S and (?: \SS){4} will match 4 same sequences (thus, there will be 5 2-character sequences with S at the end of each).

Limit number of character of capturing group

Let's say i have this text : "AAAA1 AAA11 AA111AA A1111 AAAAA AAAA1111".
I want to find all occurrences matching these 3 criteria :
-Capital letter 1 to 4 times
-Digit 1 to 4 times
-Max number of characters to be 5
so the matches would be :
{"AAAA1", "AAA11", "AA111", "A1111", "AAAA1"}
i tried
([A-Z]{1,4}[0-9]{1,4}){5}
but i knew it would fail, since it's looking for five time my group.
Is there a way to limit result of the groups to 5 characters?
Thanks
You can limit the character count with a look ahead while checking the pattern with you matching part.
If you can split the input by whitespace you can use:
^(?=.{2,5}$)[A-Z]{1,4}[0-9]{1,4}$
See demo here.
If you cannot split by whitespace you can use capturing group with (?:^| )(?=.{2,5}(?=$| ))([A-Z]{1,4}[0-9]{1,4})(?=$| ) for example, or lookbehind or \K to do the split depending on your regex flavor (see demo).
PREVIOUS ANSWER, wrongly matches A1A1A, updated after #a_guest remark.
You can use a lookahead to check for your pattern, while limiting the character count with the matching part of the regex:
(?=[A-Z]{1,4}[0-9]{1,4}).{2,5}
See demo here.