Regex for masking data - regex

I am trying to implement regex for a JSON Response on sensitive data.
JSON response comes with AccountNumber and AccountName.
Masking details are as below.
accountNumber Before: 7835673653678365
accountNumber Masked: 783567365367****
accountName Before : chris hemsworth
accountName Masked : chri* *********
I am able to match above if I just do [0-9]{12} and (?![0-9]{12}), when I replace this, it is replacing only with *, but my regex is not producing correct output.
How can I produce output as above from regex?

If all you want is to mask characters except first N characters, don't think you really a complicated regex. For ignoring first N characters and replacing every character there after with *, you can write a generic regex like this,
(?<=.{N}).
where N can be any number like 1,2,3 etc. and replace the match with *
The way this regex works is, it selects every character which has at least N characters before it and hence once it selects a character, all following characters also get selected.
For e.g in your AccountNumber case, N = 12, hence your regex becomes,
(?<=.{12}).
Regex Demo for AccountNumber masking
Java code,
String s = "7835673653678365";
System.out.println(s.replaceAll("(?<=.{12}).", "*"));
Prints,
783567365367****
And for AccountName case, N = 4, hence your regex becomes,
(?<=.{4}).
Regex Demo for AccountName masking
Java code,
String s = "chris hemsworth";
System.out.println(s.replaceAll("(?<=.{4}).", "*"));
Prints,
chri***********

If you match [0-9]{12} and replace that directly with a single asterix you are left with accountNumber Before: *8365
There is no programming language listed, but one option to replace the digits at the end is to use a positive lookbehind to assert what is on the left are 12 digits followed by a positive lookahead to assert what is on the right are 0+ digits followed by the end of the string.
Then in the replacement use *
If the value of the json exact the value of chris hemsworth and 7835673653678365 you can omit the positive lookaheads (?=\d*$) and (?=[\w ]*$) which assert the end of the string for the following 2 expressions.
Use the versions with the positive lookahead if the data to match is at the end of the string and the string contains more data so you don't replace more matches than you would expect.
(?<=[0-9]{12})(?=\d*$)\d
In Java:
(?<=[0-9]{12})(?=\\d*$)\\d
(?<=[0-9]{12}) Positive lookbehind, assert what is on the left are 12 digits
(?=\d*$) Positive lookahead, assert what is on the right are 0+ digits and assert the end of the string
\d Match a single digit
Regex demo
Result:
783567365367****
For the account name you might do that with 4 word characters \w but this will also replace the whitespace with an asterix because I believe you can not skip matching that space in one regex.
(?<=[\w ]{5})(?=[\w ]*$)[\w ]
In Java
(?<=[\\w ]{4})(?=[\\w ]*$)[\\w ]
Regex demo
Result
chri***********

Related

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

php check ncr with negative lookbehind and greedy doesn't work

I want to find a erroneous NCR without &# and remedy it, the unicode is 4 or 5 decimal digit, I write this PHP statement:
function repl0($m) {
return '&#'.$m[0];
}
$s = "This is a good 23200; sample ship";
echo "input1= ".htmlentities($s)."<br>";
$out1=preg_replace_callback('/(?<!#)(\d{4,5};)/','repl0',$s);
echo 'output1 = '.htmlentities($out1).'<br>';
The output is:
input1= This is a good 23200; sample ship
output1 = This is a good 2ಀ sample ship
The match only happens once according to the output message.
What I want is to match '23200;' instead of '3200;'.
Default should be greedy mode and I thought it will capture 5-digit number instead 4-digit's
Do I misunderstand 'greedy' here? How can I get what I want?
The (?<!#)(\d{4,5};) pattern matches like this:
(?<!#) - matches a location that is not immediately preceded with #
(\d{4,5};) - then tries to match and consume four or five digits and a ; char immediately after these digits.
So, if you have #32000; string input, 3 cannot be a starting character of a match, as it is preceded with #, but 2 can since it is not preceded by a # and there are five digits with a ; for the pattern to match.
What you need here is to curb the match on the left by adding a digit to the lookbehind,
(?<![#\d])(\d{4,5};)
With this trick, you ensure that the match cannot be immediately preceded with either # or a digit.
You say you finally used (?<!#)(?<!\d)\d{4,5};, and this pattern is functionally equivalent to the pattern above since the lookbehinds, as all lookarounds, "stand their ground", i.e. the regex index does not move when the lookaround patterns are matched. So, the check for a digit or a # char occurs at the same location in the string.

Extract a specific string from within a string using regular expressions

I need a regular expression to match a string within a longer string.
Specifically I need to not match any leading zeros or the last 2 digits for the string.
For example, my input might be the following:
00009666666605
00010444444404
00007Z22222205
00033213433104
00009000G00005
And I would like to match
96666666
104444444
7Z222222
332134331
9000G000
For further information, the last 2 digits are always numbers and describe the starting point of the valid reference, after the leading zeros.
I thought I'd cracked it with something like
(?<=0000).{8}|((?<=000).{9})+? but that doesn't work as expected.
It sure takes a lot of steps, but this should do the trick:
(?<=^000)[^0].{8}|(?<=^0000).{8}
(?<= 'start lookbehind
^000 'for the beginning of the string then three zeroes
) 'end lookbehind
[^0] 'match a non-zero
.{8} 'match the remaining 8 chars
| ' OR
(?<= 'start lookbehind
^0000 'for the beginning of the string then four zeroes
) 'end lookbehind
.{8} 'match the remaining 8 chars
That said, in .NET, it will be quicker to do:
dim trimmed = line.TrimStart("0"c)
dim numberString = trimmed.Substring(0,trimmed.Length-2)
if the format of these string is always the same
I would use:
^0*(.*).{2}$
And access your matches via $1
Regex Storm demo

Regex not returning all matches

I have the following regex (my actual regex is actually a lot more complex but I pinned down my problem to this): \s(?<number>123|456)\s
And the following test data:
" 123 456 "
As expected/wanted result I would have the regex match in 2 matches one with "number" being "123" and the second with number being "456". However, I'm only getting 1 match with "number" being "123".
I did notice that adding another space in between "123" en "456" in the test data does give 2 matches...
Why don't I get the result I want? How to get it right?
Your pattern contains consuming \s patterns that matches a whitespace before and after a number, and the input contains consecutive numbers separated with a single whitespace. If there were two spaces between the numbers, it would work.
Use whitespace boundaries based on lookarounds:
(?<!\S)(?<number>123|456)(?!\S)
See the regex demo
The (?<!\S) is a negative lookbehind that will fail the match if there is a non-whitespace char immediately to the left of the current location, and (?!\S) is a negative lookahead that will fail the match if there is a non-whitespace char immediately to the right of the current location.
(?<!\S) is the same as (?<=^|\s) and (?!\S) is the same as (?=$|\s), but more efficient.
Note that in many situations you might even go with 1 lookahead and use
\s(?<number>123|456)(?!\S)
It will ensure the consecutive whitespace separated matches are found.

RegEx No more than 2 identical consecutive characters and a-Z and 0-9

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!
http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.
Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row