Trying to obfuscate an email to this format:
a***#******m
Meaning I need a regex to match everything except first and last character, as well as #.
I can use [^#] for everything but how do I ignore the last and first characters in a String? Anchors seem like the way to go but what is the exact syntax?
How about using a lookahead:
(?!^|.$)[^#\s]
See this demo at regex101
I also added white space to the characters that won't be replaced.
If the tool or language you use supports lookarounds, you can use:
(?<!^)[^#](?!$)
Demo: https://regex101.com/r/5Tbaq7/1
There is no language tagged, but if you are using a programming language and you want to make sure that there is an # sign in the email address and that the first and last character are shown, you might use capturing groups and use replace on the groups that you want to show with an *:
^(\S)([^#\n]*)(#)([^#\n]*)(\S)$
^ Start of string
(\S) Capture group 1, match a non whitespace char
([^#\s]*) Capture group 2, match 0+ times not an # or a whitespace char
(#) Capture group 3, Match #
([^#\s]*) Capture group 4, match 0+ times not an # or a whitespace char
(\S) Capture group 5, match a non whitespace char
$ End of string
Regex demo
For example using javascript
let pattern = /^(\S)([^#\s]*)(#)([^#\s]*)(\S)$/;
[
"test#test.com",
"te st#te st.com",
"test#test#test.com",
"te#nl",
"t#t",
"test#",
"#tes",
"test"
].forEach(str => {
let replaced = str.replace(pattern, function(_, g1, g2, g3, g4, g5) {
return g1 + g2.replace(/./g, "*") + g3 + g4.replace(/./g, "*") + g5;
});
console.log(replaced);
});
Related
I would need to retrieve an email id from a email address.
(i.e. this-is-the-best.email#gmail.com => this-is-the-best.email)
The regex that I used is (.*)#.* .
Now I need truncate the string with N characters.
(i.e. N=7 => this-is N=30 =>this-is-the-best.email)
How would I add this to a existing regex?
Any other recommendations?
What about: ([^#]{1,7}).+?
this-is-the-best-email#gmail.com
short#hotmail.co.uk
Becomes:
this-is
short
I think that this is what you are looking for:
((.{1,7}).*)#.+
The first capturing group contains the full id and the second group contains up to 7 chars.
In your pattern (.*)#.* you don't need the trailing .* as it is optional, and the dot can match any character including spaces and the # itself which can match much more that just an email like address.
The thing of interest is the non whitespace chars excluding an # char before actually matching the #, and in that case you can use a capture group matching 7 non whitespace chars.
([^\s#]{7})[^\s#]*#
The pattern matches:
([^\s#]{7}) Capture group 1, match 7 non whitespace chars excluding #
[^\s#]* Optionally match any non whitespace char excluding #
# Match literally
Regex demo
I'm struggling with the following combination of characters that I'm trying to parse:
I have two types of text:
1. AF-B-W23F4-USLAMC-X99-JLK
2. LS-V-A23DF-SDLL--X22-LSM
I want to get the last two combination of characters devided by - within dash.
From the 1. X99-JLK and from the 2. X22-LSM
I accomplished the 2. with the following regex '--(.*-.*)'
How can I parse the 1. sample and is there any option to parse it at one time with something like OR operator?
Thanks for any help!
The pattern --(.*-.*) that you tried matches the second example because it contains -- and it matches the first occurrence.
Then it matches until the end of the string and backtracks to find another hyphen.
As .* can match any character (also -) and there are no anchors or boundaries set, this is a very broad match.
If there have to be 2 dashes, you can match the first one, and use a capture group for the part with the second one using a negated character class [^-]
The character class can also match a newline. If you don't want to match a newline you can use [^-\r\n] or also not matching spaces [^-\s] (as there are none in the example data)
-([^-]+-[^-]+)$
Explanation
- Match -
( Capture group 1
[^-]+-[^-]+ Match the second dash between chars other than -
) Close group 1
$ End of string
See a regex demo
For example using Javascript:
const regex = /-([^-]+-[^-]+)$/;
[
"AF-B-W23F4-USLAMC-X99-JLK",
"LS-V-A23DF-SDLL--X22-LSM"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m[1]);
}
})
You can try lookahead to match the last pair before the new line. JavaScript example:
const str = `
AF-B-W23F4-USLAMC-X99-JLK
LS-V-A23DF-SDLL--X22-LSM
`;
const re = /[^-]*-[^-]*(?=\n)/g;
console.log(str.match(re));
Hi I'm a regex noob and I'd like to make a regex in order to extract the penultimate string from the URL if the word "xxxx" is contained or the last string if the word "xxxx" is not contained.
For example, I could have 2 scenarios:
www.hello.com/aaaa/1adf0023efae456
www.hello.com/aaaa/1adf0023efae456/xxxx
In both cases I want to extract the string 1adf0023efae456.
I've tried something like (?=(\w*xxxx\w*)\/.*\/(.*?)\/|[^\/]+$) but doesn't work properly.
You can match the forward slash before the digits, then match digits and assert what follows is either xxxx or the end of the string.
\d+(?=/xxxx|$)
Regex demo
If there should be a / before matching the digits, you could use a capturing group and get the value from group 1
/(\d+)(?=/xxxx|$)
/ Match /
(\d+) Capture group 1, match 1+ digits
(?=/xxxx|$) Positive lookahead, assert what is on the right is either xxxx or end of string
Regex demo
Edit
If there could possibly also be alphanumeric characters instead of digits, you could use a character class [a-z0-9]+ with an optional non capturing group.
/([a-z0-9]+)(?:/xxxx)?$
Regex demo
To match any char except a whitespace char or a forward slash, use [^\s/]+
Using lookarounds, you could assert a / on the left, match 1+ alphanumerics and assert what is at the right is either /xxxx or the end of the string which did not end with /xxxx
(?<=/)[a-z0-9]+(?=/xxxx$|$(?<!/xxxx))
Regex demo
You could avoid Regex:
string[] strings =
{
"www.hello.com/aaaa/1adf0023efae456",
"www.hello.com/aaaa/1adf0023efae456/xxxx"
};
var x = strings.Select(s => s.Split('/'))
.Select(arr => new { upper = arr.GetUpperBound(0), arr })
.Select(z => z.arr[z.upper] == "xxxx" ? z.arr[z.upper - 1] : z.arr[z.upper]);
So I need to match the following:
1.2.
3.4.5.
5.6.7.10
((\d+)\.(\d+)\.((\d+)\.)*) will do fine for the very first line, but the problem is: there could be many lines: could be one or more than one.
\n will only appear if there are more than one lines.
In string version, I get it like this: "1.2.\n3.4.5.\n1.2."
So my issue is: if there is only one line, \n needs not to be at the end, but if there are more than one lines, \n needs be there at the end for each line except the very last.
Here is the pattern I suggest:
^\d+(?:\.\d+)*\.?(?:\n\d+(?:\.\d+)*\.?)*$
Demo
Here is a brief explanation of the pattern:
^ from the start of the string
\d+ match a number
(?:\.\d+)* followed by dot, and another number, zero or more times
\.? followed by an optional trailing dot
(?:\n followed by a newline
\d+(?:\.\d+)*\.?)* and another path sequence, zero or more times
$ end of the string
You might check if there is a newline at the end using a positive lookahead (?=.*\n):
(?=.*\n)(\d+)\.(\d+)\.((\d+)\.)*
See a regex demo
Edit
You could use an alternation to either match when on the next line there is the same pattern following, or match the pattern when not followed by a newline.
^(?:\d+\.\d+\.(?:\d+\.)*(?=.*\n\d+\.\d+\.)|\d+\.\d+\.(?:\d+\.)*(?!.*\n))
Regex demo
^ Start of string
(?: Non capturing group
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
(?=.*\n\d+\.\d+\.) Positive lookahead, assert what follows a a newline starting with the pattern
| Or
\d+\.\d+\. Match 2 times a digit and a dot
(?:\d+\.)* Repeat 0+ times matching 1+ digits and a dot
*(?!.*\n) Negative lookahead, assert what follows is not a newline
) Close non capturing group
(\d+\.*)+\n* will match the text you provided. If you need to make sure the final line also ends with a . then (\d+\.)+\n* will work.
Most programming languages offer the m flag. Which is the multiline modifier. Enabling this would let $ match at the end of lines and end of string.
The solution below only appends the $ to your current regex and sets the m flag. This may vary depending on your programming language.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /((\d+)\.(\d+)\.((\d+)\.)*)$/gm,
match;
while (match = regex.exec(text)) {
console.log(match);
}
You could simplify the regex to /(\d+\.){2,}$/gm, then split the full match based on the dot character to get all the different numbers. I've given a JavaScript example below, but getting a substring and splitting a string are pretty basic operations in most languages.
var text = "1.2.\n3.4.5.\n1.2.\n12.34.56.78.123.\nthis 1.2. shouldn't hit",
regex = /(\d+\.){2,}$/gm;
/* Slice is used to drop the dot at the end, otherwise resulting in
* an empty string on split.
*
* "1.2.3.".split(".") //=> ["1", "2", "3", ""]
* "1.2.3.".slice(0, -1) //=> "1.2.3"
* "1.2.3".split(".") //=> ["1", "2", "3"]
*/
console.log(
text.match(regex)
.map(match => match.slice(0, -1).split("."))
);
For more info about regex flags/modifiers have a look at: Regular Expression Reference: Mode Modifiers
Suppose I have an email address, 'abcdef#gmail.com'. I want to replace all the characters between 'a' and 'f' so the result would look like 'a****f#gmail.com'.
Trying to do this with a regex and replace
str.replace(/^(.*?)#/gi, '*');
But the results look like this
*gmail.com
Is there a way to do what I need?
This is not an answer to your actual question, but I'd like to challenge you that your idea is not a good one. It's best not to show how long an email address is by replacing the internal letters with the same number of *s. It's better to use a fixed number of *s.
You seem to be using javascript, which doesn't have lookbehind assertions, and capturing in this case may be simpler to understand too, so I'd do this to replace with a constant number of *s
str.replace(/^(.).*(.#)/, '$1***$2')
I'd use a replace with a callback, where the user middle part can be also replaced with *s:
var email = "abcdef#gmail.com";
document.write(email.replace(/^(.)(.*)(.#[^#]*)$/, function(m, g1, g2, g3) {
return g1 + g2.replace(/./g, "*") + g3;
}));
Here is how the "outer" /^(.)(.*)(.#[^#]*)$/ regex works:
^ - matches start of a string
(.) - Group 1: any first character
(.*) - Group 2: any characters up to the character before the last #`
(.#[^#]*) - Group 3: one character before the last #, then # and then any 0+ characters other than # up to...
$ - end of string
The .replace(/./g, "*") will just replace any character with *. And it will be done only on the Group 2.
The regex you suggested in the comment should also work.
/(?!^).(?=[^#]+#)/g matches any character but a newline that is not the first character ((?!^)) and that has 1+ characters other than # after it and a #.
var re = /(?!^).(?=[^#]+#)/g;
document.body.innerHTML = "fake#gmail.com".replace(re, "*");