Regex to replace characters between strings

Regex to replace characters between strings - regex

Suppose I have an email address, 'abcdef#gmail.com'. I want to replace all the characters between 'a' and 'f' so the result would look like 'a****f#gmail.com'.
Trying to do this with a regex and replace
str.replace(/^(.*?)#/gi, '*');
But the results look like this
*gmail.com
Is there a way to do what I need?

This is not an answer to your actual question, but I'd like to challenge you that your idea is not a good one. It's best not to show how long an email address is by replacing the internal letters with the same number of *s. It's better to use a fixed number of *s.
You seem to be using javascript, which doesn't have lookbehind assertions, and capturing in this case may be simpler to understand too, so I'd do this to replace with a constant number of *s
str.replace(/^(.).*(.#)/, '$1***$2')

I'd use a replace with a callback, where the user middle part can be also replaced with *s:
var email = "abcdef#gmail.com";
document.write(email.replace(/^(.)(.*)(.#[^#]*)$/, function(m, g1, g2, g3) {
return g1 + g2.replace(/./g, "*") + g3;
}));
Here is how the "outer" /^(.)(.*)(.#[^#]*)$/ regex works:
^ - matches start of a string
(.) - Group 1: any first character
(.*) - Group 2: any characters up to the character before the last #`
(.#[^#]*) - Group 3: one character before the last #, then # and then any 0+ characters other than # up to...
$ - end of string
The .replace(/./g, "*") will just replace any character with *. And it will be done only on the Group 2.
The regex you suggested in the comment should also work.
/(?!^).(?=[^#]+#)/g matches any character but a newline that is not the first character ((?!^)) and that has 1+ characters other than # after it and a #.
var re = /(?!^).(?=[^#]+#)/g;
document.body.innerHTML = "fake#gmail.com".replace(re, "*");

Related

Regex to replace all non numbers but allow a '+' prefix

I want to delete all invalid letters from a string which should represent a phone number. Only a '+' prefix and numbers are allowed.
I tried in Kotlin with
"+1234abc567+".replace("[^+0-9]".toRegex(), "")
It works nearly perfect, but it does not replace the last '+'.
How can I modify the regex to only allow the first '+'?

You could do a regex replacement on the following pattern:
(?<=.)\+|[^0-9+]+
Sample script:
String input = "+1234abc567+";
String output = input.replaceAll("(?<=.)\\+|[^0-9+]+", "");
System.out.println(input); // +1234abc567+
System.out.println(output); // +1234567
Here is an explanation of the regex pattern:
(?<=.)\+ match a literal + which is NOT first (i.e. preceded by >= 1 character)
| OR
[^0-9+]+ match one or more non digit characters, excluding +

You can use
^(\+)|\D+
Replace with the backreference to the first group, $1. See the regex demo.
Details:
^(\+) - a + at the start of string captured into Group 1
| - or
\D+ - one or more non-digit chars.
NOTE: a raw string literal delimited with """ allows the use of a single backslash to form regex escapes, such as \D, \d, etc. Using this type of string literals greatly simplifies regex definitions inside code.
See the Kotlin demo:
val s = "+1234abc567+"
val regex = """^(\+)|\D+""".toRegex()
println(s.replace(regex, "$1"))
// => +1234567

Kotlin / Regex - Replace a group of pattern with a repeating character

I would like to mask the email passed in the maskEmail function. I'm currently facing a problem wherein the asterisk * is not repeating when i'm replacing group 2 and and 4 of my pattern.
Here is my code:
fun maskEmail(email: String): String {
return email.replace(Regex("(\\w)(\\w*)\\.(\\w)(\\w*)(#.*\\..*)$"), "$1*.$3*$5")
}
Here is the input:
tom.cat#email.com
cutie.pie#email.com
captain.america#email.com
Here is the current output of that code:
t*.c*#email.com
c*.p*#email.com
c*.a*#email.com
Expected output:
t**.c**#email.com
c****.p**#email.com
c******.a******#email.com
Edit:
I know this could be done easily with for loop but I would need this to be done in regex. Thank you in advance.

For your problem, you need to match each character in the email address that not is the first character in a word and occurs before the #. You can do that with a negative lookbehind for a word break and a positive lookahead for the # symbol:
(?<!\b)\w(?=.*?#)
The matched characters can then be replaced with *.
Note we use a lazy quantifier (?) on the .* to improve efficiency.
Demo on regex101
Note also as pointed out by #CarySwoveland, you can replace (?<!\b) with \B i.e.
\B\w(?=.*?#)
Demo on regex101
As pointed out by #Thefourthbird, this can be improved further efficiency wise by replacing the .*? with a [^\r\n#]* i.e.
\B\w(?=[^\r\n#]*#)
Demo on regex101
Or, if you're only matching single strings, just [^#]*:
\B\w(?=[^#]*#)
Demo on regex101

I suggest keeping any char at the start of string and a combination of a dot + any char, and replace any other chars with * that are followed with any amount of characters other than # before a #:
((?:\.|^).)?.(?=.*#)
Replace with $1*. See the regex demo. This will handle emails that happen to contain chars other than just word (letter/digit/underscore) and . chars.
Details
((?:\.|^).)? - an optional capturing group matching a dot or start of string position and then any char other than a line break char
. - any char other than a line break char...
(?=.*#) - if followed with any 0 or more chars other than line break chars as many as possible and then #.
Kotlin code (with a raw string literal used to define the regex pattern so as not to have to double escape the backslash):
fun maskEmail(email: String): String {
return email.replace(Regex("""((?:\.|^).)?.(?=.*#)"""), "$1*")
}
See a Kotlin test online:
val emails = arrayOf<String>("captain.am-e-r-ica#email.com","my-cutie.pie+here#email.com","tom.cat#email.com","cutie.pie#email.com","captain.america#email.com")
for(email in emails) {
val masked = maskEmail(email)
println("${email}: ${masked}")
}
Output:
captain.am-e-r-ica#email.com: c******.a*********#email.com
my-cutie.pie+here#email.com: m*******.p*******#email.com
tom.cat#email.com: t**.c**#email.com
cutie.pie#email.com: c****.p**#email.com
captain.america#email.com: c******.a******#email.com

Regex for ignoring first and last character

Trying to obfuscate an email to this format:
a***#******m
Meaning I need a regex to match everything except first and last character, as well as #.
I can use [^#] for everything but how do I ignore the last and first characters in a String? Anchors seem like the way to go but what is the exact syntax?

How about using a lookahead:
(?!^|.$)[^#\s]
See this demo at regex101
I also added white space to the characters that won't be replaced.

If the tool or language you use supports lookarounds, you can use:
(?<!^)[^#](?!$)
Demo: https://regex101.com/r/5Tbaq7/1

There is no language tagged, but if you are using a programming language and you want to make sure that there is an # sign in the email address and that the first and last character are shown, you might use capturing groups and use replace on the groups that you want to show with an *:
^(\S)([^#\n]*)(#)([^#\n]*)(\S)$
^ Start of string
(\S) Capture group 1, match a non whitespace char
([^#\s]*) Capture group 2, match 0+ times not an # or a whitespace char
(#) Capture group 3, Match #
([^#\s]*) Capture group 4, match 0+ times not an # or a whitespace char
(\S) Capture group 5, match a non whitespace char
$ End of string
Regex demo
For example using javascript
let pattern = /^(\S)([^#\s]*)(#)([^#\s]*)(\S)$/;
[
"test#test.com",
"te st#te st.com",
"test#test#test.com",
"te#nl",
"t#t",
"test#",
"#tes",
"test"
].forEach(str => {
let replaced = str.replace(pattern, function(_, g1, g2, g3, g4, g5) {
return g1 + g2.replace(/./g, "*") + g3 + g4.replace(/./g, "*") + g5;
});
console.log(replaced);
});

Python String Dissection

Here is the problem:
Replace input string with the following: The first and last characters, separated by the count of distinct characters between the two.
Any non-alphabetic character in the input string should appear in the output string in its original relative location.
Here is the code I have thus far:
word = input("Please enter a word: ")
first_character = word[0]
last_character = word[-1]
unique_characters = (list(set(word[1:-1])))
unique_count = str(len(unique_characters))
print(first_character[0],unique_count,last_character[0])
For the second part, I have thought about using regex, however I have not been able to wrap my head around regex as it is not something I ever use.

You can use
import re
pat = r"\b([^\W\d_])([^\W\d_]*)([^\W\d_])\b"
s = "Testers"
print(re.sub(pat, (lambda m: "{0}{1}{2}".format(m.group(1), len(''.join(set(m.group(2)))), m.group(3))), s))
See the IDEONE demo.
The regex breakdown:
\b - word boundary (use ^ if you test an individual string)
([^\W\d_]) - Group 1 capturing any ASCII letter (use re.U flag if you need to match Unicode, too)
([^\W\d_]*) - Group 2 capturing zero or more letters
([^\W\d_]) - Group 3 capturing a letter at...
\b - the trailing word boundary (replace with $ if you handle individual strings)
In the replacement pattern, the len(''.join(set(m.group(2)))) is counting the number of unique letter occurrences (see this SO post).
If you need to handle 2-letter words like Ts > Ts, you may replace * with + quantifier in the second group.

Regular Expression remove leading blank and dash character

Given a string like String a="- = - - What is your name?";
How to remove the leading equal, dash, space characters, to get the clean text,
"What is your name?"

If you want to remove the leading non-alphabets you can match:
^[^a-zA-Z]+
and replace it with '' (empty string).
Explanation:
first ^ - Anchor to match at the
begining.
[] - char class
second ^ - negation in a char class
+ - One or more of the previous match
So the regex matches one or more of any non-alphabets that are at the beginning of the string.
In your case case it will get rid of all the leading spaces, leading hyphens and leading equals sign. In short everything before the first alphabet.

$a=~s/- = - - //;

In Javascript you could do it like this
var a = "- = - - What is your name?";
a = a.replace(/^([-=\s]*)([a-zA-Z0-9])/gm,"$2");

Java:
String replaced = a.replaceFirst("^[-= ]*", "");

Assuming Java try this regex:
/^\W*(.*)$/
retrieve your string from captured group 1!
\W* matches all preceding non-word characters
(.*)then matches all characters to the end beginning with the first word character
^,$ are the boundaries. you could even do without $ in this case.
Tip try the excellent Java regex tutorial for reference.

In Python:
>>> "- = - - What is your name?".lstrip("-= ")
'What is your name?'
To remove any kind of whitespace, use .lstrip("-= \t\r\n").

In Javascript, I needed to do this and did it using the following regex:
^[\s\-]+
and replace it with '' (empty string) like this:
yourStringValue.replace(/^[\s\-]+/, '');

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to replace characters between strings - regex

Related

Regex to replace all non numbers but allow a '+' prefix

Kotlin / Regex - Replace a group of pattern with a repeating character

Regex for ignoring first and last character

Python String Dissection

Regular Expression remove leading blank and dash character

Categories

Resources