Regular Expression remove leading blank and dash character - regex

Given a string like String a="- = - - What is your name?";
How to remove the leading equal, dash, space characters, to get the clean text,
"What is your name?"

If you want to remove the leading non-alphabets you can match:
^[^a-zA-Z]+
and replace it with '' (empty string).
Explanation:
first ^ - Anchor to match at the
begining.
[] - char class
second ^ - negation in a char class
+ - One or more of the previous match
So the regex matches one or more of any non-alphabets that are at the beginning of the string.
In your case case it will get rid of all the leading spaces, leading hyphens and leading equals sign. In short everything before the first alphabet.

$a=~s/- = - - //;

In Javascript you could do it like this
var a = "- = - - What is your name?";
a = a.replace(/^([-=\s]*)([a-zA-Z0-9])/gm,"$2");

Java:
String replaced = a.replaceFirst("^[-= ]*", "");

Assuming Java try this regex:
/^\W*(.*)$/
retrieve your string from captured group 1!
\W* matches all preceding non-word characters
(.*)then matches all characters to the end beginning with the first word character
^,$ are the boundaries. you could even do without $ in this case.
Tip try the excellent Java regex tutorial for reference.

In Python:
>>> "- = - - What is your name?".lstrip("-= ")
'What is your name?'
To remove any kind of whitespace, use .lstrip("-= \t\r\n").

In Javascript, I needed to do this and did it using the following regex:
^[\s\-]+
and replace it with '' (empty string) like this:
yourStringValue.replace(/^[\s\-]+/, '');

Related

Regex to replace all non numbers but allow a '+' prefix

I want to delete all invalid letters from a string which should represent a phone number. Only a '+' prefix and numbers are allowed.
I tried in Kotlin with
"+1234abc567+".replace("[^+0-9]".toRegex(), "")
It works nearly perfect, but it does not replace the last '+'.
How can I modify the regex to only allow the first '+'?
You could do a regex replacement on the following pattern:
(?<=.)\+|[^0-9+]+
Sample script:
String input = "+1234abc567+";
String output = input.replaceAll("(?<=.)\\+|[^0-9+]+", "");
System.out.println(input); // +1234abc567+
System.out.println(output); // +1234567
Here is an explanation of the regex pattern:
(?<=.)\+ match a literal + which is NOT first (i.e. preceded by >= 1 character)
| OR
[^0-9+]+ match one or more non digit characters, excluding +
You can use
^(\+)|\D+
Replace with the backreference to the first group, $1. See the regex demo.
Details:
^(\+) - a + at the start of string captured into Group 1
| - or
\D+ - one or more non-digit chars.
NOTE: a raw string literal delimited with """ allows the use of a single backslash to form regex escapes, such as \D, \d, etc. Using this type of string literals greatly simplifies regex definitions inside code.
See the Kotlin demo:
val s = "+1234abc567+"
val regex = """^(\+)|\D+""".toRegex()
println(s.replace(regex, "$1"))
// => +1234567

Kotlin / Regex - Replace a group of pattern with a repeating character

I would like to mask the email passed in the maskEmail function. I'm currently facing a problem wherein the asterisk * is not repeating when i'm replacing group 2 and and 4 of my pattern.
Here is my code:
fun maskEmail(email: String): String {
return email.replace(Regex("(\\w)(\\w*)\\.(\\w)(\\w*)(#.*\\..*)$"), "$1*.$3*$5")
}
Here is the input:
tom.cat#email.com
cutie.pie#email.com
captain.america#email.com
Here is the current output of that code:
t*.c*#email.com
c*.p*#email.com
c*.a*#email.com
Expected output:
t**.c**#email.com
c****.p**#email.com
c******.a******#email.com
Edit:
I know this could be done easily with for loop but I would need this to be done in regex. Thank you in advance.
For your problem, you need to match each character in the email address that not is the first character in a word and occurs before the #. You can do that with a negative lookbehind for a word break and a positive lookahead for the # symbol:
(?<!\b)\w(?=.*?#)
The matched characters can then be replaced with *.
Note we use a lazy quantifier (?) on the .* to improve efficiency.
Demo on regex101
Note also as pointed out by #CarySwoveland, you can replace (?<!\b) with \B i.e.
\B\w(?=.*?#)
Demo on regex101
As pointed out by #Thefourthbird, this can be improved further efficiency wise by replacing the .*? with a [^\r\n#]* i.e.
\B\w(?=[^\r\n#]*#)
Demo on regex101
Or, if you're only matching single strings, just [^#]*:
\B\w(?=[^#]*#)
Demo on regex101
I suggest keeping any char at the start of string and a combination of a dot + any char, and replace any other chars with * that are followed with any amount of characters other than # before a #:
((?:\.|^).)?.(?=.*#)
Replace with $1*. See the regex demo. This will handle emails that happen to contain chars other than just word (letter/digit/underscore) and . chars.
Details
((?:\.|^).)? - an optional capturing group matching a dot or start of string position and then any char other than a line break char
. - any char other than a line break char...
(?=.*#) - if followed with any 0 or more chars other than line break chars as many as possible and then #.
Kotlin code (with a raw string literal used to define the regex pattern so as not to have to double escape the backslash):
fun maskEmail(email: String): String {
return email.replace(Regex("""((?:\.|^).)?.(?=.*#)"""), "$1*")
}
See a Kotlin test online:
val emails = arrayOf<String>("captain.am-e-r-ica#email.com","my-cutie.pie+here#email.com","tom.cat#email.com","cutie.pie#email.com","captain.america#email.com")
for(email in emails) {
val masked = maskEmail(email)
println("${email}: ${masked}")
}
Output:
captain.am-e-r-ica#email.com: c******.a*********#email.com
my-cutie.pie+here#email.com: m*******.p*******#email.com
tom.cat#email.com: t**.c**#email.com
cutie.pie#email.com: c****.p**#email.com
captain.america#email.com: c******.a******#email.com

Trying to cut a Spotify link with regex using dart

I am trying to cut a Spotify playlist link into getting all the chars between / & ?, however I am getting nowhere with regex.
The link: https://open.spotify.com/playlist/37i9dQZF1DX60OAKjsWlA2?si=2NBcsO0bQS-CQclS1rNoCA
What I want: 37i9dQZF1DX60OAKjsWlA2
My code so far looks the following, but I am getting nothing out of it:
RegExp regExp = new RegExp(
"\w*\?",
caseSensitive: false,
multiLine: false,
);
When I print with
print("stringMatch : " +
regExp
.stringMatch(
"https://open.spotify.com/playlist/7x1ebdezDivH4mXAhUdR2S?si=TxHdzuvnTzuoCD5TFR4z_g")
.toString());
It just prints an empty String. Where am i going wrong?
You need to match 1+ word chars, or chars other than /, up to a question mark excluding it.
Note that you need to double escape bacslashes in a regular string literal, or single ones in as raw string literal.
In your current case, you may use
r"\w+(?=\?)"
See the regex demo
Or,
r"[^?/]+(?=\?)"
See this regex demo. Here, [^?/]+ matches 1+ chars other than ? and /.
A non-regex way is to split on ?, get the first item, then get the chunk of chars after the last /:
String s = "https://open.spotify.com/playlist/7x1ebdezDivH4mXAhUdR2S?si=TxHdzuvnTzuoCD5TFR4z_g";
String t=s.split("?")[0];
print(t.substring(t.lastIndexOf("/")+1));
Output: 7x1ebdezDivH4mXAhUdR2S

Regex match contingent non word in string

How would I create a regex which matches all contingent non words "[a].[b]" in a string? I don't care about spaces or newline or any invisible character I haven't heard about, as long as it only matches the contingent string. Anything else is invalid.
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
[a].[b] // valid for "[a].[b]"
\n[a].[b] // valid for "[a].[b]"
[a].[b]$[c] // invalid because of "$" (or any other character) and everything after
[c]$[a].[b] // invalid because of "$" (or any other character) and everything before
[c].[a].[b] // invalid because of "[c]."
The problem I'm having is if I try
[\ \n\r]
it matches the space before " [a].[b]" which is not what I want, I want spaces to be ignored because I don't want to replace anything besides "[a].[b]". But of course only when it is a contingent string, "somethinganythingbutspaceandnewline[a].[b]" I don't want to replace.
Thank you.
If I understand you right, you want [a].[b] string with possible leading and trailing whitespaces. If it's your case, I suggest \A\s*\[a\]\.\[b\]\s*\Z pattern, e.g. (C# code)
string pattern = #"\A\s*\[a\]\.\[b\]\s*\Z";
string source = "\n[a].[b] \t ";
if (Regex.IsMatch(source, pattern))
Console.Write("Match");
else
Console.Write("Not Match");
Pattern:
\A - beginning of the text
\s* - zero or more leading whitespaces
\[a\]\.\[b\] - string to find (please, notice escapements)
\s* - zero or more trailing whitespaces
\Z - end of the text
Edit: As far as I can see, the match's core is a constant - [a].[b], so I doubt if you really want the match's text which is "[a].[b]". If you do
you can try look ahead and look behind construction (C# code):
string pattern = #"(?<=\A\s*)\[a\]\.\[b\](?=\s*\Z)";
string source = "\n[a].[b] \t ";
var match = Regex.Match(source, pattern);
if (match.Success)
Console.Write($"Matched: '{match.Value}'");
Now
(?<=\A\s*) - zero or more leading spaces should be matched but not included into the match
(?=\s*\Z) - zero or more trailing spaces should be matched but not included into the match
Edit 2: in case you have several [a].[b] separated by white spaces (see comments below)
string pattern = #"(?<=\A|\s+)\[a\]\.\[b\](?=\s+|\Z)";
string source = "[a].[b] [a].[b][a].[b] [a].[b] \t ";
string result = string.Join(", ", Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value));
Console.Write(join);
Outcome (2 matches):
[a].[b], [a].[b]

Regex to replace characters between strings

Suppose I have an email address, 'abcdef#gmail.com'. I want to replace all the characters between 'a' and 'f' so the result would look like 'a****f#gmail.com'.
Trying to do this with a regex and replace
str.replace(/^(.*?)#/gi, '*');
But the results look like this
*gmail.com
Is there a way to do what I need?
This is not an answer to your actual question, but I'd like to challenge you that your idea is not a good one. It's best not to show how long an email address is by replacing the internal letters with the same number of *s. It's better to use a fixed number of *s.
You seem to be using javascript, which doesn't have lookbehind assertions, and capturing in this case may be simpler to understand too, so I'd do this to replace with a constant number of *s
str.replace(/^(.).*(.#)/, '$1***$2')
I'd use a replace with a callback, where the user middle part can be also replaced with *s:
var email = "abcdef#gmail.com";
document.write(email.replace(/^(.)(.*)(.#[^#]*)$/, function(m, g1, g2, g3) {
return g1 + g2.replace(/./g, "*") + g3;
}));
Here is how the "outer" /^(.)(.*)(.#[^#]*)$/ regex works:
^ - matches start of a string
(.) - Group 1: any first character
(.*) - Group 2: any characters up to the character before the last #`
(.#[^#]*) - Group 3: one character before the last #, then # and then any 0+ characters other than # up to...
$ - end of string
The .replace(/./g, "*") will just replace any character with *. And it will be done only on the Group 2.
The regex you suggested in the comment should also work.
/(?!^).(?=[^#]+#)/g matches any character but a newline that is not the first character ((?!^)) and that has 1+ characters other than # after it and a #.
var re = /(?!^).(?=[^#]+#)/g;
document.body.innerHTML = "fake#gmail.com".replace(re, "*");