How can I build a regular expression that will match a string of any length containing any characters but which must contain 21 commas?
/^([^,]*,){21}[^,]*$/
That is:
^ Start of string
( Start of group
[^,]* Any character except comma, zero or more times
, A comma
){21} End and repeat the group 21 times
[^,]* Any character except comma, zero or more times again
$ End of string
If you're using a regex variety that supports the Possessive quantifier (e.g. Java), you can do:
^(?:[^,]*+,){21}[^,]*+$
The Possessive quantifier can be better performance than a Greedy quantifier.
Explanation:
(?x) # enables comments, so this whole block can be used in a regex.
^ # start of string
(?: # start non-capturing group
[^,]*+ # as many non-commas as possible, but none required
, # a comma
) # end non-capturing group
{21} # 21 of previous entity (i.e. the group)
[^,]*+ # as many non-commas as possible, but none required
$ # end of string
Exactly 21 commas:
^([^,]*,){21}[^,]$
At least 21 commas:
^([^,]?,){21}.*$
Might be faster and more understandable to iterate through the string, count the number of commas found and then compare it to 21.
^(?:[^,]*)(?:,[^,]*){21}$
if exactly 21:
/^[^,]*(,[^,]*){21}$/
if at least 21:
/(,[^,]*){21}/
However, I would suggest don't use regex for such simple task. Because it's slow.
What language? There's probably a simpler method.
For example...
In CFML, you can just see if ListLen(MyString) is 22
In Java, you can compare MyString.split(',') to 22
etc...
var valid = ((" " + input + " ").split(",").length == 22);
or...
var valid = 21 == (function(input){
var ret = 0;
for (var i=0; i<input.length; i++)
if (input.substr(i,1) == ",")
ret++;
return ret
})();
Will perform better than...
var valid = (/^([^,]*,){21}[^,]*$/).test(input);
.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,.*,
Related
Without using a gem, I just want to write a simple regex formula to remove the first character from strings if it's a 1, and, if there are more than 10 total characters in the string. I never expect more than 11 characters, 11 should be the max. But in the case there are 10 characters and the string begins with "1", I don't want to remove it.
str = "19097147835"
str&.remove(/\D/).sub(/^1\d{10}$/, "\1").to_i
Returns 0
I'm looking for it to return "9097147835"
You could use your pattern, but add a capture group around the 10 digits to use the group in the replacement.
\A1(\d{10})\z
For example
str = "19097147835"
puts str.gsub(/\D/, '').sub(/\A1(\d{10})\z/, '\1').to_i
Output
9097147835
Another option could be removing all the non digits, and match the last 10 digits:
\A1\K\d{10}\z
\A Start of string
1\K Match 1 and forget what is matched so far
\d{10} Match 10 digits
\z End of string
Regex demo | Ruby demo
str = "19097147835"
str.gsub(/\D/, '').match(/\A1\K\d{10}\z/) do |match|
puts match[0].to_i
end
Output
9097147835
You can use
str.gsub(/\D/, '').sub(/\A1(?=\d{10})/, '').to_i
See the Ruby demo and the regex demo.
The regex matches
\A - start of string
1 - a 1
(?=\d{10}) - immediately to the right of the current location, there must be 10 digits.
Non regex example:
str = str[1..] if (str.start_with?("1") and str.size > 10)
Regexes are powerful, but not easy to maintain.
I am trying to understand how regex works. I understand it little by little. However, I don't understand this one completely. It's basically a regex for fully qualified domain names but a requirement is that the ending can't be .arpa.
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}[^.arpa]$)
https://regex101.com/r/hU6tP0/3
This doesn't match google.uk. If I change it to:
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{1,63}[^.arpa]$)
It works again.
But this works as well
(?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}$)
Here is my thought process for
?=^.{4,253}$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}[^.arpa]$)
I see it as this
(?=
Is a positive look ahead (Can someone explain to me what this actually means?) As I understand it now, it just means that the string needs to match the regex.
^.{4,253}$)
Match all characters but it needs to be between 4 and 253 characters long.
(^([a-zA-Z0-9]{1,63}\.)
Start a capture group and make another capture group within. This capture group says that every non special character can be written 1 to 63 times or till the . is written.
+
The previous capture group can be repeated indefinitely, but it should always end with a .. This way the next capture group is started.
[a-zA-Z]{2,63}
Then as many times as you want you can write a to z with upper, but it needs to be between 2 and 63.
[^.arpa]$)
The last characters can't be .arpa.
Can someone tell me where I am going wrong?
This doesn't do what you think it does:
[^.arpa]
All that says is 'ends with something that isn't one of the letter apr.' - it's a negated character class.
You might be thinking of a negative lookahead assertion:
(?!\.arpa)$
But if you're trying to compound multiple criteria in a regex, I'd suggest you're probably using the wrong tool for the job. It ends up complicated and hard to debug, thanks to greedy/non-greedy matching, etc.
Your 'positive/negative' lookaheads are to match a piece of a pattern that aren't surrounded by other pieces of pattern. But that can have some unexpected outcomes if you're matching variable widths, because the regex engine will backtrack until it finds something that matches.
A simpler example:
([\w.]+)(?!arpa)$
Applied to:
www.test.arpa
Will it match? What's in the group?
... it will match, because [\w\.]+ will consume all of it, and then the lookahead won't "see" anything.
If you use:
([\w]+)\.(?!arpa)
Instead though - you'll capture.... www, but you won't match test (with e.g. g flag, because the www doesn't have .arpa after it, but the test does.
https://regex101.com/r/hU6tP0/5
It really does get complicated using negative assertions in a pattern as a result. I'd suggest simply not doing so, and applying two separate tests. It's hard for you to figure out, and it's hard for a future maintenance programmer too!
This is an analysis of your regex:
(?=^.{4,253}$) # force min length: 4 chars, max length: 253 chars
( # Capturing Group 1 (CG1) - not needed
^ # Match start of the string
( # CG2 (can be a non capturing group '(?:...)')
[a-zA-Z0-9]{1,63} # any sequence of letters and numbers with length between 1 and 63
\. # a literal dot
)+ # CLOSE CG2
[a-zA-Z]{1,63} # any letter sequence with length between 1 to 63
[^.arpa] # a negated char class: any char that is not a "literal" '.','a','r','p' (last 'a' is redundant)
$ # end of the string
) # CLOSE CG1
To avoid the tail of the string to be .arpa you need to use a negative lookahead (?!...), so modify just like this:
(?=^.{4,253}$)(?!.*\.arpa$)(^([a-zA-Z0-9]{1,63}\.)+[a-zA-Z]{2,63}$)
An online demo
Update:
I've upgraded the regex to rationalise it (i've incorporated also the Sobrique suggestion adding an important details):
/^(?=.{4,253}$)([a-z0-9]{1,63}[.])+(?!arpa$)[a-z]{2,63}$/i
Compact version online demo
Legenda
/ # js regex delimiter
^ # start of the string
(?=.{4,253}$) # force min length: 4 chars, max length: 253 chars
(?: # Non capturing group 1 (NCG1)
[a-z0-9]{1,63} # any letter or digit in a sequence with length from 1 to 63 chars
[.] # a literal dot '.' (more readable than \.)
)+ # CLOSE NCG1 - repeat its content one or more time
(?!arpa$) # force that after the last literal dot '.' the string does not end with 'arpa' (i've added '$' to Sobrique suggestion instead it prevents also '.arpanet' too)
[a-z]{2,63} # a sequence of letters with length from 2 to 63
$ # end of the string
/i # Close the regex delimiter and add case insensitive flag [a-z] match also [A-Z] and viceversa
var re = /^(?=.{4,253}$)([a-z0-9]{1,63}[.])+(?!arpa$)[a-z]{2,63}$/i;
var tests = ['google.uk','domain.arpa','domain.arpa2','another.domain.arpa.net','domain.arpanet'];
var m;
while(t = tests.pop()) {
document.getElementById("r").innerHTML += '"' + t + '"<br/>';
document.getElementById("r").innerHTML += 'Valid domain? ' + ( (t.match(re)) ? '<font color="green">YES</font>' : '<font color="red">NO</font>') + '<br/><br/>';
}
<div id="r"/>
I need to set validation for number of alphabets in the text box without including white space can anyone suggest me with a regular expression
This is a shot in the dark, but assuming you want to allow only letters and whitespace in your textbox, and the total number of letters must not exceed 10, then use
^\s*(?:[a-z]\s*){0,10}$
Test it live on regex101.com.
Explanation:
^ # Start of string
\s* # Optional whitespace
(?: # Start of non-capturing group:
[a-z] # Match a single letter (don't forget the /i option if so desired)
\s* # Optional whitespace
){0,10} # between 0 and 10 times
$ # End of string
Simple and sweet answer :) . (assuming s is the string).
System.out.println(s.split("\\s+").length);
this will printout the number of all the words(including if sentence contains numbers)
a quick way of gettin the total number of Alphabet here is a quick and dirty way.(doesn't use Reg Ex)
int counter=0;
for(Char c : s.toLowerCase().toCharArray()){
if(c=<'z' && c=>'a') counter++;
}
System.out.println(counter);
var str = 'mlsqkf qsq merzo';
var n = str.replace(/[^a-z]/gi,'').length;
demo
I've read many Q&As in StackOverflow and I'm still having a hard time getting RegEX.
I have string 12_13_12.
How can I replace last occurrence of 12 with, aa.
Final result should be 12_13_aa.
I would really like for good explanation about how you did it.
You can use this replace:
var str = '12-44-12-1564';
str = str.replace(/12(?![\s\S]*12)/, 'aa');
console.log(str);
explanations:
(?! # open a negative lookahead (means not followed by)
[\s\S]* # all characters including newlines (space+not space)
# zero or more times
12
) # close the lookahead
In other words the pattern means: 12 not followed by another 12 until the end of the string.
newString = oldString.substring(0,oldString.lastIndexOf("_")) + 'aa';
Use this String.replace and make sure you have end of input $ in the end:
repl = "12_13_12".replace(/12(?!.*?12)/, 'aa');
EDIT: To use a variable in Regex:
var re = new RegExp(ToBeReplaced);
repl = str.replace(re, 'aa');
I'm looking for a rather specific regex and I almost have it but not quite.
I want a regex that will require at least 5 charactors, where at least one of those characters is either a numeric value or a nonalphanumeric character.
This is what I have so far:
^(?=.*[\d]|[!##$%\^*()_\-+=\[{\]};:|\./])(?=.*[a-z]).{5,20}$
So the problem is the "or" part. It will allow non-alphanumeric values, but still requires at least one numeric value. You can see that I have the or operator "|" between my require numerics and the non-alphanumeric, but that doesn't seem to work.
Any suggestions would be great.
Try:
^(?=.*(\d|\W)).{5,20}$
A short explanation:
^ # match the beginning of the input
(?= # start positive look ahead
.* # match any character except line breaks and repeat it zero or more times
( # start capture group 1
\d # match a digit: [0-9]
| # OR
\W # match a non-word character: [^\w]
) # end capture group 1
) # end positive look ahead
.{5,20} # match any character except line breaks and repeat it between 5 and 20 times
$ # match the end of the input
Perhaps this may work for you:
^.*[\d\W]+.*$
And use some code like this to check string size:
if(str.len >= 5 && str.len =< 20 && regex.ismatch(str, "^.*[\d\W]+.*$")) { ... }
Is it really necessary to stuff everything in a giant regex? Just use program logic (5 ≤ length(s) ≤ 20) ∧ (/[[:digit:]]/ ∨ /[^[:alpha:]]/). Far more readable syntactically and semantically, I think.
Pretty simple solution, once S.Mark got me on the right track, just needed to merge my numeric and non-alphanumeric pieces as one.
Here's the final regex for anyone that's interested:
^(?=.*[\d!##$%\^*()_\-+=\[{\]};:|\./])(?=.*[a-z]).{5,20}$
This will allow any password between 5 and 20 characters and requires at least one letter and one numeric and/or one non-alphanumeric character.
How about like this?
^.*?[\d!##$%\^*()_\-+=\[{\]};:|\./].*$
For the length 5,20 Please use normal strlen function