Verify if a word have a letter repeated in any position - regex

I'd like know if there are a way to test if a word have a letter repeated in any position?
I'm currently using this regex to test it, but not work, becouse if I add more then 2 's' the test returns true.
/s{0,2}/.test('süuaãpérbrôséê'); //expected true
/s{0,2}/.test('ssüuaãpérbrôéê'); //expected true
/s{0,2}/.test('süuaãpérbrôéê'); //expected true
/s{0,2}/.test('süuaãpérbrôséês'); //expected fail
Thanks.

/s{2,}/
or generally for any character:
/(.)\1/

/(\w)\1/ finds two alphanumeric characters next to each other
This will find and replace the duplicates:
s/(\w)\1/$1/

The only way that I found to resolve this problem is using php preg_match_all, on this way I can count how much times the character repeat.
$s = 'süuaãpérbrôséê';
preg_match_all('/s/i', $s, $m);
echo count($m[0]); //outputs 2
My initial idea was pass a regex and use preg_match to verify the match exists in a determined number of times, but I think that it's not possible, so I'll create a method that receive the word and the regex that I need match and it will return the number of matches.
Thanks.

using lookahead you can achieve something like that:
^(?=.*(\w)(.*\1){1}.*$)((?!\1).)*\1(((?!\1).)*\1){1}((?!\1).)*$
Where {1} is number of repeatings minus 1, so for finding if there are three repeations this would look like:
^(?=.*(\w)(.*\1){2}.*$)((?!\1).)*\1(((?!\1).)*\1){2}((?!\1).)*$
And for two or three:
^(?=.*(\w)(.*\1){1,2}.*$)((?!\1).)*\1(((?!\1).)*\1){1,2}((?!\1).)*$
etc.
The lookaheads with backreferences can be very powerful :)

Related

Return dash followed by a single character

This works as expected:
([^\u0000-\u007F])+-हा([^\u0000-\u007F])+
Returns:
ब-हाणपूर
ब-हाणी
बनियन-हाफ
But I am looking for 1 character followed by dash. The expected output is:
ब-हाणपूर
ब-हाणी
I tried to replace + sign with character count like this...
([^\u0000-\u007F]){1}-हा([^\u0000-\u007F])+
But it returned the same 3 results. How do I return the first 2?
You need anchors:
^([^\u0000-\u007F])-हा([^\u0000-\u007F])+$
Demo
You asked 'What if I need 5 characters to the left of dash?'
The regex portion [^\u0000-\u007F] as written matches a single character that meets that criterion. If you want more or less than one, use a regex quantifier to describe how many you want.
In this case, if you want 5, you would use:
^([^\u0000-\u007F]{5})-हा([^\u0000-\u007F])+$
Probably like this:
^([^\u0000-\u007F]){1}-हा([^\u0000-\u007F])+
^([^\u0000-\u007F]{1})-हा([^\u0000-\u007F]+)
(\b[^\u0000-\u007F]{1})-हा([^\u0000-\u007F]+)
Regex demo

regex, period allowed, not comma

Hi Im looking for a regex for
Valid:
20000
20.000
If a comma is used, it should not match with the comma and whats after.
Not valid
20.000,12
Right now Im using:
([0-9]+([.][0-9]+)*)+?
But this one also takes the last 2 digits after comma.
You could use
^\d+(?:\.\d+)*
# start of line, 1+ digits, .1234 eventually
See a demo on regex101.com.
If you add a ^ to the beginning of the regex, only the part from the start of the string will match
^([0-9]+([.][0-9]+)*)+?
But i think
^\d+(\.\d+)*
is the better solution to match numbers
If you want to match floating point numbers then use (^\d*.?\d+)
There is nothing in the question that suggests that the outcome needs to be a valid number. Therefore it looks like the expression needs to accept any number or a period in which case the regex is quite straightforward and the following should work:
^([0-9\.]+)$
Here are my tests to demonstrate the outcome
20000 - pass
20.000 - pass
20.000,12 - fail
1.000.000 - pass
23000,000 - fail
For info, I used the php code below for my test:
$testdata = array('20000', '20.000', '20.000,12', '1.000.000', '23000,000');
$pattern = "/^([0-9\.]+)$/";
foreach ($testdata as $k => $v) {
$result = preg_match($pattern, $v)? 'pass': 'fail';
echo "".$v." - ".$result."<br />";
}

Regex Find English char in text need more than 3

I want to validate a text that need have more than 3 [aA-zZ] chars, not need continous.
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("aaa123") => return true;
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("a1b2c3") => return false;
Can anybody help me?
How about replacing and counting?
var hasFourPlusChars = function(str) {
return str.replace(/[^a-zA-Z]+/g, '').length > 3;
};
console.log(hasFourPlusChars('testing1234'));
console.log(hasFourPlusChars('a1b2c3d4e5'));
You need to group .* and [a-zA-Z] in order to allow optional arbitrary characters between English letters:
^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[a-zA-Z]){3,})[_\-\sa-zA-Z0-9]+$
^^^ ^
Add this
Demo:
var re = /^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[aA-zZ]){3,})[_\-\sa-zA-Z0-9]+$/;
console.log(re.test("aaa123"));
console.log(re.test("a1b2c3"));
By the way, [aA-zZ] is not a correct range definition. Use [a-zA-Z] instead. See here for more details.
Correction of the regex
Your repeat condition should include the ".*". I did not check if your regex is correct for what you want to achieve, but this correction works for the following strings:
$testStrings=["aaa123","a1b2c3","a1b23d"];
foreach($testStrings as $s)
var_dump(preg_match('/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[a-zA-Z]){3,}[_\-\sa-zA-Z0-9]+$/', $s));
Other implementations
As the language seems to be JavaScript, here is an optimised implementation for what you want to achieve:
"a24be4Z".match(/[a-zA-Z]/g).length>=3
We get the list of all matches and check if there are at least 3.
That is not the "fastest" way as the result needs to be created.
)
/(?:.*?[a-zA-Z]){3}/.test("a24be4Z")
is faster. ".*?" avoids that the "test" method matches all characters up to the end of the string before testing other combinations.
As expected, the first suggestion (counting the number of matches) is the slowest.
Check https://jsperf.com/check-if-there-are-3-ascii-characters .

Regex match any character 5 or more times

I have this regex pattern:
^[.]{5,}$
Which I want to return true if the tested string has 5 or more characters.
I.E it'll only return false if the string contains 4 or less characters.
At the moment it seems to return true regardless of the number of characters and I can't see why.
You want
^.{5,}$
But really - just use the built-in string length function of the language of your choice
Try this regex:
.{5,}
more chars to make up the minimum post...
I think you dont need the ^ and $. Try just:
.{5,}

How to use a REGEX pattern to remove a specific word "THE" only if at beginning of text string?

I have a text input field for titles of various things and to help minimize false negatives on search results(internal search is not the best), I need to have a REGEX pattern which looks at the first four characters of the input string and removes the word(and space after the word) _the _ if it is there at the beginning only.
For example if we are talking about the names of bands, and someone enters The Rolling Stones , what i need is for the entry to say only Rolling Stones
Can a regex be used to automatically strip these 4characters?
Applying the regex
^(?:\s*the\s*)?(.*)$
will match any string, and capture it in backreference no. 1, unless it starts with the (optionally surrounded by whitespace), in which case backref no. 1 will contain whatever follows.
You need to set the case-insensitive option in your regex engine for this to work.
You can use the ^ identifier to match a pattern at the beginning of a line, however for what you are using this for, it can be considered overkill.
A lot of languages support string manipulations, which is a more suitable choice. I can provide an example to demonstrate in Python,
>>> def func(n):
n = n[4:len(n)] if n[0:4] == "The " else n
return n
>>> func("The Rolling Stones")
'Rolling Stones'
>>> func("They Might Be Giants")
'They Might Be Giants'
As you don't clarify with language, here is a solution in Perl :
my $str = "The Rolling Stones";
$str =~ s/^the //i;
say $str; # Rolling Stones