Regex Find English char in text need more than 3 - regex

I want to validate a text that need have more than 3 [aA-zZ] chars, not need continous.
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("aaa123") => return true;
/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[aA-zZ]{3,})[_\-\sa-zA-Z0-9]+$/.test("a1b2c3") => return false;
Can anybody help me?

How about replacing and counting?
var hasFourPlusChars = function(str) {
return str.replace(/[^a-zA-Z]+/g, '').length > 3;
};
console.log(hasFourPlusChars('testing1234'));
console.log(hasFourPlusChars('a1b2c3d4e5'));

You need to group .* and [a-zA-Z] in order to allow optional arbitrary characters between English letters:
^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[a-zA-Z]){3,})[_\-\sa-zA-Z0-9]+$
^^^ ^
Add this
Demo:
var re = /^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=(?:.*[aA-zZ]){3,})[_\-\sa-zA-Z0-9]+$/;
console.log(re.test("aaa123"));
console.log(re.test("a1b2c3"));
By the way, [aA-zZ] is not a correct range definition. Use [a-zA-Z] instead. See here for more details.

Correction of the regex
Your repeat condition should include the ".*". I did not check if your regex is correct for what you want to achieve, but this correction works for the following strings:
$testStrings=["aaa123","a1b2c3","a1b23d"];
foreach($testStrings as $s)
var_dump(preg_match('/^(?![_\-\s0-9])(?!.*?[_\-\s]$)(?=.*[a-zA-Z]){3,}[_\-\sa-zA-Z0-9]+$/', $s));
Other implementations
As the language seems to be JavaScript, here is an optimised implementation for what you want to achieve:
"a24be4Z".match(/[a-zA-Z]/g).length>=3
We get the list of all matches and check if there are at least 3.
That is not the "fastest" way as the result needs to be created.
)
/(?:.*?[a-zA-Z]){3}/.test("a24be4Z")
is faster. ".*?" avoids that the "test" method matches all characters up to the end of the string before testing other combinations.
As expected, the first suggestion (counting the number of matches) is the slowest.
Check https://jsperf.com/check-if-there-are-3-ascii-characters .

Related

Match same number of repetitions as previous group

I'm trying to match strings that are repeated the same number of times, like
abc123
abcabc123123
abcabcabc123123123
etc.
That is, I want the second group (123) to be matched the same number of times as the first group (abc). Something like
(abc)+(123){COUNT THE PREVIOUS GROUP MATCHED}
This is using the Rust regex crate https://docs.rs/regex/1.4.2/regex/
Edit As I feared, and pointed out by answers and comments, this is not possible to represent in regex, at least not without some sort of recursion which the Rust regex crate doesn't for the time being support. In this case, as I know the input length is limited, I just generated a rule like
(abc123)|(abcabc123123)|(abcabcabc123123123)
Horribly ugly, but got the job done, as this wasn't "serious" code, just a fun exercise.
As others have commented, I don't think it's possible to accomplish this in a single regex. If you can't guarantee the strings are well-formed then you'd have to validate them with the regex, capture each group, and then compare the group lengths to verify they are of equal repetitions. However, if it's guaranteed all strings will be well-formed then you don't even need to use regex to implement this check:
fn matching_reps(string: &str, group1: &str, group2: &str) -> bool {
let group2_start = string.find(group2).unwrap();
let group1_reps = (string.len() - group2_start) / group1.len();
let group2_reps = group2_start / group2.len();
group1_reps == group2_reps
}
fn main() {
assert_eq!(matching_reps("abc123", "abc", "123"), true);
assert_eq!(matching_reps("abcabc123", "abc", "123"), false);
assert_eq!(matching_reps("abcabc123123", "abc", "123"), true);
assert_eq!(matching_reps("abcabc123123123", "abc", "123"), false);
}
playground
Pure regular expressions are not able to represent that.
There may be some way to define back references, but I am not familiar with regexp syntax in Rust, and this would technically be a way to represent something more than a pure regular expression.
There is however a simple way to compute it :
use a regexp to make sure your string is a ^((abc)*)((123)*)$
if your string matches, take the two captured substrings, and compare their lengths
Building a pattern dynamically is also an option. Matching one, two or three nested abc and 123 is possible with
abc(?:abc(?:abc(?:)?123)?123)?123
See proof. (?:)? is redundant, it matches no text, (?:...)? matches an optional pattern.
Rust snippet:
let a = "abc"; // Prefix
let b = "123"; // Suffix
let level = 3; // Recursion (repetition) level
let mut result = "".to_string();
for _n in 0..level {
result = format!("{}(?:{})?{}", a, result, b);
}
println!("{}", result);
// abc(?:abc(?:abc(?:)?123)?123)?123
There's an extension to the regexp libraries, that is implemented from the old times unix and that allows to match (literally) an already scanned group literally after the group has been matched.
For example... let's say you have a number, and that number must be equal to another (e.g. the score of a soccer game, and you are interested only in draws between the two teams) You can use the following regexp:
([0-9][0-9]*) - \1
and suppose we feed it with "123-123" (it will match) but if we use "123-12" that will not match, as the \1 is not the same string as what was matched in the first group. When the first group is matched, the actual regular expression converts the \1 into the literal sequence of characters that was matched in the first group.
But there's a problem with your sample... is that there's no way to end the first group if you try:
([0-9][0-9]*)\1
to match 123123, because the automaton cannot close the first group (you need at least a nondigit character to make the first group to finalize)
But for example, this means that you can use:
\+(\([0-9][0-9]*\))\1(-\1)*
and this will match phone numbers in the form
+(358)358-358-358
or
+(1)1-1-1-1-1-1-1
(the number in between the parenthesys is catched as a sample, and then you use the group to build a sequence of that number separated by dashes. You can se the expression working in this demo.)

Regex to upper case not surrounded by single quotes

hello 'this' is my'str'ing
If I have string like this, I'd like to make it all upper case if not surrounded by single quote.
hello 'this' is my'str'ing=>HELLO 'this' IS MY'str'ING
Is there a easy way I can achieve this in node perhaps using regex?
You can use the following regular expression:
'[^']+'|(\w)
Here is a live example:
var subject = "hello 'this' is my'str'ing";
var regex = /'[^']+'|(\w)/g;
replaced = subject.replace(regex, function(m, group1) {
if (!group1) {
return m;
}
else {
return m.toUpperCase();
}
});
document.write(replaced);
Credit of this answer goes to zx81. For more information see the original answer of zx81.
Since Javascript doesn't support lookbehinds, we have to use \B which matches anything a word boundary doesn't match.
In this case, \B' makes sure that ' isn't to the right of anything in \w ([a-zA-Z0-9_]). Likewise, '\B does a similar check to the left.
(?:(.*?)(?=\B'.*?'\B)(?:(\B'.*?'\B))|(.*?)$) (regex demo)
Use a callback function and check to see if the length of captures 1 or 3 is > 0 and if it is, return an uppercase on the match
**The sample uses \U and \L just to uppercase and lowercase the related matches. Your callback need not ever effect $2's case, so "Adam" can stay "Adam", etc.
Unrelated, but a note to anyone who might be trying to do this in reverse. it's much easier to the the REVERSE of this:
(\B'.+?'\B) regex demo

Verify if a word have a letter repeated in any position

I'd like know if there are a way to test if a word have a letter repeated in any position?
I'm currently using this regex to test it, but not work, becouse if I add more then 2 's' the test returns true.
/s{0,2}/.test('süuaãpérbrôséê'); //expected true
/s{0,2}/.test('ssüuaãpérbrôéê'); //expected true
/s{0,2}/.test('süuaãpérbrôéê'); //expected true
/s{0,2}/.test('süuaãpérbrôséês'); //expected fail
Thanks.
/s{2,}/
or generally for any character:
/(.)\1/
/(\w)\1/ finds two alphanumeric characters next to each other
This will find and replace the duplicates:
s/(\w)\1/$1/
The only way that I found to resolve this problem is using php preg_match_all, on this way I can count how much times the character repeat.
$s = 'süuaãpérbrôséê';
preg_match_all('/s/i', $s, $m);
echo count($m[0]); //outputs 2
My initial idea was pass a regex and use preg_match to verify the match exists in a determined number of times, but I think that it's not possible, so I'll create a method that receive the word and the regex that I need match and it will return the number of matches.
Thanks.
using lookahead you can achieve something like that:
^(?=.*(\w)(.*\1){1}.*$)((?!\1).)*\1(((?!\1).)*\1){1}((?!\1).)*$
Where {1} is number of repeatings minus 1, so for finding if there are three repeations this would look like:
^(?=.*(\w)(.*\1){2}.*$)((?!\1).)*\1(((?!\1).)*\1){2}((?!\1).)*$
And for two or three:
^(?=.*(\w)(.*\1){1,2}.*$)((?!\1).)*\1(((?!\1).)*\1){1,2}((?!\1).)*$
etc.
The lookaheads with backreferences can be very powerful :)

Regex help - match words besides MD5 hashes

I can't figure out a regex that will grab every word besides MD5 hashes. - I'm using [a-zA-Z0-9]+ to match every word. How do I augment that so that it ignores something I'm thinking is like [a-fA-F0-9]{32} which would match the MD5 hashes. My question regards Regex.
8e85d8b3be426bc8d370facdb0ad3ad0
string
stringString
63994b32affec18c2a428cdfcb0e2823
stringSTRINGSTING333
34563994b32dddddddaffec18c2a
stringSTRINGSTINGsrting
Thanks for any help. :)
This kind of thing is usually done with a negative lookahead:
/\b(?![0-9a-f]{32}\b)[A-Za-z0-9]+\b/
At the beginning of each word, (?![0-9a-fA-F]{32}\b) tries to match exactly 32 hexadecimal digits followed by a word boundary. If it succeeds, the regex fails.
The following works fine for me:
/^[a-f0-9]{8}(-)[a-f0-9]{4}(-)[a-f0-9]{4}(-)[a-f0-9]{4}(-)[a-f0-9]{12}$/i
as already said, just grab all words which do not match to be MD5 hashes.
(first, you have to split the string)
var s = "8e85d8b3be426bc8d370facdb0ad3ad0\nstring\nstringString\n63994b32affec18c2a428cdfcb0e2823\nstringSTRINGSTING333\n34563994b32dddddddaffec18c2a\nstringSTRINGSTINGsrting";
words = [];
words_all = s.split(/\s+/);
for (i in words_all) {
word = words_all[i];
if (! word.match(/^[a-fA-F0-9]{32}$/)) { words.push(word) }
}
// words = ["string", "stringString", "stringSTRINGSTING333", "34563994b32dddddddaffec18c2a", "stringSTRINGSTINGsrting"]
(assuming, according to your original code, you want to use javascript)

Capturing a repeated group

I am attempting to parse a string like the following using a .NET regular expression:
H3Y5NC8E-TGA5B6SB-2NVAQ4E0
and return the following using Split:
H3Y5NC8E
TGA5B6SB
2NVAQ4E0
I validate each character against a specific character set (note that the letters 'I', 'O', 'U' & 'W' are absent), so using string.Split is not an option. The number of characters in each group can vary and the number of groups can also vary. I am using the following expression:
([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}-?){3}
This will match exactly 3 groups of 8 characters each. Any more or less will fail the match.
This works insofar as it correctly matches the input. However, when I use the Split method to extract each character group, I just get the final group. RegexBuddy complains that I have repeated the capturing group itself and that I should put a capture group around the repeated group. However, none of my attempts to do this achieve the desired result. I have been trying expressions like this:
(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){4}
But this does not work.
Since I generate the regex in code, I could just expand it out by the number of groups, but I was hoping for a more elegant solution.
Please note that the character set does not include the entire alphabet. It is part of a product activation system. As such, any characters that can be accidentally interpreted as numbers or other characters are removed. e.g. The letters 'I', 'O', 'U' & 'W' are not in the character set.
The hyphens are optional since a user does not need top type them in, but they can be there if the user as done a copy & paste.
BTW, you can replace [ABCDEFGHJKLMNPQRSTVXYZ0123456789] character class with a more readable subtracted character class.
[[A-Z\d]-[IOUW]]
If you just want to match 3 groups like that, why don't you use this pattern 3 times in your regex and just use captured 1, 2, 3 subgroups to form the new string?
([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}-([[A-Z\d]-[IOUW]]){8}
In PHP I would return (I don't know .NET)
return "$1 $2 $3";
I have discovered the answer I was after. Here is my working code:
static void Main(string[] args)
{
string pattern = #"^\s*((?<group>[ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})-?){3}\s*$";
string input = "H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
Regex re = new Regex(pattern);
Match m = re.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["group"].Captures)
Console.WriteLine(c.Value);
}
After reviewing your question and the answers given, I came up with this:
RegexOptions options = RegexOptions.None;
Regex regex = new Regex(#"([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8})", options);
string input = #"H3Y5NC8E-TGA5B6SB-2NVAQ4E0";
MatchCollection matches = regex.Matches(input);
for (int i = 0; i != matches.Count; ++i)
{
string match = matches[i].Value;
}
Since the "-" is optional, you don't need to include it. I am not sure what you was using the {4} at the end for? This will find the matches based on what you want, then using the MatchCollection you can access each match to rebuild the string.
Why use Regex? If the groups are always split by a -, can't you use Split()?
Sorry if this isn't what you intended, but your string always has the hyphen separating the groups then instead of using regex couldn't you use the String.Split() method?
Dim stringArray As Array = someString.Split("-")
What are the defining characteristics of a valid block? We'd need to know that in order to really be helpful.
My generic suggestion, validate the charset in a first step, then split and parse in a seperate method based on what you expect. If this is in a web site/app then you can use the ASP Regex validation on the front end then break it up on the back end.
If you're just checking the value of the group, with group(i).value, then you will only get the last one. However, if you want to enumerate over all the times that group was captured, use group(2).captures(i).value, as shown below.
system.text.RegularExpressions.Regex.Match("H3Y5NC8E-TGA5B6SB-2NVAQ4E0","(([ABCDEFGHJKLMNPQRSTVXYZ0123456789]+)-?)*").Groups(2).Captures(i).Value
Mike,
You can use character set of your choice inside character group. All you need is to add "+" modifier to capture all groups. See my previous answer, just change [A-Z0-9] to whatever you need (i.e. [ABCDEFGHJKLMNPQRSTVXYZ0123456789])
You can use this pattern:
Regex.Split("H3Y5NC8E-TGA5B6SB-2NVAQ4E0", "([ABCDEFGHJKLMNPQRSTVXYZ0123456789]{8}+)-?")
But you will need to filter out empty strings from resulting array.
Citation from MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array.