What does + mean for [^SPEAK]+ in RegEx? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm finding that learning RegEx is nearly impossible, since every tool or helper site offers the worst explanations and doesn't cover everything. I'm attempting to work through the puzzles on regexcrossword.com and they don't offer substantial help and/or insight. This is causing confusion.
Here is what I understand:
[^SPEAK] = not S, P, E, A, K
+ = matches one or more of the previous characters
Therefore [^SPEAK]+ = nothing
I don't understand how this is supposed to work. What am I missing?

[^SPEAK] matches any character that isn't one of those 5 letters. + means to match the preceding pattern at least 1 time. So [^SPEAK]+ matches a sequence of characters that isn't in that set.
For example, if the input is 123ABCDEFG, it will match 123, BCD and FG.

[^SPEAK] means apart from S,P,E,A,K take any character and + means one or more time
So It will accept any character taken one or more time which s not in the list [SPEAK]
ssssssss is valid string aaaa is valid string SA is invalid string

Related

Regex capture groups of N and the remainder? [duplicate]

This question already has answers here:
Split large string in n-size chunks in JavaScript
(23 answers)
Closed 2 years ago.
Supposing, I have a string ASDFZXCVQW, is it possible to capture this into groups of N, and then the remaining characters would be in the final group.
For example, if N were 4, then we could have: ASDF, ZXCV, and QW. Notice how the QW is everything that is left over.
I know how to capture the groups of N with .{N}, and then manually get the leftover through string indexing, but is it possible to do this in a single regular expression?
var data = 'ASDFZXCVQW'
var result = data.match(/\D{1,4}/g)
console.log(result)
It will be helpful!
That really depends on the language in use.
In general, it will be a concatenation of 0 or more 4-character groups followed by 0-3 single characters.
Here is a possible formal definition for alphanumeric string: ([a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9])*[a-zA-Z0-9]*. Different languages might express this differently and possibly in a more compact way.

Regex.Matches problem in visual basic 2010 [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have this string
copiaElementos = "c'8 d'8 a8"
And when I do Regex.Matches(copiaElementos, "8.").Count() it returns 2
why is that? I don't understand, can anyone please give me a hand?
Thank you, best regards
That is because the . mathes one character. means you are matching an 8 followed by any charactrer, and there are exactly two of those (a space is considered a character too). Because the last one has no characters after it.
if you want to count the 8s in the string you should do Regex.Matches(copiaElementos, "8").Count(). Remember every character, even a space has its own meaning in regex.

Regex to match only if symbol is found at most once in fixed length pattern [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 3 years ago.
I need help with a regex that should match a fixed length pattern.
For example, the following regex allows for at most 1 ( and 1 ) in the matched pattern:
([^)(]*\(?[^)(]*\)?[^)(]*)
However I can not / do not want to use this solution because of the *, as the text I have to scan through is very large using it seems to really affect the performance.
I thus want to impose a match length limit, e.g. using {10,100} for example.
In other words, the regex should only match if
there are between 0 and 1 set of parenthese inside the string
the total length of the match is fixed, e.g. not infinite (No *!)
This seems to be a solution to my problem, however I do not get it to work and I have trouble understanding it.
I tried to use the accepted answer and created this:
^(?=[^()]{5,10}$)[^()]*(?:[()][^()]*){0,2}$
which does not seem to really work as expected: https://regex101.com/r/XUiJZz/1
Also please do not mark this question a duplicate of another question, if the answers in that question make use of the kleene star operator, it wont help me.
Edit:
I know this is a possible solution, but I'm wondering if there is a better way to do it:
([^)(]{0,100}\(?[^)(]{0,100}\)?[^)(]{0,100})
I thus want to impose a match length limit, e.g. using {10,100}
You may want to anchors add a lookahead assertion in your regex:
^(?=.{10,100})[^)(]*(?:\(?[^)(]*\))?[^)(]*$
(?=.{10,100}) is lookahead condition to assert that length of string must be between 10 and 100.
RegEx Demo

Confusion in JavaScript RegExp ?= Quantifier [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What the difference between
(?=.\d)(?=.[a-z])(?=.[A-Z])
and
(.\d)(.[a-z])(.[A-Z])
When I test the string a2A only the first RegExp returns true. Can anyone explain this for me?
The difference is in the lookahead operator for each of the terms in the regex. The LA operator matches the sub-regex it guards as usual, but effectively locks the initial matching position for the subsequent regex portion.
This means that the first regex should not match (contrary to your tests, which engine have you used ?) - Given any initial matching position, the second character would have to be a number, a lowercase letter, and an uppercase letter, all at the same time.
Observe that this will not happen if the . ('any char') is quantified:
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])
Each LA term may skip an arbitrary amount of material before matching the character class, and this amount may differ between the subexpressions.
The second alternative (with and without quantification) will never match as it invariably requires a subsequence of digit-letter-letter, which the test string a2A does not provide.

Regex, Difference between ^[a-zA-Z]+$ vs [a-zA-Z]* [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
I'm very new to programming and I've been told to avoid regex for now, but I find it extremely helpful.
When writing a program to check if a string only contains letters, I found on stackoverlow that both ^[a-zA-Z]+$ and [a-zA-Z]* yield the same results. I understand how [a-zA-Z] works and I understand how [A-z]is different from both of those as well, but I do not understand +$ vs ^[]* or why they yield the same result and I'm having trouble finding anything to explain it.
Here's the example I used it in:
String student = input.next();
while (!student.matches("[a-zA-Z]*")) {
System.out.print("Invalid input. Enter name: ");
student = input.next();
}
This is my first question here so sorry if this kind of question is frowned upon.
As you know,
[a-zA-Z]
Matches a single upper or lower-case letter.
[a-zA-Z]*
matches zero or more upper- or lower-case letters in a row.
^[a-zA-Z]+$
matches a string that STARTS with one-or more upper- or lower-case letters and also ends with it. Meaning, the only thing in your string is upper- or lower-case letters.
^ and $ play more of a role when you're dealing with streams of data, using regular expressions to sift out stuff you want while ignoring the stuff you don't. That last pattern could be used to find a stream consisting of only upper and lower-case letters.
* is zero or more, + is one or more.
However, there is a larger difference which is the ^ and $. In the first example, it is saying that it MUST contain only [a-zA-Z], where the string 123abc123 is not valid.
In the 2nd example, where ^ and $ are omitted, 123abc123 is valid.