Regex capture groups of N and the remainder? [duplicate] - regex

This question already has answers here:
Split large string in n-size chunks in JavaScript
(23 answers)
Closed 2 years ago.
Supposing, I have a string ASDFZXCVQW, is it possible to capture this into groups of N, and then the remaining characters would be in the final group.
For example, if N were 4, then we could have: ASDF, ZXCV, and QW. Notice how the QW is everything that is left over.
I know how to capture the groups of N with .{N}, and then manually get the leftover through string indexing, but is it possible to do this in a single regular expression?

var data = 'ASDFZXCVQW'
var result = data.match(/\D{1,4}/g)
console.log(result)
It will be helpful!

That really depends on the language in use.
In general, it will be a concatenation of 0 or more 4-character groups followed by 0-3 single characters.
Here is a possible formal definition for alphanumeric string: ([a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9])*[a-zA-Z0-9]*. Different languages might express this differently and possibly in a more compact way.

Related

What does + mean for [^SPEAK]+ in RegEx? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm finding that learning RegEx is nearly impossible, since every tool or helper site offers the worst explanations and doesn't cover everything. I'm attempting to work through the puzzles on regexcrossword.com and they don't offer substantial help and/or insight. This is causing confusion.
Here is what I understand:
[^SPEAK] = not S, P, E, A, K
+ = matches one or more of the previous characters
Therefore [^SPEAK]+ = nothing
I don't understand how this is supposed to work. What am I missing?
[^SPEAK] matches any character that isn't one of those 5 letters. + means to match the preceding pattern at least 1 time. So [^SPEAK]+ matches a sequence of characters that isn't in that set.
For example, if the input is 123ABCDEFG, it will match 123, BCD and FG.
[^SPEAK] means apart from S,P,E,A,K take any character and + means one or more time
So It will accept any character taken one or more time which s not in the list [SPEAK]
ssssssss is valid string aaaa is valid string SA is invalid string

Find the shortest match that matches a specific condition using regex [duplicate]

This question already has answers here:
Regular expressions: Ensuring b doesn't come between a and c
(4 answers)
Closed 4 years ago.
I'd like to find these three strings in any order and the result may have all these three strings including any character between them with the shortest length.
strings are: "ACT", "AGT" and "CGT".
Sample input: "ACTACGTTTAGTAACTCGTCT"
I tried but the regex returns the first occurrence matched which is "ACTACGTTTAGTAACTCGT"
/(ACT.*AGT.*CGT)|(ACT.*CGT.*AGT)|(AGT.*ACT.*CGT)|(AGT.*CGT.*ACT)|(CGT.*ACT.*AGT)|(CGT.*AGT.*ACT)/g
Output has to be "AGTACTCGT"
You can't return separate bits of a string already concatenated in one go.
See here: Regular expression to skip character in capture group
You can first match each bit, using parentheses to group them, and then put them together in a separate step

regex string length validaiton [duplicate]

This question already has answers here:
How to find the length of a string in R
(6 answers)
Closed 8 years ago.
In R, I need to remove string that exceeds the length of 7 characters, from a column in a data frame.
My code is,
memos.to <- as.data.frame(apply(memos.to,2,function(x)gsub('/^[a-zA-Z0-9]{7,}$/', NA ,x)))
and it doesn't seem to work. What's wrong here?
The easiest way is to just check the string length.
Don't know R lang, but all things being equal, if it conforms to the minimal modern regex's
One of these should match as far as regex is concerned
/.{8,}/ using Dot-all modifier as external flag
or
/(?s).{8,}/
or
/[\S\s]{8,}/ if Dot-all not available
if you are only considering [a-zA-Z0-9] chars
/^[a-zA-Z0-9]{8,}$/

need a regexp to match positive integers separated by commas [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Regular expression to only allow whole numbers and commas in a string
I need a regexp to match a sequence of POSITIVE integers separated by comma. Space is also allowed.
For example
706101, 700102, 700295 should match, but 0, 1, 2, 3 should not.
I tried to use /^\s*(\d+(\s*,\s*\d+)*)?\s*$/ but it seems to accept zeros as well.
Replace (\d+) with [1-9]\d* and it gonna work. For example:
/^\s*[1-9]\d*(?:\s*,\s*[1-9]\d*)*$/
This regex will fail at empty string (while the one in the original post won't), but I assume it's actually the intention. If not, just make the first 'number part' optional.
Something like this would work. The main change is basically switching from \d ([0-9]) to [1-9].
This regex also allows you to type digits such as 0001.
/^(?:0*[1-9]\d*\s*(?:,|$)\s*)+$/gm
As you have not specified language, the flags may change. This is PCRE.
Demo+explanation: http://regex101.com/r/kN2tW0
Try:
[1-9][0-9]+( *, *[1-9][0-9]+)*
Try this
^([1-9]\d*[\s,]*)+$
This RE will completely elliminate strings with standalone '0' at any place in the string.

Regex for range 1-1000 [duplicate]

This question already has answers here:
Regular expression where part of string must be number between 0-100
(7 answers)
Closed 1 year ago.
I need help creating a simple regex for a whole number range of 1-1000, with no special characters.
The two I have both seem to break or allow characters or not the full range:
^\d(\d)?(\d)?$
^[0-9]{1,3}$
Try this:
^([1-9][0-9]{0,2}|1000)$
[1-9][0-9]{0,2} matches any number between 1–999
1000 matches 1000
Use ^(.*[^0-9]|)(1000|[1-9]\d{0,2})([^0-9].*|)$ which will match 1000 or a non-zero digit followed by up to two further digits. It will also allow other characters on either end of the number.