Regex search string contains with specific count of letter - regex

I am trying to workout the regex for searching string which satisfies count of letters where not in specific order
such as:
AAABBBCCCDDD
BBBAAADDDCCC
CCCAAABBBDDD
are TRUE:
so far, I have got A{3}B{3}C{3}D{3} would matches the first line, but for other lines would be needing different order.
is there any great solution that would work out?

You can match and capture a letter, then backreference that captured character. Repeat the whole thing as many times as needed, which looks to be 4 here:
(?:([A-Z])\1{2}){4}
https://regex101.com/r/vrQVgD/1
If the same character can't appear as a sequence more than once, I don't think this can be done in such a DRY manner, you'll need separate capture groups:
([A-Z])\1{2}(?!\1)([A-Z])\2{2}(?!\1|\2)([A-Z])\3{2}(?!\1|\2|\3)([A-Z])\4{2}
https://regex101.com/r/vrQVgD/2
which is essentially 4 of a variation on the below put together:
(?!\1|\2|\3)([A-Z])\4{2}
The (?!\1|\2|\3) checks that the next character hasn't occurred in any of the previously matched capture groups.

Related

Regular Expression: Two words in any order but with a string between?

I want to use positive lookaheads so that RegEx will pick up two words from two different sets in any order, but with a string between them of length 1 to 20 that is always in the middle.
It also is already case insensitive, allow for any number of characters including 0 before the first word found and the same after the second word found - I am unsure if it is more correct to terminate in $.
Without the any order matching I am so far as:
(?i:.*(new|launch|releas)+.{1,20}(product1|product2)+.*)
I have attempted to add any order matching with the following but it only picks up the first word:
(?i:.*(?=new|launch|releas)+.{1,20}(?=product1|product2)+.*)
I thought perhaps this was because of the +.{1,20} in the middle but I am unsure how it could work if I add this to both sets instead, as for instance this could cause a problem if the first word is the very first part of the source text it is parsing, and so no character before it.
I have seen example where \b is used for lookaheads but that also seems like it may cause a problem as I want it to match when the first word is at the start of the source text but also when it is not.
How should I edit my RegEx here please?

VSCode chaining regex transforms in a snippet

I'm trying to transform a filename automatically in VSCode in two ways.
Let's say I have test-file-name.md, I want to end up with Test File Name in my document.
Right now I can do both part of the transform separately but I'm struggling to find how to combine them.
To remove all the - and replace them with a space I do this: ${TM_FILENAME_BASE/[-]/ /g}
And for capitalizing the first letter of each word I do: ${TM_FILENAME_BASE/(\b[a-z])/${1:/upcase}/g} (the \b is probably useless in this case I guess)
I've tried multiples way of writing them together but I can't find a way to have them one after the other.
Try this:
"body": "${TM_FILENAME_BASE/([^-]+)(-*)/${1:/capitalize}${2:+ }/g}"
Because of the g flag it will get all the occurrences and do each transform of the two capture groups multiple time. In your test case (test-)(file-)(name) that would be three times. It should work for any number of hyphenated words.
([^-]+) everything up to a hyphen.
${1:/capitalize} capitalize capture group 1.
${2:+ } means if there is a 2nd capture group, the (-*), add a space. I added this because at the end there is no hyphen - and thus there will be no 2nd capture group and thus no extra space should be added at the end.

Putting a group within a group [123[a-u]]

I'm having a lot more difficulty than I anticipated in creating a simple regex to match any specific characters, including a range of characters from the alphabet.
I've been playing with regex101 for a while now, but every combination seems to result in no matches.
Example expression:
[\n\r\t\s\(\)-]
Preferred expression:
[[a-z][a-Z]\n\r\t\s\(\)-]
Example input:
(123) 241()-127()()() abc ((((((((
Ideally the expression will capture every character except the digits
I know I could always manually input "abcdefgh".... but there has to be an easier way. I also know there are easier ways to capture numbers only, but there are some special characters and letters which I may eventually need to include as well.
With regex you can set the regex expression to trigger on a range of characters like in your above example [a-z] that will capture any letter in the alphabet that is between a and z. To trigger on more than one character you can add a "+" to it or, if you want to limit the number of characters captured you can use {n} where n is the number of characters you want to capture. So, [a-z]+ is one or more and [a-z]{4} would match on the first four characters between a and z.
You can use partial intervals. For example, [a-j] will match all characters from a to j. So, [a-j]{2} for string a6b7cd will match only cd. Also you can use these intervals several times within same group like this: [a-j4-6]{4}. This regex will match ab44 but not ab47
Overlooked a pretty small character. The term I was looking for was "Alternative" apparently.
[\r\t\n]|[a-z] with the missing element being the | character. This will allow it to match anything from the first group, and then continue on to match the second group.
At least that's my conclusion when testing this specific example.

Regex to match reocurring character groups

I'm trying to write a regex that would match groups of exactly three characters, that reoccur within the text at least one time.
What I came up with is this simple regex:(.{3}).*\g1, using the \g (global) and \s (dot also matches newline) flags. However, it is clearly faulty, as it only finds a part of the groups I'm hoping to capture. Any idea how can I improve it? Here is the link to an example input https://regex101.com/r/Cuiva1/2
Edit: Here's the full list of groups I was hoping to capture as requested in the comment:GLT,VIW,IWK,KTL,GLT,LTK,LIS,KTX,TXK,XDL,KTL
If your input is always multiple triplets of uppercase characters and you're only looking for ones that repeat, then you need something more complex to avoid backtracking into a previous triplet:
/(?>[^A-Z]*+([A-Z]{3}))(?=(?:[^A-Z]*+[A-Z]{3})*?\1)|(?>[^A-Z]*+[A-Z]{3})/g
The matches from index 1 will hold what you want. If your strings are not that well formatted (i.e. may contain any length string in between repeating patterns, then you can use a simpler pattern but you'll get totally inconsistent results and miss some matches.
I re-read your desired output, you're not going to achieve this with regex. VIW and IWK are overlapping, which won't work in a single preg_match_all(). Just use string functions.

Regular Expression pattern explanation

Dono what this regular expression is doing
(?>[^\,]*\,){3}([^\,]*)[\']?
(?>[^\,]*\,){4}([^\,]*)[\']?
could any one explain me more in deatil
There is an awesome site http://regex101.com for these needs! It describes regulars and allows you to test and debug them.
Your ones does match things like 4 (5 for the second one) values separated by commas and returns the last one as a signle matching group:
(?>...) are atomic groups. After they have matched once they won't leave it forever.
[^\,] matches any character except comma
[^\,]*\, means any number (even zero) of non-comma charaters, and then a sigle comma
(?>[^\,]*\,){3} means do that happend above 3 times
([^\,]*)[\']? means one more word without commas as a group and possibly one more comma.
For example, in 1,,333,4,5 the first one will match 1,,333,4, and return 4 as matched group. The second one will find 1,,333,4,5 and 5 as group.
Edit: Even more description.
Regular expression have groups. These are parts or regular expressions that can have number quantifiers -- how many times to repeat them ({3}) and some options. Also, after regular has matched, we can find out what every group has matched.
Atomic ones, less talk, take as much forward as they can and never go back. Also, they can't be watched as described before. They are used here only due to perfomance reasons.
So, we need to take as a group the 4th word from comma-separated values. We will do it like this:
We will take 3 times ({3}) an atomic group ((?>...)):
Which takes a word -- any number of characters (*) of any non-comma character ([^\n])
[^...] means any symbol except described ones.
And a comma (\,) that separates that word from the next one
Now, our wanted word starts. We open a group ((...))
That will take a word as described above: [^\,]*
The is possibly one more comma, take it if there is one (\,? or [\,]?)
? means 0 or 1 group before, here it's single comma.
So, it starts on first word in first atomic group, takes it all, then takes a comma. After that, it is repeated 2 times more. That takes 3 first words with their commas.
After that, one non-atomic group takes the 4th word.