Regular expression to get the letters with numbers on the left - regex

I want to extract the letters with numbers on the left.
For example 3F 4G XY output F and G.
I tried to do this using this regular expression (?=\d)[A-Z], but failed.
Look at the following code:
s = "1A 2B 3C IJK X1 Y2 Z3"
s.match(/(?=\d)[A-Z]/g) // null
s.match(/[A-Z](?=\d)/g) // [ 'X', 'Y', 'Z' ]
I was surprised why the first match was not found, and the second was found correctly.
In my opinion, this even expression is the same, but the left and right change.
Did I get it wrong?
That,
How do I get this output ['A', 'B', 'C']?

Because the text is processed from left to right the number is processed first, so you need a look-behind:
(?<=\d)[A-Z]
So the regex engine steps behind into the already processed text to look for your pattern

Related

Regex exact number of exact characters in random position

For example, I have a string:
R-G-B-G
I want to know is this string contains at least 2 'G', 1 'R' and 1 'B'.
String can change it size from 1 to 11 symbols, letters can appear at random position separated by '-'.
3 'R' -> R-R-G-R-R-B -> match
2 'B' -> ... -> not match
1 'R' 1 'G' 1 'B' -> ... -> match
Edit:
Moving forward i need to check some more cases.
My strings consist of Sockets, Links and Colors, ex:
R R-R B-R-R has 6 Sockets, one 2-link, one 3-link, 5R, 1B and 0G colors.
User sends me a number of sockets, length of the link, and desired colors in that link, and i must tell if it has all that or not. I've built expressions to match each individual case, but i cant figure out how to put them together.
6 sockets, Link with size 2, 2 R colors
All of that combined must return true because R R-R B-R-R, on the other hand, string: R R R B-R-R must be false because 2 R link is part of a 3 link B-R-R.
Or should i just run 3 separate expressions with output of previous sent as an input to the next.
You can chain look-aheads (?=...) after the start ^ for each required letter.
F.e. for at least two 'R' and two 'G' and one 'B'.
And for a string that's 1 to 11 long.
^(?=(?:.*?R){2})(?=(?:.*?G){2})(?=(?:.*?B){1}).{1,11}$
Test here on regex101
Note that the lazy searches *? used in those look-aheads is just a speed optimalization.
It won't matter much in this case, since it's such a short string.

Regex to validate a password in order to create a DFA

I need to create a regex to validate a password and then create a DFA with it.
The sets are:
a = {a,...,z}
| A = {A,...,Z}
| d = {0,...,9}
The criteria are:
Must begin with a letter (doesn't matter if upper or lower case).
| Must contain at least 1 upper case.
| Must contain at least 1 lower case.
| Must contain at least 1 number.
So far, I've come with the following Regex:
(aa\*(AA\*a\*dd\*|dd\*a\*AA\*)|AA\*(aa\*A\*dd\*|dd\*A\*aa\*))(a|A|d)\*
Is it correct?
The inner parts of your expression are slightly wrong. For example, with AA*a*dd*, you trying to check the case where it begins with 'a' and then 'A' occurs before 'd', but this does not match "aAaAd". Here is a correct version:
(aa*(d(a|d)*A|A(a|A)*d)|AA*(d(A|d)*a|a(A|a)*d))(a|A|d)*
The DFA you should do as an exercise. It should have 7 states including start and end. Think about establishing one state for each combination of characters that you have left to match.

R - split string before two last digits in each column cell

I have a csv with usernames in a column, followed by each user's feedback rating, out of 100.
E.g. James89
I hope to find a way to split the name and the rating, e.g. by inserting a comma before the two last digits using regex. Is this possible? And/or is there a better way to do this?
df1 = data.frame(Product = c(rep("ARCH78"), rep("AUSFUNGUY91"), rep("AddiesAndXans96"), rep("AfroBro79")))
The code above is a tiny excerpt of the data I'm dealing with. I hope to get this output:
ARCH 78
AUSFUNGUY 91
AddiesAndXans 96
AfroBro 79
I've tried this code (inspired from this answer:
df1$P2 <- gsub("(.*?)(..)", "\\1", df1$Product)
It seems to be working, but there's something wrong with the output:
ARCH78 AR
AUSFUNGUY91 AUUNY
AddiesAndXans96 AdesdXs
AfroBro79 AfBr9
As for the following:
I hope to find a way to split the name and the rating, e.g. by inserting a comma before the two last digits using regex.
You can achieve it with a mere
df1 = data.frame(Product = c(rep("ARCH78"), rep("AUSFUNGUY91"), rep("AddiesAndXans96"), rep("AfroBro79")))
gsub("(\\d{2})$",",\\1",df1$Product)
## => [1] "ARCH,78" "AUSFUNGUY,91" "AddiesAndXans,96" "AfroBro,79"
See IDEONE demo
You can further adjust the replacement ",\\1" that features a backreference \1 to the last 2 digits.

make two strings correspond using regex

Is there a way to make a string correspond to another string? For example I have the cigar code from a sam file as follows : 77S22M2S
The corresponding sequence is : CCCCGGGGTGGACTTCTCGGGTGCCAAGGAACTCCAGTCACGCCAATAACTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACAGAACTCCATTAACGCAAA
Is there a way I can only extract those letters that match with 22M? For example, the first 77 letters in the sequence (77S) I do not want, the next 22 letters in the sequence (22M) I want to keep and print out, and the last 2 letters in the sequence (2S) I do not want.
I think you mean something like this:
^[CGTA]{77}([CGTA]{22})[CGTA]{2}
With substitution \1.

Using gregexpr to get position in a string

What I want to do is to extract the position of a certain expression in a character string (length is 22588). I tried it in the following way:
This is the pattern I'm looking for:
\n,null,[null,null,12.27,800.54]\n,
\n,null,[null,null,12.58,670.84]\n,
\n,null,[null,null,13.45,750.25]\n,
And so on.
I try to give an example:
test = "some other stuff \n,null,[null,null,12.27,800.54]\n, other stuff a lot of characters \n,null,[null,null,12.58,670.84]\n, and again \n,null,[null,null,13.45,750.25]\n,"
Now I want to get the positions of the expressions. which have this pattern:
\n,null,[null,null,"decimal numbers""comma between decimal numbers""decimal numbers"]\n,
This is what I tried:
mypattern = "\\\\n,null,\\[\null,null,[:alnum:]\\]\\\\\n,"
gg = gregexpr(mypattern,datalines)
Unfortunately this does not work. In the middle I always have these coordinates. So I need a wildcard for them and I also gues R has a problem to read the metacharacter.
Thanks in advance!
You can try with this pattern:
"\\\n,null,\\[null,null,\\d+\\.\\d+\\,\\d+\\.\\d+\\]\\\n"
or this pattern if the numbers of digits before and after each "." stay the same:
"\\\n,null,\\[null,null,\\d{2}\\.\\d{2}\\,\\d{3}\\.\\d{2}\\]\\\n"
With your example:
gregexpr("\\\n,null,\\[null,null,\\d+\\.\\d+\\,\\d+\\.\\d+\\]\\\n",test)
gregexpr("\\\n,null,\\[null,null,\\d{2}\\.\\d{2}\\,\\d{3}\\.\\d{2}\\]\\\n",test)
#[[1]]
#[1] 18 84 129
#attr(,"match.length")
#[1] 32 32 32
#attr(,"useBytes")
#[1] TRUE