Is there a way to make a string correspond to another string? For example I have the cigar code from a sam file as follows : 77S22M2S
The corresponding sequence is : CCCCGGGGTGGACTTCTCGGGTGCCAAGGAACTCCAGTCACGCCAATAACTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACAGAACTCCATTAACGCAAA
Is there a way I can only extract those letters that match with 22M? For example, the first 77 letters in the sequence (77S) I do not want, the next 22 letters in the sequence (22M) I want to keep and print out, and the last 2 letters in the sequence (2S) I do not want.
I think you mean something like this:
^[CGTA]{77}([CGTA]{22})[CGTA]{2}
With substitution \1.
Related
I have a small issue, I am trying to get specific characters from a long string using regex but I am having trouble.
Workflow
Prometheus --> Grafana --> Variable (using regex)
I can't use anything other than Regex expressions to achieve this result
I am currently using this expression to grab the long string from some json output:
.*channel_id="(.*?)".*
FROM THIS
{account_id="XXXXXXX-xxxx-xxxx-xxxx-xxxxxxxxxx",account_name="testalpha",channel_id="s0022110430col0901241usa",channel_abbr="s0022109430col}
This returns a string that's ALWAYS 24 characters long:
s0022110430col0901241usa
PROBLEM:
I need to grab the 3 letters 'col' and 'usa' as they are the two teams that are playing, ideally I would be able to pipe the results from the first regex to get these values (the position is key, since the first value will ALWAYS be the 12-14th characters and the second value is the last 3 characters) if I could output these values in uppercase with the string "vs" in between to create a string such as:
COL vs USA
or
ARG vs BRA
I am open to any and every suggestion anyone may have
Thank you!
PS - The uppercase thing is 'nice to have' BUT not needed
I'm still learning RegEx, so this is all I could come up with:
For the col (first team):
(?<=(channel_id=".{11}))\w{3}
For the usa (second team):
(?<=(channel_id=".{21}))\w{3}
Can you define the channel_id?
It begins with 's' and then there are many numbers. If they are always numbers, you can use this regex:
channel_id=".[0-9]+([a-z]+)[0-9]+([a-z]+)
You will get 2 groups, one with "col" and the other with "usa".
Edit:
Or if you just know, that you have always the same size, you can use something like:
channel_id=".{11}([a-z]+).{7}([a-z]+)
I have some words how can i match regular expression just one number and all words vice versa?
YV932X6R
V5R67HD1
5R3XPD61
57342D61
CRHXPDV2
12345678
CDHKPQRV
I've tried to use this way, but it's not quite what I want
^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+)$
Output
YV932X6R
V5R67HD1
5R3XPD61
57342D61
CRHXPDV2
Expected Output
CRHXPDV2
OR
57342D61
If you don't care about the length, then you may use the following patterns for just one number or just one letter:
^[A-Za-z]*[0-9][A-Za-z]*$
^[0-9]*[A-Za-z][0-9]*$
If you also have a length requirement of 8 characters, you could enforce that via a positive lookahead. For example, the pattern for one digit and the rest letters would become:
^(?=.{8}$)[A-Za-z]*[0-9][A-Za-z]*$
If i have a series of strings in Python 3.x im iterating over, how do i check if they all have the right formatting of 1 letter and 12 numbers following it. I want a booleon output so i can use it in an if statment? Thanks
/[a-zA-Z]\d{12}/.test(string)
[a-zA-Z] matches any single letter capital or small
\d{12} matches 12 digits
and test() function returns true if a match is found
I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
I quickly found a way to get a working multi-line regular expression for my needs, but having trouble with its conversion into a single line.
So, consider this input with regex /^[2-9]\d{1}(?:\s){0}/gm applied:
4126-54D429-001,
5149-A42102-002,
9251-Z48910-003
...
However, when I turn it to one line, I'm getting only first two digits in ouput:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ...
How can this regexp be written to get this capture:
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003 ... ?
This Should Work.
REGEXP
\b\d{2}(?=\d{2})
INPUT
4126-54D429-001, 5149-A42102-002, 9251-Z48910-003, 7851-Z48910-003
OUTPUT
41
51
92
78
The comma is not essential
If i help u, mark me as correct and vote up
This will capture the first two digits of each in groups:
(\d{2})[^,]*