Part of as string from a string using regular expressions - regex

I have a string of 5 characters out of which the first two characters should be in some list and next three should be in some other list.
How could i validate them with regular expressions?
Example:
List for First two characters {VBNET, CSNET, HTML)}
List for next three characters {BEGINNER, EXPERT, MEDIUM}
My Strings are going to be: VBBEG, CSBEG, etc.
My regular expression should find that the input string first two characters could be either VB, CS, HT and the rest should also be like that.

Would the following expression work for you in a more general case (so that you don't have hardcoded values): (^..)(.*$)
- returns the first two letters in the first group, and the remaining letters in the second group.

something like this:
^(VB|CS|HT)(BEG|EXP|MED)$

This recipe works for me:
^(VB|CS|HT)(BEG|EXP|MED)$

I guess (VB|CS|HT)(BEG|EXP|MED) should do it.

If your strings are as well-defined as this, you don't even need regex - simple string slicing would work.
For example, in Python we might say:
mystring = "HTEXP"
prefix = mystring[0:2]
suffix = mystring[2:5]
if (prefix in ['HT','CS','VB']) AND (suffix in ['BEG','MED','EXP']):
pass # valid!
else:
pass # not valid. :(
Don't use regex where elementary string operations will do.

Related

Find group of strings starting and ending by a character using regular expression

I have a string, and I want to extract, using regular expressions, groups of characters that are between the character : and the other character /.
typically, here is a string example I'm getting:
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
and so, I want to retrieved, 45.72643,4.91203 and also hereanotherdata
As they are both between characters : and /.
I tried with this syntax in a easier string where there is only 1 time the pattern,
[tt]=regexp(str,':(\w.*)/','match')
tt = ':45.72643,4.91203/'
but it works only if the pattern happens once. If I use it in string containing multiples times the pattern, I get all the string between the first : and the last /.
How can I mention that the pattern will occur multiple time, and how can I retrieve it?
Use lookaround and a lazy quantifier:
regexp(str, '(?<=:).+?(?=/)', 'match')
Example (Matlab R2016b):
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = regexp(str, '(?<=:).+?(?=/)', 'match')
result =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
In most languages this is hard to do with a single regexp. Ultimately you'll only ever get back the one string, and you want to get back multiple strings.
I've never used Matlab, so it may be possible in that language, but based on other languages, this is how I'd approach it...
I can't give you the exact code, but a search indicates that in Matlab there is a function called strsplit, example...
C = strsplit(data,':')
That should will break your original string up into an array of strings, using the ":" as the break point. You can then ignore the first array index (as it contains text before a ":"), loop the rest of the array and regexp to extract everything that comes before a "/".
So for instance...
'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh'
Breaks down into an array with parts...
1 - 'abcd'
2 - '45.72643,4.91203/Rou'
3 - 'hereanotherdata/defgh'
Then Ignore 1, and extract everything before the "/" in 2 and 3.
As John Mawer and Adriaan mentioned, strsplit is a good place to start with. You can use it for both ':' and '/', but then you will not be able to determine where each of them started. If you do it with strsplit twice, you can know where the ':' starts :
A='abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
B=cellfun(#(x) strsplit(x,'/'),strsplit(A,':'),'uniformoutput',0);
Now B has cells that start with ':', and has two cells in each cell that contain '/' also. You can extract it with checking where B has more than one cell, and take the first of each of them:
C=cellfun(#(x) x{1},B(cellfun('length',B)>1),'uniformoutput',0)
C =
1×2 cell array
'45.72643,4.91203' 'hereanotherdata'
Starting in 16b you can use extractBetween:
>> str = 'abcd:45.72643,4.91203/Rou:hereanotherdata/defgh';
>> result = extractBetween(str,':','/')
result =
2×1 cell array
{'45.72643,4.91203'}
{'hereanotherdata' }
If all your text elements have the same number of delimiters this can be vectorized too.

Match return substring between two substrings using regexp

I have a list of records that are character vectors. Here's an example:
'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'
From these names I would like to extract whatever's between the two substrings 1mil_ and _ks_drivers_sorted.csv.
So in this case the output would be:
0,1_1_1_lb200
0_1_lb100
1_1_lb2_100_100
1_1_lb100
I'm using MATLAB so I thought to use regexp to do this, but I can't understand what kind of regular expression would be correct.
Or are there some other ways to do this without using regexp?
Let the data be:
x = {'1mil_0,1_1_1_lb200_ks_drivers_sorted.csv'
'1mil_0_1_lb100_ks_drivers_sorted.csv'
'1mil_1_1_lb2_100_100_ks_drivers_sorted.csv'
'1mil_1_1_lb100_ks_drivers_sorted.csv'};
You can use lookbehind and lookahead to find the two limiting substrings, and match everything in between:
result = cellfun(#(c) regexp(c, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match'), x);
Or, since the regular expression only produces one match, the following simpler alternative can be used (thanks #excaza for noticing):
result = regexp(x, '(?<=1mil_).*(?=_ks_drivers_sorted\.csv)', 'match', 'once');
In your example, either of the above gives
result =
4×1 cell array
'0,1_1_1_lb200'
'0_1_lb100'
'1_1_lb2_100_100'
'1_1_lb100'
For me the easy way to do this is just use espace or nothing to replace what you don't need in your string, and the rest is what you need.
If is a list, you can use a loop to do this.
Exemple to replace "1mil_" with "" and "_ks_drivers_sorted.csv" with ""
newChr = strrep(chr,'1mil_','')
newChr = strrep(chr,'_ks_drivers_sorted.csv','')

Regular Expression: Match Hex/Key Strings

I have tried creating a regular expression myself to do this, but honestly my mind is so boggled with it right now that I must ask for help... This may be helpful for people in the future as well.
I have the following input templates:
06-6A-BF-05-AF-84-DF-A4-23-7C-BE-B4-6C-95-D7
JK1T-XTSRV-2HC4D-RP4S7-ZMKRG
I need to pick out strings like these two from an input string. An input string may look like this:
JK1T-XTSRV-2HC4D-RP4S7-ZMKRG
FDGF-A1S0M-5M8XJ-T08WC-BCZSJ
C6-6C-1C-17-B7-EE-BE-EA-E3-7C-EF-23-6C-12-F1
asdf234 ,f C6-324_EE
In this case, the following would be returned:
JK1T-XTSRV-2HC4D-RP4S7-ZMKRG, FDGF-A1S0M-5M8XJ-T08WC-BCZSJ, C6-6C-1C-17-B7-EE-BE-EA-E3-7C-EF-23-6C-12-F1
Thus, the regular expression would need to have the following restrictions to match a string:
15 two character (numbers or letters) pairs separated by -
5 four character (numbers or letters) pairs separated by -
What regular expression will match these?
You should use two regular expressions:
(\w{2}-){14}\w{2}
\w{4}-(\w{5}-){3}\w{5}
The second type is actually one four char and four five char.
Test 1:
http://fiddle.re/h3ve6
Test 2:
http://fiddle.re/3a5e6

How can I match all strings unless it contains a certain string?

So I want to match every string in this list, except the ones that contain the product SKU, which is /s7892632 <---- random string of numbers. I've been trying to do this for quite some time and have been unsuccessful. Any insight would be greatly appreciated.
/account/login?returnurl=/account/forgotpassword
/account/login?returnurl=/account/orders
/account/orders
/account/updateaddress
/account/updateemail
/account/updaterewardscard
/brands/havaianas
/careers
/Category List
/checkout
/checkout/addresses
/checkout/addresses/delivery
/checkout/addresses/deliverymethod
/checkout/affilinetbasket
/checkout/anonymous
/checkout/confirmation
/checkout/express
/checkout/login
/checkout/login?returnurl=/checkout/addresses
/checkout/null
/checkout/payment
/checkout/paypal
/checkout/quickshop/
/checkout/verify
/click-and-collect
/click-and-collect/click-and-collect-overview
/corporate/about-matalan
/corporate/careers
/corporate/cookies
/corporate/history
/customer-services/accessibility
/customer-services/contact
/customer-services/customer-services-home
/customer-services/delivery
/customer-services/faq
/customer-services/fitting-room
/customer-services/here-to-help
/customer-services/size-guides
/delivery
/events/mothers-day
/events/mothers-day/s2516241/tassle-detail-slouch-bag
/events/mothers-day/s2518752/waxed-jacket
/events/mothers-day/s2519237/fabric-buckle-tote-bag
/events/mothers-day/s2521182/heart-print-nightie
/events/mothers-day/s2521184/heart-print-dressing-gown
/events/mothers-day/s2521185/heart-print-pyjama-set
/events/mothers-day/s2521679/structured-tote-bag
/events/mothers-day/s2522143/chiffon-print-dress
/events/mothers-day/s2522347/butterfly-enamel-bowl-32cm-x-8cm
/events/mothers-day/s2526013/animal-print-jersey-blazer
/events/mothers-day/s2527624/croc-tote-bag
/events/mothers-day/s2529731/shift-dress
/events/mothers-day?page=1&size=120&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
/events/mothers-day?page=2&size=120&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
/events/mothers-day?page=2&size=36&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
/events/mothers-day?page=3&size=36&cols=4&sort=&id=/events/mothers-day&priceRange[min]=2&priceRange[max]=59
The following should work:
^(?!.*/s\d{7}/).*
Example: http://regexr.com?343nf
This assumes you have each string as a separate element in a list. If this is actually matching one big string with multiple lines you can use the same regex, but you may need to enable global and multiline options depending on the tool you are using (and make sure dotall/singleline is disabled).
Try this:
boolean noSku = !line.matches(".*/s\\d{5,}.*");
This uses {5,} which allows for any number of digits in the SKU greater than 4 (giving you flexibility with matching). You can change the number to whatever suits.
this matches lines that don't have the code....
^((?!s\d{7}).)*$

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed