MATLAB 2012 regular expression - regex

I have a set of strings that I'd like to parse in MATLAB 2012 that all have the following format:
string-int-int-int-int-string
I'd like to pluck out the third integer (the rest are 'don't cares'), but I haven't used MATLAB in ages and need to refresh on regular expressions. I tried using the regular expression '(.*)-(.*)-(.*)-\d-(.*)' but no dice. I did check out the MATLAB regexp page, but wasn't able to figure out how to apply that information to this case.
Anyone know how I might get the desired result? If so, could you explain what the expression you're using is doing to get that result so that others might be able to apply the answer to their unique situation?
Thanks in advance!

str = 'XyzStr-1-2-1000-56789-ILoveStackExchange.txt';
[tok] = regexp(str, '^.+?-.+?-.+?-(\d+?)-.+?-.+?', 'tokens');
tok{:}
ans =
'1000'
Update
Explanation, upon request.
^ - "Anchor", or match beginning of string.
.+? - Wildcard match, one or more, non-greedy.
- - Literal dash/hyphen.
(\d+?) - Digits match, one or more, non-greedy, captured into a token.

^.*?-.*?-.*?-(\d+)-.*?-.*?$
OR
^(?:[^-]*?-){3}(\d+)(?:.*?)$
Group1 now contains your required data

Related

Regular expression to match question mark except repeated or commented(--)

I would like to build a regular expression in C# to match question mark except repeated or commented.
For example, if I have a string below
--???
??
asdlfkj --?
asldfjl -?
aslfldkf --?
aslfkvlv --??
?
-?
dklsafdlafjd = ?
, I want to match like below (between * character).
--???
??
asdlfkj --?
asldfjl -*?*
aslfldkf --?
aslfkvlv --??
*?*
-*?*
dklsafdlafjd = *?*
I'm developing SQL binding method using 2 parameters.
The first one is SQL, for example
select * from atable where id = ?.
SQL can have comment so I want ignore them.
The second one is parameter for SQL as Array to match sequentially;
Does anyone have good idea for it?
If you can negate this regex it should work for you:
(\?{2,}|(?<=--)\?)
I don't know what language you're working in, but you should be able to filter by line. Apply this regex as a predicate and either negate it or use a exclude function.
I'll leave those implementation details up to you.

regex for repeating values

I am trying to find the correct regex (for use with Java and JavaScript) to validate an array of day-of-week and 24-hour time formats. I figured out the time format but am struggling to come up with the full solution.
The regex needs to validate patterns which include one or more of the following, separated by a comma.
{two-character day} HH:MM-HH:MM
Three examples of valid strings would be:
M 5:30-7:00
M 5:30-7:00, T 5:30-7:00, W 18:00-19:30
F 12:00-14:30, Sa 6:45-8:15, Su 6:45-8:15
This should validate a 24-hour time:
/^((M|T|W|Th|Fr|Sa|Su) ([01]?[0-9]|2[0-3]):[0-5][0-9]-([01]?[0-9]|2[0-3]):[0-5][0-9](, )?)+$/
Credit for the time bit goes to mkyong: http://www.mkyong.com/regular-expressions/how-to-validate-time-in-24-hours-format-with-regular-expression/
you can try this
[A-Za-z]{1,2}[ ]\d+:\d+-\d+:\d+
You could try this: ([MTWFS][ouehra]?) ([0-9]|[1-2][0-9]):([0-6][0-9])-([0-9]|[1-2][0-9]):([0-6][0-9])
I'd go with this:
(((M|T(u|h)|W|F|S(a|u)) ((1*\d)|(2[0-3])):[1-5]\d-((1*\d)|(2[0-3])):[1-5]\d(, )?)+
This should do the trick:
^(M|Tu|W|Th|F|Sa|Su) \d{1,2}:\d{2}-\d{1,2}:\d{2}(, (M|Tu|W|Th|F|Sa|Su) \d{1,2}:\d{2}-\d{1,2}:\d{2})*$
Note that you show T in your example above which is ambiguous. You might want to enforce Tu and Th as shown in my regex.
This will capture all sets in an array. The T in the short day of week list is debatable (tuesday or thursday?).
^((?:[MTWFS]|Tu|Th|Sa|Su)\s(?:[0-9]{1,2}:[0-9]{2})-(?:[0-9]{1,2}:[0-9]{2})(?:,\s)?)+$
The (?:) are non-capturing groups, so your actual matches will be (for example):
M 5:30-7:00
T 5:30-7:00
W 18:00-19:30
But the entire line will validate.
Added ^ and $ for line boundaries and an explicit time-time match because some regular expression parsers may not work with the previous way that I had it.

regular expression match domain

I need a regular expression to match the following domains as follows:
http://www.cnn.com/fred = www.cnn.com
cnn.com = cnn.com
www.cnn.com:8080 = www.cnn.com
I have the following regular expression (using pcre):
([^/]+://)?([^:/]+)
The above works fine in case 2 and 3 however with 1 i still have the http:// appended to the matching string, is there a regular expression option which i can use to skip the http part?
many thanks in advance
This one should suit your needs:
^(?:(?:f|ht)tps?://)?([^/:]+)
The first group will contain what you're looking for.
this looks like the closest i could get to what i want not perfect but seems to gets the job done
www?([^/:]+)

Why is it selecting this file?

I have the following statement:
Directory.GetFiles(filePath, "A*.pdf")
.Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].*"))
.Skip((pageNum - 1) * pageSize)
.Take(pageSize)
.Select(path => new FileInfo(path))
.ToArray()
My problems is that the above statement also finds the file "Adali.pdf" which it should not - but i cannot figure out why.
The above statement should only select files starting with a, and where the second letter is in the range i-l.
Because it matches Adali taking 3rd and 4th characters (al):
Adali
--
Try using ^ in your regex which allows looking for start of the string (regex cheatsheet):
Regex.IsMatch(..., "^[Aa][i-lI-L].*")
Also I doubt you need asterisk at all.
PS: As a sidenote let me notice that this question doesn't seem to be written that good. You should try debugging this code yourself and particularly you should try checking your regex against your cases without LINQ. I'm sure there is nothing to do here with LINQ (the tag you have in your question), but the issue is about regular expressions (which you didn't mention in tags at all).
You are not anchoring the string. This makes the regex match the al in Adali.pdf.
Change the regex to ^[Aa][i-lI-L].* You can do just ^[Aa][i-lI-L] if you don't need anything besides matching.
You should to do this
var f = Directory.GetFiles(tb_Path.Text, "A*.pdf").Where(file => Regex.IsMatch(Path.GetFileName(file), "[Aa][i-lI-L].pdf")).ToArray();
When you call ".*" Adali accept in Regex

Regular expression quantifier questions

Im trying to find a regular expression that matches this kind of URL:
http://sub.domain.com/selector/F/13/K/100546/sampletext/654654/K/sampletext_sampletext.html
and dont match this:
http://sub.domain.com/selector/F/13/K/10546/sampletext/5987/K/sample/K/101/sample_text.html
only if the number of /K/ is minimum 1 and maximum 2 (something with a quantifier like {1,2})
Until this moment i have the following regexp:
http://sub\.domain\.com/selector/F/[0-9]{1,2}/[a-z0-9_-]+/
Now i would need a hand to add any kind of condition like:
Match this if in the text appears the /K/ from 1 to 2 times at most.
Thanks in advance.
Best Regards.
Josema
Do you need to this all in one line?
The approach I would take is to do a regex for /K/ and then count the number of matches I got.
I think Boost is a C++ library right? In C# I would do it like this:
string url = "http://sub.domain.com/selector/F/13/K/100546/sampletext/654654/K/sampletext_sampletext.html";
if (Regex.Matches(url, "/K/").Count <= 2)
{
// good url found
}
UPDATE
This regex would match everything up to the first two K's and then only allow the url filename.html after that:
^http://sub.domain.com/selector/F/[\d]+/[a-zA-Z]+/[\d]+/[a-zA-Z]+/[\d]+/K/[a-zA-Z_]+\.html$
This RE will match anything after the/F/[0-9]{1,2} that has 1 or 2 /K/, it could also match http://sub.domain.com/selector/F/13/K/100546/stuff/21515/stuff/sampletext/654654/K/stuff/sampletext_sampletext.html :
^http://sub\.domain\.com/selector/F/[0-9]{1,2}(?:/K(?=/)(?:(?!/K/)/[a-z0-9_.-]+)*){1,2}$