In the below example I want to match one always and also match two if two follows one.
one two three four
one three four five
Is this possible with regex?
For my real life problem I need to match DOWN (Partial) but (Partial) is not always in string.
Match one, then two optionally:
one( two)?
See live demo.
For your real life example:
DOWN( \(Partial\))?
Related
For example, I have a string 111352_01_2_SAMPLE_TEXT_SAMPLE. I need to match first, second, third number and remaining text.
Currently I have this:
First number: ^[^_]+(?=_) (Everything until 1. underscore)
Second number: (?<=_)[^_]*(?=_) (Everything between 1. and 2. underscore)
Remaining text: (?:.*?_){3}(.*)\s* (Text after third occurrence of underscore)
Is there any more "readable" way of building expression, since the logic for first three matches in quite similar.
And what's the best way of writing expression for matching everything
Since you tagged regex-group I think a more straightforward way of retrieving these three substring could be:
^(.*?)_(.*?)_.*?_(.*)$
See the demo
Maybe you are looking to get a single regex expressions that is applicable to whichever element from the string you want. In that case you could use:
^(?:.*?_){0}([^\n_]+)
This is a zero-index type of retrieving elements delimited by an underscore. However, I do not see the benefit over a regular split() function. Change the zero to a 1, 2 or 3 etc.
Just use
^(\d+)_(\d+)_(\d+)_(.+)
See a demo on regex101.com.
I'm trying to create a regex expression that will create a match if a string has at least 2 words out of N. For example, take the words ('one', 'two', 'three', 'four'). This regex should return a match for all these cases:
one two three four
twothreeone
two plus two is four
It should not return a match for:
one
three plus three is three
I have tried something like this'/^(?=.*one)(?=.*two)(?=.*three)(?=.*four).+/', but this will only match if all words ('one', 'two', 'three', 'four') are contained in the string.
Apologies for stealing someone's comment, but it does appear to work!
In Perl/PCRE you can use a reference to a subpattern in a capture group with (?n) where n is the number of the capture group. So: (one|two|three|four).*(?!\1)(?1). In the worst case, you don't have to type everything twice when you know the shortcuts ctrl+c and ctrl+v – Casimir et Hippolyte 4 hours ago
% pcretest
PCRE version 8.35 2014-04-04
re> #(one|two|three|four).*(?!\1)(?1)#
data> one one one
No match
data> one two one
0: one two
1: one
data> one four
0: one four
1: one
data> four four
No match
data> ^D
%
Indeed, in pcre, which is a popular library used by nginx (the only dependency of the whole nginx port in OpenBSD ports!) and lots of other software, you can use something like (?1) (or (?-1)) to refer to the previous pattern, so, you don't have to copy-paste the thing several times, as well as the negative look-ahead, which is just standard fare.
Here's the docs on the features at stake — you may want to look into the pcrepattern and pcresyntax manual pages, sections as below:
http://www.pcre.org/original/doc/html/pcresyntax.html#SEC19
http://mdoc.su/f/pcresyntax.3#LOOKAHEAD_AND_LOOKBEHIND_ASSERTIONS
(?!...) negative look ahead
http://www.pcre.org/original/doc/html/pcresyntax.html#SEC21
http://mdoc.su/f/pcresyntax.3#SUBROUTINE_REFERENCES_(POSSIBLY%09RECURSIVE)
(?n) call subpattern by absolute number
http://www.pcre.org/original/doc/html/pcrepattern.html#SEC24
http://mdoc.su/f/pcrepattern.3#SUBPATTERNS_AS_SUBROUTINES
etc.
In general, the http://www.pcre.org/original/pcre.txt and http://www.pcre.org/pcre2.txt pages include complete documentation, and are helpful in searching up that syntax you've seen somewhere.
(one|two|three|four).*(?!\1)(?-1)
Explanation:
Capture one of the words in the group
Find any amount of characters
If you find what was matched in the last group don't match
Unless you find another match of the group one behind this one (recursive subpattern)
This will mean when you edit it, you'll be able to just edit one capture group, assuming you're using PCRE regex (with say, PHP).
Check out the demo
Search for two copies of a target word, but capture the first and apply a negative lookahead on the second word using a back reference to the first group to assert that a different word appeared in the second group - making (at least) 2 in total.
(one|two|three|four).*(?!\1)(one|two|three|four)
See live demo.
For example lets take the sequence
"aaaaaa".
I want regex to match all subsequences, including repeating characters. Meaning the total count of subsequences should be 5, instead of 3.
Clarification:
Lets numerate our characters. Our sequence will look something like
"a1a2a3a4a5a6"
All subsequences are:
"a1a2", "a2a3". "a3a4", "a4a5", "a5a6"
Can I do that in regex? I am currently programming in Java and I know it is possible to develop an algorithm there, but I would like to avoid that for now.
You can use the following regex:
(?=((a)\2))
See demo
The technique of capturing the overlapping substrings inside a positive lookahead is described here.
The difference is that you need to use 2 capturing groups: one is a "functional", technical, inner group to make sure we match two identical consecutive symbols, and the outer group (ID#1) that we can use to extract the values we need.
I have 2 strings which are 2 records
string1 = "abc/BS-QANTAS\\/DS-12JUL15\\dfd"
string2 = "/DS-10JUN15\\/BS-AIRFRANCE\\dfdsfsdf"
BS is booking airline
DS is Date
I want to use a single regex and extract the booking source & date. Please let me know if it is feasible.
I have tried lookaheads and still couldn't achieve
The target language is Splunk and not Javascript.
Whatever may be the language please post I'll give a try in Splunk
You mentioned that you've tried lookahead, what about lookbehind?
(?<=BS-|DS-)(\w+)
Tested at Regex101
Here's a more scalable (and more readable, IMO) alternative to miroxlav's answer:
(?:\/BS-(?P<source>\w+)|\/DS-(?P<date>\w+)|[^\/\v]+)+
I'm assuming the fields you're interested in always start with a slash. That allows me to use [^/]+ to safely consume the junk between/around them.
demo
This is effectively three regexes in one, wrapped in a group, to give each one a chance to match in turn, and applied multiple times. If the first alternative matches, you're looking at a "source airline" field, and the name is captured in the group named "source". If then second alternative matches, you're looking at the date, which is captured in the "date" group.
But, because the fields aren't in a predetermined order, the regex has to match the whole string to be sure of matching both fields (in fact, I should have used start and end anchors--^ and $--to enforce that; I've added them below). The third alternative, [^/]+, allows it to consume the parts that the first two can't, thus making an overall match possible. Here's the updated regex:
^(?:\/BS-(?P<source>\w+)|\/DS-(?P<date>\w+)|[^\/\v]+)+$
...and the updated demo. As noted in the comment, the \v is there only because I'm combining your two examples into one multiline string and doing two matches. You shouldn't need it in real life.
This gives you both strings filled either in match groups airline1+date1 or in airline2+date2:
((BS-(?<airline1>\w+).*DS-(?<date1>[\w]+))|(DS-(?<date2>[\w]+).*BS-(?<airline2>\w+)))
>> view at regex101.com
Since there are only 2 groups, I used simple permutation.
This regex will take last of occurrences, if there are more. If you need earliest one (using lookbehind), let me know.
I am trying to write a regular expression that matches 3 or more different vowels in a row.
I understand to write a regular expression search 3 identical vowels.
/([aeuioy])\\1{2,}/
But, about 3 different...
any thoughts..
Please help me to solve this problem!
Actually no thoughts in my head.
Look for 3 consecutive vowels. Capture the first in a group. After the first, check if it's not #1 again with a Negative Lookahead. Passing that test, capture the next character. Then use two Negative Lookahead's, one to check if it's not #1 and the other if it's not #2.
The latter step can be OR'ed into a single lookahead.
(?=[aeouiy]{3})(.)(?!\1)(.)(?!\1|\2).
You don't need any test for the last character. The first Lookahead ensures it's one of aeouiy; the third, negative, lookahead ensures it's not character #1 or character #2.
Not that it's necessarily the most practical option, but this is the only one so far that is an "actual" regular expression:
(iea|oea|uea|yea|eia|oia|uia|yia|eoa|ioa|uoa|yoa|eua|iua|oua|yua|eya|iya|oya|uya|iae|oae|uae|yae|aie|oie|uie|yie|aoe|ioe|uoe|yoe|aue|iue|oue|yue|aye|iye|oye|uye|eai|oai|uai|yai|aei|oei|uei|yei|aoi|eoi|uoi|yoi|aui|eui|oui|yui|ayi|eyi|oyi|uyi|eao|iao|uao|yao|aeo|ieo|ueo|yeo|aio|eio|uio|yio|auo|euo|iuo|yuo|ayo|eyo|iyo|uyo|eau|iau|oau|yau|aeu|ieu|oeu|yeu|aiu|eiu|oiu|yiu|aou|eou|iou|you|ayu|eyu|iyu|oyu|eay|iay|oay|uay|aey|iey|oey|uey|aiy|eiy|oiy|uiy|aoy|eoy|ioy|uoy|auy|euy|iuy|ouy)
You can use this lookahead based regex:
^(?:[^aeiou]*([aeiou])(?!.*?\1)){3}
RegEx Demo
Update: In case OP is looking for at least three consecutive different vowels in each line then use this simpler version of above regex:
(?:([aeiou])(?!.{1,2}\1)){3}
New RegEx Demo
Any thing that you want is all permutation between vowels ,so you can use this :
(?=[aeouiy]{3}$)(?!.*(.).*\1).*$
(?!.*(.).*\1) is for ensures that you can't match a repeated character !