regex - multiple occurrences within match

regex - multiple occurrences within match - regex

Is it possible to find multiple matching groups within the full match using ONLY regex?
Given the text below
{1234} Lorem ipsum dolor sit amet, consectetur adipiscing elit. ** Sed
iaculis nisi et dapibus consectetur. Vestibulum ** feugiat sapien, sed
sagittis magna. Phasellus euismod tempor augue, ** eget dictum mi
sagittis sit amet. Quisque sit amet diam vel magna imperdiet pulvinar
vel ac lectus. {4321} Lorem ipsum....
Im trying to group all the occurences of ** within the numbers.
I came up with the following:
\{\d+\}.+?(\*\*)+.+\{\d.+\}
https://regex101.com/r/s746be/2
Which as you can see it only groups the first group because of the lazy question mark or the last if I remove the question mark.

Why not break it down to a few simple steps instead of using a really big inefficient Regex?
You can try doing something like this:
1) Grab the text between {1234} and {4321} using a simple regex like this:
/\{\d+\}(.*?)\{\d+\}/
2) Extract the matched text between these two delimiters
3) Run a second global regex search on this matched inner text using a simple regex pattern like so:
/\*\*/g
Hope this helps

Related

Regex that matches multiple new lines until finding patern

I am not very familiar to regex and I am having trouble to create a regex that solves my problem.
I want to create a regex that finds the following example: (What the regex should match is in bold)
Action type: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam sodales tincidunt ipsum ut ullamcorper
Phasellus rhoncus quam id eros volutpat, ac sodales magna tincidunt Phasellus rhoncus quam id eros volutpat, ac sodales magna
Phasellus rhoncus quam id eros volutpat, ac sodales magna tincidunt
Number Name Degree
11111111 LOREM IPSUM COMPUTER ENGINEERING
31837183 DOLOR IPSUM COMPUTER ENGINEERING
Total: 2
Action type: Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Number Name Degree
128172818211 SIT AMET IPSUM COMPUTER ENGINEERING
12183781 CONSECTETUR ELIT COMPUTER ENGINEERING
128172818212 ETIAM SODALES COMPUTER ENGINEERING
128172818213 IPSUM UT COMPUTER ENGINEERING
128172818215 SODALES MAGNA COMPUTER ENGINEERING
Total: 5
What I have accomplished so far, is generating a regex that matches the lines with success and the first line of the action type, but not the subsequent. I would like to match everything that comes after action type till the line that contains Number, Name and Degree.
The currently regex I am using is (Action type: .+?\n|[0-9]{8,12} .+?\n). A preview of the current executiong using regex101.com is attached.
As You can see, it works well for the second example, but it does not fulfil my needs with regard to the first one.
Is it possible to adapt my current regex to fit these multilines?

Try:
^Action type:.*?(?=^Number Name Degree)|^\d{8,12}[^\n]+
Regex demo.
^Action type:.*?(?=^Number Name Degree) - this matches all text beginning with Action type: until ^Number Name Degree is found.
^\d{8,12}[^\n]+ - this matches all lines beginning with 8-12 digits.
Note: the expression needs (?s) modifier

Ignore a substring in RegEx pattern

I want to ignore the certain substring in the result match, not exclude if the substring exists.
For example
I have the text:
Lorem ipsum dolor sit amet, consectetur adipiscing eliti qwer-
ty egeet qwewerty lectus. Proinera risus massa, placerat in q-
werty sed, tincidunt in nunci auspendisse vel dolor qwerty qw-
erty, molestie nisl sit amet, qwerty ligula curabitur ipsum,
euismod at augue at, dapibus feugiat qweerty
I need to find all qwerty, even if it contains -\n.
My decision is adding (?:-\n)? after every char:
/q(?:-\n)?w(?:-\n)?e(?:-\n)?r(?:-\n)?t(?:-\n)?y/gm
But it looks bulky (even for the example that contains only 6 chars) and it is too hard to modify the regex later, is there a magic to make the regex shorter?

No, regex is not good at this kind of match. The easiest way would be to remove - and \n first.

Remove one iteration from every instance of a pattern with a RegEx?

Let's say I have the following text:
Lorem ipsum dolor sit amet, consectetur aaBaaBaaB adipiscing elit.
aaBaaB
aaB Ut in risus quis elit posuere faucibus sed vitae metus. aaBaaBaaBaaB
Fusce nec tortor in dolor aaBaaBaaB porttitor viverra. aaB
I'm trying to figure out how to perform a regular expression search and replace on this in such a way that the output is:
Lorem ipsum dolor sit amet, consectetur aaBaaB adipiscing elit.
aaB
Ut in risus quis elit posuere faucibus sed vitae metus. aaBaaBaaB
Fusce nec tortor in dolor aaBaaB porttitor viverra.
That is, to remove one "aaB" from each pattern of it. Is this actually possible, and if so, how would it be done? Specifically, I intend to do this in Sublime Text 2 as a RegEx search/replace in a file.

You can use a positive lookahead:
(?=(?<w>[a-z]{2}[A-Z]{1})\s)\k<w>
You just need to make sure you have case-sensitive matching on.
example: http://regex101.com/r/sK8bG1

Use either the leading or trailing whitespace to remove the first or last substring. Either of these work:
(\s+)(aaB) with $1 in the Replace field
or
(aaB)(\s+) with $2 in the Replace field

regex: match until the first ] character is found

I'm trying to match untill the first occurence of ] is found but can't seem to make it work, if someone could help me figure this out.
The string I'm matching against:
[plugin:tabs][tab title="test"]Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam sit amet nisl nisl. Ut interdum libero vitae quam ultricies et lacinia elit aliquet. Praesent tincidunt, sem tempus feugiat feugiat, turpis tellus scelerisque erat, sit amet feugiat neque arcu ac lectus. Sed at mi et elit interdum scelerisque vitae eu felis.[/tab][/plugin]
What it should match:
[plugin:tabs]
What it keeps matching:
[plugin:tabs][tab title="test"]
The regex:
(\[plugin:(?<identifier>[^\s]+)(?<parameters>.*?)\])
EDIT:
What it should also match:
[plugin:tabs test="test"]

You just need to add ? like so (lazy match, will match as few characters as possible):
(\[plugin:(?<identifier>[^\s]+?)(?<parameters>.*?)\])
^
Although the (?<parameters>.*?) part is unnecessary then.
So your final Regex would look like this:
(\[plugin:(?<identifier>[^\s]+?)\])
€dit: See #stema's answer.

Try this here
(\[plugin:(?<identifier>[^\]\s]+)(?<parameters>.*?)\])
See it here on Regexr
This avoids additionally to the whitespace characters also the ] character in the first named group.
If you don't need the first capturing group you can make it a non-capturing group by adding ?: right after the opening bracket.
(?:\[plugin:(?<identifier>[^\]\s]+)(?<parameters>.*?)\])
To avoid that the space in between is captured by the second group, just match optional whitespace between the two groups
(?:\[plugin:(?<identifier>[^\]\s]+)\s*(?<parameters>.*?)\])
See it here on Regexr

With any language that supports lookbehinds that will be your easiest solution.
/(^(?<!])*)/

Regexp to find strings enclosed by ** (double-star)

I'm trying to find a way to make an array of matched patterns out of a string.
I'll explain myself with an example.
From a string like
Lorem ipsum dolor **sit** amet, consectetur adipiscing elit.
Nulla elementum euismod mi. Morbi eu eros eget augue vestibulum semper.
Curabitur sapien purus, **semper** in consequat eu, gravida vitae purus.
I need to apply a regexp to extract the words sit and semper
and I really don't know how to manage it.

I would think a regex such as \*{2}(.*?)\*{2} would take care of it, and using regular expressions in Objective-C (assuming you're on an Apple platform) you'd want to look at the NSRegularExpression iOS or Mac documentation.

You can do it like this..
\s*{2}([^\*]+)\s*{2}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js