Groovy regexp linebreak within parentheses

Groovy regexp linebreak within parentheses - regex

I need to delete a paragraph enclosed within parentheses like below, without touching the rest of the text as below
(Text to delete Lorem ipsum dolor sit amet, consectetur linebreak->
in voluptate velit esse cillum. Excepteur sint proident, mollit anim id est laborum.)
Text that shouldnt be touched Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation llamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehend in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat upidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
For now I have /\(.*\)[\n]*/ to match a pragraph, but with the linebreaks, it obviously doesn't work. I was thinking about something in the lines of /\(.*[\n]*\)[\n]*/ but that didn't work. Looking here results with (?<=\()(.*?)(?=\)) but its python, so won't work, and other links are about parentheses within parentheses, so that's different from my problem.
The \n is to simplify the (\r|\n|\r\n) linebreak thing.
So is there a way to do it, or is the regexp in groovy not capable of this?

You could use something like /(?s)\(.+?\)/ (example available here), which according to here makes the period character also match new line feeds.
The expression will look for round brackets and stop at the first occurrence of a close bracket.

Related

2 regex work correctly independently but they don't when combined

I would like to split up a string based on punctuation, while keeping the punctuation, and also if a string is wrapped in curly braces delete the curly braces but not the word.
My current regex works ALMOST perfectly. It does not capture a punctuation if it is the last character in a string. Thank you for your help
// const re = /([.!\"'/$:\d]+)/g;
// const re = /{(.*?)}/g
const re = /([.!\"'/$:\d]+)| {(.*?)}/g
const delimiter= new RegExp(re);
const sentences = sentence.split(delimiter);
sentences = sentences.filter(Boolean);
console.log(sentences)
Input:
const sentence = `Lorem ipsum {dolor sit} amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et {dolore magna aliqua}. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat! Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum?`
Actual Output:
[
'Lorem ipsum',
'dolor sit',
' amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et',
'dolore magna aliqua',
'.',
'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat',
'!',
'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur',
'.',
'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum?'
]
Desired Output:
[
'Lorem ipsum',
'dolor sit',
' amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et',
'dolore magna aliqua',
'.',
'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat',
'!',
'Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur',
'.',
'Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum',
'?'
]

You may use
/\s*{([^{}]*)}|([`!##$%^&*()_+=[\]{};':"\\|.<>\/?~-])/
Or - a bit more compact:
/\s*{([^{}]*)}|(?!,)([!-\/:-#[-`{-~])/
Both [`!##$%^&*()_+=[\]{};':"\\|.<>\/?~-] and (?!,)([!-\/:-#[-`{-~]) patterns match all ASCII punctuations and symbols other than ,. The main difference is the \s* part: your space before {{ was an obligatory pattern, with *, \s* matches zero or more occurrences of whitespace chars.
JS demo:
var sentence = "Lorem ipsum {dolor sit} amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et {dolore magna aliqua}. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat! Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum?";
var delimiter = /\s*{([^{}]*)}|([`!##$%^&*()_+=[\]{};':"\\|.<>\/?~-])/;
var sentences = sentence.split(delimiter).filter(Boolean);
console.log(sentences);

How to determine the number of repeats in a regular expression?

Let's say I want to replace any number of repeats of string 1 with an equal number of repeats of string 2, using regular expressions. For example, string 1 = "apple", string 2 = "orange".
I imagine something like this:
s/apple{2,}/orange{N}/
but I don't know how to specify the N to match the number of repeats of apple. Is that even possible?
Note: as pointed out by xhienne, I am looking for repeats, therefore at least two occurrences of the string 1.
Sample input:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. apple Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. appleappleapple Excepteur sint occaecat cupidatat non proident, appleappleappleapple sunt in culpa qui officia deserunt mollit anim id est laborum.
Sample output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. apple Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. orangeorangeorange Excepteur sint occaecat cupidatat non proident, orangeorangeorangeorange sunt in culpa qui officia deserunt mollit anim id est laborum.

A possible solution is using a regex that supports \G operator:
(?:\G(?!\A)|(?=(?:apple){2}))apple
See the regex demo
Details
(?:\G(?!\A)|(?=(?:apple){2})) - a non-capturing group that matches either of the two alternatives:
\G(?!\A) - the end of the previous successful match (with the start of string position subtracted from the \G)
| - or
(?=(?:apple){2}) - a location in string that is followed with two occurrences of apple substring
apple - an apple substring.
Note that the regex does not need to count much, it just finds a place where a string repeats 2 times, then, it replaces all consecutive, adjoining matches.

Since this problem initially arose while you were using vim (which doesn't support the \G operator used by Wiktor Stribiżew in his answer), here is an answer specifically for vim:
:s/\(apple\)\{2,\}/\= substitute(submatch(0), "apple", "orange", "g")/g
(of course, this cannot be considered as a true regex since it makes use of a vim function to do a sub-substitution in the matched text)

RegEx to extract text and number in outlook vba

I Have outgoing emails which go like:
Dear XYZ,
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
CASE ID: 123654
Best Regards,
XYZ
The text could be one or two paragraphs. I want to make two regex. One should give me the text in paragraphs and the other should give me the number that is the CASE ID. The result should look like this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
123654
I have managed to create a RegEx to get the case using (CASE ID\s*[:]+\s*(\w*)\s*)but I haven't been able to extract the paragraph. Any help will be much appreciated.

Basically you can or should do one regex instead, that will deliver matchgroups.
In almost any other language it would look like this (using "gs" flag to ignore newline):
(.+?)CASE ID: (\d+)
But for vbscript it we have something like this:
(.*?[^\$]*)CASE ID: (\d+)
Also you need to deal with matchgroups like this:
Dim RegEx : Set RegEx = New RegExp
RegEx.Pattern = "(.*?[^\$]*)CASE ID: (\d+)"
RegEx.Global = True
RegEx.MultiLine = True
Dim strTemp : strTemp = "Lorem ipsum " & VbCrLf & "Cannot be translated to english " & VbCrLf & "CASE ID: 153"
WScript.Echo RegEx.Execute(strTemp)(0).SubMatches(0)
WScript.Echo RegEx.Execute(strTemp)(0).SubMatches(1)
The thing is that this will only work if the constant string "CASE ID: " is contained in the message. In case the string is missing e.g. the newline after the ":" it would not work

Split [n] phrases into paragraphs using regex with Notepad++

I'm trying to split a text of n phrases into paragraphs using regular expressions (i.e. : after a certain number of phrases, begin a new paragraph) with Notepad++.
I have come up with the following regex (in this case, every 3 phrases -> new paragraph) :
(([\S\s]*?)(\.)){3}
So far so good. However, how do I match the phrases now? $1, $2 will only match the braces..
Example text:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
Desired result (using a count of 2):
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

How about:
Find what: ((?:[^.]+\.){2})
Replace with: $1\n

Find using this pattern:
((.*?\.){2})
Breaking it down a bit...
The inner parentheses ...
( )
... provide the group which is affected by {2}.
The outer parentheses ...
( )
...provide the delimiters for the replace pattern. Since they are "top-level", they are what the replace pattern \1 will attach to.
Note the outer parentheses have to enclose the {2}. I'm not good at thinking through how regex will handle everything, but fortunately Notepad++ offers instant confirmation -- just press "Find" to watch it jump through the matches.
The replace pattern is followed by your return and new line, so the whole string looks like this:
\1\r\n
If you want an optional space, make sure you add \s? ... probably like this, but I didn't test it.:
((.*?\.\s?){2})
If the issue is inserting a space with the results, just add a space (or two, if you're old-school like me) to the replace pattern:
\1 \r\n

To find n sentence that end with period is quite easy. For instance for two sentence
(?:.*?\.){2}
To make it a paragraph (insert new line) you replace with
$0\r\n\r\n
This insert two carriage return + line feed which is the Windows way of marking new line. On Unix files \n\n would be enough. If you only want one line break, just do $0\r\n\r\n
If you want to make it htlm paragraph same search, you can replace with
<p>$0</p>

Regular expression to find R code in Sweave expression

I have some sweave expressions contained among text in some .Rnw files. The paragraph below contains two sweave expressions. What regular expression can I use to find the R code in each expression. So the regular expression should be able to find mean(mtcars$mpg) and/or summary(lm(mpg ~ hp + drat, mtcars))
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. \Sexpr{mean(mtcars$mpg)}. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat \Sexpr{summary(lm(mpg ~ hp + drat, mtcars))} non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

The regex would be (?<=\\Sexpr{).+?(?=})
(?<=\\Sexpr{) part is positive lookbehind
(?=}) is positive lookahead
.+? will match everything between above two lookarounds lazily.
Readup more here. http://www.regular-expressions.info/lookaround.html
E.g. in R (since you tagged R)
txt <- 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. \\Sexpr{mean(mtcars$mpg)}. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat \\Sexpr{summary(lm(mpg ~ hp + drat, mtcars))} non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'
regmatches(txt, gregexpr('(?<=\\Sexpr{).+?(?=})', txt, perl=T))
## [[1]]
## [1] "mean(mtcars$mpg)" "summary(lm(mpg ~ hp + drat, mtcars))"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Groovy regexp linebreak within parentheses - regex

You could use something like /(?s)\(.+?\)/ (example available here), which according to here makes the period character also match new line feeds. The expression will look for round brackets and stop at the first occurrence of a close bracket.

Related

2 regex work correctly independently but they don't when combined

How to determine the number of repeats in a regular expression?

RegEx to extract text and number in outlook vba

Split [n] phrases into paragraphs using regex with Notepad++

Regular expression to find R code in Sweave expression

Categories

Resources