Regular expression to select text across multiple lines

Regular expression to select text across multiple lines - regex

I am trying to write a regular expression to select text between two curly braces in a Java file. Text between braces may be spread across multiple lines.
Eg.,
{
// line1 ;
// line 2 ;
// line 3 ;
}
I need to select all the lines in between braces.
I tried \{[.]*\} but it doesn't select multiple lines.
Please suggest me in this regard.
Thanks.

Generally, this is not a good idea as you need to take nested brackets, etc. into account. You might be better off using a parser instead.
This being said, you might get along with the following construct:
^\{
(.+?)
^\}
This assumes, that your opening and closing brackets stand alone in one line. If you want to allow whitespaces as well, you'll need to alter it to:
^\s*\{
(.+?)
^\s*\}
See a demo on regex101.com (and mind the different modifiers, ie DOTALL and MULTILINE !).

You need to add line breaks in the regular expression... Try this:
\{(.*\r\n)*\}

Related

Regex: Trying to extract all values (separated by new lines) within an XML tag

I have a project that demands extracting data from XML files (values inside the <Number>... </Number> tag), however, in my regular expression, I haven't been able to extract lines that had multiple data separated by a newline, see the below example:
As you can see above, I couldn't replicate the multiple lines detection by my regular expression.

If you are using a script somewhere, your first plan should be to use a XML parser. Almost every language has one and it should be far more accurate compared to using regex. However, if you just want to use regex to search for strings inside npp, then you can use \s+ to capture multiple new lines:
<Number>(\d+\s)+<\/Number>
https://regex101.com/r/MwvBxz/1
I'm not sure I fully understand what you are trying to do so if this doesn't do it then let me know what you are going for.

You can use this find+replace combo to remove everything which is not a digit in between the <Number> tag:
Find:
.*?<Number>(.*?)<\/Number>.*
Replace:
$1

finally i was able to find the right regular expression, I'll leave it below if anyone needs it:
<Type>\d</Type>\n<Number>(\d+\n)+(\d+</Number>)
Explanation:
\d: Shortcut for digits, same as [1-9]
\n: Newline.
+: Find the previous element 1 to many times.
Have a good day everybody,

After giving it some more thought I decided to write a second answer.
You can make use of look arounds:
(?<=<Number>)[\d\s]+(?=<\/Number>)
https://regex101.com/r/FiaTKD/1

gvim regexp for nested parentheses

I have file containing abc(de+fgh(2a+2b))+xyz(). i want to write regexp(preferably vim) to get a pattern like de+fgh(2a+2b) + xyz() .
I tried in gvim regexp But while matching parenthesis, if i use greedy option it will match abc(de+fgh(2a+2b))+xyz() and for non-greedy option it will matching with abc(de+fgh(2a+2b')')+xyz() , how to match with abc(de+fgh(2a+2b)')'+xyz().
Regards
keerthan

I won't do it with regex, assume that your cursor is at BOL, just do:
%di(v0p
you will get desired output.
Translating it into english, take the stuff between first (..) group and concatenate it with whatever after the first (...) group.
You can use :normal cmd or macro to apply the operations on multiple lines.

Watch out! regex cannot handle counting arbitrary numbers of brackets. If you want to do this generally, you might need to write a parser.
That said, if you only need this to work for a specific case:
http://regexr.com/3fne0
in vim this is:
%s/[^()]*(\([^()]*([^()]*)\))\(.*\)/\1\2/g

Regular Expression for removing indentation

I have a requirement to remove indentation from a numbered paragraph. I currently do this with a couple of regular expressions and some code, but would like to accomplish it with one or more regular expressions. The paragraph looks like this:
1. THE FIRST LINE OF THE PARAGRAPH
ANOTHER LINE IN THE PARAGRAPH
AN INDENTED LINE WITHIN THE PARAGRAPH
This needs to be transformed to retain the indentation within the paragraph, but remove the indentation of the entire paragraph as measured by the indentation of the first line.
THE FIRST LINE OF THE PARAGRAPH
ANOTHER LINE IN THE PARAGRAPH
AN INDENTED LINE WITHIN THE PARAGRAPH
The following regex accomplishes the task by replacing matches with empty strings. (note that there are no tabs expected in this content, just spaces):
(\A *\d+\. *|^ {0,5})
But it requires that the indention length of 5 characters be set explicitly. I would like a generic way of doing this that would work with any indentation length. Any ideas for how one or more regular expressions (applied cumulatively) could accomplish this?
I am using the .NET regular expression engine with multiline mode turned on.

As other have indicated, regex (alone) probably aren't the correct tool for the job.
The major problem is that in order to strip the correct amount of spaces from all the further lines, you somehow need to store how wide was the first indentation. This is something that I'm not sure is doable with a regex engine alone.
If your desire for a regex based approach is just to have a quick one-liner than I think you can hack something like the following (I'm not familiar with .NET so I'll just provide you with a python solution):
re.sub(r"^([\d\. ]+)(.*)$",
lambda m: re.sub("^" + " "*len(m.group(1)),
"",
m.group(2),
flags=re.MULTILINE),
paragraph,
flags=re.MULTILINE|re.DOTALL)
The idea is to have the outer regex isolate the indentation of the first line, while the inner regex takes care of removing the correct amount from subsequent lines.
In order for this to work the indentation must be made exclusively of spaces (i.e. no tabs) otherwise you'll have to do some assumptions on how many spaces a tab is made of.
That said you would probably better off implementing a custom parser to do the job. It would surely be cleaner and probably more efficient too.

I am not sure how you thought it would work, but your regex matches everything under the sun due to the right side of the |.
Try this:
^((?:\d+\.)? +)
Use something like http://www.regexr.com/ to test it out.

Vim regular expression to remove block of code for all the lines

In my code I want to remove a block of code that starts with a bracket and ends with a bracket. For example if I have this line
ENDPROGRAM { fprintf(sdfsdfsdfsd....) }
and after running the regex i want it to just end up with
ENDPROGRAM
I want to only delete code inside the bracket and the brackets themselves. I tried this command
:%s/\{[a-zA-Z0-0]*\}//g
but it says that pattern not found. Any suggestion?
ENDPROGRAM is just an example, I have like DIV MULT etc etc

Since you're using Vim, an alternative is to record a keyboard macro for this into a register, say register z.
Start recording with qz.
Search forward for ENDPROGRAM: /ENDPROGRAM[enter]
Scan forward for opening brace: f{
Delete to matching brace: d%
Finish recording q.
Now run the macro with #z, and then repeat with ##. Hold down your # key to repeat rapidly.
For one-off jobs not involving tens of thousands of changes in numerous files, this kind of interactive approach works well. You visually confirm that the right thing is done in every place. The thing is that even if you fully automate it with regexes, you will still have to look at every change to confirm that the right thing was done before committing the code.
The first mistake in your regex is that the material between braces must only be letters and digits. (I'm assuming the 0-0 is a typo for 0-9). Note that you have other things between the braces such as spaces and parentheses. You want {.*}: an open brace, followed by zero or more characters, followed by a closing brace. If it so happens that you have variants, like ENDPROGRAM { abc } { def }, this regex will eat them too. The regex matches from the first open brace to the last closing one. Note also that the regex {[^}]*} will not work if the block contains nested interior braces; it stops at the first closing brace, not the last one, and so ENDPROGRAM { { x } } will turn to ENDPROGRAM }.
The second mistake is that you are running this on all lines using the % address. You only want to run this on lines that contain ENDPROGRAM, in other words:
:g/ENDPROGRAM/s/ {.*}//
"For all lines that contain a match for ENDPROGRAM, find a space followed by some bracketed text, and replace it with nothing." Or else:
:%s/ENDPROGRAM {.*}/ENDPROGRAM/

THIS looks like a job for: (dum da dum duuuuum!)
TEXT OBJECTS!
Place the cursor anywhere within the braces. Type daB.
WOOOOOOOAAAH what just happened?!
aB is something called a "text object" in Vim. You could also have typed da{ or da} in this situation. A text object is a thing that Vim's operators can act on. d is one such operator. I'm sure you know others: c, y, etc.
Visual mode also works on text objects. Instead of daB, try vaB. This will select all the text in the braces, plus the braces themselves. Most text objects also have an "inner" variant, for example ciB would delete everything inside the braces, and enter insert mode, leaving the braces intact.
There are text objects to work with HTML/XML tags, objects for working with quoted strings, objects for sentences and paragraphs, objects for words and WORDS, and more. See the full list at :help text-objects.

When something is broken, start simple and work up to what you need. Do not worry about the :s command at first; instead, focus on getting the pattern (or regular expression) right.
Search for \{ and you will get an error message. Oops, that should be just {.
Add the character class: {[a-zA-Z0-0]*. Darn, that is not right, because you left out the space.
Next try: {[a-zA-Z0-0 ]*. Now we are getting somewhere, but we also want to match the parentheses and the dots: {[a-zA-Z0-0 ().]*.
Add the closing brace, realize that you really meant 0-9 instead of 0-0, and you are done: {[a-zA-Z0-9 ().]*}.
At this point, you can take advantage of the fact that :s uses the current search pattern by default, so all you need is :%s///.

Regex to insert text BEFORE a line containing a match?

I have a bunch of artists that are named in this fashion:
Killers, The
Treatment, The
Virginmarys, The
I need them to look like
The Killers
The Treatment
The Virginmarys
I'm able to match the lines with , The ((^|\n)(.*, The) is what I've used) but the more advanced syntax is eluding me. I can use regex on the replacement syntax as well (it's for a TextPipe filter so it might as well be for Notepad++ or any other Regex text editor).

You should be able to use the following:
Find: (\S+),\s\S*
Replace: The $1
Or include the The..
Find: (\S+),\s+(\S+)
Replace: $2 $1
Depending on your editor, you may be better off using \1, \2, and so on for capture groups.

Since you need to specifically capture the title before the comma, do so:
(^|\n)(.*), The
And replace it putting the "the" in the right place:
\1The \2

Regular expressions define matches but not substitutions.
How and in which way you can perform substitutions is highly dependant on the application.
Most editors that provide regular expression support work on a line per line basis.
Some of them will allow substitutions such as
s/^(.*Banana)/INSERTED LINE\n\1/
which would then insert the specific pattern before each match. Note that others may not allow newlines in the substitution pattern at all. In VIM, you can input newlines into the command prompt using Ctrl+K Return Return. YMMV.
In Java, you would just first print the insertion text, then print the matching line.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression to select text across multiple lines - regex

You need to add line breaks in the regular expression... Try this: \{(.\r\n)\}

Related

Regex: Trying to extract all values (separated by new lines) within an XML tag

gvim regexp for nested parentheses

Regular Expression for removing indentation

Vim regular expression to remove block of code for all the lines

Regex to insert text BEFORE a line containing a match?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression to select text across multiple lines - regex

You need to add line breaks in the regular expression... Try this: \{(.*\r\n)*\}

Related

Regex: Trying to extract all values (separated by new lines) within an XML tag

gvim regexp for nested parentheses

Regular Expression for removing indentation

Vim regular expression to remove block of code for all the lines

Regex to insert text BEFORE a line containing a match?

Categories

Resources

You need to add line breaks in the regular expression... Try this: \{(.\r\n)\}