Getting strings inside matching brackets has been asked a lot here, but I haven't any luck applying them to the problem at hand: I'm trying to replace red text label in a LaTeX file \red{any text} with just any text. However, the problem is that any text may span multiple lines, and also contain closing brackets, e.g. \red{some \ref{reference} text...}, and the result should be some \ref{reference} text...
The perl one-liner
perl -0777 -i.bak -pe 's/\\red{([^}]*)}/\1/igs' /path/to/file.tex
or with python
from pyparsing import *
sample = "\\red{some \\ref{stuff} text}"
scanner = originalTextFor(nestedExpr('\\red{','}'))
for match in scanner.searchString(sample):
print(match[0])
gives the wrong result \red{some \ref{stuff}. I know this can theoretically be done by counting brackets, but I'm trying to find a more elegant/clean approach.
With perl, you may match nested structures and balanced amount of parentheses. Use the following regex:
's/\\red({((?>[^{}]+|(?1))*)})/\2/ig'
It will match:
\\red - a \red substring
({((?>[^{}]+|(?1))*)}) - Group 1 (technical, we will need to recurse it) capturing:
{ - an open {
((?>[^{}]+|(?1))*) - Group 2 capturing 1+ chars other than { and } (with [^{}]+) or the whole Group 1 pattern (with the (?1) subroutine call)
} - a close }
The match is replaced with the \2 backreference, Group 2 contents.
You do not need s modifier, since there is no dot in the pattern.
See an online text and a regex demo.
Related
I'm trying to find and replace some function calls in py program. The idea is to add some boolean parameter to each call found on the project.
I looked for solutions on the internet 'cause I don't know regex science at all... It seems like a basic exercice for regex guys but still.
In my case I have this call in a lot of files :
myFunction("test")
My gooal is to find and replace this call into :
myFunction("test", false)
Could you help me write the regex ?
Try this command:
sed -re 's/(myFunction)[[:space:]]*\([[:space:]]*("test")[[:space:]]*\)/\1(\2, false)/' SOURCE_FILENAME
If you prefer to replace the existing source file with an updated one, then write -i SOURCE_FILENAME instead of SOURCE_FILENAME.
This works by defining a pattern to match the function call you would like to update:
myFunction (obviously) matches the text myFunction;
[[:space:]] matches any whitespace character, mainly spaces and tabs.
[[:space:]]* matches zero or more whitespace characters.
\( and \) match literal parenthesis in your program text;
( and ) are regex metacharacters that match nothing, but ("test") matches "test" and captures the matched text for later use.
Note that this pattern captures two things using ( and ). The ("test") is the second of these.
Now let us examine the overall structure of the Sed command 's/.../.../'. The s means "substitute," so 's/.../.../' is Sed's substitution command.
Between the first and second slashes comes the pattern we have just discussed. Between the second and third slashes comes the replacement text Sed uses to replace the matched part of any line of your program text that matches the pattern. Within the replacement text, the \1 and \2 are backreferences that place the text earlier captured using ( and ).
So, there it is. Not only have I helped you to write the regex but have shown you how the regex works so that, next time, you can write your own.
Refer this:
import re
#Replace all white-space characters with the digit "9":
str = "The rain in Spain"
x = re.sub("\s", "9", str)
print(x)
you could use this regex to match and capture
(myFunction\("test")(\))
then use the regex below to replace
$1, false$2
I want to use Notepad++ regex to find all strings that do not match a pattern.
Sample Input Text:
{~Newline~}{~Indent,4~}{~Colour,Blue~}To be or not to be,{~Newline~}{~Indent,6~}
{~Colour,Green~}that {~StartItalic~}is{~EndItalic~} the question.{~EndDocument~}
The parts between {~ and ~} are markdown codes. Everything else is plaintext. I want to find all strings which do not have the structure of the markdown, and insert the code {~Plain~} in front of them. The result would look like this:
{~Newline~}{~Indent,4~}{~Colour,Blue~}{~Plain~}To be or not to be,{~Newline~}{~Indent,6~}{~Colour,Green~}{~Plain~}that {~StartItalic~}{~Plain~}is{~EndItalic~}{~Plain~} the question.{~EndDocument~}
The markdown syntax is open-ended, so I can't just use a list of possible codes to not process.
I could insert {~Plain~} after every ~}, then delete every {~Plain~} that's followed by {~, but that seems incredibly clunky.
I hope this works with the current version of Notepad++ (don't have it right now).
Matching with:
~}((?:[^{]|(?:{[^~]))+){~
and then replacing by
~}{~Plain~}$1{~
might work. The first group should capture everything between closing ~} and the next {~. It will also match { and } in the text, as long as they are not part of an opening tag {~.
EDIT Additional explanation, so you can modify it better:
~} end of previous tag
( start of the "interesting" group that contains text
(?: non-capturing group for +
[^{] everything except opening braces
| OR
(?:
{ opening brace followed by ...
[^~] ... some character which is not `~`
)
)+ end of non-capturing group for +, repeated 1 or more times
) end of the "interesting" group
{~ start of the next tag
Here is an interactive example: regex101 example
You need to use Negative Lookahead. This regex will match all ~} occurrences, so you can just replace them with ~}{~Plain~}:
~}(?!{~|$)
If you don't want to match the space in {~Indent,6~} {~Colour,Green~}, just use this:
~}(?!{~|$| )
Using REGEX (in PowerShell) I would like to find a pattern in a text file that is over two lines and replace it with new text and preserve the whitespace. Example text:
ObjectType=Page
ObjectID=70000
My match string is
RunObjectType=Page;\s+RunObjectID=70000
The result I want is
ObjectType=Page
ObjectID=88888
The problem is my replacement string
RunObjectType=Page;`n+RunObjectID=88888
returns
ObjectType=Page
ObjectID=88888
And I need it to keep the original spacing. To complicate matters the amount of spacing may change.
Suggestions?
Leverage a capturing group and a backreference to that group in the replacement pattern:
$s -replace 'RunObjectType=Page;(\s+)RunObjectID=70000', 'RunObjectType=Page;$1RunObjectID=88888'
See the regex demo
With the (\s+), you capture all the whitespaces into the Group 1 buffer and then, using $1 backreference, the value is inserted into the result.
I want to change some strings:
space+cows --> space + cows
stupid+rabbit --> stupid + rabbit
(put spaces around the `+`)
In Sublime Text 2, I tried to use these:
Find: \w+\+\w+
Replace: \w+ \+ \w+
The finding regex matched everything well, but obviously, my strings were replaced with literally
w+ + w+.
One more example:
Strings:
bool *foo --> bool* foo
int *bar --> int* bar
Pattern:
Find: (bool|int) *(foo|bar)
Replace: (bool|int)* (foo|bar)
Result:
(bool|int)* (foo|bar)
(bool|int)* (foo|bar)
Needless to say I wanted to keep the actual bool, int, foo and bar as they were before.
I also cannot use only \* to match the strings because it would select other stuff that I don't want to replace; I need some context around the actual \* to select the correct strings. In the same way, I cannot use patterns like \*[^ ] because the not-space character after the asterisk would be obliterated after replacement.
I fixed my problem by using Sublime Text's multiline edition but I am still wondering: is it possible to use a regex in such a way that you can replace strings containing "group of characters" without wiping the actual contents of the "group of characters"?
Yes, this is possible. The reason your replacements don't work is (as you've noticed) your replacement text is just literal text; whatever you put in the box is what replaces what was matched as you would expext.
What you need to do is use a RegEx capture for this. What this does is make the regular expression processor (in this case Sublime Text) not only match the test but also store it for use in the replacement. You do that by wrapping the parts of the match you want to save in parenthesis. Each set of parenthesis is a Capture Group.
For your example, your regex becomes"
(\w+)\+(\w+)
The value of the match inside each set of parenthesis is saved into it's own numeric group, starting at one. A syntax like the following expands out to the contents of the first match, followed by the plus sign with spaces around it, followed by the second word:
\1 + \2
You can use each number multiple times, if you want:
\1 and again \1 and also \2
Regex to turn "stupid+rabbit" to "stupid + rabbit"
Find: (\w+)\+(\w+)
Replace: $1 + $2
Regex to turn "bool *foo" or "int *bar" into "bool* foo" or "int* bar"
Find: (bool|int) \*(foo|bar)
Replace: $1* $2
() - forms groups which can be later used. $1 is the first group and $2 is the second group.
I have a large file with content inside every bracket. This is not at the beginning of the line.
1. Atmos-phere (7800)
2. Atmospheric composition (90100)
3.Air quality (10110)
4. Atmospheric chemistry and composition (889s120)
5.Atmospheric particulates (10678130)
I need to do the following
Replace the entire content, get rid of line numbers
1.Atmosphere (10000) to plain Atmosphere
Delete the line numbers as well
1.Atmosphere (10000) to plain Atmosphere
make it a hyperlink
1.Atmosphere (10000) to plain linky study
[I added/Edit] Extract the words into a new file, where we get a simple list of key words. Can you also please explain the numbers in replace the \1\2, and escape on some characters
Each set of key words is a new line
Atmospheric
Atmospheric composition
Air quality
Each set is a on one line separated by one space and commas
Atmospheric, Atmospheric composition, Air quality
I tried find with regex like so, \(*\) it finds the brackets, but dont know how to replace this, and where to put the replace, and what variable holds the replacement value.
Here is mine exression for notepad ([0-9(). ]*)(.*)(\s\()(.*)
You need split your search in groups
([0-9. ]*) numbers, spaces and dots combination in 0 or more times
(.*) everything till next expression
(\s\() space and opening parenthesis
(.*) everything else
In replace box - for practicing if you place
\1\2\3\4 this do nothing :) just print all groups from above from 1.1 to 1.4
\2 this way you get only 1.2 group
new_thing\2new_thing adds your text before and after group
<a href=blah.com/\2.html>linky study</a> so now your text is added - spaces between words can be problematic when creating link - so another expression need to be made to replace all spaces in link to i.e. _
If you need add backslash as text (or other special sign used by regex) it must be escaped so you put \\ for backslash or \$ for dolar sign
Want more tune - <a href=blah.com/\2.html>\2</a> add again 1.2 group - or use whichever you want
On the screenshot you can see how I use it (I had found and replaced one line)
Ok and then we have case 4.2 with colon at the end so simply add colon after extracted section:
change replace from \2 to \2,
Now you need join it so simplest way is to Edit->Line Operations->Join Lines
but if you want to be real pro switch to Extended mode (just above Regular expression mode in Replace window) and Find \r\n and replace with space.
Removing line endings can differ in some cases but this is another story - for now I assume that you using windows since Notepad++ is windows tool and line endings are in windows style :)
The following regex should do the job: \d+\.\s*(.*?)\s*\(.*?\).
And the replacement: <a href=example.com\\\1.htm>\1</a>.
Explanation:
\d+ : Match a digit 0 or more times.
\. : Match a dot.
\s* : Match spaces 0 or more times.
(.*?) : Group and match everything until ( found.
\s* : Match spaces 0 or more times.
\(.*?\) : Match parenthesis and what's between it.
The replacement part is simple since \1 is referring to the matching group.
Online demo.
Try replacing ^\d+\.(.*) \(\w+\)$ with <a href=blah.com\\\1.htm>linky study</a>.
The ^\d+. removes the leading number and dot. The (.*) collects the words. Then there is a single space. The \(\w+\)$ matches the final number in brackets.
Update for the added Q4.
Regular expressions capture things written between round brackets ( and ). Brackets that are to be found in the text being searched must be escaped as \( and \). In the replacement expression the \1 and \2 etc are replaced by the corresponding capture expression. So a search expression such as Z(\d+)X([aeiou]+)Y might match Z29XeieiY then the replacement expression P\2Q\1R would insert PeieiQ29R. In the search at the top of this answer there is one capture, the (.) captures or collects the words and then the \1 inserts the captured words into the replacement text.