Regex to insert text BEFORE a line containing a match? - regex

I have a bunch of artists that are named in this fashion:
Killers, The
Treatment, The
Virginmarys, The
I need them to look like
The Killers
The Treatment
The Virginmarys
I'm able to match the lines with , The ((^|\n)(.*, The) is what I've used) but the more advanced syntax is eluding me. I can use regex on the replacement syntax as well (it's for a TextPipe filter so it might as well be for Notepad++ or any other Regex text editor).

You should be able to use the following:
Find: (\S+),\s\S*
Replace: The $1
Or include the The..
Find: (\S+),\s+(\S+)
Replace: $2 $1
Depending on your editor, you may be better off using \1, \2, and so on for capture groups.

Since you need to specifically capture the title before the comma, do so:
(^|\n)(.*), The
And replace it putting the "the" in the right place:
\1The \2

Regular expressions define matches but not substitutions.
How and in which way you can perform substitutions is highly dependant on the application.
Most editors that provide regular expression support work on a line per line basis.
Some of them will allow substitutions such as
s/^(.*Banana)/INSERTED LINE\n\1/
which would then insert the specific pattern before each match. Note that others may not allow newlines in the substitution pattern at all. In VIM, you can input newlines into the command prompt using Ctrl+K Return Return. YMMV.
In Java, you would just first print the insertion text, then print the matching line.

Related

Replace wrong xml-comments with Regex

I am dealing with a bunch of xml files that contain one-line-comments like this: // Some comment.
I am pretty sure that xml comments look like this: <!-- Some comment -->
I would like to use a regular expression in the Atom editor to find and replace all wrong comment syntax.
according to this question, the comment can be found with (?<=\s)//([^\n\r]*) and replaced with something like <!--$1-->. There must be an error somewhere since clicking replace button leaves the comment as is, instaed of replacing it. Actually I can't even replace it with a simple character.
The find and replace works with a different regex in the "Find" field:
Find: name.*
Replace: baloon
Is there anything I can write in the "Find" and "Replace" field to achieve this transformation?
Atom editor search and replace currently does not support lookbehind constructs, like (?<=\s). To "imitate" it, you may use a capturing group with an alternation between start of string, ^, and a whitespace, \s.
So, you may use
Find What: (^|\s)//([^\n\r]+)
Replace With: $1<!--$2-->
See the regex demo. NOTE \s may match newlines, so you may probably want to use (^|[^\S\r\n])//([^\n\r]+) to avoid matching across line breaks.
If you do not need to check for a whitespace, just remove that first capturing group and use a mere:
Find What: //([^\n\r]+)
Replace With: <!--$1-->
See another regex demo.

Regex in search & replace: avoid fixed length of lookaround

In a long corpus of text, I want to make some corrections in certain
environments. However, I am encountering problems when using regex with text
editors. I switched to gedit to have an editor which supports regex in
search & replace.
Crucially, I only want to make changes if the line starts with a certain
pattern (\nm or \mb). The problem is that the element that I want to
replace (o' -> o'o) is not at a fixed length from the beginning of the line
and I can't include the regex in the lookbehind (the lookbehind fails).
Is there any way to include what I am looking for in a simple text editor
regex? Or is this already a step where I have to learn how to script in, for
example, Python?
This is what the regex looks like so far.
(?<=\\(nm|mb)).*o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Of course, I can't apply .* in the replace without losing its content.
Put a capture group around .* and a back-reference in the replacement.
Find: (?<=\\(nm|mb))(.*)o'(?=(q|w|r|t|z|p|s|d|f|g|h|j|k|l|x|c|v|b|n|m|a|i|u|e))
Replace: \1o'o

How to replace all lines based on previous lines in Notepad++?

I have an XML code:
<Line1>Matched_text Other_text</Line1>
<Line2>Text_to_replace</Line2>
How to tell Notepad++ to find Matched_text and replace Text_to_replace to Replaced_text? There are several similar blocks of code, with one exactly Matched _text and different Other_text and Text_to_replace. I want to replace all in once.
My idea is to put
Matched_text*<Line2>*</Line2>
in the Find field, and
Matched_text*<Line2>Replaced_text</Line2>
in the Replace field. I know that \1 in regex might be useful, but I don't know where to start.
The actual code is:
<Name>Matched_text, Other_text</Name>
<IsBillable>false</IsBillable>
<Color>-Text_to_replace</Color>
The regex you're looking for is something like the following.
Find: (Matched_text[\w,\s<>\/]*<Color>-).*(</Color>)
Replace: \1Replaced_text\2
Broken down:
`()` is how you tell regex that you want to keep things (for use in /1, /2, etc.), these are called capture groups in regex land.
`Matched_text[\w,\s<>\/]*` means you want your anchor `Matched_text` and everything after it up till the next part of the expression.
`<Color>-).*(</Color>)` Select everything between <Color>- and </Color> for replacement.
If you have any questions about the expression, I highly recommend looking at a regex cheatsheet.

Select word with regex when previous words are specific (and sometimes variable)

I am trying to highlight (or find) any word that is preceded by another word, being define, and another specific word to be highlighted (as), when define is present, etc. Basically, I need to find words that are found because of other regex searches, but only targetting each word independently.
For example, having the following string:
define MyFile as File
In that case, define is searched using the regex statement \b-?define\b. I also need to find MyFile if it is preceded directly by define. Plus, as needs to be found as well only if it is preceded directly by a word, in this case MyFile, which is preceded by define, and this goes on and on.
How can this be done? I have messed around quite a bit to find how to highlight MyFile correctly, without any success. As for the specific recursive search of as and File, I am clueless.
Keep in mind that all the regex expressions must be separate, since I will use this as a Sublime Text custom syntax highlight match finder.
define\s([\w]+)\sas\s([\w]+)$
This regex code would capture all words after define separated by a space and all words after as separated by space as well
check this regex : https://regex101.com/r/aQ0yO0/2
For not having context of what the data looks like...this is a naive way of doing it but it's pretty intuitive. However, it doesn't use regex. The other examples are good ways to use regex.
seq = "word1 defined as blah blahh blahhh word2 defined as hello helloo"
words_of_interest = []
list_of_words = seq.split(" ")
for i,word in enumerate(list_of_words):
if word == "defined":
words_of_interest.append(list_of_words[i-1])
print words_of_interest
#['word1', 'word2']
The regular expression is always going to encompass the "define" as well. The trick is to use capture groups and refer to them afterwards. The specific way how to do this depends on the "flavor" of your regex.
As I'm not familiar with Sublime's regex, I'm just going to present an example in sed:
$ sed -e 's/define \([A-Za-z]*\)/include \1/g' <<< "define MyFile as File"
include MyFile as File
This example replaces all "define"s with "include"s - and adds whatever was captured by what's inside the group (the regex [A-Za-z]* in this case). Not too useful, but hopefully explanatory :)
The capture group is denoted by the escaped brackets, and (in sed) referenced by the escaped number (representing the index) of the group.
I believe it's capture groups as a concept that you're looking for, rather than any specific regex.

Remove text appearing after numbers in Notepad++ using regular expressions

I have a large text file which contains many timestamps. The timestamps look like this: 2013/11/14 06:52:38AM. I need to remove the last two characters (am/pm/AM/PM) from each of these. The problem is that a simple find and replace of "AM" may remove text from other parts of the file (which contains a lot of other text).
I have done a find using the regular expression (:\d\d[ap]m), which in the above example would track down the last bit of the timestamp: :38AM. I now need to replace this with :38, but I don't know how this is done (allowing for any combination of two digits after the colon).
Any help would be much appreciated.
EDIT: What I needed was to replace (:\d\d)[ap]m with \1
Make (:\d\d[ap]m) into (:\d\d)[ap]m and use $1 not \1
Go to Search > Replace menu (shortcut CTRL+H) and do the following:
Find what:
[0-9]{2}\K[AP]M
Replace:
[leave empty]
Select radio button "Regular Expression"
Then press Replace All
You can test it at regex101.
Note: the use of [0-9] is generally better than \d (read why), and avoiding to use a capture group $1 with the use of \K is considered better. It's definitely not important in your case, but it is good to know :)