Notepad++ Regex - Finding and replacing multiple different criteria simultaneously - regex

I've just started to get to grips with regex in notepad++ and I've tasked myself with formatting a chunk of JSON data into something human readable, as well as something that can be read into an algorithm a colleague of mine wrote. I've found a few regex expressions that do this perfectly, but in order to get to my desired result, I have to do it in four separate Find/Replace steps. Is there some sort of way I can create one single find/replace expression that handles all of the above tasks for me?
Currently I have Notepad++ doing the following:
Deleting all quotation marks by finding " and replacing it with
nothing
Deleting all commas by finding , and replacing it with nothing
Changing all underscored numbers that are followed by a colon with
the number 0 (the reason behind this is particular to the project)
by finding _[0-9]*: and replacing with _0 and finally, putting all
of a particular expression onto it's own line by finding the start
of the particular string I'm after and adding \n.
I know that's convoluted, but fortunately it does the job. Is there any way of consolidating all that into a single command, or does that all have to be done step by step?
Thanks guys :)

Notepad ++ allows you to consolidate individual search and replaces as a macro which you can also save.
Hit the record button in the toolbar (or Macro>Start Recording)
perform these regex replacements in the required order.
hit stop button in toolbar (or Macro>Stop Recording)
Hit the play button to perform all the required replacement operations again.
Save the macro by going into the Macro option in the window menu and 'save current recorded macro'
As for the first to replacements you could use the following expression: (?:"|,)

Related

Search in VSCode for the multiline contents of a set of XML tags, using a regular expression

I am using VSCode to do a global search of XML files. Within those files there are multiple instances of these XML tags: <translated></translated>. I need to find all occurrences of any hyphens - that exist anywhere between those tags, where the contents of those tags can be on multiple lines.
<translated>
Content is here
Could be on multiple lines
The meeting could take 3-4 hours
</translated>
In the above example, the phrase "3-4 hours" has a hyphen in it. I need a regex that works for VSCode which finds all incidences of hyphens which happen to be within a set of these XML tags.
Option 1 (using VS Code)
This only matches one dash at a time and not all dashes. This is because limiting the search to inside one set of tags means it can only do one pass at a time. I was going to delete this answer but if it's the only answer given it may be better than nothing. The work around would be that you would have to refresh the search (button above the search box) and click replace all over and over. If there are lots of dashes this would be annoying but better than no answer.
I have been fiddling with Visual Code Studio and the following seems to work.
(<translated>(.|\n)*?)(-)((.|\n)*?<\/translated>)
Assuming you may be wanting to, for example, replace the dash it's possible to with adding back groups 1 and 4 wrapped around any new text...
$1 <yourTextHere> $4
Example:
Before replace:
After replace (note only the 3-4 in the first section of the file(s) is affected and the 3 to 4 is not changed):
Option 2 / Update (using Brackets.io)
While I'm unsure of the cause if the failure for VSCode to match across files, the following regex works with Brackets (google Brackets.io) across multiple files...
-(?=[^<]*?<\/translated>)
You have to have all your files in a folder and open the folder. Then search in the project (Find > Find in files). Notice in the screenshot it shows for the matches found across all files. In the lower panel for the selected file t2 copy.txt it matches first on line 6 and then on line 16 and (correctly) does not match on line 10 because it is not contained in a translated tag set.
The reason why -(?=[^<]*?<\/translated>) doesn't work in vscode is because it does not EXPLICITLY contain a newline \n. Even though [^<] includes newlines, the \n needs to be actually written into the regex in order to trigger the multiline option. Why is this?
See https://github.com/microsoft/vscode/issues/75265 which uses a similar regex. The issue makes for interesting reading ;>} Primarily for performance reasons.
So simply using this
-(?=[^<]*?\n*<\/translated>)
works in vscode!
-(?=[^<]*?\n<\/translated>) would work for you too unless you have single line blocks like:
<translated>Con-tent is he-re</translated>

Visual Studio Code - Removing Lines Containing criteria

This probably isn't a VS Code-specific question but it's my tool of choice.
I have a log file with a lot of lines containing the following:
Company.Environment.Security.RightsBased.Policies.RightsUserAuthorizationPolicy
Those are debug-level log records that clutter the file I'm trying to process. I'm looking to remove the lines with that content.
I've looked into Regex but, unlike removing a blank line where you have the whole content in the search criteria (making find/replace easy), here I need to match from line break to line break on some criteria between the two, I think...
What are your thoughts on how criteria like that would work?
If the criteria is a particular string and you don't want to have to remember regexes, there is a few handy keyboard shortcuts that can help you out. I'm going to assume you're on a Mac.
Cmd-F to open find.
Paste your string.
Opt-Enter to select all of the instances of the string on the page.
Cmd-L to broaden the selection to the entire line of each instance on the page.
Delete/Backspace to remove those lines.
I think you should be able to just search for ^.*CONTENT.*$\n, where the content is the text you showed us. That is, search on the following pattern:
^.*Company\.Environment\.Security\.RightsBased\.Policies\.RightsUserAuthorizationPolicy.*$\n
And then just replace with empty string.
I have already up-voted answer of #james. But.. still I found one more easy and many feature available extension in VS Code. Here it is
It have much easy options to apply filters.
To match specific case mentioned in question. I am attaching screenshot which display how to use for it. I am posting this for others who come here in search for same issue. (Like I came)

If duplicate within brackets, delete one of the lines

Hi i have a long list of items (~6k), that comes in this format:
'Entry': ['Entry'],
What i want to do, is if within the first bracket, the words match, i.e.:
'ACT': ['KOSOV'],
'ACT': ['STIG'],
I want it to leave only one of the entries, it doesn't matter which entry the first the second or whatever, i just need it to leave one of them.
If possible I would like to accomplish that by sublime, or notepad++ using regexp and if there is no way then do whatever you think is best to solve this.
UPD: The AWK command did the job indeed, thank you
You can't solve this using just regular expressions. You either need to remember all entries you've seen so far while scanning the text (would require writing a small utility program, probably), or you could sort the entries and then remove any repeated entries.
If you have a sorted file, then you can solve it using a regular expression, such as this one:
^(([^:]+):.+\n)(?:\2.+\n)+
Replace with \1. See it in action here

How do I join two regular expressions into one in Notepad++?

I've been searching a lot in the web and in here but I can't find a solution to this.
I have to make two replacements in all registry paths saved in a text file as follows:
replace all asterisc with: [#42]
replace all single backslashes with two.
I already have two expressions that do this right:
1st case:
Find: (\*) - Replace: \[#42\]
2nd case:
Find: ([^\\])(\\)([^\\]) - Replace: $1$2\\$3
Now, all I want is to join them together into just one expression so that I can do run this in one time only.
I'm using Notepad++ 6.5.1 in Windows 7 (64 bits).
Example line in which I want this to work (I include backslashes but i don't know if they will appear right in the html):
HKLM\SOFTWARE\Classes\*\shellex\ContextMenuHandlers\
I already tried separating it with a pipe, like I do in Jscript (WSH), but it doesn't work here. I also tried a lot of other things but none worked.
Any help?
Thanks!
Edit: I have put all the backslashes right, but the page html seem to be "eating" some of them!
Edit2: Someone reedited my text to include an accent that doesn't remove the backslashes, so the expressions went wrong again. But I got it and fixed it. ;-)
Sorry, but this was my first post here. :)
As everyone else already mentioned this is not possible.
But, you can achieve what you want in Notepad++ by using a Macro.
Go to "Macro" > "Start Recording" menu, apply those two search and replace regular expressions, press "Stop Recording", then "Save Current Recorded Macro", there give it a name, assign a shortcut, and you are done. You now can reuse the same replacements whenever you want with one shortcut.
Since your replacement strings are totally different and use data that come not from any capture (i.e. [#42]), you can't.
Keep in mind that replacement strings are only masks, and can not contain any conditional content.

Regex select XML Element (containing hyphen) and inside content

I'm working with an enterprise CMS and in order to properly create our weekly-updated dropdown menu without republishing our entire site, I have an XML document being created which has a various number of useful XML elements. However, when pulling in a link with the CMS, the generated XML also outputs the link's contents (the entire HTML for the page). Needless to say, with roughly 50 items, the XML file is too big for use on the web (as it stands I think it's over 600KB). The element is <page-content>filler here</page-content>.
What I'm trying to do is use TextWrangler to find and replace all <page-content> tags as well as their containing content.
I've tried a few different regex's, but I can't seem to match the closing tag, so it will just trail on.
Here's what I've tried:
(<page-content>)(.*?)
The above will match up until the next starting <page-content> tag, which is not what I want.
(<page-content>)(.*?)(<\/page-content>)
(<page-content>)(.*?)(<\/page\-content>)
The above finds no matches, even though the below will find the 7 matches it should.
(<content>)(.*?)(<\/content>)
I don't know if there's a special way to deal with hyphens (I'm inexperienced in regular expressions), but if anyone could help me out, it would be greatly appreciated.
Thanks!
EDIT: Before you tell me that Regex isn't meant to parse HTML, I know that, but there seems to be no other way for me to easily find and replace this. There are too many occurences to manually delete it and save the file again every week.
It seems the problem is that your . is not matching newlines that exist between your open and close tags.
An easy solution for this would be to add the s flag in order for your . to match over newlines. TextWrangler appears to support inline modifiers (?s). You could do it like this:
(<page-content>)(?s)(.*?)(<\/page-content>)
More information on modifiers here.