Delete/Replace multiple instances of two consecutive patterns using RegEx in Notepad++ - regex

Following is sample text.
Header
This is test, and
mid line adsf
bag a lot of many things
shoes>
shoes/
This line should not be affected.
Neither this line.
This is test, and
mid line
bag a lot of many things
shoes>
shoes/
Footer
Need to replace/remove lines where text starts with This is test and ends with
shoes>
shoes/
using Notepad++ 7.5
Newbie to RegEx, following were my trials, failed to select multiple instance.
(This is test).*(shoes/)
^This is test.*shoes/$
Expected output is
Header
This line should not be affected.
Neither this line.
Footer

^This is test((.|\n|\r)*?)shoes/$ should match; it takes into account the newlines. Edit: even shorter and better is ^This is test(?s:.*?)shoes/$ as suggested in the comments by #Wiktor.

Related

Regex to select two spaces between words (or two spaces before a letter)

I'm cleaning up some ancient HTML help files and there are a lot of double-spaces that I'd like to clean up (replace each double-space with single).
Sample: <li><b>% Successful</b>. The percentage of jobs that returned a confirmation.</li>
I want to find double-spaces only before the start of sentences or between words (so between the setting label or between 'percentage' and 'of', not spaces in isolation or before the XML tags).
I tried a simple search for two spaces, but that also brings up the tab/space mixture that the creator used for formatting the indents, so I'm getting five useless results for every relevant one.
Is there a single regex that would help with both use cases, or is it better to use two different ones for each format? I'm fine either way, just am still pretty new to regexes and not sure where to start on this one.
Spaces between periods and text (three here for a better visual):
<li><b>Submit Time</b>. The time the job was scheduled.</li>
Spaces between words (three here as well):
<li><b>End Time</b>. The date/time when the job was completed or canceled.</li>
I'm looking for multiple spaces either between words or at the start of sentences (generally after the setting name period).

How to ignore specific charactor and new line using regex

I am trying to validate a csv file using Apache-NiFi.
My CSV file has some defects.
id,name,address
1,sachith,{"Lane":"ABC.RTG.EED","No":"12"}
2,nalaka,{"Lane":"DEF",
"No":"23"}
3,muha,{"Lane":"GRF.FFF","No":"%$&%*^%"}
Here in second row,its been divided into two lines and third row has some special characters.
I want to ignore both the lines. For that I use \{("\w+":"\w+",)*[^%&*#]*\}, but this is not capturing row split error and new line.
I also used \{("\w+":"\w+",)*[^%&*#]*\}$, but it doesnt even get the right answer.
This is you might looking for: ^[0-9]+,[a-z]+,\{("\w+":"[\w\.]+","\w+":"[a-zA-Z0-9]+")\}$

Notepad++ - Selecting or Highlighting multiple sections of repeated text IN 1 LINE

I have a text file in Notepad++ that contains about 66,000 words all in 1 line, and it is a set of 200 "lines" of output that are all unique and placed in 1 line in the basic JSON form {output:[{output1},{output2},...}]}.
There is a set of characters matching the RegEx expression "id":.........,"kind":"track" that occurs about 285 times in total, and I am trying to either single them out, or copy all of them at once.
Basically, without some super complicated RegEx terms, I am stuck because I can't figure out how to highlight all of them at once, and also the Remove Unbookmarked Lines feature does not apply because this is all in one line. I have only managed to be able to Mark every single occurrence.
So does this require a large number of steps to get the file into multiple lines and work from there, or is there something else I am missing?
Edit: I have come up with a set of Macro schemes that make the process of doing this manually work much faster. It's another alternative but still takes a few steps and quite some time.
Edit 2: I intended there to be an answer for actually just highlighting the different sections all at once, but I guess that it not possible. The answer here turns out to be more useful in my case, allowing me to have a list of IDs without everything else.
You seem to already have a regex which matches single instances of your pattern, so assuming it works and that we must use Notepad++ for this:
Replace .*?("id":.........,"kind":"track").*?(?="id".........,"kind":"track"|$) with \1.
If this textfile is valid JSON, this opens you up to other, non-notepad++ options, like using Python with the json module.
Edited to remove unnecessary steps

Vim: How to apply external command only to lines matching pattern

Two of my favorite Vim features are the ability to apply standard operators to lines matching a regex, and the ability to filter a selection or range of lines through an external command. But can these two ideas be combined?
For example, I have a text file that I use as a lab notebook, with notes from different dates separated by a line of dashes. I can do something like delete all the dash-lines with something like :% g/^-/d. But let's say I wanted to resize all the actual text lines, without touching those dash lines.
For a single paragraph, this would be something like {!}fmt. But how can this be applied to all the non-dash paragraphs? When I try what seems the logical thing, and just chain these two together with :% v/^-/!fmt, that doesn't work. (In fact, it seems to crash Vim...)
Is there a way to connect these two ideas, and only pass lines (not) matching a pattern into an external command like fmt?
Consider how the :global command works.
:global (and :v) make two passes through the buffer,
first marking each line that matches,
then executing the given command on the marked lines.
Thus if you can come up with a command – be it an Ex command or a command-line tool – and an associated range that can be applied to each matching line (and range), you have a winner.
For example, assuming that your text is soft-wrapped and your paragraphs are simply lines that don't begin with minus, here's how to reformat the paragraphs:
:v/^-/.!fmt -72
Here we used the range . "current line" and thus filtered every matching line through fmt. More complicated ranges work, too. For instance, if your text were hard-wrapped and paragraphs were defined as "from a line beginning with minus, up until the next blank line" you could instead use this:
:g/^-/.,'}!fmt -72
Help topics:
:h multi-repeat
:h :range!
:h :range
One way to do it may be applying the command to the lines matching the pattern 'not containing only dashes'
The solution I would try the is something like (not tested):
:g/\v^(-+)#!/normal V!fmt
EDIT I was doing some experiments and I think a recurvie macro should work for you
first of all set nowrapscan:
set nowrapscan
To prevent the recursive macro executing more than you want.
Then you make a search:
/\v^(-+)#!
Test if pressing n and p works with your pattern and tune it up if needed
After that, start recording the macro
qqn:.!awk '{print $2}'^M$
In this case I use awk as an example .! means filter current line with an external program
Then to make the macro recursive just append the string '#q' to the register #q
let #q .= '#q'
And move to the beggining of the buffer to apply the recursive macro and make the modifications:
gg#q
Then you are done. Hope this helps

PowerShell isolating parts of strings

I have no experience with regular expressions and would love some help and suggestions on a possible solution to deleting parts of file names contained in a csv file.
Problem:
A list of exported file names contains a random unique identifier that I need isolated. The unique identifier has no predictable pattern, however the aspects which need removing do. Each file name ends with one of the following variations:
V, -V, or %20V followed by a random number sequence with possible spaces, additional "-","" and ending with .PDF
examples:
GTD-LVOE-43-0021 V10 0.PDF
GTD-LVOE-43-0021-V34-2.PDF
GTD-LVOE-43-0021_V02_9.PDF
GTD-LVOE-43-0021 V49.9.PDF
Solution:
My plan was to write a script to select of the first occurrence of a V from the end of the string and then delete it and everything to the right of it. Then the file names can be cleaned up by deleting any "-" or "_" and white space that occurs at the end of a string.
Question:
How can I do this with a regular expression and is my line of thinking even close to the right approach to solving this?
REGEX: [\s\-_]V.*?\.PDF
Might do the trick. You'd still need to replace away any leading - and _, but it should get you down the path, hopefully.
This would read as follows..
start with a whitespace, - OR _ followed by a V. Then take everything until you get to the first .PDF