Replace multiple sentences between 2 expressions in multiple files Notepad ++ - regex

I have 58K files where I need to find this expression
()">A Random sentence.</A></P>
and i need to replace A Random Sentence by nothing.
I was trying on Notepad++ something like
Find What: ()">[[:alnum:][:punct:][:space:]]</A></P>
Replace: <empty>
Not even gettng results from the search...
Waiting for some feedback.

Try to find
(\(\)">).*(<\/A><\/P>)
and replace it with
$1\<empty\>$2
The idea is to save left part and right part, placing essential parts in brackets ().
The ".*" means every character in between.
In replace statement we call $1 and $2 to access saved parts.

You also can try :
(?<=\(\)">)[a-z \.-]+(?=</A></P>)
here [a-z \.-] you put everything what you want to search
Also parenthesis in Notepad++ should be mark with \

This should work for you:
Find: (?<=\(\)">)A Random sentence.(?=<\/A><\/P>)
Replace: <empty>
If A Random sentence. is not the actual sentence you can replace the find with:
(?<=\(\)">).*?(?=<\/A><\/P>)

Related

Notepad ++ regex. Finding and replacing with wildcard, but without allowing any spaces?

I have something like this in txt
[[asdfg]] [[abcd|qwerty]]
in a row, but I want it to look like that
[[asdfg]] [[qwerty]]
using wildcards ( [[.*\| ) when trying to search, results in it finding the whole line up to the "|" Not allowing it to have a space in between should work, but I don't know how to do that.
Edit 1
It's from a wikipedia dump, so the first part is the word in it's basic form and the second is how it fits into the sentence. Something like [[I]] [[be|was]] [[at]] [[the]] [[doctor]] And I want to change it into normal sentences
[[I]] [[was]] [[at]] [[the]] [[doctor]]
Edit 2
I found somewhat of a solution. I just put every word in a new line, did the first regex and then deleted newlines. That did kinda mess up my spacing though...
Try this regex:
\[\[\w+\|(\w+)\]\]
Replace with:
[[$1]]
Make sure you choose Regular expression at the bottom before you click Replace All in Notepad++.
You can do it all in one run like so
\[{2}(?:(?!\]{2}).)+?\|([^\]]+)
This needs to be replaced by
[[$1
See a demo on regex101.com.
Broken down this says:
\[{2} # match [[
(?:(?!\]{2}).)+? # do not overrun ]]
\| # |
([^\]]+) # capture anything not ] into group 1
Afterwards, you'll only need to replace the open brackets and the content of group $1

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

regix match words starting with dot but exclude some

I need to build regex to match all words starting with . but also white flag words like .well-known .. or some similar..
For now I have build the one that's complete opposite,, this one captures ONLY this one. I tried to find some regex symbol for invert but that doesn't exist I think..
location ~ /^(\.well-known) {
deny all;
}
thx
Here's an expression that works:
^(\.(?!well-known|other-forbidden-words|another-forbidden-word).+)$
Simply change the other words to white-listed words you want, and add more if need be.
hm,,
I guess that I found my answer thanks to:
Invert match with regexp
this is the test that works now:
https://regex101.com/r/mS2wC6/2
and solution is this:
^(\.(?!well-known))
so ok, I think I got this grouping thing now..
hth, k
well you don't have to add words in OR "|" condition
you can simply do
$pattern = "/\.+(?:[a-zA-Z]|-)/";
which accepts anything that starts with . and contains - or alphabets.

Remove everything without Digits With Notepad ++

I wish to remove everything except Digits in my notepad ++ with regular expression.
can anyone help me with the String to use. that would help me get results like
from
416385-creativelive-photo-week-2014-hd-full-day-5.html
416668-creativelive-photo-week-2014-hd-full-day-4.html
421733-creativelive-photo-week-2014-day-2.html
to
416385
416668
421733
According to this sentence:
I wish to remove everything except Digits in my notepad ++ with regular expression.
do:
Find what: \D+
Replace with: :Nothing
It'll give : 416385201454166682014442173320142, but I'm pretty that's not what you want.
Another proposal is to keep also line break:
Find what: [^\d\r\n]+
Replace with: :Nothing
It'll give:
41638520145
41666820144
42173320142
Finally, according to your example, I guess you want:
Find what: ^(\d+).*$
Replace with: $1
NB.: Don't check dot matches new line.
It'll give:
416385
416668
421733

multi-line xml regex in sublime

I have a large logfile (+100 000 lines) in XML like so:
<container>
<request:getApples xml="...">
...
</request:getApples>
<request:getOranges xml="...">
...
</request:getOranges>
</container>
...
I want to extract the :getXXXX part to
getApples
getOranges
by doing a regex find & replace in Sublime Text 2.
Something like
Find: [^(request:)]*(.*) xml
Replace: $1\n
Any regex masters that can assist?
Correcting mart1n's answer and actually using ST2 and your sample input, I came up with the following:
First, CtrlA to select all. Then, CtrlH,
Search: .*?(get\w+) .*
Replace: $1
Replace All
Then,
Search: ^[^get].*$
Replace: nothing
Replace All
Finally,
Search: ^\n
Replace: nothing
Replace All
And you're left with:
getApples
getOranges
Not familiar with Sublime Text but you can do in two parts:
Find .*?\(get\w+\) .* and replace with \1. Now those get* strings are on separate lines with nothing else. All that remains is to remove the cruft.
So, many ways to do this. Easy one: find ^[^g][^e][^t].*$ and replace with nothing (an empty string).
Now you have your file that contains just the string you want and some empty lines, which (I hope) Sublime can get rid of with some delete-empty-lines function.
You can quickly throw all of the above in a macro and execute at will for any input following the same format ;-)
If you're willing to take the problem out of sublime text, you can use the dotall flag along with lazy matching to extract only the getXXX parts.
Replacing
.*?(get\w*) .*?
with
$1\n
should get you most of the way, only leaving a bit of easily removeable closing tags at the end of the file that I can't figure out at present.
You can check this solution here.
Maybe someone could take this and figure out a way to remove the extra closing tags.
Try this
Find what: :(\w+)>|.\s?
Replace with: $1
And if didn't work as intended, then let me know?