Remove all text except a certain string in Notepad++ - regex

I'm extracting case numbers from a wall of text. How do I filter out all the useless text using the replace function in Notepad++ with the help of RegEx? The parts I want to keep are made up of letters, digits, and a hyphen (SPP-1803-2045227).
I would like to turn this...
(SPP-1803-2045227)Useless text goes here. 2019-05-18 *
(SPP-1915-1802667)More useless text. 2019-01-14 *
(SPP-1904-1012523)And some more. 2019-02-03 *
...into this:
SPP-1803-2045227
SPP-1915-1802667
SPP-1904-1012523
I've been playing around with RegEx and also found something in another thread on here before, which wasn't the solution but came very close. Unfortunately I can't find it anymore. It looked something like this:
^(?!S\w+).*\r?\n?
Any help is appreciated.

you could try something like this.
find: .*\((\w{3}-\d{4}-\d{7})\).*
replace with: \1
The above Regular Expression matches the whole line with your letters and digits between an extra pair of parentheses.
When you replace with \1 you keep only the match between parentheses.
Remember to select Regular Expression Search mode.

Related

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Regular Expressions - Select the Second Match

I have a txt file with <i> and </i> between words that I would like to remove using Editpad
For example, I'd like to keep when it's like this:
<i>Phrases and words.</i>
And I'd like to remove the </i> and <i> tags inside the phrase, when it's like this:
<i>Phrases</i>and<i> words.</i>
<i>Phrases</i>and <i>words.</i>
I was trying to do that using regex, but I couldn't do it.
As the tag is followed by space or a word character I could find when the line has the double tag with
/ <i>|<\/i> /
but this way I can't just press replace for nothing, I have to edit line by line I search.
There's anyway to accomplish that?
* Edited *
Another example of lines found on the subtitle text
<i>- find me on the chamber.</i>
- What? <i>Go. Go, go, go!</i>
Rule number one: you can't parse html with regex.
That being said, if you know each line follows a certain pattern, you can usually hack something together to work. ;)
If I've understood correctly, it looks like you can simply remove all <i> and </i> that aren't either at the beginning or end of the lines. In that case, one method you could try is the following regex:
(?<=.)\<\/?i\>(?=.)
This will match the tags, with a lookahead and behind to make sure that we aren't at the end/start of a line (by checking if another character exists in front/behind. (Note that typically matched characters in a lookahead/behind won't be replaced when you search/replace.)
Disclaimer: this works on regex101, but notepad++ may have some differences to the pcre regex style.
update to work with Editpad
EDIT: since this question is actually wanting to know how to do this in Editpad, below is a modified alternative:
Try searching for the regex: (.)\<\/?i\>(.). This will match (and capture) exactly one character before and after the <i> tags.
When replacing, use backreferences to replace the entire match with the two captured characters - a replacement string of \1\2 should work.

replace text with regular expression keeping structure match on sublime text

i been trying a few options, but i can't figure out.
this is the text i'm looking for:
php[whatever_is_in_between]myfunction
and i want to change it to this
=[whatever_is_in_between]myfunction
where [whatever_is_in_between] = \n or \n\t or \n\t\t or \nbarspace or \nbarespace\t and so on
so i found this regexp match the search text:
php[\n]*[\t]*[ ]*myfunction\(
this is the "replace with" text:
=[\n]*[\t]*[ ]*myfunction\(
but the regexp does not work on the replacement, it replace it as text.
can anybody help me with this?
thanks
I think the problem is that you are not using a capturing group ( ). Among other things, a capturing group allows you to take input from the the read text and then inject it into your replacement text.
I'd use a search pattern like this:
php\[([^\]]*)\]([\w\W]*)
It looks complicated, but I've set up a sample on Regex 101 that you can check out. The replacement text should look something like this:
[\1]\2
Please note that how you insert a capturing group will depend on what programming language you're using. The above should work for php.
I hope that helps,
--Jonathan

multi-line xml regex in sublime

I have a large logfile (+100 000 lines) in XML like so:
<container>
<request:getApples xml="...">
...
</request:getApples>
<request:getOranges xml="...">
...
</request:getOranges>
</container>
...
I want to extract the :getXXXX part to
getApples
getOranges
by doing a regex find & replace in Sublime Text 2.
Something like
Find: [^(request:)]*(.*) xml
Replace: $1\n
Any regex masters that can assist?
Correcting mart1n's answer and actually using ST2 and your sample input, I came up with the following:
First, CtrlA to select all. Then, CtrlH,
Search: .*?(get\w+) .*
Replace: $1
Replace All
Then,
Search: ^[^get].*$
Replace: nothing
Replace All
Finally,
Search: ^\n
Replace: nothing
Replace All
And you're left with:
getApples
getOranges
Not familiar with Sublime Text but you can do in two parts:
Find .*?\(get\w+\) .* and replace with \1. Now those get* strings are on separate lines with nothing else. All that remains is to remove the cruft.
So, many ways to do this. Easy one: find ^[^g][^e][^t].*$ and replace with nothing (an empty string).
Now you have your file that contains just the string you want and some empty lines, which (I hope) Sublime can get rid of with some delete-empty-lines function.
You can quickly throw all of the above in a macro and execute at will for any input following the same format ;-)
If you're willing to take the problem out of sublime text, you can use the dotall flag along with lazy matching to extract only the getXXX parts.
Replacing
.*?(get\w*) .*?
with
$1\n
should get you most of the way, only leaving a bit of easily removeable closing tags at the end of the file that I can't figure out at present.
You can check this solution here.
Maybe someone could take this and figure out a way to remove the extra closing tags.
Try this
Find what: :(\w+)>|.\s?
Replace with: $1
And if didn't work as intended, then let me know?

Numbers & escape characters for regex replacement in Notepad++

I'm trying to match a pattern using RegEx in notepad++, but not having much luck. I'm able to match part but not all of it.
I need to search for this line:
<size value="Large" pax="13074"/>
And replace it with this:
<size value="Very_large" pax="41450" cargo="Largest" cargovolume="3227"/>
Essentially I need to find all patterns matching pax="n"/> and replace them with pax="n" cargo="Largest" cargovolume="0"/> while retaining the initial value of n.
So, ideas anyone?
Press Ctrl + F, move to tab Replace, in Find what do: pax="(\d+)" and in Replace with put this: pax="\1" cargo="Largest" cargovolume="0"
Remember to mark regex. That should retain the number and replace the content.
UPDATE: Hint about saving text for replacement.
Whenever you use regex to do text replacement, wrap the content you want to save in parenthesis and then you can access them using \i where i is the order of appearance of the parenthesis starting at 1.
Hope it helps!