Text replace challenge (regex) - regex

I can't solve a problem. Perhaps it is impossible to achieve what I want.
GOAL: use only replace function to remove all text except the email address.
I have a text with email in: Start text some other text 2828 text my.address#mail.com some additional text.
Regular expression to select email: [a-zA-Z0-9\-\._]+#[\w\d\-\._]+\.\w{2,12}
Regular expression works perfectly to find an email address, but it didn't work to remove all letters from an email.
Below print screen shows what I got as a result when apply replace function in the text editor:
As results I used regexp .*([a-zA-Z0-9\-\._]+#[\w\d\-\._]+\.\w{2,12}).*, and replace it on $1. Sadly this workflow give me broken email.
I used email as an example, the same result I got for any other data types as URLs, IPs, phones, names, cities, zips etc.
Can anyone unveil a solution to this problem?
Thank you a lot.
PS I am not interested in using math() function, because of this function isn't presented in most of the text editors.

I think you should make the first part non greedy .*? or else the .* will match upon the # and after that just giving up 1 match to satisfy the character class [a-zA-Z0-9\-\._]+
If it is not greedy it will capture my.address#mail.com instead of s#mail.com
.*?([a-zA-Z0-9\-\._]+#[\w\d\-\._]+\.\w{2,12}).*

I would do it like this:
Find: (.*?)[a-zA-Z0-9\-\._]+#[\w\d\-\._]+\.\w{2,12}\s?(.*?)
Replace: $1$2
Input: Start text some other text 2828 text my.address#mail.com some additional text
Output: Start text some other text 2828 text some additional text

Related

Notepad++ Regex Replace selecting all text. Works in RegExr

I'm trying to replace all spaces in a log file with commas (to convert it to CSV format). However, some log entries have spaces that I don't want replaced. These entries are bounded by quotation marks. I looked at a couple of examples and came up with the following code, which seems to work in RegExr.com and regex101.com.
[\s](?=(?:"[^"]*"|[^"])*$)
However, when I do a find/replace with that expression, it runs correctly until it hits the first quotation with a space and then selects the entire contents of the file.
Sample log file entry:
date=2020-08-24 time=07:35:15 idseq=216296511061885345 itime="2020-08-24 07:35:15" euid=3 epid=4107 dsteuid=3 dstepid=101 type="utm" subtype="webfilter" level="notice" action="passthrough" msg="URL belongs to an allowed category in policy"
Desired result:
date=2020-08-24,time=07:35:15,idseq=216296511061885345,itime="2020-08-24 07:35:15",euid=3,epid=4107,dsteuid=3,dstepid=101,type="utm",subtype="webfilter",level="notice",action="passthrough",msg="URL belongs to an allowed category in policy"
RegExr result:
EDIT: After more testing, it appears that with a single line, the replace works. However, if you have more than one line, it replaces all lines with the replace character (in my case, the comma).
Ctrl+H
Find what: "[^"\r\n]+"(*SKIP)(*FAIL)|\h+
Replace with: ,
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
"[^"\r\n]+" # everything between quotes
(*SKIP)(*FAIL) # kip and fail the match
| # OR
\h+ # 1 or more horizontal spaces
Screenshot (before):
Screenshot (after):
While lengthy, if you have a known list of values, you can simply use them as replacement keys
first value is skipped as it shouldn't be prefixed with ,
must capture and = around labels to be more sure, (though this does not guarantee it will not find substrings in the msg field)
's/ (time|idseq|itime|euid|epid|dsteuid|dstepid|type|subtype|level|action|msg)=/,$1='
Example in Python
import re
>>> source = '''date=2020-08-24 time=07:35:15 idseq=216296511061885345 itime="2020-08-24 07:35:15" euid=3 epid=4107 dsteuid=3 dstepid=101 type="utm" subtype="webfilter" level="notice" action="passthrough" msg="URL belongs to an allowed category in policy"'''
>>> regex = ''' (time|idseq|itime|euid|epid|dsteuid|dstepid|type|subtype|level|action|msg)='''
>>> print(re.sub(regex, r",\1=", source)) # raw string to prevent loss of 1
date=2020-08-24,time=07:35:15,idseq=216296511061885345,itime="2020-08-24 07:35:15",euid=3,epid=4107,dsteuid=3,dstepid=101,type="utm",subtype="webfilter",level="notice",action="passthrough",msg="URL belongs to an allowed category in policy"
You may find some values contain \" or similar, which can break even quite careful regular expressions!
Also note for a CSV you may wish to replace the field names entirely

vim: my regexp to select some words doesnt work

Sentences = lines that may contain anything (including html tags). I have a lot of sentences like that. Those sentences are in a huge text where I dont want to remove all tags (I want all other lines to remain untouched):
<h2 id="aa">sentence</h2>
<h2 id="xx">Another sentence</h2>
And sometimes only:
<h2 id="aa">A sentence without a link</h2>
First thing that I find strange: I'm trying to search for any caracter and fill a group. I've tried all those solutions:
\(.\)\+ -> select whole line
\([.]\)\+ -> select only the "." caracter
\([\.]\)\+ -> select only the "." caracter
\([\.]\)\+ -> still select only the "." caracter (what the?)
From the documentation, if I want to select a group of any caracters and fill a register I thought I could use that expression but it doesnt work: \([\.]\+\). The only "close" expression that works is \(.\)\+ but if I try to output the register it's only filled with the last caracter matched.
So starting from this problem above, I can't do what I want which is convert all the sentences above by this output:
---sentence
---Another sentence
---A sentence without a link
I've tried something like :%s/^<h2 id=\(\[.\]\+\)<a\([.]\)\+>\(.\)\+<\/a><\/h2>$/--->\3/ but it didnt work properly, and didnt include sentences that did not have <a /> tag inside
How would you do this?
Simply use the regex below:
>([^<>]+)<
Demo: https://regex101.com/r/mS2oB5/2
For full text:
>([^<>\n]+)<
Demo: https://regex101.com/r/mS2oB5/3
Vim in command mode , type
%s/<[^>]*>//g.
Explaination:
1.\([\.]\)\+ still select only the "." caracter.Because the character in the [] is treated as normal chars, they dont have regex special meaning.
2.My Regex <[^>]*> is a simple way to remove all html tags.Will have some problems, but I will leave it to you.
3.<[^>]*> have another version <.*?> with include the greedy featrue of regex.

How to perform negative search (or replace) in common text editors

Is there any way I can replace all the words/lines which don't match in my search query in text editors like notepad++ or sublime text.
For example I have a document having few url links in it. Can I do something which leaves only url links in my document. If I have to remove url links, I can search them using regex and replace them with an empty string. But can I do the same thing but for the content which doesn't match regex.
Example:
this is line which I want to remove and can also have special characters in it link % $ [] (0) and here is url: https://google.com one more line with some random garbagee and https://www.example.com
For above text, output should be:
https://google.com
https://www.example.com
In Sublime Text, you can search, hit "Find All", then copy and paste all the matches at once into a new document. This isn't exactly "negative search", but it does accomplish your goal.
With Notepad++ you can do it in two passes. The first pass isolates the wanted text. The second pass removes the unwanted pieces.
Firstly do a regular expression search for \b(https://[^\s]+)(\s|$) and replace with \r\n\1\r\n. This is a very crude and easy to fool URL detector, but it works on the examples you give. The search string looks for "https://" preceded by a word boundary (ie \b). That is followed by some non-whitespace characters that are considered to be part of the wanted text. The last part of the search text looks for either a whitespace character or the end of line. The wanted part is retained in a capture for the replace text.
Second do a regular search for ^https:// using the "Mark" tab in the find window. Select "Bookmark lines" then click on "Mark all". (You might like to click on "Clear all marks" before clicking on "Mark all".) Finally use menu => Search => Bookmark => Remove unmarked lines.
(Checked in Notepad++ version 6.6.9)
In SynWrite app:
call dialog "Search/ Extract strings"
enter Regex for URL, do "Find"
now press button to copy found URLs to new tab

replace text with regular expression keeping structure match on sublime text

i been trying a few options, but i can't figure out.
this is the text i'm looking for:
php[whatever_is_in_between]myfunction
and i want to change it to this
=[whatever_is_in_between]myfunction
where [whatever_is_in_between] = \n or \n\t or \n\t\t or \nbarspace or \nbarespace\t and so on
so i found this regexp match the search text:
php[\n]*[\t]*[ ]*myfunction\(
this is the "replace with" text:
=[\n]*[\t]*[ ]*myfunction\(
but the regexp does not work on the replacement, it replace it as text.
can anybody help me with this?
thanks
I think the problem is that you are not using a capturing group ( ). Among other things, a capturing group allows you to take input from the the read text and then inject it into your replacement text.
I'd use a search pattern like this:
php\[([^\]]*)\]([\w\W]*)
It looks complicated, but I've set up a sample on Regex 101 that you can check out. The replacement text should look something like this:
[\1]\2
Please note that how you insert a capturing group will depend on what programming language you're using. The above should work for php.
I hope that helps,
--Jonathan

Replace text with regular expressions in a text editor

I need to edit lines in a text file.
The text files contains more than 100 lines of data in the below format.
Cosmos Rh Us (Paperback) $10.99 Shipped:
The Good Earth (Paperback) $6.66 Shipped:
BEST OF D.H. LAWRENCE (Paperback) $7.89 Shipped:
...
These are excerpts from the online book shop I use to buy books
I have this data in a test editor. How do I edit it [Fine/Replace] such that the data becomes like this
$10.99
$6.66
$7.89
or better, without the dollar sign, since it'll be easy total it.
I use notepad++ as text editor.
Search for (don't forget to enable regular expressions in the replace box!)
^.*\$(\d+\.\d+).*$
and replace all with
\1
You could simply match full lines and capture all numbers after the $ sign:
Find what: ^[^$]*\$(\d+\.\d+).*$
Replace with: $1
Make sure that you don't check the ". matches newline" option. And note that this will behave unexpectedly if you have multiple $ signs in a line.
You might need to update to Notepad++ 6. Before that some regex features were not working properly.
Find:
((?<=\$)[\d\.]+)
Replace With:
\1 or $1 (whichever Notepad++ uses)
first regex will be replaced with nothing
[a-zA-Z0-9].*\)
second regex will be replaced with nothing
[a-zA-Z]+\: