Need help with regex Email in Notepad++ - regex

I have a list with contacts each line, we have to replace the whole line in to single email:
Name, Surname, Address, Email, Phone
=>
Email
I know how to find email, but I need smth like find and replace to "" everything but Email

This worked for me using Notepad++ to remove everything except for the email addresses:
Ctrl + H to bring up Find/Replace dialog box.
Change to the the Replace tab.
Find what: ^.*(\<[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z][A-Za-z][A-Za-z]?[A-Za-z]?\>).*$
Replace with: \1
You need to select [Regular Expression] at the bottom of the Find/Replace dialog box.
Then click [Replace All]

Assuming your email regular expression is well-written and won't match anything that isn't an email...
Find (() characters are significant):
^.*(your email regex here).*$
Replace with:
\1

I don't think you can replace "everything except" any regex in notepad++. I usually use macros for such a problem.
But another method would it be, to import the data into Excel as a CSV, mark the column with the email adresses and copy-paste them to notepad++. That's another trick I usually do.

Related

Regex to find a specific anchor tag that have href with a specific domain and nofollow

I have a string that contains html I want a regex that get me the string that has with a specific domain name and has noFollow
I have found this would will do work on the domain name but does not include nofollow condition
(<a\s*(?!.\brel=)[^>])(href="https?://)((?stackoverflow)[^"]+)"([^>]*)>
let's say the domain name I want is stackoverflow
Example:
- "click here " this would match
- "<a href="stackoverflow.com"> would not match since it has no follow
- "<a href="google.com" rel = "nofollow"> would not match
It's bit hard to match a HTML tag with specific condition, but the following regex should do it:
select regexp_match(str, '<a((?:\s+(([^\/=''"<>\s]+)(=((''[^'']*'')|("[^"]*")|([^\s<>''"=`]+)))?)))* href=((''(https?:\/\/)?stackoverflow\.com[^'']*'')|("(https?:\/\/)?stackoverflow\.com[^"]*"))((?: (([^\/=''"<>\s]+)(=((''[^'']*'')|("[^"]*")|([^\s<>''"=`]+)))?)))*\s+rel=("nofollow"|''nofollow'')((?: (([^\/=''"<>\s]+)(=((''[^'']*'')|("[^"]*")|([^\s<>''"=`]+)))?)))*\/?>') from tes;
It's really hard to read, but basically most of the regex is there for matching attributes. The important thing for you is to find stackoverflow\.com (which can be found 2 times; one for href with single quote and second for double quote) and replace it with whatever domain you need (and don't forget to escape it properly).
Some notes
I don't know which regexp function you want to use, but you should be able to use it with whatever regexp function you need. Another thing is that your example click here won't be matched, because you have spaces between attribute name and = sign (i don't know if this is valid HTML or not). It will work with this click here . If you need to match addresses which might include spaces between = signs just comment me and I'll try to edit the regex.

How to replace a pattern in notepad++

I have a sql procedure code. We are migrating the code on different schema. I need to replace all the dimension tables schema.
Example:
Old schemas: DBO.ABC_DIM, DBO.XYZ_DIM
After replace: MART.ABC_DIM, MART.XYZ_DIM
Could any one let me know how we can do this using regex replace.
Thanks
Sky
You must use:
in the "Find what" field:
(DBO)\.
and in the "Replace with" field:
MART\.
Don't forget to place the cursor at beginning of the file. Otherwise the replacements begin after actually cursor position
EDITED:
So in this case if you have others, you can use that:
Find field:
\b(DBO\.)(.+?)_DIM\b
Replace field:
MART\.$2_DIM
Some like:
DBO.ABC_DIM, DBO.XYZ_DIM,
DBO.ABC_DTL, DBO.ABC_2_BCD
become:
MART.ABC_DIM, MART.XYZ_DIM,
DBO.ABC_DTL, DBO.ABC_2_BCD
LAST EDIT:
The above fail with:
DBO.ABC_DIM, DBO.XYZ_DIM,
DBO.ABC_DTL, DBO.ABC_2_BCD, DBO.ABC_DIM, DBO.XYZ_DIM,
DBO.ABC_DTL, DBO.ABC_2_BCD,
DBO.ABC_DIM, DBO.XYZ_DIM,
Because in the second row match DBO.ABC_DTL, DBO.ABC_2_BCD, DBO.ABC_DIM
And DBO.ABC_DTL become MART.ABC_DTL
So the right solution is:
Find field:
(DBO\.)(.[^\.]+?)_DIM
Replace field:
MART\.$2_DIM
see matching results here: http://refiddle.com/refiddles/596b348175622d74ff020000
if you open that schema in VIM, do press esc and then
:s%/DBO/MART
and press enter
:s (colon and s) for substitute
/DBO find DBO
/MART replace it with MART
once you verify that all the DBOs are replace with MART, you need to save the changes by esc and :wq

extract email address from Notepad++ using regex

I am trying to extract email addresses from notepad++ using RegEx.
I tried like this
Find and Replace
Find: (\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)
Replace : .\1
I am loosing email address instead of text. I need remove all text and keep only email addresses in the file. How to do that?
Abilash Perumandla
hi Gunpreet, kindly share your thoughts to Abi#TEKperfekt.com
Pratap Aneel
15d
Pratap Aneel
please share your thoughts to Pratap.kumar#rsrit.com
naveen kumar
15d
naveen kumar
You need to match and capture the email with a (...) subpattern (so, you do that right), but you need to just match everything else (and that part is missing).
Use
Find what: (\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)|.
Replace with: $1
Then, you might want to use Edit -> Blank Operations -> Remove Unnecessary Blank and EOL menu option.

Notepad++: Multiple line search & multiple line replace

I'm having a battle with a regex. (MOBI creation)
I have two files: one with XML, the other an HTML table of contents.
The important parts of the XML:
<navPoint id="_NeedsHTMLid" playOrder="40">
<navLabel><text>Needs anchor text from link.)</text></navLabel>
...
The HTML TOC, of course, looks like:
schema.org Article Mark-up
======
Hours and hours... worked with Textpad forever. Saw remarks here, now I'm using NotePad++... some of the regex results are different (NOT that I had it working anyway.) #_[\b(\w\b] was returning the ID: now? Not so much!
Does anyone know how to yank both the ID and the anchor text out of these? I'd be so grateful.
You can use this to get the id and the anchor text at the same time:
_(\w+)\b|([a-Z\s.]+[)]+)
#_[\b(\w\b] is not a valid regex. Try _([^"]+)\b.
Edited: try [^"] in place of \w.
If you want to match the ids and the text, go to Search > Find menu (shortcut CTRL+F) and do the following:
Find what:
id="([a-zA-Z0-9\-\:\_\.]+)"|<text>(.+?)<\/text>
Select radio button "Regular Expression"
Then press Find All in Current Document
You can test it with your example at regex101.
Here's a StackOverflow post about valid id names.
I didn't provided you with a Search and Replace solution, since you didn't mentioned anything about a replacement.

Find/Replace regex to remove html tags

Using find and replace, what regex would remove the tags surrounding something like this:
<option value="863">Viticulture and Enology</option>
Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable
I am still trying to learn but I can't get it to work.
I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace
This works for me Notepad++ 5.8.6 (UNICODE)
search : <option value="\d+">(.*?)</option>
replace : $1
Be sure to select "Regular expression" and ". matches newline"
I have done by using following regular expression:
Find this : <.*?>|</.*?>
and
replace with : \r\n (this for new line)
By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:
I have input:
<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>
I need to find values between options like 1,2,3,4,5
and got below output :
This works perfectly for me:
Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.
as found here:
digoCOdigo - strip html tags in notepad++
Something like this would work (as long as you know the format of the HTML won't change):
<option value="(\d+)">(.+)</option>
String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology
(Tested with scala, therefore the res1:)
With sed, you would use a little different syntax:
echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'
For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ.
Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.
However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.