REGEX, searching for emails with a certain defined length, prefix & domain? (attempted but failed) - regex

Hi i'm doing some infosec research and searching through text bins.
I'm using a text editor to search files and I'm wanting to search for email addresses with certain conditions. Text is comma-separated.
Say for example i know the email is 20 chars long and I know that the domain is gmail.com, and I also know it starts with t.
[tT](.{9})#gmail.com
If it was correct it should pick up for example: tqwertyuio#gmail.com and tzxcvb1234#gmail.com. Right?
I'm using emEditor which uses Boost Regex engine I think. This regex is just not working as it also returns anything that has that expression in it.
I've tried to use anchors, but they are not working. Perhaps its this engine. I would of thought i would go:
^[tT](.{9})#gmail.com$
But it's not working. Any help?? Thanks SO much i really just want to learn why i cant do this.

I believe you are looking for 20 characters long email which are NOT surrounded by alphabets or numbers. In that case, you can search for:
(?<!\w)[tT](.{9})#gmail.com(?!\w)
where \w is an alphabet or a number, (?<!\w) is negative lookbehind, and (?!\w) is negative lookahead.
In the Find dialog box, you can enter this regular expression, and make sure you select the "Regular Expressions" option.
You might also want to try the Filter toolbar with the same regular expression.

Related

REGEX in MS Word 2016: Exclude a simple String from Search

So I read a lot about Negation in Regex but can't solve my problem in MS Word 2016.
How do I exclude a String, Word, Number(s) from being found?
Example:
<[A-Z]{2}[A-Z0-9]{9;11}> to search a String like XY123BBT22223
But how to exclude for example a specefic one like SEDWS12WW04?
Well it depends on what you need to achieve or is this a matter of curiosity... RegEx is not the same as the built-in Advanced Find with Wildcards; for that you need VBA.
Depending on your need, without using VBA, you could make use of space and return characters - something like this will work for the strings provided: [ ^13][A-Z]{2}[0-9]{1,}[A-Z]{1,}[0-9]{1,}[ ^13] (assuming you use normal carriage returns and spaces in your document)
Anyway, this is a good article on wildcard searches in MS Word: https://wordmvp.com/FAQs/General/UsingWildcards.htm
EDIT:
In light of your further comments you will probably want to look at section 8 of the linked article which explains grouping. For my proposed search you can use this to your advantage by creating 3 groups in your 'find' and only modifying the middle group, if indeed you do intend to modify. Using groups the search would look something like:
([ ^13])([A-Z]{2}[0-9]{1,}[A-Z]{1,}[0-9]{1,})([ ^13])
and the replace might look like this:
\1 SOMETHING \3
Note also: compared to a RegEx solution my suggestion is kinda lame, mainly because compared to RegEx, MS-Words find and replace (good as it is, and really it is) is kinda lame... it's hacky but it might work for you (although you might need to do a few searches).
BUT... if it really is REGEX that you want, well you can get access to this via VBA: How to Use/Enable (RegExp object) Regular Expression using VBA (MACRO) in word
And... then you will be able to use proper RegEx for find and replace, well almost - I'm under the impression that the VBA RegEx still has some quirks...
As already noted by others, this is not possible in Microsoft Word's flavor of regular expressions.
Instead, you should use standard regular expressions. It is actually possible to use standard regular expressions in MS Word if you use a special tool that integrates into Microsoft Word called Multiple Find & Replace (see http://www.translatortools.net/products/transtoolsplus/word-multiplefindreplace). This tool opens as a pane to the right of the document window and works just like the Advanced Find & Replace dialog. However, in addition to Word's existing search functionality, it can use the standard regular expressions syntax to search and replace any text within a Word document.
In your particular case, I would use this:
\b[A-Z]{2}[A-Z0-9]{9,11}\b(?<!\bSEDWS12WW04)
To explain, this searches for a word boundary + ID + word boundary, and then it looks back to make sure that the preceding string does not match [word boundary + excluded ID]. In a similar vein, you can do something like
(?<!\bSEDWS12WW04|\bSEDWS12WW05|\bSEDWS12WW05)
to exlude several IDs.
Multiple Find & Replace is quite powerful: you can add any number of expressions (either using regular expressions or using Word's standard search syntax) to a list and then search the document for all of them, replace everything, display all matches in a list and replace only specific matches, and a few more things.
I created this tool for translators and editors, but it is great for any advanced search/replace operations in Word, and I am sure you will find it very useful.
Best regards, Stanislav

regex to find domain without those instances being part of subdomain.domain

I'm new to regex. I need to find instances of example.com in an .SQL file in Notepad++ without those instances being part of subdomain.example.com(edited)
From this answer, I've tried using ^((?!subdomain))\.example\.com$, but this does not work.
I tested this in Notepad++ and # https://regex101.com/r/kS1nQ4/1 but it doesn't work.
Help appreciated.
Simple
^example\.com$
with g,m,i switches will work for you.
https://regex101.com/r/sJ5fE9/1
If the matching should be done somewhere in the middle of the string you can use negative look behind to check that there is no dot before:
(?<!\.)example\.com
https://regex101.com/r/sJ5fE9/2
Without access to example text, it's a bit hard to guess what you really need, but the regular expression
(^|\s)example\.com\>
will find example.com where it is preceded by nothing or by whitespace, and followed by a word boundary. (You could still get a false match on example.com.pk because the period is a word boundary. Provide better examples in your question if you want better answers.)
If you specifically want to use a lookaround, the neative lookahead you used (as the name implies) specifies what the regex should not match at this point. So (?!subdomain\.)example trivially matches always, because example is not subdomain. -- the negative lookahead can't not be true.
You might be better served by a lookbehind:
(?<!subdomain\.)example\.com
Demo: https://regex101.com/r/kS1nQ4/3
Here's a solution that takes into account the protocols/prefixes,
/^(www\.)?(http:\/\/www\.)?(https:\/\/www\.)?example\.com$/

Regular expression: find abc.com except xyz.abc.com or #abc.com

In Eclipse I want to find a string, and using the normal search results in hundreds of irrelevant results. So I'm trying to use regular expressions, but they don't give me the proper results up til now.
This is what I need: find "abc.com", but not "xyz.abc.com" or "#abc.com". To make it clear, it should return www.abc.com.
I've tried the following regex but I'm not sure if this is how it should be:
[^#xyz\.]abc.com
Using a negative lookbehind should suit your needs:
(?<!xyz[.]|#)abc[.]com
Every "abc.com" that is not preceded by "xyz." nor by "#".

Having trouble creating a regex for a list of zip codes

I need to test whether a list of zip codes in a textarea has only 5-digit zip codes. Under normal circumstances the list would look like this:
56228, 56243, 55324, 55325, 55329, 55355, 55389
I need to find out if there is anything but the above pattern in the textarea. There can be any number of individual zip codes, but I need to make sure there isn't anything else. (I think I'm going to need to be able to highlight illegal matches in the textarea also, but I'll cross that bridge when I get to it).
I started with this regex:
^\d{5},?\s?$+
I'm very new to building regular expressions, but as I understand it, the above should match any set of 5 digits, and commas and whitespace after the five digits may or may not be there.
Online regex testers (I've tried several) aren't finding any matches, whether I have a legitimate list of zip codes or a list with "illegal" characters.
What am I missing here?
This one should suit your needs:
^([, ]*\d{5})+[, ]*$

Find Acronym with Regular Expression Dreamweaver

I have 2000 page website and it contains over 500 acronyms. What Regular expression could I use to find all the acronyms in the text only? I'm using dream-weaver. Some examples would be AFD, GTDC, IJQW and so on.. these are 2 or more capitals might be bounded or surround by other characters. Such example would be (DFT) or l'WQF - any ideas??
If dreamweaver has search via grep capability, you could just search for any string of letters with all capitals, including whatever necessary punctuation you need, e.g. [A-Z'-]{3,}. The 3 is the minumum number of letters in the acronym... you can change that as needed.
This would probably be better done via shell script, though, just for speed's sake. Let us know what OS you're using and someone else can leave a comment as to how to script that, as I probably don't know.