Notepad++ replace with reg expression? - regex

I have a big list with links and other date in it. I want to filter out all the data and have a list with just the links.
Example of the current list:
32,2012-01-04 06:44:44,http://link.com/link
33,2012-01-04 06:44:45,http://link.com/link,{Text|textext|text},http://link.com/link|http://link.com/link|http://link.com/link

Notepad++ offers find replace functionality using RegEx. You can access this feature by using Ctrl+H.
If you're actually asking for a regular expression to do this, you can use something like this to match URLs:
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))
which I found here.
Additionally you can test out changes to your regex easily at http://gskinner.com/RegExr/

Using the input you provided, here's a pattern you can use on http://www.regexr.com/
You'll need to make sure the global (/g) flag is on
Expression:
.*?(http.*?)[,|\n]
Input:
32,2012-01-04 06:44:44,http://link.com/link1
33,2012-01-04 06:44:45,http://link.com/link2,{Text|textext|text},http://link.com/link3|http://link.com/link4|http://link.com/link5
Substitution:
$1\n
Output:
http://link.com/link1
http://link.com/link2
http://link.com/link3
http://link.com/link4
http://link.com/link5

Related

How to extract the version info using regex in Groovy

I have a string as <tr><td>Version:</td><td>6.3.13.2</td></tr> I want to extract 6.3.13.2 from this. How can I do so in Groovy regex, please help.
For the example presented, you can use:
/[\d.]+/
Regex Demo and Explanation
For the example provided you could use a simple regular expressions like this
/[\d.]+/
It would be better if you use
(?:<td>)([\d.]+)(?:<\/td>)
and take this capture group and replace it what you want.
Learn about regular expressions here
REGEX Library, tester, documentation, cheat sheet

Notepad++ Wildcard Find/Replace

I'm using Notepad++ and need to update a file where there are various differences in earlier sections of the string of text and think Wildcards may help here. From the research I've done thus far, it isn't clear what syntax would be used for this.
Here's an example of the original string:
"EEID","SUPLIFE","Voluntary Life Insurance","500000.00","500000.00",0,276,10.62.0,0,0,"20151112","","A","","","","",""
I'd like to find a way to add wildcards in the places noted below as WILDCARD:
"EEID","SUPLIFE","Voluntary Life Insurance","WILDCARD","WILDCARD",WILDCARD,WILDCARD,WILDCARD,WILDCARD,WILDCARD,WILDCARD,"20151112","","A","","","","",""
The final output would then look like the following after the find/replace with wildcards to add VLIFE:
"EEID","SUPLIFE","Voluntary Life Insurance","500000.00","500000.00",0,276,10.62.0,0,0,"20151112","","A","VLIFE","","","",""
Thanks,
Brandon
Tested in Notepad++ and appears to work:
("EEID","SUPLIFE","Voluntary Life Insurance",([^,]+,){8}"","A",)("")(.*)
and replace pattern:
\1"VLIFE"\4
Regex101 example

trying to Exclude Strings from my Regex Search

I'm using the following expression to filter Oracle Java vulnerabilities from a list. This works just fine:
^(?!.*Oracle Java.*).*$
I'm having a tough time adding another string to exclude.
I got the expression from an earlier question here:
Regular Expression to exclude set of Keywords
I've tried all the examples from this link but the answer Tim gave was the only one that worked for me.
Does anyone know how I could add another string to this?
^(?!.*Oracle Java.*).*$
You can use regex alternation inside the lookahead:
^(?!.*(Oracle Java|excluded1|excluded2).*).*$

Regular expression to extract part of a file path using the logstash grok filter

I am new to regular expressions but I think people here may give me valuable inputs. I am using the logstash grok filter in which I can supply only regular expressions.
I have a string like this
/app/webpf04/sns882A/snsdomain/logs/access.log
I want to use a regular expression to get the sns882A part from the string, which is the substring after the third "/", how can I do that?
I am restricted to regex as grok only accepts regex. Is it possible to use regex for this?
Yes you can use regular expression to get what you want via grok:
/[^/]+/[^/]+/(?<field1>[^/]+)/
for your regex:
/\w*\/\w*\/(\w*)\/
You can also test with:
http://www.regextester.com/
By googling regex tester, you can have different UI.
If you are indeed using Perl then you should use the File::Spec module like this
use strict;
use warnings;
use File::Spec;
my $path = '/app/webpf04/sns882A/snsdomain/logs/access.log';
my #path = File::Spec->splitdir($path);
print $path[3], "\n";
output
sns882A
This is how I would do it in Perl:
my ($name) = ($fullname =~ m{^(?:/.*?){2}/(.*?)/});
EDIT:
If your framework does not support Perl-ish non-grouping groups (?:xyz), this regex should work instead:
^/.*?/.*?/(.*?)/
If you are concerned about performance of .*?, this works as well:
^/[^/]+/[^/]+/([^/]+)/
One more note: All of regexes above will match string /app/webpf04/sns882A/.
But matching string is completely different from first matching group, which is sns882A in all three cases.
Same answer but a small bug fix. If you doesnt specify ^ in starting,it will go for the next match(try longer paths adding more / for input.). To fix it just add ^ in the starting like this. ^ means starting of the input line. finally group1 is your answer.
^/[^/]+/[^/]+/([^/]+)/
If you are using any URI paths use below.(it will handle path aswell as URI).
^.*?/[^/]+/[^/]+/([^/]+)/

Notepad++ I'm looking for a regexp to select all occurances of 'href="' that do not match 'href="javascript'

This is about the code editor Notepad++.
I'm looking for a regular expression that will solve the following problem:
I have a set of html files. I need to find all links in them that are not links to javascript functions. If I search for the string 'href="' I get 342 results and if I search for 'href="javascript' I get 301 results. I'd like to get at the 41 elements that are only in the first set. That is all links that are not to javascript function calls.
I'd be grateful if anyone more familiar with regular expressions than I currently am could help me out on this one.
This will match urls that don't start with "j", which probably will work for you.
href="[^j]
I don't know what type of RegExp engine is used in Notepad++ but the extended regular expression would look like:
href="(?:(?!javascript).)
PowerGrep w/ RegexBuddy - I use notepad++ and PowerGrep