exclude an asset regex - regex-greedy

I need to write a regex that will capture everything in a directory except one asset.
Ex.
I want to exclude /test1/test2/new.jpg and capture everything else in /test1/test2/
I tried the negative lookahead but doesn't seem to work.
/test1/test2/^(?!new.jpg).*

You don't need ^ in between:
/test1/test2/(?!new.jpg).*
See this: https://regex101.com/r/uHgBZc/1

Related

Regex expression for matching folder content

May be an obvious Regex, but I need to create an expression for filtering the following folders and files
So the regex should match to get the content of the folder
SUNSOLAR_Demo_0/My Documents/CUSTOMER/sale_docs
and exclude the rest.
Eg.:
Include
SUNSOLAR_Demo_0/My Documents/CUSTOMER/sale_docs/invoices/emails/sample.pdf
Exclude:
SUNSOLAR_Demo_0/My Documents/CUSTOMER/unseal/mytest.doc
SUNSOLAR_Demo_0/My Documents/PROVIDERS/orders/invoices.xls
I was trying to do something like, but not luck
"^SUNSOLAR_Demo_0/My Documents/(?!CUSTOMER/sale_docs).*$"
Thanks
Your pattern ^SUNSOLAR_Demo_0/My Documents/(?!CUSTOMER/sale_docs).*$ matches until Documents/ and then asserts that what is directly to the right is not CUSTOMER/sale_docs
But CUSTOMER/sale_docs is actually part of the string that you want to match.
There is no need for a lookaround here.
You can match the whole part of the folder that you want to match, followed by an optional part that starts with / and the rest of the line.
^SUNSOLAR_Demo_0/My Documents/CUSTOMER/sale_docs(?:/.*)?$
Regex demo

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

Regex to match all urls, excluding .css, .js recources

I'm looking for a regular expression to exclude the URLs from an extension I don't like.
For example resources ending with: .css, .js, .font, .png, .jpg etc. should be excluded.
However, I can put all resources to the same folder and try to exclude URLs to this folder, like:
.*\/(?!content\/media)\/.*
But that doesn't work! How can I improve this regex to match my criteria?
e.g.
Match:
http://www.myapp.com/xyzOranotherContextRoot/rest/user/get/123?some=par#/other
No match:
http://www.myapp.com/xyzOranotherContextRoot/content/media/css/main.css?7892843
The correct solution is:
^((?!\/content\/media\/).)*$
see: https://regex101.com/r/bD0iD9/4
Inspirit by Regular expression to match a line that doesn't contain a word?
Two things:
First, the ?! negative lookahead doesn't remove any characters from the input. Add [^\/]+ before the trailing slash. Right now it is trying to match two consecutive slashes. For example:
.*\/(?!content\/media)[^\/]+\/.*
(edit) Second, the .*s at the beginning and end match too much. Try tightening those up, or adding more detail to content\/media. As it stands, content/media can be swallowed by one of the .*s and never be checked against the lookahead.
Suggestions:
Use your original idea - test against the extensions: ^.*\.(?!css|js|font|png|jpeg)[a-z0-9]+$ (with case insensitive).
Instead of using the regular expression to do this, use a regex that will pull any URL (e.g., https?:\/\/\S\+, perhaps?) and then test each one you find with String.indexOf: if(candidateURL.indexOf('content/media')==-1) { /*do something with the OK URL */ }

Delphi XE2 Regex: Quantifier does not work inside positive lookbehind?

I have a complete HTML document string from a web page containing this BASE tag:
<BASE href="http://whatreallyhappened.com/">
In Delphi XE2, I use this regular expression with the whole HTML document as subject to get the URL from the BASE tag between the double quotes:
BaseURL := TRegEx.Match(HTMLDocStr, '(?<=<base(\s)href=").*(?=")', [roIgnoreCase]).Value;
This works, but only if there is only ONE space character in the subject between BASE and href.
I tried to add a quantifier to the space part in the regex (\s), but it did not work.
So how can I make this regex match the URL even if there are several spaces between BASE and href?
You're making this far too complicated by using lookaround. If you want to extract only part of the regex match, simply add a capturing group. Then you can use the text matched by the capturing group instead of the overall match. In most cases you'll also get much better performance this way.
To find the base tag in a file and extract its URL you can use the regex <base[^>]+href=["']([^"']*)["']. Call TRegex.Match() to get a TMatch. This has a Groups property that you can use to retrieve group 1 if a match was found.
With lookaround
You can use different ways to try using quantifiers like these:
(?<=<BASE)\s+href=".*(?=")
(?<=<BASE)\s{0,30}href=".*(?=")
Working demo
Without lookaround
By the way, if you want just to get the content within href there is no need of lookaround you just can use:
<BASE\s+href="(.*?)"
Working demo
EDIT: after reading your comments I figured out a workaround (ugly but could work). You can try using something like this:
((?<=<BASE\shref=")|(?<=<BASE\s\shref=")|(?<=<BASE\s\s\shref=")).*(?=")
^---notice \s ^---notice \s\s ^---notice \s\s\s
I know that this is horrible, but if none of above work you can try with that.

regular expression to parse short urls

I've a list of possible urls on my site like
1 http://dev.site.com/People/
2 http://dev.site.com/People
3 http://dev.site.com/Groups/
4 http://dev.site.com/Groups
5 http://dev.site.com/
6 http://dev.site.com/[extraword]
I want to be able to match all the urls like 6 and redirect them to
http://dev.site.com/?Shorturl=extraword
but I don't want to redirect the first 5 urls
I tried something like
((.*)(?!People|Groups))\r
but something is wrong.
any help?
thanks
You should put the check that it isn't People or Groups at the start:
(?!People|Groups)(.*)
At the moment you're checking that the regular expression isn't followed by People or Groups.
Depending on which language/framework you're using, you might also need to use ^ and $ to make sure you're matching the whole string:
^(?!People|Groups)(.*)$
You should also think about whether you want to match urls that begin with People, eg. http://dev.site.com/People2/. So this might be better:
^(?!(?:People|Groups)(?:/|$))(.*)$
It checks that a negative match for People or Groups is followed by the end of the url or a slash.
You might want to make sure you don't match an empty string, so use .+ instead of .*:
^(?!(?:People|Groups)(?:/|$))(.+)$
And if you want a word without any slashes:
^(?!(?:People|Groups)(?:/|$))([^/]+)$
In your regex, the (.*) subpattern consumes the entire string, which then causes the negative lookahead to succeed.
You need a negative lookahead to exclude People|Groups, and then you need to capture the extra word (and the word needs to have some stuff in it, otherwise we want the match to fail). The crucial piece here is that the negative lookahead does not consume any of the string, so you are able to capture the extra word for subsequent use in the redirect URL you are trying to build.
Here's a solution in Perl, but the approach should work for you in C#:
use warnings;
use strict;
while (<DATA>){
print "URL=$1 EXTRA_WORD=$2\n"
if /^(.*)\/(?!People|Groups)(\w+)\/?$/;
}
__DATA__
http://dev.site.com/People/
http://dev.site.com/People
http://dev.site.com/Groups/
http://dev.site.com/Groups
http://dev.site.com/
http://dev.site.com/extraword1
http://dev.site.com/extraword2/
Output:
URL=http://dev.site.com EXTRA_WORD=extraword1
URL=http://dev.site.com EXTRA_WORD=extraword2