.hgignore a folder except some subfolders - regex

I want to ignore a folder but preserve some of its folders.
I Tried regexp matching like this
syntax: regexp
^site/customer/\b(?!.*/data/.*).*
Unfortunately this doesn't work.
I read in this answer that python only does fixed-width negative lookups.
Is my desired ignoring impossible?

Python regex is cool
Python does support negative lookahead lookups (?=.*foo). But it doesn't support arbitrary-length negative lookbehind lookups (?<=foo.*). It needs to be fixed (?<=foo..).
Which means it's definitely possible to solve your problem.
The problem
You've got the following regex: /customer/(?!.*/data/.*).*.
Let's take an input example /customer/data/name. It matches for a reason.
/customer/data/name
^^^^^^^^^^ -> /customer/ match !
^ (?!.*/data/.*) Let's check if there is no /data/ ahead
The problem is here, we've already matched "/"
so the regex only finds "data/name" instead of "/data/name"
^^^^^^^^^ .* match !
Fixing your regex
Basically we just need to remove that one forward slash, we add an anchor ^ to make sure it's the beginning of string and make sure we just match customer by using \b : ^/customer\b(?!.*/data/).*.
Online demo

Related

Match specific pattern that does not contain other pattern in one expression

I'm looking for a regex to use in nginx location matching, that would match a specified end pattern not being anywhere preceded by a specified other pattern.
Like, I have files:
webgl-0.4.0-alpha.1-gzip-dev/streaming-wasm-gzip-dev.wasm.framework.unityweb
webgl-0.4.0-alpha.1-gzip-dev/streaming-wasm-gzip-dev.data.unityweb
webgl-0.4.0-alpha.1-gzip/streaming-wasm-gzip.wasm.framework.unityweb
webgl-0.4.0-alpha.1-gzip/streaming-wasm-gzip.data.unityweb
I want to match all \.unityweb except those that are anywhere preceded by dev. Basically, I need to match last two lines. I cannot hardcode it, as the files/directories might be named arbitrary.
The usual ((?!dev\/).)*$ doesn't suffice, because it still gets the ends. (?<!dev) also cannot be added anwyhere as it will only match directly before.
I am out of clues and also out of regex fu!
The solution does not have to be strictly regex, might be nginx based too.
It might have been asked before, but I cannot seem to know the correct keywords to find it.
Try
^(?!.*?dev\/.*).+\.unityweb$
See the demo here
Description:
^ From the start of the line
(?! _______ ) Negative Lookahead
.*?dev\/ Match any character any amount of times, until you reach dev followed by a slash
.* Match any characters any amount of times
Negative lookahead closes
.+ Match any character, more than once
\.unityweb - until you reach .unityweb
$ End of the line
Use the full match for what you need
EDIT
Just realised that you also state a contradiction in your question, as you say you don't want to match anything preceded by dev/ but you also want to match the first two examples you gave.
That can be done by changing the negative lookahead to a positive lookahead:
^(?=.*?dev\/.*).+\.unityweb$
See the demo here
You can use this
^(?!.*dev.*\.unityweb)(?=.*\.unityweb).*$
Demo

Regular expression to exclude tag groups or match only (.*) in between tags

I am struggling with this regex for a while now.
I need to match the text which is in between the <ns3:OutputData> data</ns3:OutputData>.
Note: after nscould be 1 or 2 digits
Note: the data is in one line just as in the example
Note: the ... preceding and ending is just to mention there are more tags nested
My regex so far: (ns\d\d?:OutputData>)\b(.*)(\/\1)
Sample text:
...<ns3:OutputData>foo bar</ns3:OutputData>...
I have tried (?:(ns\d\d?:OutputData>)\b)(.*)(?:(\/\1)) in an attempt to exclude group 1 and 3.
I wan't to exclude the tags which are matched, as in the images:
start
end
Any help is much appreciated.
EDIT
There might be some regex interpretation issue with Grep Console for IntelliJ which I intend to use the regex.
Here is is the latest image with the best match so far...
Your regex is almost there. All you need to do is to make the inside-matcher non-greedy. I.e. instead of (.*) you can write (.*?).
Another, xml-specific alternative is the negated character-class: ([^<]*).
So, this is the regex: (ns\d\d?:OutputData>)\b(.*?)(\/\1) You can experiment with it here.
Update
To make sure that the only group is the one that matches the text, then you have to make it work without backreferences: (?:ns\d\d?:OutputData>)\b(.*?)<
Update 2
It's possible to match only the required parts, using lookbehind. Check the regex here.:
(?<=ns\d:OutputData>)\b([^<]*)|(?<=ns\d\d:OutputData>)\b([^<]*)
Explanation:
The two alternatives are almost identical. The only difference is the number of digits. This is important because some flavors support only fixed-length lookbehinds.
Checking alternative one, we put the starting tag into one lookbehind (?<=...) so it won't be included into the full match.
Then we match every non- lt symbol greedily: [^<]*. This will stop atching at the first closing tag.
Essentially, you need a look behind and a look ahead with a back reference to match just the content, but variable length look behinds are not allowed. Fortunately, you have only 2 variations, so an alternation deals with that:
(?<=<(ns\d:OutputData>)).*?(?=<\/\1)|(?<=<(ns\d\d:OutputData>)).*?(?=<\/\2)
The entire match is the target content between the tags, which may contain anything (including left angle brackets etc).
Note also the reluctant quantifier .*?, so the match stops at the next matching end tag, rather than greedy .* that would match all the way to the last matching end tag.
See live demo.
This was the answer in my case:
(?<=(ns\d:OutputData)>)(.*?)(?=<\/\1)
The answer is based on #WiktorStribiżew 3 given solutions (in comments).
The last one worked and I have made a slight modification of it.
Thanks all for the effort and especially #WiktorStribiżew!
EDIT
Ok, yes #Bohemian it does not match 2-digits, I forgot to update:
(?<=(ns\d{0,2}:OutputData)>)(.*?)(?=<\/\1)

Regex to match any .config file with a few exceptions

I'm trying to get a regex working to use in an .hgignore file that will ignore various copies of .config files made during debugging.
The regex should match any path ending in .config as long as the path does not start with _config, config, or packages and as long as the file name (the characters immediately following the last slash) is not app, web, packages, or repositories (or web.release, web.debug).
The closest I seem to get is
^(?!(_config|[Cc]onfig|packages)).*\/(?!([Aa]pp|[Ww]eb|packages|repositories)\.).*config$
This will properly ignore Data/app.config, and seems to work with all other cases, but it will incorrectly match Libraries/Data/app.config. When I check this out at http://regex101.com/ it shows me that the .*\/ group is only matching through Libraries/, not Libraries/Data/ as I expected.
I tried changing it to
^(?!(_config|[Cc]onfig|packages))(.*\/)*(?!([Aa]pp|[Ww]eb|packages|repositories)\.).*config$
But then the group (.*\/)* seems to match the whole path for any .config file.
If I change the last negative lookahead to a matching group like so
^(?!(_config|[Cc]onfig|packages))(.*\/)(([Aa]pp|[Ww]eb|packages|repositories)\.).*config$
Then the (.*\/) matches Libraries/Data/, which is what I want and expected, but it appears the negative lookahead changes the matching behavior of (.*\/).
I'm not sure where to go from here? The conditions I'm trying to match or not match don't seem that complicated, but I'm not the most experienced with regexes. Maybe there is a simpler way to achieve the same thing in .hgignore?
These are examples of paths that should match and be ignored:
Web/smtp.config
Libraries/Data/connectionStrings.config
These are examples of paths that should NOT match and not be ignored
_config/staging/smtp.config
Web/web.config
Web/web.release.config
Web/Views/web.config
Libraries/Data/app.config
Libraries/Data/packages.config
Data/app.config
packages/MiniProfiler.EF6.3.0.11/lib/net40/MiniProfiler.EntityFramework6.dll.config
packages/repositories.config
You were really close. Try this regex on regex101:
^(?!_?config|packages).*\/(?!(app|web|packages|repositories)\.)[^\/]*config$
I simplified it a little, but the main change was to specify no slashes in the match before the "config".
Note: I used a case-insensitive flag to simplify the regex itself.

regex negative lookbehind - pcre

I'm trying to write a rule to match on a top level domain followed by five digits. My problem arises because my existing pcre is matching on what I have described but much later in the URL then when I want it to. I want it to match on the first occurence of a TLD, not anywhere else. The easy way to check for this is to match on the TLD when it has not bee preceeded at some point by the "/" character. I tried using negative-lookbehind but that doesn't work because that only looks back one single character.
e.g.: How it is currently working
domain.net/stuff/stuff=www.google.com/12345
matches .com/12345 even though I do not want this match because it is not the first TLD in the URL
e.g.: How I want it to work
domain.net/12345/stuff=www.google.com/12345
matches on .net/12345 and ignores the later match on .com/12345
My current expression
(\.[a-z]{2,4})/\d{5}
EDIT: rewrote it so perhaps the problem is clearer in case anyone in the future has this same issue.
You're pretty close :)
You just need to be sure that before matching what you're looking for (i.e: (\.[a-z]{2,4})/\d{5}), you haven't met any / since the beginning of the line.
I would suggest you to simply preppend ^[^\/]*\. before your current regex.
Thus, the resulting regex would be:
^[^\/]*\.([a-z]{2,4})/\d{5}
How does it work?
^ asserts that this is the beginning of the tested String
[^\/]* accepts any sequence of characters that doesn't contain /
\.([a-z]{2,4})/\d{5} is the pattern you want to match (a . followed by 2 to 4 lowercase characters, then a / and at least 5 digits).
Here is a permalink to a working example on regex101.
Cheers!
You can use this regex:
'|^(\w+://)?([\w-]+\.)+\w+/\d{5}|'
Online Demo: http://regex101.com/

Adding "/index.html" to paths in Vim

I'm trying to append "/index.html" to some folder paths in a list like this:
path/one/
/another/index.html
other/file/index.html
path/number/two
this/is/the/third/path/
path/five
sixth/path/goes/here/
Obviously the text only needs to be added where it does not exist yet. I could achieve some good results with (vim command):
:%s/^\([^.]*\)$/\1\/index.html/
The only problem is that after running this command, some lines like the 1st, 5th and 7th in the previous example end up with duplicated slashes. That's easy to solve too, all I have to do is search for duplicates and replace with a single slashes.
But the question is:
Isn't there a better way to achieve the correct result at once?
I'm a Vim beginner, and not a regex master also. Any tips are really appreciated!
Thanks!
So very close :)
Just add an optional slash to the end of the regex:
\/\?
Then you need to change the rest of the pattern to a non-greedy match so that it ignores a trailing slash. The syntax for a non-greedy match in vim (replacing the *) is:
\{-}
So we end up with:
:%s/^\([^\.]\{-}\)\/\?$/\1\/index.html/
(Doesn't hurt to be safe and escape the period.)
Vim's regex supports the ability to match a bit of text foo if it does or doesn't precedes or follows some other text bar without matching bar, and this is exactly the sort of thing you're looking for. Here you want to match the end of line with an optional /, but only if the / isn't followed by index.html, and then replace it with /index.html. A quick look at Vim's help tells me \#<! is exactly what to use. It tells Vim that the preceding atom must be in the text but not in what's matched. With a little experimentation, I get
:%s;/\?\(index\.html\)\#<!$;/index.html;
I use ; to delimit the parts of the :s command so that I don't have to escape any / in the regex or replacement expression. In this particular situation, it's not a big deal though.
The / is optional, and we say so with \?.
We need to group index.html together because otherwise our special \#<! would only affect the l otherwise.