Regex to match any .config file with a few exceptions - regex

I'm trying to get a regex working to use in an .hgignore file that will ignore various copies of .config files made during debugging.
The regex should match any path ending in .config as long as the path does not start with _config, config, or packages and as long as the file name (the characters immediately following the last slash) is not app, web, packages, or repositories (or web.release, web.debug).
The closest I seem to get is
^(?!(_config|[Cc]onfig|packages)).*\/(?!([Aa]pp|[Ww]eb|packages|repositories)\.).*config$
This will properly ignore Data/app.config, and seems to work with all other cases, but it will incorrectly match Libraries/Data/app.config. When I check this out at http://regex101.com/ it shows me that the .*\/ group is only matching through Libraries/, not Libraries/Data/ as I expected.
I tried changing it to
^(?!(_config|[Cc]onfig|packages))(.*\/)*(?!([Aa]pp|[Ww]eb|packages|repositories)\.).*config$
But then the group (.*\/)* seems to match the whole path for any .config file.
If I change the last negative lookahead to a matching group like so
^(?!(_config|[Cc]onfig|packages))(.*\/)(([Aa]pp|[Ww]eb|packages|repositories)\.).*config$
Then the (.*\/) matches Libraries/Data/, which is what I want and expected, but it appears the negative lookahead changes the matching behavior of (.*\/).
I'm not sure where to go from here? The conditions I'm trying to match or not match don't seem that complicated, but I'm not the most experienced with regexes. Maybe there is a simpler way to achieve the same thing in .hgignore?
These are examples of paths that should match and be ignored:
Web/smtp.config
Libraries/Data/connectionStrings.config
These are examples of paths that should NOT match and not be ignored
_config/staging/smtp.config
Web/web.config
Web/web.release.config
Web/Views/web.config
Libraries/Data/app.config
Libraries/Data/packages.config
Data/app.config
packages/MiniProfiler.EF6.3.0.11/lib/net40/MiniProfiler.EntityFramework6.dll.config
packages/repositories.config

You were really close. Try this regex on regex101:
^(?!_?config|packages).*\/(?!(app|web|packages|repositories)\.)[^\/]*config$
I simplified it a little, but the main change was to specify no slashes in the match before the "config".
Note: I used a case-insensitive flag to simplify the regex itself.

Related

vscode Find and replace across files, backreference is repeated the if more than one match is on the same line

I am searching across files in vscode using the following regex expression
(?<=[a-zA-Z])_([a-zA-Z]) with $1
to replace Some_Random_Text with SomeRandomText
but vs code repeats the backreference value if they are on the same line as shown below S is repeated:
That sure looks like a bug to me. It does not happen in the Find in the current file widget. You should file an issue on this - probably due to the Search across files handling the lookbehind (which should be no problem since it is fixed-width).
In the meantime, you can easily remove the lookbehind part and use:
find: ([a-zA-Z])_([a-zA-Z])
replace: $1$2
which works as it should.

How to exclude list of folders from Mercurial/TortoiseHG's .hgignore file?

Ok. I need to ignore a list of files from the version control, except for files in three certain folders (let's call them Folder1, Folder2 and Folder3). I can list all folders I need to ignore as a plain list, but I consider this as not an elegant way, so I wrote the following regex:
.*/(Bin|bin)/(?!Folder1/|Folder2/|Folder3/).*
My thoughts were as follows, from left to right:
.* - Any number of any characters.
/ - Slash symbol, which separates folders from one another.
(Bin|bin) - Folder with "Bin" or "bin" name.
/ - Slash symbol, which separates folders from one another.
(?!Folder1/|Folder2/|Folder3/) - Folder name is not "Folder1/" and is not "Folder2/" and is not "Folder3/". This part was the most complicated, but I googled it somehow. I don't understand why should it work, but it works during the tests.
.* - Any number of any characters.
This expression works perfectly when I test it at regex101.com with a couple of text strings, representing paths to files, but nothing works when I put it in my .hgignore file, as follows:
syntax: regexp
.*/(Bin|bin)/(?!Folder1/|Folder2/|Folder3/).*
For some reason it ignores all files and sub-folders in all "Bin" and "bin" folders. How can I accomplish my task?
P.S. As soon as I know, Mercurial/TortoiseHG uses Python/Perl regular expressions.
Many thanks in advance.
To adjust the question a bit to make it clearer (at least to me), we have any number of /bin/somename/... and .../bin/anothername/... names that should be ignored, along with three sets of .../bin/folder1/..., .../bin/2folder/..., and .../Bin/third/... set of names that should not be ignored.
Hence, we want a regular expression that (without anchoring) will match the names-to-be-ignored but not the ones-to-be-kept. (Furthermore, glob matching won't work, since it's not as powerful: we'll either match too little or too much, and Mercurial lacks the "override with later un-ignore" feature of Git.)
The shortest regular expression for this should be:
/[Bb]in/(?!(folder1|2folder|third)/)
(The part of this regex that actually matches a string like /bin/somename/... is only the /bin/ part, but Mercurial does not look at what matched, only whether something matched.)
The thing is, your example regular expression should work, it's just a longer variant of this same thing with not-required but harmless (except for performance) .* added at the front and back. So if yours isn't working, the above probably won't work either. A sample repository, with some dummy files, that one could clone and experiment with, would help diagnose the issue.
Original (wrong) answer (to something that's not the question)
The shortest regular expression for the desired case is:
/[Bb]in/Folder[123]/
However, if the directory / folder names do not actually meet this kind of pattern, we need:
/[Bb]in/(somedir|another|third)/
Explanation
First, a side note: the default syntax is regexp, so the initial syntax: regexp line is unnecessary. As a result, it's possible that your .hgignore file is not in proper UTF-8 format: see Mercurial gives "invalid pattern" error for simple GLOB syntax. (But that would produce different behavior, so that's probably a problem. It's just worth mentioning in any answer about .hgignore files malfunctioning.)
Next, it's worth noting a few items:
Mercurial tracks only files, not directories / folders. So the real question is whether any given file name matches the pattern(s) listed in .hgignore. If they do match, and the file is currently untracked, the file will not be automatically added with a sweeping "add everything" operation, and Mercurial will not gripe that the file is untracked.
If some file is already tracked, the fact that its name matches an ignore pattern is irrelevant. If the file a/b/c.ext is not tracked and does match a pattern, hg add a/b/c.ext will add it anyway, while hg add a/b will en-masse add everything in a/b but won't add c.ext because it matches the pattern. So it's important to know whether the file is already tracked, and consider what you explicitly list to hg add. See also How to check which files are being ignored because of .hgignore?, for instance.
Glob patterns are much easier to write correctly than regular expressions. Unless you're doing this for learning or teaching purposes, or glob is just not powerful enough, stick with the glob patterns. (In very old versions of Mercurial, glob matching was noticeably slower than regexp matching, but that's been fixed for a long time.)
Mercurial's regexp ignore entries are not automatically anchored: if you want anchored behavior, use ^ at the front, and $ at the end, as desired. Here, you don't want anchored behavior, so you can eliminate the leading and trailing .*. (Mercurial refers to this as rooted rather than anchored, and it's important to note that some patterns are anchored, but .hgignore ones are not.)
Python/Perl regexp (?!...) syntax is the negation syntax: (?!...) matches if the parenthesized expression doesn't match the string. This is part of the problem.
We need not worry about capturing groups (see capturing group in regex) as Mercurial does nothing with the groups that come out of the regular expression. It only cares if we match.
Path names are really slash-separated components. The leading components are the various directories (folders) above the file name, and the final component is the file name. (That is, try not to think of the first parts as folders: it's not that it's wrong, it's that it's less general than "components", since the last part is also a component.)
What we want, in this case, is to match, and therefore "ignore", names that have one component that matches either bin or Bin followed immediately by another component that matches Folder1, Folder2, or Folder3 that is followed by a component-separator (so that we haven't stopped at /bin/Folder1, for instance, which is a file named Folder1 in directory /bin).
The strings bin and Bin both end with a common trailing part of in, so this is recognizable as (B|b)in, but single-character alternation is more easily expressed as a character class: [Bb], which eliminates the need for parentheses and vertical-bars.
The same holds for the names Folder1, Folder2, and Folder3, except that their common string leads rather than trails, so we can use Folder[123].
Suppose we had anchored matches. That is, suppose Mercurial demanded that we match the whole file name, which might be, say, /foo/hello/bin/Folder2/bar/world.ext. Then we'd need .*/[Bb]in/Folder[123]/.*, because we'd need to match any number of characters to skip over /foo/hello before matching /bin/Folder2/, and again skip over any number of characters to match bar/world.ext, in order to match the whole string. But since we don't have anchored matches, we'll find the pattern /bin/Folder2/ within the whole string, and hence ignore this file, using the simpler pattern without the leading and trailing .*.

Match anything.c but not anything.in.c

I'm trying to write a regex that matches a.c, hello.c, etc.c, but not a.in.c, hello.in.c, etc.in.c.
Here's my regex: https://regex101.com/r/jC8nB0/21
It says that the negative lookahead won't match what I specified, that is, .in.c. I didn't know where to teach it to match .c. I tried both inside the parenthesis and outside.
I've read Regex: match everything but specific pattern but it talks about matching everything except something, but I need to match a rule except other rule.
This worked for me.
.*(?<!(\.in))\.c
https://www.regular-expressions.info/lookaround.html
*Edited do to good information from zzxyz
This is actually a bit complicated given unknown input. The following isn't perfect, but it avoids .cpp files, and deals with strings that don't contain filenames, or longer strings that do.
\b\S+(?<!\.in)\.c\b
https://regex101.com/r/jC8nB0/286

.hgignore a folder except some subfolders

I want to ignore a folder but preserve some of its folders.
I Tried regexp matching like this
syntax: regexp
^site/customer/\b(?!.*/data/.*).*
Unfortunately this doesn't work.
I read in this answer that python only does fixed-width negative lookups.
Is my desired ignoring impossible?
Python regex is cool
Python does support negative lookahead lookups (?=.*foo). But it doesn't support arbitrary-length negative lookbehind lookups (?<=foo.*). It needs to be fixed (?<=foo..).
Which means it's definitely possible to solve your problem.
The problem
You've got the following regex: /customer/(?!.*/data/.*).*.
Let's take an input example /customer/data/name. It matches for a reason.
/customer/data/name
^^^^^^^^^^ -> /customer/ match !
^ (?!.*/data/.*) Let's check if there is no /data/ ahead
The problem is here, we've already matched "/"
so the regex only finds "data/name" instead of "/data/name"
^^^^^^^^^ .* match !
Fixing your regex
Basically we just need to remove that one forward slash, we add an anchor ^ to make sure it's the beginning of string and make sure we just match customer by using \b : ^/customer\b(?!.*/data/).*.
Online demo

Mercurial/.hgignore - How do I ignore everything but the contents of a folder?

I have a NetBeans project and the Mercurial repository is in the project root. I would like it to ignore everything except the contents of the "src" and "test" folders, and .hgignore itself.
I'm not familiar with regular expressions and can't come up with one that will do that.
The ones I tried:
(?!src/.*)
(?!test/.*)
(?!^.hgignore)
(?!src/.|test/.|.hgignore)
These seem to ignore everything, I can't figure out why.
Any advice would be great.
This seems to work:
syntax: regexp
^(?!src|test|\.hgignore)
This is basically your last attempt, but:
It's rooted at the beginning of the string with ^, and
It doesn't require a trailing slash for the directory names.
The second point is important since, as the manual says:
For example, say we have an untracked file, file.c, at a/b/file.c inside our repository. Mercurial will ignore file.c if any pattern in .hgignore matches a/b/file.c, a/b or a.
So your pattern must not match src.
^(?!src\b|test\b|\.hgignore$).*$
should work. It matches any string that does not start with the word src or test, or consists entirely of .hgignore.
It uses a word boundary anchor \b to ensure that files like testtube.txt or srcontrol.txt aren't accidentally matched. However, it will "misfire" on files like src.txt where there is a word boundary other than before the slash.