Need regular expression that avoids substring - regex

I would like a regular expression to match an image format from a string(an url), but avoiding a concrete domain or directory.
For example:
"myImages/small/myImage.png"
"myImages/xxxx/myImage.png"
"myImages/large/myImage.png"
I would like a regexp to match any but not the 'large' one...
Many thanks in advance!

You want a negative lookahead assertion:
myImages\/(?!large\/).+\.(?:png|jpg|gif|jpeg|svg)$
The above will match any path that ends with one of those file extensions, but that does not have the text "large/" following "myImages/".
It's not very clear what your needs are, what output you want and what you can and cannot anchor against. If you edit your question to be more clear, you can get more-targeted information.

Related

Regex: Find value between two sections

I'm trying to solve the following problem:
I want to get the adress-data between the values: Kunde: XXXXX and Artikel:
I want to get it within the newlines for there I can use it exact like shown.
enter image description here
Can you give me a hint, how to use the right Regex?
Many thanks in advance.
(?s)(?<=Kunde:).*?(?=Artikel:)
From the image you sent this is the correct way of doing it.
(?s) may not work in your REGEX flavor. It's called "dot-all" (see how to turn on DOTALL in various languages).
I used lookbehind (?<=) and look ahead (?=) so that "Kunde:" and "Artikel:" is not included in the match.

RegEx to match and select specific URLs

I’m on a website with these URLs;
https://flyheight.com/videos/ybb347
https://flyheight.com/videos/yb24os
https://flyheight.com/public/images/videos/793f77362f321e62c32659c3ab00952d.png
https://flyheight.com/videos/5o6t98/#disqus_thread
I need a RegEx that will only select these URLs instead
https://flyheight.com/videos/yb24os
https://flyheight.com/videos/ybb347
This is what I got so far ^(?!images$).*(flyheight.com/videos/).*
PCRE: ^https?:\/\/flyheight\.com\/videos\/[a-z0-9]{6}$
https://regex101.com/r/vM31MK/1
May be this will also work for your language:
^https?://flyheight\.com/videos/[a-z0-9]{6}$
I'm not too sure if this is what you were looking for, but you could use the following:
^(?!images$).*(flyheight.com/videos/)([^/]+)$
The idea is that it would match the first part that you had, then match one or more characters that is not a slash ([^/]+) .
If you had strings that may or may not contain the / on the end (for example, you had https://flyheight.com/videos/yb24os or https://flyheight.com/videos/yb24os/), you can try the following:
^(?!images$).*(flyheight.com/videos/)([^/]+)/?$
here are my results on regexr.
This simple expression might do that since all your desired output starts with an y:
\/(y.*)
However, if you wish to add additional boundaries to it, you can do so. For instance, this would strengthen the left boundary:
flyheight.com\/videos\/(y.*)
Or you could add a list of char, similar to this:
flyheight.com\/videos\/([a-z0-9]+)
You can also add a quantifier to the desired output, similar to this expression:
flyheight.com\/videos\/([a-z0-9]{6})
and you can simply increase and add any boundary that you wish and capture your desired URLs, and fail others.
You might want to use this tool and change/edit/modify your expression based on your desired engine, as you wish:
^(.*)(flyheight.com\/videos\/)([a-z0-9]{6})$
This graph shows how it works and you can test more expressions here:

Regex Expression to Match URL and Exclude Other

Im trying to write a regex expression to match anything (.*)/feed/ with the exception of (.*)/author/feed/
Currently, I have (.*)/feed/(.*) which works well to identify any string /feed/ to redirect. However, I dont want to exlude those that have /author/(.*)/feed/
For example - match http://www.site.com/ANYTHING/feed/ but exclude site.com/author/ANYTHING/feed/
I should clarify that I'm not terribly familiar with regex expressions but this is actually for use within the Redirection plugin for wordpress which states "Full regular expression support."
Any help would be greatly appreciated. Thank you in advance
Depending on the language, you may be able to use a negative look-behind assertion:
(.*)(?<!/author)/feed
The assertion, (?<!/author), ensures that /author does not match behind the text /feed, but does not count it as being matched.

Extract text between two given strings

Hopefully someone can help me out. Been all over google now.
I'm doing some zone-ocr of documents, and want to extract some text with regex. It is always like this:
"Til: Name Name Name org.nr 12323123".
I want to extract the name-part, it can be 1-4 names, but "Til:" and "org.nr" is always before and after.
Anyone?
If you can't use capturing groups (check your documentation) you can try this:
(?<=Til:).*?(?=org\.nr)
This solution is using look behind and lookahead assertions, but those are not supported from every regex flavour. If they are working, this regex will return only the part you want, because the parts in the assertions are not matched, it checks only if the patterns in the assertions are there.
Use the pattern:
Til:(.*)org\.nr
Then take the second group to get the content between the parenthesis.

Regex for all files except .hg_keep

I use empty .hg_keep files to keep some (otherwise empty) folders in Mercurial.
The problem is that I can't find a working regex which excludes everything but the .hg_keep files.
lets say we have this filestructure:
a/b/c2/.hg_keep
a/b/c/d/.hg_keep
a/b/c/d/file1
a/b/c/d2/.hg_keep
a/b/.hg_keep
a/b/file2
a/b/file1
a/.hg_keep
a/file2
a/file1
and I want to keep only the .hg_keep files under a/b/.
with the help of http://gskinner.com/RegExr/ I created the following .hgignore:
syntax: regexp
.*b.*/(?!.*\.hg_keep)
but Mercurial ignores all .hg_keep files in subfolders of b.
# hg status
? .hgignore
? a/.hg_keep
? a/b/.hg_keep
? a/file1
? a/file
# hg status -i
I a/b/c/d/.hg_keep
I a/b/c/d/file1
I a/b/c/d2/.hg_keep
I a/b/c2/.hg_keep
I a/b/file1
I a/b/file2
I know that I a can hd add all the .hg_keep files, but is there a solution with a regular expression (or glob)?
Regexp negation might work for this. If you want to ignore everything except the a/b/.hg_keep file, you can probably use:
^(?!a/b/\.hg_keep)$
The parts of this regexp that matter are:
^ anchor the match to the beginning of the file path
(?! ... ) negation of the expression between '!' and ')'
a/b/\.hg_keep the full path of the file you want to match
$ anchor the match to the end of the file path
The regular expression
^a/b/\.hg_keep$
would match only the file called a/b/.hg_keep.
Its negation
^(?!a/b/\.hg_keep)$
will match everything else.
Not quite sure in what context you are using the Regex but this should be it, this matches all lines ending in .hg_keep:
^.*\.hg_keep$
EDIT: And here is a Regex to match items not matching the above expression:
^(?:(?!.*\.hg_keep).)*$
Try (?!.*/\.hg_keep$).
Looking for something similiar to this.
Found an answer, but it's not what we want to hear.
Limitations
There is no straightforward way to ignore all but a set of files. Attempting to use an inverted regex match will fail when combined with other patterns. This is an intentional limitation, as alternate formats were all considered far too likely to confuse users to be worth the additional flexibility.
Ref: https://www.mercurial-scm.org/wiki/.hgignore