Mongo DB escape front slash - regex

If collection(Area) has template like below
{
'path': '/city/area/street/house'
}
then how do we use like query here(how do i escape front slash)
db.getCollection('Area').find({ "path":/.city/area/street/house./})
this does not work

/.city\/area\/street\/house./
\ will escape the / inside regex.

It worked using Regex like below and we don't need to escape slashes
db.getCollection('Area').find({"path":{'$regex':'city/area/street/house'}})
This does not work
db.getCollection('Area').find({ "path":/.city\/area\/street\/house./})

Related

Django url regex hit end of url: *./something

I am geting 404 on different urls that end with the same string and instead of creating multiple redirects I would like to catch them all on the last string. It always appears at the same position, pattern goes like so:
/some-of-my-urls/the-same-string
No trailing slash there. I tried something like this:
url(r'^[a-zA-Z0-9_]+/the-same-string', redirect_func),
url(r'^./the-same-string', redirect_func),
But that doesn't work. Probably obvious for somebody with more regex knowledge, I am not very advanced. Anybody ideas?
You may use a negated character class [^/] to match any char but / and quantify it with a + quantifier that matches 1 or more repetitions:
r'^[^/]+/the-same-string'
See the regex demo.

Extract all subfolders of a path in Elasticsearch

I want to extract all direct subfolders in a path field in elasticsearch.
For example I want all subfolders of this path: /path/to/file
These URLs should match
/path/to/file/subfolderA
/path/to/file/subfolder-b
/path/to/file/subfolder_c
These URLs should not match
/path/to/file/subfolderA/folderc
/path/to/file/subfolder-b/folderd/folderE
I tried with this regex query but it's not working. The part with the / is not working. But when I replace de / with a letter the query works. I tried to escape the / with a \ but it's not working either.
POST index_name/_search
{
"query": {
"regexp":{
"path_parent": "(/path/to/file/.*)&~(.*/.*)"
}
}
You may use negated character classes:
"path_parent": "/path/to/file/[^/]*"
^^^^^
Since ElasticSearch patterns are anchored by default this pattern will match all paths starting with /path/to/file/ and then having 0+ chars other than / followed with end of string.

Regex to get a filename from a url

I am trying to write a regex to get the filename from a url if it exists.
This is what I have so far:
(?:[^/][\d\w\.]+)+$
So from the url http://www.foo.com/bar/baz/filename.jpg, I should match filename.jpg
Unfortunately, I match anything after the last /.
How can I tighten it up so it only grabs it if it looks like a filename?
The examples above fails to get file name "file-1.name.zip" from this URL:
"http://sub.domain.com/sub/sub/handler?file=data/file-1.name.zip&v=1"
So I created my REGEX version:
[^/\\&\?]+\.\w{3,4}(?=([\?&].*$|$))
Explanation:
[^/\\&\?]+ # file name - group of chars without URL delimiters
\.\w{3,4} # file extension - 3 or 4 word chars
(?=([\?&].*$|$)) # positive lookahead to ensure that file name is at the end of string or there is some QueryString parameters, that needs to be ignored
This one works well for me.
(\w+)(\.\w+)+(?!.*(\w+)(\.\w+)+)
(?:.+\/)(.+)
Select all up to the last forward slash (/), capture everything after this forward slash. Use subpattern $1.
Non Pcre
(?:[^/][\d\w\.]+)$(?<=\.\w{3,4})
Pcre
(?:[^/][\d\w\.]+)$(?<=(?:.jpg)|(?:.pdf)|(?:.gif)|(?:.jpeg)|(more_extension))
Demo
Since you test using regexpal.com that is based on javascript(doesnt support lookbehind), try this instead
(?=\w+\.\w{3,4}$).+
I'm using this:
(?<=\/)[^\/\?#]+(?=[^\/]*$)
Explanation:
(?<=): positive look behind, asserting that a string has this expression, but not matching it.
(?<=/): positive look behind for the literal forward slash "/", meaning I'm looking for an expression which is preceded, but does not match a forward slash.
[^/\?#]+: one or more characters which are not either "/", "?" or "#", stripping search params and hash.
(?=[^/]*$): positive look ahead for anything not matching a slash, then matching the line ending. This is to ensure that the last forward slash segment is selected.
Example usage:
const urlFileNameRegEx = /(?<=\/)[^\/\?#]+(?=[^\/]*$)/;
const testCases = [
"https://developer.mozilla.org/en-US/docs/Web/API/MutationObserverInit#yo",
"https://developer.mozilla.org/static/fonts/locales/ZillaSlab-Regular.subset.bbc33fb47cf6.woff2",
"https://developer.mozilla.org/static/build/styles/locale-en-US.520ecdcaef8c.css?is-nice=true"
];
testCases.forEach(testStr => console.log(`The file of ${testStr} is ${urlFileNameRegEx.exec(testStr)[0]}`))
It might work as well:
(\w+\.)+\w+$
You know what your delimiters look like, so you don't need a regex. Just split the string. Since you didn't mention a language, here's an implementation in Perl:
use strict;
use warnings;
my $url = "http://www.foo.com/bar/baz/filename.jpg";
my #url_parts = split/\//,$url;
my $filename = $url_parts[-1];
if(index($filename,".") > 0 )
{
print "It appears as though we have a filename of $filename.\n";
}
else
{
print "It seems as though the end of the URL ($filename) is not a filename.\n";
}
Of course, if you need to worry about specific filename extensions (png,jpg,html,etc), then adjust appropriately.
> echo "http://www.foo.com/bar/baz/filename.jpg" | sed 's/.*\/\([^\/]*\..*\)$/\1/g'
filename.jpg
Assuming that you will be using javascript:
var fn=window.location.href.match(/([^/])+/g);
fn = fn[fn.length-1]; // get the last element of the array
alert(fn.substring(0,fn.indexOf('.')));//alerts the filename
Here is the code you may use:
\/([\w.][\w.-]*)(?<!\/\.)(?<!\/\.\.)(?:\?.*)?$
names "." and ".." are not considered as normal.
you can play with this regexp here https://regex101.com/r/QaAK06/1/:
In case you are using the JavaScript URL object, you can use the pathname combined with the following RegExp:
.*\/(.[^(\/)]+)
Benefit:
It matches anything at the end of the path, but excludes a possible trailing slash (as long as there aren't two trailing slashes)!
Try this one instead:
(?:[^/]*+)$(?<=\..*)
This is worked for me, no matter if you have '.' or without '.' it take the sufix of url
\/(\w+)[\.|\w]+$

How do i map this url

I am using google appengine and cant map this URL "user/test#example.com"
application = webapp.WSGIApplication( [('/user/(\w+)',UsersSubPath)],debug=True)
I dont know why this expression doesnt work. any ideas?
You'll have to widen the scope of your regex. \w only matches [A-Za-z0-9] which excludes the special characters # and .. For this example you could use:
'/user/([A-Za-z0-9#.]*)'
or
'/user/(\S*)'

match url that doesnt contain asp, apsx, css, htm.html,jpg

Q-1. match url that doesn't contain asp, apsx, css, htm.html,jpg,
Q-2. match url that doesn't end with asp, apsx, css, htm.html,jpg,
You want to use the 'matches count' function, and make it match 0.
eg.
(matches all characters, then a dot, then anything that isnt aspx or css
^.*\.((aspx) | (css)){0}.*$
Edit,
added ^ (start) and $ (end line chars)
Q-1. This is better done using a normal string search, but if you insist on regex: (.(?!asp|apsx|css|htm|html|jpg))*.
Q-2. This is better done using a normal string search, but if you insist on regex: .*(?<!asp|css|htm|jpg)(?<!aspx|html)$.
If your regular expression implementation does allow lookaround assertions, try these:
(?:(?!aspx?|css|html?|jpg).)*
.*$(?<!aspx?|css|html?|jpg)