Firebase redirect using Regex - regex

My goal is to redirect any URL that does not start with a specific symbol ("#") to a different website.
I am using Firebase Hosting and already tried the Regex function in redirect to achieve this. I followed this firebase documentation on redirects but because I new to regular expressions I assume that my mistake might be my regex code.
My Goal:
mydomain.com/anyNotStartingWith# => otherdomain.com/anyNotStartingWith#
mydomain.com/#any => mydomain.com/#any
My Code:
{
"hosting": {
...
"redirects": [
{
"regex": "/^[^#]:params*",
"destination": "otherdomain.com/:params",
"type": 301
}
],
...
}
}

You can use
"regex": "/(?P<params>[^/#].*)"
The point is that you need a capturing group that will match and capture the part you want to use in the destination. So, in this case
/ - matches /
(?P<params>[^/#].*) - Named capturing group params (you can refer to the group from the destination using :params):
[^/#] - any char other than / and #
.* - any zero or more chars other than line break chars, as many as possible
To avoid matching files with .js, you can use
/(?P<params>[^/#].*(?:[^.].{2}$|.[^j].$|.{2}[^s]$))$
See this RE2 regex demo
See more about how to negate patterns at Regex: match everything but specific pattern.

Related

Exclude phrase ending with matched pattern using regex

I have written a rule to catch all web domains ending with .watch or .video, and I want to exclude 2 domains:
(?!\.videolecture|fb\.watch)\b(\.video|\.watch)
The first exclusion .videolecture works fine. But I can't exclude fb.watch.
I'm really sorry but I could't find any similar questions on stackoverflow..
You can use
\b(\.video(?!lecture\b)|(?<!\bfb)\.watch)
See the regex demo. Details:
\b - a word boundary
( - start of a capturing group:
\.video(?!lecture\b) - .video that is not immediately followed by lecture as a whole word
| - or
(?<!\bfb)\.watch - .watch that is not immediately preceded with fb as a whole word
) - end of the group.
An exclude variable could be set using a map. To exclude second-level domains, videolecture or fb, and top-level domains, watch or video:
map $host $exclude {
~\b(?:videolecture|fb)\.(?:watch|video)$ 1;
}
Return 403 if $exclude is set:
server {
if ($exclude) {
return 403
}
}

Sonatype NXRM - Asset Name Matcher [duplicate]

I have a link like http://drive.google.com and I want to match "google" out of the link.
I have:
query: {
bool : {
must: {
match: { text: 'google'}
}
}
}
But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?
The point is that the ElasticSearch regex you are using requires a full string match:
Lucene’s patterns are always anchored. The pattern provided must match the entire string.
Thus, to match any character (but a newline), you can use .* pattern:
match: { text: '.*google.*'}
^^ ^^
In ES6+, use regexp insted of match:
"query": {
"regexp": { "text": ".*google.*"}
}
One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."
However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:
{
"query": {
"wildcard": {
"text": {
"value": "*google*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
See Wildcard search for more details.
NOTE: The wildcard pattern also needs to match the whole input string, thus
google* finds all strings starting with google
*google* finds all strings containing google
*google finds all strings ending with google
Also, bear in mind the only pair of special characters in wildcard patterns:
?, which matches any single character
*, which can match zero or more characters, including an empty one
use wildcard query:
'{"query":{ "wildcard": { "text.keyword" : "*google*" }}}'
For both partial and full text matching ,the following worked
"query" : {
"query_string" : {
"query" : "*searchText*",
"fields" : [
"fieldName"
]
}
I can't find a breaking change disabling regular expressions in match, but match: { text: '.*google.*'} does not work on any of my Elasticsearch 6.2 clusters. Perhaps it is configurable?
Regexp works:
"query": {
"regexp": { "text": ".*google.*"}
}
For partial matching you can either use prefix or match_phrase_prefix.
For a more generic solution you can look into using a different analyzer or defining your own. I am assuming you are using the standard analyzer which would split http://drive.google.com into the tokens "http" and "drive.google.com". This is why the search for just google isn't working because it is trying to compare it to the full "drive.google.com".
If instead you indexed your documents using the simple analyzer it would split it up into "http", "drive", "google", and "com". This will allow you to match anyone of those terms on their own.
using node.js client
tag_name is the field name, value is the incoming search value.
const { body } = await elasticWrapper.client.search({
index: ElasticIndexs.Tags,
body: {
query: {
wildcard: {
tag_name: {
value: `*${value}*`,
boost: 1.0,
rewrite: 'constant_score',
},
},
},
},
});
You're looking for a wildcard search. According to the official documentation, it can be done as follows:
query_string: {
query: `*${keyword}*`,
fields: ["fieldOne", "fieldTwo"],
},
Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters: qu?ck bro*
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-wildcard
Be careful, though:
Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string "a* b* c*".
Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Leading wildcards can be disabled by setting allow_leading_wildcard to false.

regular expresions in firebase.json rewrites

When using firebase with functions it is possible to use regular expressions to match incoming requests and based on the match use a specific function sharing the same endpoint? for example, I am trying this:
{
"hosting": {
"rewrites": [
{
"source": "/^([0-9a-f]{2}[:-]){15}([0-9a-f]{2})$",
"function": "getFingerprint"
},
{
"source": "/*",
"function": "callNew"
}
]
}
}
I would like to match urls like:
http://test.firebaseapp.com/b4:e8:b4:ec:4a:36:76:4b:04:4a:83:c9:47:d4:c8:70
If the request matches the defined regular expression then use the function getFingerprint if not, in my try to implement a "catch-all", I am using /*.
The only pattern that works at the moment is /*, but can't find a way to make this one to work:
^([0-9a-f]{2}[:-]){15}([0-9a-f]{2})$
Therefore wondering if is possible to use any regex within the firebase.json file for configuring custom rewrites and share endpoints, for example, / in this case or as an alternative better to have a unique resource and then split the URL path to retrieve the paths as parameters
From the documentation on Firebase Hosting rewrite rules:
A source specifying a glob pattern
Glob patterns are a subset of regular expressions, and for example I don't it supports the ^ and $ terminator expressions that you use.

Multi Taxonomy URL rewrite not wokring

I am trying to rewrite WP URL and here is the URL:
http://example.com/?job_listing_region=california&job_listing_category=wordpress
I want to change it as http://example.com/california/wordpress
I tried this:
add_rewrite_rule('([^/]*)/([^/]*)/?','job_listing_region=$matches[1]&job_listing_category=$matches[2]','top');
But its not working. Sorry I am not good at regex it might be a small one but I am not able to find a solution. Thanks in advance
Code
See regex in use here
Regex
\??\w+=([^&]+)&?
Replacement
$1/
Results
Input
http://example.com/?job_listing_region=california&job_listing_category=alcohol-abuse-programs
Output
http://example.com/california/alcohol-abuse-programs/
Explanation
Regex
\?? Match between zero and one of the ? character literally
\w+= Match any word character one or more times, followed by the = character literally (\w can be replaced with [a-zA-Z0-9_] if preferred/doesn't work in your regex flavour)
([^&]+) Capture into capture group 1 any character except the & character literally one or more times
&? Match between zero and one & character literally
Replacement
$1/ Matches the same text as most recently matched by the 1st capturing group, followed by a / literally
Using http://example.com/{job_listing_region}/{job_listing_category}/ is too broad - it would affect every single URL on your website, such as /wp-admin.
I'd recommend using http://example.com/jobs/{job_listing_region}/{job_listing_category}/ as your URL structure, in which case the rewrite rule would be set as follows:
add_rewrite_rule('^jobs/([^/]*)/([^/]*)/?','index.php?page_id=1234&job_listing_region=$matches[1]&job_listing_region=$matches[2]','top');
page_id should be set to the page ID of the page/post you'd like to route this to.
It's important to note that the rewrite might not be available until you view/save the Settings -> Permalinks page in the back end.
Thanks for the above answer, they helped me to get an solution finally.
So while passing url strings to wordpress we need to register the variables in functions.php and then instead of using php get we need to use wordpress var queries to get the urls.
As suggested by #athms above, I changed url structure.
Now "wordpress" is a wordpress page in which the query variables are captured.
So my URL is http://example.com/wordpress/?job_listing_region=california
In functions.php I registered these variables:
function custom_query_vars_filter($vars) {
$vars[] = 'job_listing_region';
return $vars;
}
add_filter( 'query_vars', 'custom_query_vars_filter' );
function custom_rewrite_tag1() {
add_rewrite_tag('%job_listing_region%', '([^&]+)');
}
add_action('init', 'custom_rewrite_tag1', 10, 0);
Rewrite Rule in functions.php:
function custom_rewrite_rule3() {
add_rewrite_rule('^wordpress/([^/]*)/?','index.php?page_id=35349&state=$matches[1]','top');
}
add_action('init', 'custom_rewrite_rule3', 10, 0);
Here page id is the id of page I created i.e "wordpress"
And in the page template for "wordpress" I captured the region using:
$region = get_query_var('job_listing_region');
Now you can pass this variable to your query.
So now you can start using this pretty URL:
http://example.com/wordpress/california
The end of URL california is taken as query string and can be used in our template.

Extract all subfolders of a path in Elasticsearch

I want to extract all direct subfolders in a path field in elasticsearch.
For example I want all subfolders of this path: /path/to/file
These URLs should match
/path/to/file/subfolderA
/path/to/file/subfolder-b
/path/to/file/subfolder_c
These URLs should not match
/path/to/file/subfolderA/folderc
/path/to/file/subfolder-b/folderd/folderE
I tried with this regex query but it's not working. The part with the / is not working. But when I replace de / with a letter the query works. I tried to escape the / with a \ but it's not working either.
POST index_name/_search
{
"query": {
"regexp":{
"path_parent": "(/path/to/file/.*)&~(.*/.*)"
}
}
You may use negated character classes:
"path_parent": "/path/to/file/[^/]*"
^^^^^
Since ElasticSearch patterns are anchored by default this pattern will match all paths starting with /path/to/file/ and then having 0+ chars other than / followed with end of string.