Denodo Dive Split function URL

Denodo Dive Split function URL - regex

I'm trying the below code to select the last part of the URL:
select 'http://www.XX.com/download/apple-Selection-products/beauty-soap-ICs' , field_1[0].string
from
(
select SPLIT('([^\/]+$)', 'http://www.XX.com/download/apple-Selection-products/beauty-soap-ICs')field_1
)
However, my result isn't coming as expected.
http://www.XX.com/download/apple-Selection-products/beauty-soap-ICs
result should be :
beauty-soap-ICs
but I'm getting Wrong Result.
Any help will be appreciated. The URL can and can't end in a /.

You can use the REGEXP function here:
SELECT REGEXP('http://www.XX.com/download/apple-Selection-products/beauty-soap-ICs', '.*/([^/]+)/?$', '$1') AS result
See the regex demo
Details:
.* - any zero or more chars other than line break chars as many as possible
/ - a / char
([^/]+) - Group 1 ($1 refers to this group value)" one or more chars other than /
/? - an optional / char
$ - end of string.

Related

Regex, substitute part of a string always at the end

I am trying to substitute a string so a part of this url always goes to the end
google.com/to_the_end/faa/
google.com/to_the_end/faa/fee/
google.com/to_the_end/faa/fee/fii
Using this
(google\.com)\/(to_the_end)\/([a-zA-Z0-9._-]+)
$1/$3/$2
It works for the first example, but I need something a bit more versatile so no matter how many folders it always moves to_the_end as the last folder in the url string
Desired output
google.com/faa/to_the_end
google.com/faa/fee/to_the_end/
google.com/faa/fee/fii/to_the_end/

You can use
(google\.com)\/(to_the_end)\/(.*[^\/])\/?$
See the regex demo.
Details:
(google\.com) - Group 1: google.com
\/ - a / char
(to_the_end) - Group 2: to_the_end
\/ - a / char
(.*[^\/]) - Group 3: any zero or more chars other than line break chars as many as possible and then a char other than a / char
\/? - an optional / char
$ - end of string.

Extract text up to the n-th character in a string, but return the whole string if the character isn't present

I was looking at this question and the accepted answer gives this as a solution for the case when there are fewer than n characters in the string:
^(([^>]*>){4}|.*)
However, I have done a fiddle here, and it shows that this regex will just simply return the entire string all of the time.
This code:
SELECT
SUBSTRING(a FROM '^(([^>]*>){4}|.*)'),
a,
LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')),
LENGTH(a),
LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')) = LENGTH(a)
FROM s
WHERE LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')) = LENGTH(a) IS false;
after several runs returns no records - meaning that the regex is doing nothing.
Question:
I would like a regex which returns up to the fourth > character (not including it) OR the entire string if the string only contains 3 or fewer > characters. RTRIM() can always be used to trim the final > if not including it is too tricky - having an answer which gives both possibilities would help me to deepen my understanding of regexes!
This is not a duplicate - it's certainly related, but I'd like to correct the error in the original answer - and provide a correct answer of my own.

You can use
REGEXP_REPLACE(a, '^((?:[^>]*>){4}).*', '\1')
See the regex demo. Details:
^ - start of string
((?:[^>]*>){4}) - Group 1 (\1): four sequences of any chars other than > and then a > char
.* - the rest of the line.
Here is a test:
CREATE TABLE s
(
a TEXT
);
INSERT INTO s VALUES
('afsad>adfsaf>asfasf>afasdX>asdffs>asfdf>'),
('23433>433453>4>4559>455>3433>'),
('adfd>adafs>afadsf>');
SELECT REGEXP_REPLACE(a, '^((?:[^>]*>){4}).*', '\1') as Output FROM s;
Output:

You can repeat matching 0-3 times including the > using
^(?:[^>]*>){0,3}[^>]*
^ Start of string
(?:[^>]*>){0,3} Repeat 0 - 3 times matching any character except > and then match >
[^>]* Optionally match any char except >
See a regex demo.
If there should be at least a single > then the quantifier can be {1,3}

How to delete duplicate numbers in notepad ++?

I've been trying to do use the ^(.*?)$\s+?^(?=.*^\1$) but it doesnt work.
I have this scenario:
9993990487 - 9993990487
9993990553 - 9993990553
9993990554 - 9993990559
9993990570 - 9993990570
9993990593 - 9993990596
9993990594 - 9993990594
And I would want to delete those that are "duplicate" and spect the following:
9993990487
9993990553
9993990554 - 9993990559
9993990570
9993990593 - 9993990596
9993990594
I would really appreciate some help since its 20k+ numbers I have to filter. Or maybe another program, but it's the only one I have available in this PC.
Thanks,
Josue

You may use
^(\d+)\h+-\h+\1$
Replace with $1.
See the regex demo.
Details
^ - start of a line
(\d+) - Group 1: one or more digits
\h+-\h+ - a - char enclosed with 1+ horizontal whitespaces
\1 - an inline backreference to Group 1 value
$ - end of a line.
The replacement is a $1 placeholder that replaces the match with the Group 1 value.
Demo and settings:

Regex for multidimensional input string name to get the the last number between quare brackets

Is there someone to help me with my regex?
I want to match always last integer suquare bracket for every string.
product[attribute][1][0][value] - In this case [0]
product[attribute][9871][56][value] - In this case [56]
Click here for My work:
/\[[0-9,-]+\]/g
The goal is to increment input name on clone, product[attribute][{attribute_id}][{clone_index}][value].

You may use
var s = "product[attribute][1][0][value]";
console.log(s.replace(
/(.*\[)(\d+)(?=])/, function($0, $1, $2) {
return $1 + (Number($2)+1);
})
)
The regex matches
(.*\[) - Group 1: any 0+ chars other than line break chars as many as possible and then [
(\d+) - Group 2: one or more digits
(?=]) - a ] char must appear immediately to the right of the current location.
Incrementing is done inside the callback method.

How to match the whole expression only, even when there are sub parts that match?

Just trying to write input validation pattern that would allow entry of wild characters. Input field is 9 char max and should follow these rules:
* + 1- 8 charcters
1- 8 chars + *
* + 1-7 chars + *
I've written this regex using the regex documentation and testing it on one of the regex testers.
\*{1}[0-9]{1,7}\*{1}|[0-9]{1,8}\*{1}|\*{1}[0-9]{1,8}|[0-9]{9}
It matches all these correctly
123456789
*1*
*12*
*123*
*1234*
*12345*
*123456*
*1234567*
1234567*
123456*
12345*
1234*
123*
12*
1*
*1
*12
*123
*1234
*12345
*123456
*1234567
*12345678
But it also matches when I don't want it. For example it finds 2 matches in this *123456789* First match is *12345678 and second one is 9*
I don't want in this case to find any matches. Either the whole string matches one of the patterns or not. How does one do that?

Use anchors that make sure the regex always matches the entire string:
^(\*[0-9]{1,7}\*|[0-9]{1,8}\*|\*[0-9]{1,8}|[0-9]{9})$
Note the parentheses to make sure that the alternation is contained within the group:
^
(
\*[0-9]{1,7}\*
|
[0-9]{1,8}\*
|
\*[0-9]{1,8}
|
[0-9]{9}
)
$
Also, {1} is always superfluous - one match per token is the default.

You could use start and end string anchors:
http://www.regular-expressions.info/anchors.html
So, your regex would be something like this (note first and last symbol):
^(\*{1}[0-9]{1,7}*{1}|[0-9]{1,8}*{1}|*{1}[0-9]{1,8}|[0-9]{9})$

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Denodo Dive Split function URL - regex

Related

Regex, substitute part of a string always at the end

Extract text up to the n-th character in a string, but return the whole string if the character isn't present

How to delete duplicate numbers in notepad ++?

Regex for multidimensional input string name to get the the last number between quare brackets

How to match the whole expression only, even when there are sub parts that match?

Categories

Resources