Last word in a sentence: In SQL (regular expressions possible?) - regex

I need this to be done in Oracle SQL (10gR2). But I guess, I would rather put it plainly, any good, efficient algorithm is fine.
Given a line (or sentence, containing one or many words, English), how will you find the last word of the sentence?
Here is what I have tried in SQL. But, I would like to see an efficient way of doing this.
select reverse(substr(reverse(&p_word_in)
, 0
, instr(reverse(&p_word_in), ' ')
)
)
from dual;
The idea was to reverse the string, find the first occurring space, retrieve the substring and reverse the string. Is it quite efficient? Is a regular expression available? I am on Oracle 10g R2. But I dont mind seeing any attempt in other programming language, I wont mind writing a PL/SQL function if need be.
Update:
Jeffery Kemp has given a wonderful answer. This works perfectly.
Answer
SELECT SUBSTR(&sentence, INSTR(&sentence,' ',-1) + 1)
FROM dual

I reckon it's simpler with INSTR/SUBSTR:
WITH q AS (SELECT 'abc def ghi' AS sentence FROM DUAL)
SELECT SUBSTR(sentence, INSTR(sentence,' ',-1) + 1)
FROM q;

Not sure how it is performance wise, but this should do it:
select regexp_substr(&p_word_in, '\S+$') from dual;

I'm not sure if you can use a regex in oracle, but wouldn't
(\w+)\W*$
work?

This regex matches the last word on a line:
\w+$
And RegexBuddy gives this code for use in Oracle:
DECLARE
match VARCHAR2(255);
BEGIN
match := REGEXP_SUBSTR(subject, '[[:alnum:]]_+$', 1, 1, 'c');
END;

this leaves the punctuation but gets the final word
with datam as (
SELECT 'abc asdb.' A FROM DUAL UNION
select 'ipso factum' a from dual union
select 'ipso factum' a from dual union
SELECT 'ipso factum2' A FROM DUAL UNION
SELECT 'ipso factum!' A FROM DUAL UNION
SELECT 'ipso factum !' A FROM DUAL UNION
SELECT 'ipso factum/**//*/?.?' A FROM DUAL UNION
SELECT 'ipso factum ...??!?!**' A FROM DUAL UNION
select 'ipso factum ..d.../.>' a from dual
)
SELECT a,
--REGEXP_SUBSTR(A, '[[:alnum:]]_+$', 1, 1, 'c') , /** these are the other examples*/
--REGEXP_SUBSTR(A, '\S+$') , /** these are the other examples*/
regexp_substr(a, '[a-zA-Z]+[^a-zA-Z]*$')
from datam

This works too, even for non english words :
SELECT REGEXP_SUBSTR ('San maria Calle Cáceres Numéro 25 principal izquierda, España', '[^ .]+$') FROM DUAL;

Related

Regular expression to get last part of a string before a character

I have this string on a single column of a single row on an oracle table:
(test-1#gmail.com-1234567)
(testAAAcccc#gmail.com-7654321)
..
Above it's a single big string.
I need a regular expression to extract all the occurrences (could be 1 or more, 2 in above example) of the 7 numbers above, so the results should be:
1234567
7654321
I'm trying to to that with various regular expression or oracle functions, I'm not able to get both the occurrences.
Could you please help me?
If you need exactly regular expression:
select regexp_substr('(test-1#gmail.com-1234567)', '\d{7}' ) from dual
To find all occurences:
select *
from t,
lateral(select level occurence_number, regexp_substr(str, '(\d{7})',1,level ) digits7
from dual
connect by level<=regexp_count(str, '(\d{7})' )
);
or test query with test data (you can run it as-is to se how it works):
with t(str) as (
select '(test-1#gmail.com-1234567)' from dual union all
select '(testAAAcccc#gmail.com-7654321)' from dual union all
select '7654321 1234567 2345678' from dual
)
select *
from t,
lateral(select level occurence_number, regexp_substr(str, '(\d{7})',1,level ) digits7
from dual
connect by level<=regexp_count(str, '(\d{7})' )
);
If you are looking to extract only numbers, if you can do that by:
Select SUBSTR(column, INSTR(column,'-', -1) + 1)
from dual;
#It is fetching everything from column after dash(-).
If you have paranthesis, if you can replace it with a space and then TRIM:
Select TRIM(replace(SUBSTR(column, INSTR(column,'-', -1) + 1),')', ' '))
from dual;

Url regex oracle

I want to do something like this :
This is the link I want to replace. So I want only to keep the "textIwantToKeep" part :
http://mylink/aaa-bbb/textIwantToKeep
And I want this :
http://mySecondLink/ccc-ddd/textIwantToKeep
I want to use regular expression with Oracle SQL Developper. I think about to count the number of slash (4) and to split only the part before the 4th slash but it doesn't work..
Thank you for your help.
REGEXP_SUBSTR might be one option; \w+$ returns the last word (i.e. the one "anchored" to the end of the string):
SQL> with test (link) as
2 (select 'http://mylink/aaa-bbb/textIwantToKeep' from dual union all
3 select 'http://mySecondLink/ccc-ddd/textIwantToKeep' from dual
4 )
5 select link,
6 regexp_substr(link, '\w+$') result
7 from test;
LINK RESULT
------------------------------------------- --------------------
http://mylink/aaa-bbb/textIwantToKeep textIwantToKeep
http://mySecondLink/ccc-ddd/textIwantToKeep textIwantToKeep
SQL>
There could be other alternatives, but here is something that came to me quickly -
WITH main_table AS (
SELECT 'http://mylink/aaa-bbb/textIwantToKeep' AS original_string FROM dual
)
,
second_table AS (
SELECT 'http://mySecondLink/ccc-ddd/' AS my_second_link FROM dual
)
SELECT
second_table.my_second_link
|| regexp_substr(main_table.original_string, '[^/]+', 1, 4) AS final_string
FROM
main_table,
second_table;
Let me know if that works.

Oracle regexp cant seem to get it right

I have some values like
CR-123456
ECR-12345
BCY-499494
134-ABC
ECW-ECR1233
CR-123344
I want to match all lines which do not start with ECR and the regex for doing so is ^((?!ECR)\w+) which seems to do what I want.
But then I want to replace the matched values which do not begin with ECR and replace them with ECR and i am blanked because the following doesn't seem to work
select regexp_replace('CR-123344','^((?!ECR)\w+)','ECR') from dual
Any ideas where i have gone wrong ?
I want the result to be
ECR-123456
ECR-12345
ECR-499494
ECR-ABC
ECR-ECR1233
ECR-123344
You don't absolutely need to use regex here, you can just use Oracle's base string functions.
SELECT
'ECR-' || SUBSTR(col,
INSTR(col, '-') + 1,
LENGTH(col) - INSTR(col, '-')) AS new_col
FROM yourTable
WHERE col NOT LIKE 'ECR-%'
The advantage of this approach is that it might run faster than a regex. The disadvantage is that the code is a bit less tidy, but if you understand how it works then this is the most important thing.
I would use substring and instr to replace everything before the dash, but here is your answer using regexp:
WITH aset
AS (SELECT 'CR-123456' a
FROM DUAL
UNION ALL
SELECT 'BCY-12345' a
FROM DUAL
UNION ALL
SELECT 'ECR-499494' a
FROM DUAL
UNION ALL
SELECT '134-ABC' a
FROM DUAL
UNION ALL
SELECT 'ECW-ECR1233' a
FROM DUAL
UNION ALL
SELECT 'CR-123344'
FROM DUAL)
SELECT a, regexp_replace(a, '^([^-]*)','ECR') b
FROM aset;
Results in
A,B
CR-123456,ECR-123456
BCY-12345,ECR-12345
ECR-499494,ECR-499494
134-ABC,ECR-ABC
ECW-ECR1233,ECR-ECR1233
CR-123344,ECR-123344
Looks like you are replacing characters before the '-' with ECR. Do you need to check if it does not match 'ECR' at all?
Because this will give you what you want, will it not?
select regexp_replace('CR-123344','(.*)-','ECR-')
from dual;

How to check if a string matches multiple conditions in Oracle using regular expressions?

After struggling with regular expressions, I've came up with this pattern ^(ABC_)\w*(_USER[0-9]*)\w*(_MOD_)\w* that match this kind of word
If the string starts with ABC_ and contains _USER with any number following it, and also contains the word _MOD_ after that
Example of a matching strings:
ABC_sssss_USER0000000000_sssss_MOD_sssss
ABC_SCssB_USER0332_MOD_REG_SP
tested in this tool:
http://www.regexpal.com/
but I cant get it work in oracle sql
Here is my testing code:
SELECT
OBJECT_NAME,
REGEXP_INSTR(OBJECT_NAME, '^(ABC_)\w*(_USER[0-9]*)\w*(_MOD_)\w*') AS IS_MATCH
FROM
(
SELECT 'ABC_SCssB_USER0332_MOD_REG_SP' OBJECT_NAME FROM DUAL UNION
SELECT 'ABC_SCssB_USER0332_REG_SP' FROM DUAL UNION
SELECT 'SCssB_USER0332_MOD_REG_SP' FROM DUAL UNION
SELECT 'ABC_SCssB_MOD_REG_SP' FROM DUAL
)
Result:
ABC_SCssB_MOD_REG_SP 0
ABC_SCssB_USER0332_MOD_REG_SP 0
ABC_SCssB_USER0332_REG_SP 0
SCssB_USER0332_MOD_REG_SP 0
Expected Result:
ABC_SCssB_MOD_REG_SP 0
ABC_SCssB_USER0332_MOD_REG_SP 1
ABC_SCssB_USER0332_REG_SP 0
SCssB_USER0332_MOD_REG_SP 0
How can I achieve that in oracle ?
If regular expressions are not mandated you could do this, assuming you need 1 or more digits after '_USER':
select
object_name,
case when translate(OBJECT_NAME, '#0123456789', ' ##########')
like 'ABC\_%\_USER#%\_MOD\_%' escape '\'
then 1
else 0
end as is_match
from
(
select 'ABC_SCssB_USER0332_MOD_REG_SP' object_name from dual union
select 'ABC_SCssB_USER0332_REG_SP' from dual union
select 'SCssB_USER0332_MOD_REG_SP' from dual union
select 'ABC_SCssB_MOD_REG_SP' from dual
);
This runs a bit quicker than the regexp version for me (on 12.1.0.1.0) - about 75% of the time taken by the regexp version.
If there can be 0 or more digits after '_USER' then this will do:
select
object_name,
case when OBJECT_NAME like 'ABC\_%\_USER%\_MOD\_%' escape '\'
then 1
else 0
end as is_match
from
(
select 'ABC_SCssB_USER0332_MOD_REG_SP' object_name from dual union
select 'ABC_SCssB_USER0332_REG_SP' from dual union
select 'SCssB_USER0332_MOD_REG_SP' from dual union
select 'ABC_SCssB_MOD_REG_SP' from dual
);
Ok, so it turns out it will work if you change \w* to .*. It's still not clear what causes \w to fail, though.
I have once encountered non-latin ranges in character classes (like [A-z] but for Cyrillic, [А-я]) not working properly because of NLS_SORT settings. perhaps something similar is affecting \w?
#simsim, please post your exact database version and NLS settings, so that we could try to get to the root of the problem and make this question more useful to others.
EDIT:
The reason turns out to be much simpler - database version 10.1 is the culprit, regexp support was just added in 10g and \w is simply not supported in this version. My instance is 10.2, and "perl-influenced extensions" were only added in 10.2 - see this table for a full list of things that were added, and this link to see what's available in 10.1. Be aware that you also don't have support for non-greedy quantifiers (.*?, .+?) or similar character classes like \d.

PL/SQL Oracle regular expression doesn't work for occurencce of zero

I have a problem in matching regular expression in Oracle PL/SQL.
To be more specific, the problem is that regex doesn't want to match any zero occurrence.
For example, I have something like:
select * from dual where regexp_like('', '[[:alpha:]]*');
and this doesn't work. But if I put space in this statement:
select * from dual where regexp_like(' ', '[[:alpha:]]*');
it works.
I want to have the first example running, so that person doesn't have to put 'space' for it to work.
Any help is appreciated, and thank you for your time.
T
For better or worse, empty strings in Oracle are treated as NULL:
SQL> select * from dual where '' like '%';
DUMMY
-----
Take that into account when querying with Oracle:
SQL> SELECT *
2 FROM dual
3 WHERE regexp_like('', '[[:alpha:]]*')
4 OR '' IS NULL;
DUMMY
-----
X
Does Oracle still treat empty strings as NULLs? And if so, does regexp_like with a NULL input source string return UNKNOWN? Both of these would be semi-reasonable, and a reason why your test does not work as you expected.