REGEXP and operator .+ - regex

I am trying to use MySql REGEXP to find rows where green and 2012 occurs in the column
I am using .+ in the regexp.
This works:
select 'green 2012-01' REGEXP 'green.+2012'
returns 1
but if I place the '2012' first it returns 0
select 'green 2012-01' REGEXP '2012.+green';
returns 0
I am using MYSQL Software version: 5.1.43 - MySQL Community Server (GPL)

Regular expressions are kinda order dependent. What you'll need to do is put an | (or) operator between your two items to make it look for either one.
select 'green 2012-01' REGEXP '(green.*2012)|(2012.*green)'

As an alternative to REGEX, while potentially less efficient, you could simply use LOCATE twice.
SELECT * FROM table WHERE LOCATE('2012', column) AND LOCATE('green', column);

Related

regexp_replace() - matches but does not replace at end of line

I'm trying to regexp_replace() all the values of a column ending without "/", by adding "/".
I can get the correct values by using this statement (the pattern was tested with a PCRE checker):
SELECT * FROM `table` WHERE `column` REGEXP("(?<=[^\/])$");
And the non-matching ones with:
SELECT * FROM `table` WHERE `column` REGEXP("(?<![^\/])$");
But when the statement is:
UPDATE `table` SET `column` = REGEXP_REPLACE(`column`, "(?<=[^\/])$", "/");
Then, there is no change, whatever value I put into the third parameter:
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1031 Changed: 0 Warnings: 0
You could do this easily without regex:
UPDATE `table` SET `column` = `column` + '/'
WHERE RIGHT(`column`, 1) <> '/'
trying to understand why it does not work
As I rationalize the problem, you are asking REGEXP_REPLACE to do two things:
Discover that something is missing, and
Point to a location in the string.
Your regexp says that it is missing, but I question whether it points to a specific substring (even an empty one) for replacing. It's easy to point to a found substring (or substrings). It is hard to point to a missing substring. And such a 'pointer' is needed to do the replacement.
Hence, Michal's approach (even if some regexp were needed) is the "right" way to solve the problem.

Postgres: regexp_replace & trim

I need to remove '.0' at the end of the string but I have some issues.
In PG 8.4 I have this expression and its was worked fine.
select regexp_replace('10.1.2.3.0', '(\\\\.0)+$', '');
and result was
'10.1.2.3' - good result.
But after PG was updated to 9.x version result is
'10.1.2.3.0' - the input string and its not ok.
Also I tried to use trim function
it this case it is ok
select trim('.0' from '10.1.2.3.0');
result is '10.1.2.3' - ok
but when I have 10 at the end of the code I have unexpected result
select trim('.0' from '10.1.2.3.10.0');
or
select trim('.0' from '10.1.2.3.10');
result is 10.1.2.3.1 - 0 is trimmed from 10
Somebody can suggest me solution and explain what is wrong with trim function and what was changed in regexp_replace in latest versions?
I would suggest doing something like this:
select (case when col like '%.0' then left(col, length(col) - 2)
else col
end)
This will work in all versions of Postgres and you don't need to worry about regular expression parsing.
As for the regular expression version, both of these work for me (on recent versions of Postgres):
select regexp_replace('10.1.2.3.0', '(\.0)+$', '');
select regexp_replace('10.1.2.3.0', '([.]0)+$', '');
I suspect the problem with the earlier version is the string parsing with the backslash escape character -- you can use square brackets instead of backslash and the pattern should work in any version.

Regular Expression: Replace values according to a translation table

How can I replace a list of values like
married
single
non
married
couple
to a list like this using a regular expression
Status 2
Status 1
non
Status 2
couple
? I know can match each group by something like this
/(married|single)/gm
and that I can address the matched group by $1, $2, ... . But how can I address and/or if-else the group-value in the replace-part to acutally translate the values?
Edit
Let's say I have the values to replace in a MariaDB-colum marital in myTable. Then I can do something like
SELECT
marital,
REGEXP_REPLACE(REGEXP_REPLACE(marital,
"married", "Status 2")
, "single", "Status 1")
FROM myTable
To get the desired result. But Is there a way to do this with just one REGEXP_REPLACE?
Thanks for your help!
You cannot do it with a single REGEXP_REPLACE because MariaDB doesn't support the required features in the third parameter.
You may do it using PHP with arrays: http://php.net/manual/en/function.preg-replace.php
or with callback: http://php.net/manual/en/function.preg-replace-callback.php
You may do it using Perl: How to replace a set of search/replace pairs?

regular expression clob field

I have a question related to an regular expression in oracle 10.
Assuming I have a value like 123456;12345;454545 stored in a clob field, is there a way via an regular expression to only filter on the second pattern (12345) knowing that the value can be more then 5 digits but always occurs after the first semicolon and always has a trailing semicolon at the end?
Thanks a lot for your support in that matter,
Have a nice day,
This query should give you your desired output.
SELECT REGEXP_REPLACE(REGEXP_SUBSTR('123456;12345;454545;45634',';[0-9]+;'),';')
FROM dual;
You can get filter any pattern using this query just change 2 to any value, but it should be less than or equal to the number of elements in the string
with tab(value) as
(select '123456;12345;454545' from dual)
select regexp_substr(value, '[^;]+', 1, 2) from tab;
easily by one call:
select regexp_replace('123456;12345;454545','^[0-9]+;([0-9]+);.*$','\1')
from dual;
perhaps, regexp expression can be modified in a way of more good-looking or your business logic, but the idea, I think, is clear.
select regexp_replace(regexp_substr(Col_name,';\d+;'),';','') from your_table;

How to compare Unicode characters in SQL server?

Hi I am trying to find all rows in my database (SQL Server) which have character é in their text by executing the following queries.
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\xE9]%',question) > 0;
SELECT COUNT(*) FROM t_question WHERE patindex(N'%[\u00E9]%',question) > 0;
But I found two problems: (a) Both of them are returning different number of rows and (b) They are returning rows which do not have the specified character.
Is the way I am constructing the regular expression and comparing the Unicode correct?
EDIT:
The question column is stored using datatype nvarchar.
The following query gives the correct result though.
SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%';
Why not use SELECT COUNT(*) FROM t_question WHERE question LIKE N'%é%'?
NB: Likeand patindex do not accept regular expressions.
In the SQL Server pattern syntax [\xE9] means match any single character within the specified set. i.e. match \, x, E or 9. So any of the following strings would match that pattern.
"Elephant"
"axis"
"99.9"