Raw query:
select firstfield, secondfield, phone_number, thirdfield
from table
having CONCAT(firstfield, ' ', secondfield, ' ', thirdfield, ' ', fourthfield) regexp 'value'
and CONCAT(firstfield, ' ', secondfield, ' ', thirdfield, ' ', fourthfield) regexp 'value2'
and CONCAT(firstfield, ' ', secondfield, ' ', thirdfield, ' ', fourthfield) regexp 'value3'
and CONCAT(firstfield, ' ', secondfield, ' ', thirdfield, ' ', fourthfield) regexp 'value4'
Querybuilder
$qb->select(
'firstfield',
'secondfield',
'thirdfield',
'fourthfield',
)->from(Table, 'u');
$queryHaving = "CONCAT(firstfield, ' ', secondfield, ' ', thirdfield, ' ', fourthfield) regexp 'value'";
$qb->andhaving($queryHaving);
$queryHaving = "CONCAT(firstfield, ' ', secondfield, ' ', thirdfield, ' ', fourthfield) regexp 'value2'";
$qb->andhaving($queryHaving);
Problem:
How to collect concat with regexp not as function? Tried using literal() function but it is not possible to create due error throws on not possible to assign into.
The query seems to work for me for MySQL with any of these 2 forms:
select *
from test
having concat(field1, field2) regexp '^[FB].*' and
concat(field1, field2) regexp 'o$';
select *
from test
where concat(field1, field2) regexp '^[FB].*' and
concat(field1, field2) regexp 'o$';
See demo here
I'm just thinkging about the problem could be with CHAR columns
So, for example, one column would have FOO<space><space> on a CHAR(5) instead of FOO at VARCHAR(5). So when concatenating you would have something similar to FOO<space><space>BAR<space><space> and thus the regex would fail.
However, with SQLFiddle it doesn't seem to be the case. It does not seem to add spaces. See here.
Anyways, it may be worth trying on your app: Are you using chars or varchars? Could you try adding trims at the columns, like this:
select *,concat(trim(field1), trim(field2))
from test
having concat(trim(field1), trim(field2)) regexp '^[FB].*' and
concat(trim(field1), trim(field2)) regexp 'o$';
select *,concat(trim(field1), trim(field2))
from test
where concat(trim(field1), trim(field2)) regexp '^[FB].*' and
concat(trim(field1), trim(field2)) regexp 'o$';
Demo here.
Related
This is the list I am getting:
['', '', ' NRGD\n ', '\n MicroSectors U.S. Big Oil Index -3X Inverse Leveraged ETN\n ', ' $102.24\n ', ' 5012.00%\n \n2070.00', '\n ']
I want to "clean it up" and return:
['NRGD', 'MicroSectors U.S. Big Oil Index -3X Inverse Leveraged ETN', '$102.24', '5012.00%', '2070.00']
I want to basically remove all the items that are just spaces or \n as for the ones with actual text I want to remove the spaces and \n and just have the item with text.
We can use a list comprehension here:
inp = ['', '', ' NRGD\n ', '\n MicroSectors U.S. Big Oil Index -3X Inverse Leveraged ETN\n ', ' $102.24\n ', ' 5012.00%\n \n2070.00', '\n ']
output = [x.strip() for x in inp if x.strip()]
print(output)
This prints:
['NRGD', 'MicroSectors U.S. Big Oil Index -3X Inverse Leveraged ETN',
'$102.24', '5012.00%\n \n2070.00']
The above logic says to retain any list element which, after stripping leading and trailing whitespace, is not empty string. It then retains such elements with whitespace trimmed.
Not sure whats wrong with my regex expressions or why its chopping off the first character. The regex correctly IDs what i want to split on, but why is the first character missing in each element of the array?
>>> f = "value: http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:user-services-http/ssoeproxy/logout value: http://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:user-services-http-two/ssoeproxy/logout value: user-services-http #458930 value: user-services-http-two #458930"
>>> re.split(r'[a-z0-9]([-a-z0-9]*[a-z0-9])?', f)
>>> ['', 'alue', ': ', 'ttp', '://', 'c2-xxx-xxx-xxx-xxx', '.', 'ompute-1', '.', 'mazonaws', '.', 'om', ':', 'ser-services-http', '/', 'soeproxy', '/', 'ogout', ' ', 'alue', ': ', 'ttp', '://', 'c2-xxx-xxx-xxx-xxx', '.', 'ompute-1', '.', 'mazonaws', '.', 'om', ':', 'ser-services-http-two', '/', 'soeproxy', '/', 'ogout', ' ', 'alue', ': ', 'ser-services-http', ' #', '58930', ' ', 'alue', ': ', 'ser-services-http-two', ' #', '58930', '']
A more detailed explanation of your problem here is that split() will split on whatever group you're capturing if you only specify one capture group. It won't split on your whole regular expression. In this case you're capturing everything but the first letter. [a-z0-9] is outside your parentheses. Move your parentheses to include this part and you're good to go.
I have a column in database table in Oracle having values with leading and trailing spaces. I would like the leading spaces to be replaced with 'P' and trailing spaces with 'T', using Inline Query only.
If you want to replace each leading/training space with an equal number of P/Ts then you can use:
SELECT REPLACE( REGEXP_SUBSTR( your_column, '^ +' ), ' ', 'P' )
|| TRIM( BOTH FROM your_column )
|| REPLACE( REGEXP_SUBSTR( your_column, ' +$' ), ' ', 'T' )
FROM your_table
If you want to replace the spaces with a single P/T then:
SELECT REGEXP_REPLACE(
REGEXP_REPLACE( your_column, '(.*?) +$', '\1T' ),
'^ +(.*)',
'P\1'
)
FROM your_table
Since you didn't specify if the amount of your leading and trailing spaces are all of a constant length, something like this could be used only if they are:
select replace(substr(' hello world ',1, instr(' hello world ',' ',1,2) ),' ','P')||
trim(' hello world ')||
replace(substr(' hello world ', instr(' hello world ',' ',-1,2), length(' hello world ') ),' ','T')
from dual;
Note that, the number "2" in all of the instr functions within the query, would represent the constant number of leading/trailing spaces, so you should change it to suit your need.
I use the following regex to split sentences into words:
"('?\w[\w']*(?:-\w+)*'?)"
For example:
import re
re.split("('?\w[\w']*(?:-\w+)*'?)","'cos I like ice-cream")
gives:
['', "'cos", ' ', 'I', ' ', 'like', ' ', 'ice-cream', '!']
However, formatting tags sometimes appear in my text and my regex obviously can't process them as I would like:
re.split("('?\w[\w']*(?:-\w+)*'?)","'cos I <i>like</i> ice-cream!")
gives:
['', "'cos", ' ', 'I', ' <', 'i', '>', 'like', '</', 'i', '> ', 'ice-cream', '!']
while I would like:
['', "'cos", ' ', 'I', ' <i>', 'like', '</i> ', 'ice-cream', '!']
How would you go about solving this?
You could use a word boundary regex, specifying exclusions of matches using negative lookbehind and lookahead assertions:
^|(?<!['<\/-])\b(?![>-])
Regex demo.
Unfortunately, the python regex engine doesn't support splitting on zero-width characters, so you have to use a workaround.
import re
a = re.sub(r"^|(?<!['<\/-])\b(?![>-])", "|", "'cos I <i>like</i> ice-cream!").split('|');
print(a)
# ['', "'cos", ' ', 'I', ' <i>', 'like', '</i> ', 'ice-cream', '!']
Python demo.
# I added a negative lookahead to your pattern to assert bracket > is closed properly
import re
print re.split("('?\w[\w']*(?:-\w+)*'?(?!>))","'cos I <i>like</i> ice-cream!" )
[Output]
['', "'cos", ' ', 'I', ' <i>', 'like', '</i> ', 'ice-cream', '!']
I need to clean up a string column with both whitespaces and tabs included within, at the beginning or at the end of strings (it's a mess !). I want to keep just one whitespace between each word. Say we have the following string that includes every possible situation :
mystring = ' one two three four '
2 whitespaces before 'one'
1 whitespace between 'one' and 'two'
4 whitespaces between 'two' and 'three'
2 tabs after 'three'
1 tab after 'four'
Here is the way I do it :
I delete leading and trailing whitespaces
I delete leading and trailing tabs
I replace both 'whitespaces repeated at least two' and tabs by a sole whitespace
WITH
t1 AS (SELECT' one two three four '::TEXT AS mystring),
t2 AS (SELECT TRIM(both ' ' from mystring) AS mystring FROM t1),
t3 AS (SELECT TRIM(both '\t' from mystring) AS mystring FROM t2)
SELECT regexp_replace(mystring, '(( ){2,}|\t+)', ' ', 'g') FROM t3 ;
I eventually get the following string, which looks nice but I still have a trailing whitespace...
'one two three four '
Any idea on doing it in a more simple way and solving this last issue ?
Many thanks !
SELECT trim(regexp_replace(col_name, '\s+', ' ', 'g')) as col_name FROM table_name;
Or In case of update :
UPDATE table_name SET col_name = trim(regexp_replace(col_name, '\s+', ' ', 'g'));
The regexp_replace is flags are described on this section of the documentation.
SELECT trim(regexp_replace(mystring, '\s+', ' ', 'g')) as mystring FROM t1;
Posting an answer in case folks don't look at comments.
Use '\s+'
Not '\\s+'
Worked for me.
It didn't work for me with trim and regexp_replace. So I came with another solution:
SELECT trim(
array_to_string(
regexp_split_to_array(' test with many spaces for this test ', E'\\s+')
, ' ')
) as mystring;
First regexp_split_to_array eliminates all spaces leaving "blanks" at the beginning and the end.
-- regexp_split_to_array output:
-- {"",test,with,many,spaces,for,this,test,""}
When using array_to_string all the ',' become spaces
-- regexp_split_to_array output ( '_' instead of spaces for viewing ):
-- _test_with_many_spaces_for_this_test_
The trim is to remove the head and tail
-- trim output ( '_' instead of spaces for viewing ):
-- test_with_many_spaces_for_this_test