get integers from string - regex

i have in the database data like this
61/10#61/12,0/12,10/16,0/21,0/12#61/33,0/28#0/34,0/23#0/28
where the part like 10/16(without #) is invalid should not use for the calculation,
but all other has next format min_hr + "/" + min_hrv + "#" + max_hr + "/" + max_hrv
and the issue is get AVG value by next psevdo formula [ summ(all(min_hrv)) + summ(all(max_hrv)) ] / count(all(min_hrv)) + all(max_hrv)), for the axample string result will be ((10 + 12 + 28 + 23) + (12 + 33 + 34 + 28))/8) == 22
What i try is:
SELECT regexp_replace(
'61/10#61/12,0/12,10/16,0/21,0/12#61/33,0/28#0/34,0/23#0/28',
',\d+/\d+,', ',',
'g'
);
to remove invalid data but 10/16 still in the strin, result is:
regexp_replace
--------------------------------------------------
61/10#61/12,10/16,0/12#61/33,0/28#0/34,0/23#0/28
if do good clean the string my plan is split to array some way like this, for max (not full solution, has empty string), has no solution for min:
SELECT
regexp_split_to_array(
regexp_replace(
'61/10#61/12,0/12,0/12#61/33,0/28#0/34,0/23#0/28',
',\d+/\d+,', ',',
'g'
)
,',?\d+/\d+#\d+/'
);
result is:
regexp_split_to_array
-----------------------
{"",12,33,34,28}
and then calculate the data, something like this:
SELECT ((
SELECT sum(tmin.unnest)
FROM
(SELECT unnest('{10,12,28,23}'::int[])) as tmin
)
+
(
SELECT sum(tmax.unnest)
FROM
(SELECT unnest('{12,33,34,28}'::int[])) as tmax
))
/
(SELECT array_length('{12,33,34,28}'::int[], 1) * 2)
may be some one know more simple and right way for such issue?

Use regexp_matches():
select (regexp_matches(
'61/10#61/12,0/12,0/12#61/33,0/28#0/34,0/23#0/28',
'\d+#\d+/(\d+)',
'g'))[1]
regexp_matches
----------------
12
33
34
28
(4 rows)
The whole calculation may look like this:
with my_data(str) as (
values
('61/10#61/12,0/12,10/16,0/21,0/12#61/33,0/28#0/34,0/23#0/28')
),
min_max as (
select
(regexp_matches(str, '(\d+)#\d+', 'g'))[1] as min_hrv,
(regexp_matches(str, '\d+#\d+/(\d+)', 'g'))[1] as max_hrv
from my_data
)
select avg(min_hrv::int+ max_hrv::int) / 2 as result
from min_max;
result
---------------------
22.5000000000000000
(1 row)

The pattern you are looking for should match the digits after #, a streak of digits and a / char. With regexp_matches, you may extract a part of the pattern only if you wrap that part within a pair of parentheses.
The solution is
regexp_matches(your_col, '#\d+/(\d+)', 'g')
Note that g stands for global, meaning that all occurrences found in the string will be returned.
Pattern details
\d+ - 1 or more (+) digits
/ - a /char
(\d+) - Capturing group 1: 1 or more digits
See the regex demo.
You may extract specific bits from your data if you use a single pair of parentheses in different parts of the '(\d+)/(\d+)#(\d+)/(\d+)' regex. To extract min_hr, you'd use '(\d+)/\d+#\d+/\d+'.

Related

Regular expression for equations, variable number of inside parenthesis

I'm trying to write Regex for the case where I have series of equations, for example:
a = 2 / (1 + exp(-2*n)) - 1
a = 2 / (1 + e) - 1
a = 2 / (3*(1 + exp(-2*n))) - 1
In any case I need to capture content of the outer parenthesis, so 1 + exp(-2*n), 1+e and 3*(1 + exp(-2*n)) respectively.
I can write expression that will catch one of them, like:
\(([\w\W]*?\))\) will perfectly catch 1 + exp(-2*n)
\(([\w\W]*?)\) will catch 1+e
\(([\w\W]*?\))\)\) will catch 3*(1 + exp(-2*n))
But it seems silly to pass three lines of code for something such simple. How can I bundle it? Please take a note that I will be processing text (in loop) line-by-line anyway, so you don't have to bother for securing operator to not greedy take next line.
Edit:
Un-nested brackets are also allowed: a = 2 / (1 + exp(-2*n)) - (2-5)
The commented code below does not use regular expressions, but does parse char arrays in MATLAB and output the terms which contain top-level brackets.
So in your 3 question examples with a single set of nested brackets, it returns the outermost bracketed term.
In the example from your comment where there are two or more (possibly nested) terms within brackets at the "top level", it returns both terms.
The logic is as follows, see the comments for more details
Find the left (opening) and right (closing) brackets
Generate the "nest level" according to how many un-closed brackets there are at each point in the equation char
Find the indicies where the nesting level changes. We're interested in opening brackets where the nest level increases to 1 and closing brackets where it decreases from 1.
Extract the terms from these indices
e = { 'a = 2 / (1 + exp(-2*n)) - 1'
'a = 2 / (1 + e) - 1'
'a = 2 / (3*(1 + exp(-2*n))) - 1'
'a = 2 / (1 + exp(-2*n)) - (2-5)' };
str = cell(size(e)); % preallocate output
for ii = 1:numel(e)
str{ii} = parseBrackets_(e{ii});
end
function str = parseBrackets_( equation )
bracketL = ( equation == '(' ); % indicies of opening brackets
bracketR = ( equation == ')' ); % indicies of closing brackets
str = {}; % intialise empty output
if numel(bracketL) ~= numel(bracketR)
% Validate the input
warning( 'Could not match bracket pairs, count mismatch!' )
return
end
nL = cumsum( bracketL ); % cumulative open bracket count
nR = cumsum( bracketR ); % cumulative close bracket count
nestLevel = nL - nR; % nest level is number of open brackets not closed
nestLevelChanged = diff(nestLevel); % Get the change points in nest level
% get the points where the nest level changed to/from 1
level1L = find( nestLevel == 1 & [true,nestLevelChanged==1] ) + 1;
level1R = find( nestLevel == 1 & [nestLevelChanged==-1,true] );
% Compile cell array of terms within nest level 1 brackets
str = arrayfun( #(x) equation(level1L(x):level1R(x)), 1:numel(level1L), 'uni', 0 );
end
Outputs:
str =
{'1 + exp(-2*n)'}
{'1 + e'}
{'3*(1 + exp(-2*n))'}
{'1 + exp(-2*n)'} {'2-5'}

Regular expression - Remove special characters except single white space

From stack overflow, I got the standard reg expression
to eliminate -
a) special characters
b) digits
c) more than 2 spaces to single space
to include -
d) - (hyphen)
e) ' (single quote)
SELECT ID, REGEXP_REPLACE(REGEXP_REPLACE(forenames, '[^A-Za-z-]', ' '),'\s{2,}',' ') , REGEXP_REPLACE(REGEXP_REPLACE(surname, '[^A-Za-z-]', ' '),'\s{2,}',' ') , forenames, surname from table1;
Instead of 2 functions how to get the result in single function?
to include '(single quote) \' is not working in regexp_replace.
Thanks.
Oracle Setup:
CREATE TABLE test_data ( id, value ) AS
SELECT 1, '123a45b£$- ''c45d#{e''' FROM DUAL
Query:
SELECT ID,
REGEXP_REPLACE(
value,
'[^a-zA-Z'' -]| +( )',
'\1'
)
FROM test_data
Output:
ID | REGEXP_REPLACE(VALUE,'[^A-ZA-Z''-]|+()','\1')
-: | :--------------------------------------------
1 | ab- 'cde'
db<>fiddle here

PL SQL regular expression substring

I have a long string.
message := 'I loooove my pet animal';
This string in 23 chars long. If message is greater that 15 chars, I need to find the length of message where I can break the string into 2 strings. For example, in this case,
message1 := 'I loove my'
message2 := 'pet animal'
Essentially it should find the position of a whole word at the previous to 15 chars and the break the original string into 2 at that point.
Please give me ideas how I can do this.
Thank you.
Here is a general solution - with possibly more than one input string, and with inputs of any length. The only assumption is that no single word may be more than 15 characters, and that everything between two spaces is considered a word. If a "word" can be more than 15 characters, the solution can be adapted, but the requirement itself would need to state what the desired result is in such a case.
I make up two input strings in a CTE (at the top) - that is not part of the solution, it is just for testing and illustration. I also wrote this in plain SQL - there is no need for PL/SQL code for this type of problem. Set processing (instead of one row at a time) should result in much better execution.
The approach is to identify the location of all spaces (I append and prepend a space to each string, too, so I won't have to deal with exceptions for the first and last substring); then I decide, in a recursive subquery, where each "maximal" substring should begin and where it should end; and then outputting the substrings is trivial. I used a recursive query, that should work in Oracle 11.1 (or 11.2 with the syntax I used, with column names in CTE declarations - it can be changed easily to work in 11.1). In Oracle 12, it would be easier to rewrite the same idea using MATCH_RECOGINZE.
with
inputs ( id, str ) as (
select 101, 'I loooove my pet animal' from dual union all
select 102, '1992 was a great year for - make something up here as needed' from dual
),
positions ( id, pos ) as (
select id, instr(' ' || str || ' ', ' ', 1, level)
from inputs
connect by level <= length(str) - length(replace(str, ' ')) + 2
and prior id = id
and prior sys_guid() is not null
),
r ( id, str, line_number, pos_from, pos_to ) as (
select id, ' ' || str || ' ', 0, null, 1
from inputs
union all
select r.id, r.str, r.line_number + 1, r.pos_to,
( select max(pos)
from positions
where id = r.id and pos - r.pos_to between 1 and 16
)
from r
where pos_to is not null
)
select id, line_number, substr(str, pos_from + 1, pos_to - pos_from - 1) as line_text
from r
where line_number > 0 and pos_to is not null
order by id, line_number
;
Output:
ID LINE_NUMBER LINE_TEXT
---- ----------- ---------------
101 1 I loooove my
101 2 pet animal
102 1 1992 was a
102 2 great year for
102 3 - make
102 4 something up
102 5 here as needed
7 rows selected.
First you reverse string.
SELECT REVERSE(strField) FROM DUAL;
Then you calculate length i = length(strField).
Then find the first space after the middle
j := INSTR( REVERSE(strField), ' ', i / 2, i)`
Finally split by i - j (maybe +/- 1 need to test it)
DEMO
WITH parameter (id, strField) as (
select 101, 'I loooove my pet animal' from dual union all
select 102, '1992 was a great year for - make something up here as needed' from dual union all
select 103, 'You are Supercalifragilisticexpialidocious' from dual
), prepare (id, rev, len, middle) as (
SELECT id, reverse(strField), length(strField), length(strField) / 2
FROM parameter
)
SELECT p.*, l.*,
SUBSTR(strField, 1, len - INSTR(rev, ' ', middle)) as first,
SUBSTR(strField, len - INSTR(rev, ' ', middle) + 2, len) as second
FROM parameter p
JOIN prepare l
ON p.id = l.id
OUTPUT

how to get out string oracle regex

I have the following string my trying get out the 1111111 and 33333333333 with out the |
character
SELECT regexp_substr('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|','*[|]*[|][0-9]*')FROM dual
Using REGEXP_REPLACE may be a bit simpler;
SELECT REGEXP_REPLACE('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|',
'^([^|]*[|]){1}([^|]*).*$', '\2') FROM dual;
> 1111111
SELECT REGEXP_REPLACE('7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|',
'^([^|]*[|]){3}([^|]*).*$', '\2') FROM dual;
> 33333333333
You can choose column by choosing how many pipes to skip in the {1} part.
A simple SQLfiddle to test with.
A short explanation of the regexp;
([^|]+[|]){3} -- Matches 3 groups of {optional characters}{pipe}
(\d*) -- Matches the next digit group (the one we want)
.* -- Matches the rest of the expression
What we want is the second paranthesized group, that is, we replace the whole string by the back reference \2.
Because "|" separators always present it's simpler to extract fields with simple substring function rather than using regular expressions.
Just find positions of corresponding separators in source string and extract content between them:
with test_data as (
select
'7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|ABC' as s,
8 as field_number -- test 1, 3, 8, 10 and 16
from dual
)
select
field_number,
substr(
s,
decode( field_number,
1,1,
instr(s,'|',1,field_number - 1) + 1
),
(
decode( instr(s,'|',1,field_number),
0, length(s)+ 1,
instr(s,'|',1,field_number)
)
-
decode( field_number,
1, 1,
instr(s,'|',1,field_number - 1) + 1
)
)
) as field_value
from
test_data
SQLFiddle
This variant works with empty fields, non-numeric fields and so on.
Possible simplification with appending additional separators to the start and the end of the string:
with test_data as (
select
(
'|' ||
'7|1111111|2222222|33333333333|0||20140515|||false|0|0|0|0|0|ABC' ||
'|'
) as s, -- additional separators appended before and after original string
10 as field_number -- test 1, 3, 8, 10 and 16
from dual
)
select
field_number,
substr(
s,
instr(s, '|', 1, field_number) + 1,
(
instr(s, '|', 1, field_number + 1)
-
(instr(s, '|', 1, field_number) + 1)
)
) as field_value
from
test_data
;
SQLFiddle

Processing a Comma Separated List Before Shunting-Yard

So I'm processing some math from XML strings using the Shunting-Yard algorithm. The trick is that I want to allow the generation of random values by using comma separated lists. For example...
( ( 3 + 4 ) * 12 ) * ( 2, 3, 4, 5 ) )
I've already got a basic Shunting-Yard processor working. But I want to pre-process the string to randomly pick one of the values from the list before processing the expression. Such that I might end up with:
( ( 3 + 4 ) * 12 ) * 4 )
The Shunting-Yard setup is already pretty complicated, as far as my understanding is concerned, so I'm hesitant to try to alter it to handle this. Handling that with error checking sounds like a nightmare. As such, I'm assuming it would make sense to look for that pattern beforehand? I was considering using a regular expression, but I'm not one of "those" people... though I wish that I was... and while I've found some examples, I'm not sure how I might modify them to check for the parenthesis first? I'm also not confident that this would be the best solution.
As a side note, if the solution is regex, it should be able to match strings (just characters, no symbols) in the comma list as well, as I'll be processing for specific strings for values in my Shunting-Yard implementation.
Thanks for your thoughts in advance.
This is easily solved using two regexes. The first regex, applied to the overall text, matches each parenthesized list of comma separated values. The second regex, applied to each of the previously matched lists, matches each of the values in the list. Here is a PHP script with a function that, given an input text having multiple lists, replaces each list with one of its values randomly chosen:
<?php // test.php 20110425_0900
function substitute_random_value($text) {
$re = '/
# Match parenthesized list of comma separated words.
\( # Opening delimiter.
\s* # Optional whitespace.
\w+ # required first value.
(?: # Group for additional values.
\s* , \s* # Values separated by a comma, ws
\w+ # Next value.
)+ # One or more additional values.
\s* # Optional whitespace.
\) # Closing delimiter.
/x';
// Match each parenthesized list and replace with one of the values.
$text = preg_replace_callback($re, '_srv_callback', $text);
return $text;
}
function _srv_callback($matches_paren) {
// Grab all word options in parenthesized list into $matches.
$count = preg_match_all('/\w+/', $matches_paren[0], $matches);
// Randomly pick one of the matches and return it.
return $matches[0][rand(0, $count - 1)];
}
// Read input text
$data_in = file_get_contents('testdata.txt');
// Process text multiple times to verify random replacements.
$data_out = "Run 1:\n". substitute_random_value($data_in);
$data_out .= "Run 2:\n". substitute_random_value($data_in);
$data_out .= "Run 3:\n". substitute_random_value($data_in);
// Write output text
file_put_contents('testdata_out.txt', $data_out);
?>
The substitute_random_value() function calls the PHP preg_replace_callback() function, which matches and replaces each list with one of the values in the list. It calls the _srv_callback() function which randomly picks out one of the values and returns it as the replacement value.
Given this input test data (testdata.txt):
( ( 3 + 4 ) * 12 ) * ( 2, 3, 4, 5 ) )
( ( 3 + 4 ) * 12 ) * ( 12, 13) )
( ( 3 + 4 ) * 12 ) * ( 22, 23, 24) )
( ( 3 + 4 ) * 12 ) * ( 32, 33, 34, 35 ) )
Here is the output from one example run of the script:
Run 1:
( ( 3 + 4 ) * 12 ) * 5 )
( ( 3 + 4 ) * 12 ) * 13 )
( ( 3 + 4 ) * 12 ) * 22 )
( ( 3 + 4 ) * 12 ) * 35 )
Run 2:
( ( 3 + 4 ) * 12 ) * 3 )
( ( 3 + 4 ) * 12 ) * 12 )
( ( 3 + 4 ) * 12 ) * 22 )
( ( 3 + 4 ) * 12 ) * 33 )
Run 3:
( ( 3 + 4 ) * 12 ) * 3 )
( ( 3 + 4 ) * 12 ) * 12 )
( ( 3 + 4 ) * 12 ) * 23 )
( ( 3 + 4 ) * 12 ) * 32 )
Note that this solution uses \w+ to match values consisting of "word" characters, i.e. [A-Za-z0-9_]. This can be easily changed if this does not meet your requirements.
Edit: Here is a Javascript version of the substitute_random_value() function:
function substitute_random_value(text) {
// Replace each parenthesized list with one of the values.
return text.replace(/\(\s*\w+(?:\s*,\s*\w+)+\s*\)/g,
function (m0) {
// Capture all word values in parenthesized list into values.
var values = m0.match(/\w+/g);
// Randomly pick one of the matches and return it.
return values[Math.floor(Math.random() * values.length)];
});
}