Convert equation from string in postgresql - regex

I am trying to write a query that takes in a string, where an equation in the form
x^3 + 0.0046x^2 - 0.159x +1.713
is expected. The equation is used to calculate new values in the output table from a list of existing values. Hence I will need to convert whatever the input equation string is into an equation that postgresql can process, e.g.
power(data.value,3) + 0.0046 * power(data.value,2) - 0.159 * data.value + 1.713
A few comforting constraints in this task are
The equation will always be in the form of a polynomial, e.g. sum(A_n * x^n)
The user will always use 'x' to represent the variable in the input equation
I have been pushing my queries into a string and executing it at the end, e.g.
_query TEXT;
SELECT 'select * from ' INTO _query;
SELECT _query || 'product.getlength( ' || min || ',' || max || ')' INTO _query;
RETURN QUERY EXECUTE _query;
Hence I know I only need to somehow
Replace the 'x''s to 'data.values'
Find all the places in the equation string where a number
immediately precede a 'x', and add a '*'
Find all exponential operations (x^n) in the equation string and
convert them to power(x,n)
This may very well be something very trivial for a lot of people, unfortunately postgresql is not my best skill and I have already spent way more time than I can afford to get this working. Any type of help is highly appreciated, cheers.

Your 9am-noon time frame is over, but here goes.
Every term of the polynomial has 4 elements:
Addition/subtraction modifier
Multiplier
Parameter, always x in your case
Power
The problem is that these elements are not always present. The first term has no addition element, although it could have a subtraction sign - which is then typically connected to the multiplier. Multipliers are only given when not equal to 1. The parameter is not present in the last term and neither is a power in the last two terms.
With optional capture groups in regular expression parsing you can sort out this mess and PostgreSQL has the handy regexp_matches() function for this:
SELECT * FROM
regexp_matches('x^3 + 0.0046x^2 - 0.159x +1.713',
'\s*([+-]?)\s*([0-9.]*)(x?)\^?([0-9]*)', 'g') AS r (terms);
The regular expression says this:
\s* Read 0 or more spaces.
([+-]?) Capture 0 or 1 plus or minus sign.
\s* Read 0 or more spaces.
([0-9.]*) Capture a number consisting of digit and a decimal dot, if present.
(x?) Capture the parameter x. This is necessary to differentiate between the last two terms, see query below.
\^? Read the power symbol, if present. Must be escaped because ^ is the constraint character.
([0-9]*) Capture an integer number, if present.
The g modifier repeats this process for every matching pattern in the string.
On your string this yields, in the form of string arrays:
| terms |
|-----------------|
| {'','',x,3} |
| {+,0.0046,x,2} |
| {-,0.159,x,''} |
| {+,1.713,'',''} |
| {'','','',''} |
(I have no idea why the last line with all empty strings comes out. Maybe a real expert can explain that.)
With this result, you can piece your query together:
SELECT id, sum(term)
FROM (
SELECT id,
CASE WHEN terms[1] = '-' THEN -1
WHEN terms[1] = '+' THEN 1
WHEN terms[3] = 'x' THEN 1 -- If no x then NULL
END *
CASE terms[2] WHEN '' THEN 1. ELSE terms[2]::float
END *
value ^ CASE WHEN terms[3] = '' THEN 0 -- If no x then 0 (x^0)
WHEN terms[4] = '' THEN 1 -- If no power then 1 (x^1)
ELSE terms[4]::int
END AS term
FROM data
JOIN regexp_matches('x^3 + 0.0046x^2 - 0.159x +1.713',
'\s*([+-]?)\s*([0-9.]*)(x?)\^?([0-9]*)', 'g') AS r (terms) ON true
) sub
GROUP BY id
ORDER BY id;
SQLFiddle
This assumes you have an id column to join on. If all you have is a value then you can still do it but you should then wrap the above query in a function that you feed the polynomial and the value. The power is assumed to be integral but you can easily turn that into a real number by adding a dot . to the regular expression and a ::float cast instead of ::int in the CASE statement. You can also support negative powers by adding another capture group to the regular expression and a case statement in the query, same as for the multiplier term; I leave this for your next weekend hackfest.
This query will also handle "odd" polynomials such as -4.3x^3+ 101.2 + 0.0046x^6 - 0.952x^7 +4x just so long as the pattern described above is maintained.

Related

Extract numbers from a field in PostgreSQL

I have a table with a column po_number of type varchar in Postgres 8.4. It stores alphanumeric values with some special characters. I want to ignore the characters [/alpha/?/$/encoding/.] and check if the column contains a number or not. If its a number then it needs to typecast as number or else pass null, as my output field po_number_new is a number field.
Below is the example:
SQL Fiddle.
I tired this statement:
select
(case when regexp_replace(po_number,'[^\w],.-+\?/','') then po_number::numeric
else null
end) as po_number_new from test
But I got an error for explicit cast:
Simply:
SELECT NULLIF(regexp_replace(po_number, '\D','','g'), '')::numeric AS result
FROM tbl;
\D being the class shorthand for "not a digit".
And you need the 4th parameter 'g' (for "globally") to replace all occurrences.
Details in the manual.
For a known, limited set of characters to replace, plain string manipulation functions like replace() or translate() are substantially cheaper. Regular expressions are just more versatile, and we want to eliminate everything but digits in this case. Related:
Regex remove all occurrences of multiple characters in a string
PostgreSQL SELECT only alpha characters on a row
Is there a regexp_replace equivalent for postgresql 7.4?
But why Postgres 8.4? Consider upgrading to a modern version.
Consider pitfalls for outdated versions:
Order varchar string as numeric
WARNING: nonstandard use of escape in a string literal
I think you want something like this:
select (case when regexp_replace(po_number, '[^\w],.-+\?/', '') ~ '^[0-9]+$'
then regexp_replace(po_number, '[^\w],.-+\?/', '')::numeric
end) as po_number_new
from test;
That is, you need to do the conversion on the string after replacement.
Note: This assumes that the "number" is just a string of digits.
The logic I would use to determine if the po_number field contains numeric digits is that its length should decrease when attempting to remove numeric digits.
If so, then all non numeric digits ([^\d]) should be removed from the po_number column. Otherwise, NULL should be returned.
select case when char_length(regexp_replace(po_number, '\d', '', 'g')) < char_length(po_number)
then regexp_replace(po_number, '[^0-9]', '', 'g')
else null
end as po_number_new
from test
If you want to extract floating numbers try to use this:
SELECT NULLIF(regexp_replace(po_number, '[^\.\d]','','g'), '')::numeric AS result FROM tbl;
It's the same as Erwin Brandstetter answer but with different expression:
[^...] - match any character except a list of excluded characters, put the excluded charaters instead of ...
\. - point character (also you can change it to , char)
\d - digit character
Since version 12 - that's 2 years + 4 months ago at the time of writing (but after the last edit that I can see on the accepted answer), you could use a GENERATED FIELD to do this quite easily on a one-time basis rather than having to calculate it each time you wish to SELECT a new po_number.
Furthermore, you can use the TRANSLATE function to extract your digits which is less expensive than the REGEXP_REPLACE solution proposed by #ErwinBrandstetter!
I would do this as follows (all of the code below is available on the fiddle here):
CREATE TABLE s
(
num TEXT,
new_num INTEGER GENERATED ALWAYS AS
(NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER) STORED
);
You can add to the 'ABCDEFG... string in the TRANSLATE function as appropriate - I have decimal point (.) and a space ( ) at the end - you may wish to have more characters there depending on your input!
And checking:
INSERT INTO s VALUES ('2'), (''), (NULL), (' ');
INSERT INTO t VALUES ('2'), (''), (NULL), (' ');
SELECT * FROM s;
SELECT * FROM t;
Result (same for both):
num new_num
2 2
NULL
NULL
NULL
So, I wanted to check how efficient my solution was, so I ran the following test inserting 10,000 records into both tables s and t as follows (from here):
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
INSERT INTO t
with symbols(characters) as
(
VALUES ('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
)
select string_agg(substr(characters, (random() * length(characters) + 1) :: INTEGER, 1), '')
from symbols
join generate_series(1,10) as word(chr_idx) on 1 = 1 -- word length
join generate_series(1,10000) as words(idx) on 1 = 1 -- # of words
group by idx;
The differences weren't that huge but the regex solution was consistently slower by about 25% - even changing the order of the tables undergoing the INSERTs.
However, where the TRANSLATE solution really shines is when doing a "raw" SELECT as follows:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
NULLIF(TRANSLATE(num, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. ', ''), '')::INTEGER
FROM s;
and the same for the REGEXP_REPLACE solution.
The differences were very marked, the TRANSLATE taking approx. 25% of the time of the other function. Finally, in the interests of fairness, I also did this for both tables:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT
num, new_num
FROM t;
Both extremely quick and identical!

SQL pattern matching using regular expression

Can we use Regex i.e, Regular Expression in SQL Server? I'm using SQL-2012 and 2014 and there is an requirement to match and return input from my stored procedure.
I can't use LIKE in this situation since like only returns matching words, Using Regex I can match whole bunch of characters like Space, Hyphen, Numbers.
Here is my SP
--Suppose XYZ P is my Search Condition
Declare #Condition varchar(50) = 'XYZ P'
CREATE PROCEDURE [dbo].[usp_MATCHNAME]
#Condition varchar(25)
as
Begin
select * from tblPerson
where UPPER(Name) like UPPER(#Condition) + '%'
-- It should return both XYZ P and xyzp
End
Here my SP is going to return all matching condition where Name=XYZ P, but how to retrieve other Column having Name as [XYZP, XYZ-P]
and if search condition have any Alphanumeric value like
--Suppose XYZ 1 is my Search Condition
Declare #Condition varchar(50) = 'XYZ 1'
Then my search result should also return nonspace value like [XYZ1, xyz1, Xyz -1].
I don't want to use Substring by finding space and splitting them based on space and then matching.
Note: My input condition i.e., #Condition can have both Space or Space less, Hyphen(-) value when executing Stored Procedure.
Use REPLACE command.
It will replace the single space into %, so it will return your expected results:
SELECT *
FROM tblPerson
WHERE UPPER(Name) LIKE REPLACE(UPPER(#Condition), ' ', '%') + '%'

Regular expression for validating input string in R

I am trying to write a regular expression in R to validate user input and run program accordingly.
3 types of queries are expected, all are character vectors.
query1 = "Oct4[Title/Abstract] AND BCR-ABL1[Title/Abstract]
AND stem cells[Title] AND (2000[PDAT] :2015[PDAT])"
query2 <-c("26527521","26711930","26314551")
The following code works. But the challenge is restricting special characters in both the cases
all(grepl("[A-Za-z]+",query,perl=TRUE)) validates False for query 2
or as #sebkopf suggested
all(grepl("^[0-9 ,]+$", query)) # evaluates to TRUE only for query 2
However, query 1 also takes in year as input, which means it numeric input should be accepted for query 1. To add complexity, space , . - [] () are allowed in query1. And, the format for query2, Should be ONLY numbers, separated by , or space. Anything else should throw an error.
How to incorporate both these conditions as part of R regular expression ? So that, the following if conditions are validated accordingly to run respective codes ?
if (grepl("regex for query 1& 2",query,perl=TRUE) == True {
Run code 1
} else { print ("these characters are not allowed # ! & % # * ~ `_ = +") }
if (grepl("regex for query3",query,perl=TRUE) == True {
Run code 2
} else { print ("these characters are not allowed # ! & % # * ~ `_ = + [] () - . ")}
In your current regexps you are just looking for the occurence of the pattern ("[A-Za-z]+") anywhere in the query. If you want to specifically only allow certain character patterns, you need to make sure it matches across the whole query using "^...$".
With regular expressions there's always multiple ways of doing anything but to provide an example for matching a query without specific special characters (but everything else allowed), you could use the following (here wrapped in all to account for your query3 being a vector):
all(grepl("^[^#!&%#*~`_=+]+$", query)) # evaluates to TRUE for your query1, 2 & 3
For instead doing the positive match to only catch queries that are numbers plus space and comma:
all(grepl("^[0-9 ,]+$", query)) # evaluates to TRUE only for query3

Modify backreference using function in postgresql

I want to add (int) value to a backreference.
For this I created a function and pass the appropriate backreference. The backreference if returned without any modification works fine, however when I try to modify or use any other functions on the backreference that was passed it assumes \3 as the argument value and not the backreference value itself.
For eg-
CREATE OR REPLACE FUNCTION add10(text) returns text as $$
DECLARE
t int;
BEGIN
t := to_number($1, '999999') + 10;
return trim(to_char(t, '999999'), ' ');
END;
$$ LANGUAGE plpgsql;
then:
select regexp_replace('890808', '80(\d+)', add10('\1'), 'g');
should give result as
test
-------
89018
(1 row)
However it gives --
test
-------
89011
(1 row)
taking the value of $1 as 1(the backreference number) instead of value 8.
Any ideas why does this happen?
Problem: order of evaluation
My guess (and only a guess, given that the question isn't super clear) is that you're confused by the order of evaluation of arguments within function calls, and are trying to call a function on a backref value, but order of evaluation means that it's called on the backref string before regexp evaluation.
Assuming that add10 and t are the same thing, then:
select regexp_replace('890808', '80(\d+)', add10('\1'), 'g');
is evaluated by first calling add10('\1'). That will in turn run:
select to_number('\1', '999999') + 10 into t;
Since select to_number('\1', '999999') produces the value 1, you'll get 11 in t. You then convert that back to a string (via a rather weird approach, why didn't you just cast it).
So you've replaced '\1' with '11', so your regexp_replace call looks like:
select regexp_replace('890808', '80(\d+)', '11`, 'g');
... from which you can see where your unexpected result came from.
Solution: Split the value, modify it, then reassemble it
I don't think your desired result makes any sense, so I can't really figure out how to produce it. You seem to be trying to retain all digits before "80", discard "80", convert all digits after the "80" to a number and add 10, then substitute it back in. Which is pretty WTFy, why?
Regular expressions are one way to split numbers up, but the best way is usually modulus and remainder:
craig=> SELECT 890808 / 10000, 890808 % 10000;
?column? | ?column?
----------+----------
89 | 808
(1 row)
If you must use regexps (say, if it's mixed alphanumeric or if your criteria are not easily expressed by place values), you probably want to use regexp_split_to_array.

Escape function for regular expression or LIKE patterns

To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?
I've probed the documentation but was unable to find such a function.
Here is the full problem:
In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.
i.e. Name, Name (1), Name (2), Name (3), etc.
As it stands, I use the following code to find the next number to add in the series (written in plpgsql):
var_name_id := 1;
SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;
IF var_last_name_id IS NOT NULL THEN
var_name_id = var_last_name_id + 1;
END IF;
var_new_name := var_name || ' (' || var_name_id || ')';
(var_name contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE statement:
WHERE a.name LIKE var_name || ' (%)'
This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE statement really needs to be something more like:
WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'
But var_name could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'
Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
To address the question at the top:
Assuming standard_conforming_strings = on, like it's default since Postgres 9.1.
Regular expression escape function
Let's start with a complete list of characters with special meaning in regular expression patterns:
!$()*+.:<=>?[\]^{|}-
Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:
- needs to be first or last or it signifies a range of characters.
] and \ have to be escaped with \ (in the replacement, too).
After adding capturing parentheses for the back reference below we get this regexp pattern:
([!$()*+.:<=>?[\\\]^{|}-])
Using it, this function escapes all special characters with a backslash (\) - thereby removing the special meaning:
CREATE OR REPLACE FUNCTION f_regexp_escape(text)
RETURNS text
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;
Add PARALLEL SAFE (because it is) in Postgres 10 or later to allow parallelism for queries using it.
Demo
SELECT f_regexp_escape('test(1) > Foo*');
Returns:
test\(1\) \> Foo\*
And while:
SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';
returns FALSE, which may come as a surprise to naive users,
SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');
Returns TRUE as it should now.
LIKE escape function
For completeness, the pendant for LIKE patterns, where only three characters are special:
\%_
The manual:
The default escape character is the backslash but a different one can be selected by using the ESCAPE clause.
This function assumes the default:
CREATE OR REPLACE FUNCTION f_like_escape(text)
RETURNS text
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
, '\', '\\') -- must come 1st
, '%', '\%')
, '_', '\_');
$func$;
We could use the more elegant regexp_replace() here, too, but for the few characters, a cascade of replace() functions is faster.
Again, PARALLEL SAFE in Postgres 10 or later.
Demo
SELECT f_like_escape('20% \ 50% low_prices');
Returns:
20\% \\ 50\% low\_prices
how about trying something like this, substituting var_name for my hard-coded 'John Bernard':
create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'),
('John Bernard (1)'),
('John Bernard (2)'),
('John Bernard (3)');
select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1)
from my_table
where substring(name, 1, 12)='John Bernard'
and substring(name, 13)~'^ \([1-9][0-9]*\)$';
max
-----
4
(1 row)
one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.
Are you at liberty to change the schema? I think the problem would go away if you could use a composite primary key:
name text not null,
number integer not null,
primary key (name, number)
It then becomes the duty of the display layer to display Fred #0 as "Fred", Fred #1 as "Fred (1)", &c.
If you like, you can create a view for this duty. Here's the data:
=> select * from foo;
name | number
--------+--------
Fred | 0
Fred | 1
Barney | 0
Betty | 0
Betty | 1
Betty | 2
(6 rows)
The view:
create or replace view foo_view as
select *,
case
when number = 0 then
name
else
name || ' (' || number || ')'
end as name_and_number
from foo;
And the result:
=> select * from foo_view;
name | number | name_and_number
--------+--------+-----------------
Fred | 0 | Fred
Fred | 1 | Fred (1)
Barney | 0 | Barney
Betty | 0 | Betty
Betty | 1 | Betty (1)
Betty | 2 | Betty (2)
(6 rows)