Oracle 18c - Alternative to REGEXP_REPLACE - regex

After migrating to Oracle 18c Enterprise Edition, a function based index fails to create.
Here is my index DDL:
CREATE INDEX my_index ON my_table
(UPPER( REGEXP_REPLACE ("DEPT_NUM",'[^[:alnum:]]',NULL,1,0)))
TABLESPACE my_tbspace
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 64K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
);
I get the following error:
ORA-01743: only pure functions can be indexed
01743. 00000 - "only pure functions can be indexed"
*Cause: The indexed function uses SYSDATE or the user environment.
*Action: PL/SQL functions must be pure (RNDS, RNPS, WNDS, WNPS). SQL
expressions must not use SYSDATE, USER, USERENV(), or anything
else dependent on the session state. NLS-dependent functions
are OK.
Is this a known bug in 18c? If this function based index is no longer supported, what is another way to write this function?

The issue is regexp_replace is not deterministic. The problem arises when changing NLS settings:
alter session set nls_language = english;
with rws as (
select 'STÜFF' v
from dual
)
select regexp_replace ( v, '[A-Z]+', '#' )
from rws;
REGEXP_REPLACE(V,'[A-Z]+','#')
#Ü#
alter session set nls_language = german;
with rws as (
select 'STÜFF' v
from dual
)
select regexp_replace ( v, '[A-Z]+', '#' )
from rws;
REGEXP_REPLACE(V,'[A-Z]+','#')
#
U-umlaut is at the end of the alphabet in English. But after U in German. So the first statement doesn't replace it. The second does.
In Oracle Database 12.1 and earlier regexp_replace was incorrectly marked as deterministic. 12.2 fixed this by making it non-deterministic.
Consider carefully whether any workarounds manage diacritics correctly.
MOS note 2592779.1 discusses this further.

Most likely the REGEXP_REPLACE causes the problem, see Find out if a string contains only ASCII characters. You can bypass the limitation with a user defined function (thanks to Bob Jarvis)
CREATE OR REPLACE FUNCTION KEEP_ALNUM(strIn IN VARCHAR2)
RETURN VARCHAR2
DETERMINISTIC
AS
BEGIN
RETURN UPPER(REGEXP_REPLACE(strIn, '[^[:alnum:]]', NULL, 1, 0));
END KEEP_ALNUM;
/
CREATE INDEX DEPTS_1 ON DEPTS(KEEP_ALNUM(DEPT_NUM));
Just ensure function has keyword DETERMINISTIC, then you can define even useless functions like below and create a functional index on it
CREATE OR REPLACE FUNCTION SillyValue RETURN VARCHAR2 DETERMINISTIC
AS
BEGIN
RETURN DBMS_RANDOM.STRING('p', 20);
END;
/

There are a couple of workarounds.
First one is a hack.
As you may know, when you create FBI then Oracle creates hidden column and index on it.
Moreover, you even can specify the name of that column instead of FBI expression and Oracle will use an index.
set lines 70 pages 70
column column_name format a15
column data_type format a15
drop table my_table;
create table my_table(dept_num, dept_descr) as select rownum||'*', 'dummy' from dual connect by level <= 1e6;
create index my_index
on my_table(upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0)));
select column_name, data_type from user_tab_cols where table_name = 'MY_TABLE';
explain plan for
select * from my_table where upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0)) = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
explain plan for
select * from my_table where SYS_NC00003$ = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
Output
Table dropped.
Table created.
Index created.
COLUMN_NAME DATA_TYPE
--------------- ---------------
DEPT_NUM VARCHAR2
DEPT_DESCR CHAR
SYS_NC00003$ VARCHAR2
3 rows selected.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
So to mimic FBI you can create a hidden column and an index on top of it.
That can be done in Oracle 11g using dbms_stats.create_extended_stats.
drop index my_index;
begin
for i in (select dbms_stats.create_extended_stats
(user, 'my_table', '(upper(regexp_replace("DEPT_NUM", ''[^[:alnum:]]'', null, 1, 0)))') as col_name
from dual)
loop
execute immediate(utl_lms.format_message('alter table %s rename column "%s" to my_hidden_col','my_table', i.col_name));
end loop;
end;
/
select column_name, data_type from user_tab_cols where table_name = 'MY_TABLE';
create index my_index on my_table(my_hidden_col);
explain plan for
select * from my_table where upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0)) = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
explain plan for
select * from my_table where MY_HIDDEN_COL = '666';
select * from table(dbms_xplan.display(format => 'BASIC'));
Output
Index dropped.
PL/SQL procedure successfully completed.
COLUMN_NAME DATA_TYPE
--------------- ---------------
DEPT_NUM VARCHAR2
DEPT_DESCR CHAR
MY_HIDDEN_COL VARCHAR2
3 rows selected.
Index created.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
Explain complete.
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
Plan hash value: 2234884270
--------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| MY_TABLE |
| 2 | INDEX RANGE SCAN | MY_INDEX |
--------------------------------------------------------
9 rows selected.
Starting with Oracle 12c hidden columns are documented so it becomes even more straightforward.
alter table my_table add (my_hidden_col invisible as
(upper(regexp_replace(dept_num, '[^[:alnum:]]', null, 1, 0))) virtual);
create index my_index on my_table(my_hidden_col);
Another approach is to implement the same logic without a regex.
create index my_index on my_table(
translate(upper(dept_num, '_'||translate(dept_num, '_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', '_'), '_')));
But in this case you have to make sure that all expressions with regex in predicates are replaced with the new one.

The work-around I found easiest was to create the index using NLS_UPPER instead of UPPER:
CREATE INDEX my_index ON my_table
( REGEXP_REPLACE (NLS_UPPER("DEPT_NUM"),'[^[:alnum:]]',NULL,1,0)))
TABLESPACE my_tbspace
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE (
INITIAL 64K
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
);

Related

Add condition to an existing column

I have the following columns in my dataset:
_____________________________________________________________________________________
| ... |Start Date| Start Time | End Date | End Time | Production Start Date | ... |
|_____|__________|____________|____________|____________|_______________________|_____|
| ... | 01022020 | 180000 | 02022020 | 190000 | 01022020 | ... |
| | | | | | | |
Sometimes the Start Date + Start Time etc. values are blank but the Production Start Date values are always populated.
When the Start Date is empty (NULL), for example, I want my dataset (or ideally, graph) to read the Production Start Date.
How can I achieve this in Power BI?
I know I can make a conditional column, then within that, determine which column to read data from but is there any way to add a condition to the existing Start Date column? I couldn't see such an option in the context menu or subsequent ribbon options.
Is my only option to create a custom conditional column instead?
As #Andrey Nikolov mentioned in the comments, the only ways you can achieve this is to:
1 Create a calculated DAX column.
2 Create a custom column in query mode (M).
3 Edit the original source table.
doug

PowerBI DAX – subquery to the same table

I have a table, let's call it Products with columns:
Id
ProductId
Version
some other columns…
Id column is the primary key, and ProductId groups rows. Now I want to view distinct values of ProductId where Version is highest.
I.e. From data set:
Id | ProductId | Version | ...
100 | 1 | 0 | ...
101 | 2 | 0 | ...
102 | 2 | 1 | ...
103 | 2 | 2 | ...
I need to get:
Id | ProductId | Version | ...
100 | 1 | 0 | ...
103 | 2 | 2 | ...
In SQL I would write:
SELECT Id, ProductId, Version, OtherColumns
FROM Products p1
WHERE NOT EXISTS
(SELECT 1
FROM Products p2
WHERE p2.ProductId = p1.ProductId
AND p2.Version > p1.Version)
But I have no idea how to express this in DAX. Is this approach with subqueries inapplicable in PowerBI?
Another approach is to first construct a virtual table of product_ids and their latest versions, and then use this table to filter the original table:
EVALUATE
VAR Latest_Product_Versions =
ADDCOLUMNS(
VALUES('Product'[Product_Id]),
"Latest Version", CALCULATE(MAX('Product'[Version])))
RETURN
CALCULATETABLE(
'Product',
TREATAS(Latest_Product_Versions, 'Product'[Product_Id], 'Product'[Version]))
Result:
The benefit of this approach is optimal query execution plan.
You can use SUMMARIZECOLUMNS to group ProductId and MAX Version.
Then use ADDCOLUMNS to add the corresponding Id number(s), using a filter on the Products table for the matching ProductId and Version. I've used CONCATENATEX here, so that if multiple Id values have the same Product / MAX Version combination, all Id values will be returned, as a list.
EVALUATE
ADDCOLUMNS (
SUMMARIZECOLUMNS (
Products[ProductId],
"#Max Version",
MAX ( Products[Version] )
),
"#Max Version Id",
CONCATENATEX (
FILTER (
Products,
Products[Version] = [#Max Version] && Products[ProductId] = EARLIER ( Products[ProductId] )
),
Products[Id],
","
)
)

Choose DAX measure based on slicer value

Is it possible to dynamically pick up appropriate DAX measure defined in a table by slicer value?
Source table:
+----------------+------------+
| col1 | col2 |
+----------------+------------+
| selectedvalue1 | [measure1] |
| selectedvalue2 | [measure2] |
| selectedvalue3 | [measure3] |
+----------------+------------+
The values of col1 I put into slicer. I can retrieve these values by:
SlicerValue = SELECTEDVALUE(tab[col1])
I could hard code:
MyVariable = SWITCH(TRUE(),
SlicerValue = "selectedvalue1" , [measure1],
SlicerValue = "selectedvalue2" , [measure2],
SlicerValue = "selectedvalue3" , [measure3],
BLANK()
)
But I do not want to hard code the relation SelectedValue vs Measure in DAX measure. I want to have it defined in the source table.
I need something like this:
MyMeasure = GETMEASURE(tab[col2])
Of course assuming that such a function exists and that only one value of col2 has been filtered.
#NickKrasnov mentioned calculation groups elsewhere. To automate the generation of your hard-coded lookup table, you could use DMVs against your pbix.
You might do something like below to get output formatted that can be pasted into a large SWITCH.
SELECT
'"' + [Name] + '", [' + [Name] + '],'
FROM $SYSTEM.TMSCHEMA_MEASURES

Postgres - How to Join Tables without duplicates

I'm working on a project that locally used SQLite, now when moving to PostGres (On Heroku) my query reported an error "r.social must appear in the GROUP BY clause or be used in an aggregate function"
The original query is:
SELECT DISTINCT c.name, r.social, c.description, p.price
FROM cryptomodels_coin c
LEFT JOIN cryptomodels_coinprice p
ON p.coin_id = c.name
LEFT JOIN cryptomodels_CoinRating r
ON r.coin_id = c.name
GROUP BY c.name
Which works fine locally, with one unique row returned for each coin
When I added this to the PostGres environment, it threw the aggregate function error mentioned above - I managed to resolve this by adding all columns to the "Group by" clause, as seen below:
SELECT DISTINCT c.name, r.social, c.description, p.price
FROM cryptomodels_coin c
LEFT JOIN cryptomodels_coinprice p
ON p.coin_id = c.name
LEFT JOIN cryptomodels_CoinRating r
ON r.coin_id = c.name
GROUP BY c.name, r.social, c.description, p.price
The issue is that I now have duplicate rows for each coin
I've done a fair bit of reading and tried numerous solutions, some of which throw errors and others still result in duplicate rows, really not sure how to proceed, thank you for any assistance
EDIT for additional information:
Each coin has numerous prices and numerous ratings, with the cryptomodels_coin table being referenced by the other tables by using it's name as "coin_id" the so three coins for example:
Coin table:
| Name |
--------
| 0X |
| XSV |
| BTC |
Price table:
| Coin_id | Price |
-------------------
| 0X | 43.2 |
| XSV | 20.0 |
| BTC | 99999|
Rating table:
| Coin_id | Social|
-------------------
| 0X | 20,000|
| XSV | 12,000|
| BTC | 5,0000|
EDIT 2:
CREATE TABLE "cryptomodels_coin" (
"name" varchar(200) NOT NULL PRIMARY KEY,
"description" text NOT NULL);
CREATE TABLE "cryptomodels_coinprice" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"price" real NULL,
"coin_id" varchar(200) NOT NULL REFERENCES "cryptomodels_coin" ("name") );
CREATE TABLE "cryptomodels_coinrating" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"social" text NULL, "coin_id" varchar(200) NOT NULL REFERENCES "cryptomodels_coin" ("name"));
Added SQLFiddle:
http://sqlfiddle.com/#!15/9fcff/1
Thanks!
I guess something like this would eliminate duplicates as you wish:
SELECT c.name AS name,
r.social AS social,
c.description AS description,
SUM(p.price) AS price
FROM cryptomodels_coin c
LEFT JOIN cryptomodels_coinprice p ON p.coin_id = c.name
LEFT JOIN cryptomodels_CoinRating r ON r.coin_id = c.name
GROUP BY c.name,r.social,c.description

SELECT LAST_INSERT_ID() returns 0 after using prepared statement

I am using MySQL and a prepared statement to insert a BLOB record (jpeg image). After executing the prepared statement, I issue a SELECT LAST_INSERT_ID() and it returns 0.
In my code I put a breakpoint after the execution command and in a MySQL command (monitor) window, I issue the SELECT LAST_INSERT_ID() and it returns zero.
In the MySQL command (monitor) window, I issue an SQL statement to select all the IDs and the last (only) inserted ID is 1 (one).
I am using:
Server version: 5.1.46-community
MySQL Community Server (GPL)
Visual Studio 2008, version 9.
MySQL Connector C++ 1.0.5
My table description:
mysql> describe picture_image_data;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| ID_Image_Data | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Image_Data | mediumblob | YES | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
2 rows in set (0.19 sec)
Results using MySQL monitor:
mysql> select ID_Image_Data
-> from picture_image_data;
+---------------+
| ID_Image_Data |
+---------------+
| 1 |
+---------------+
1 row in set (0.00 sec)
mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 0 |
+------------------+
1 row in set (0.00 sec)
The Prepared statement:
INSERT INTO Picture_Image_Data
(ID_Image_Data, Image_Data)
VALUES
(?,
?);
The results above show that the ID_Image_Data field is one, but the LAST_INSERT_ID is zero. The table is created before this statement is executed, so this is the only record in the table.
Edit:
I am setting the ID field to zero and the next field to a C++ std::istream *. According to the MySQL Manual Page for LAST_INSERT_ID():
The value of LAST_INSERT_ID() is not changed if you set the AUTO_INCREMENT column of a row to a non-“magic” value (that is, a value that is not NULL and not 0).
The LAST_INSERT_ID() should return 1 since the ID value in the prepared statement is 0.
Do I need to make a prepared statement for fetching LAST_INSERT_ID?
{Searching the web showed to use a custom MySQL API, but that used PHP and futher comments said it needs another connection).
LAST_INSERT_ID would only work for auto generated primary key, that was created on auto_increment field. In your case, it looks like you are supplying the id explicitly, so last insert id is not set.
This is explicit:
mysql> insert into test (id, name) VALUES (5, 'test 2');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 0 |
+------------------+
1 row in set (0.00 sec)
This is implicit:
mysql> insert into test (name) values ('test');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 3 |
+------------------+
1 row in set (0.00 sec)
Looks like the issues is with the MySQL C++ connector.
LAST_INSERT_ID() returns 0 when
the ID field is null, explicit
insert.
LAST_INSERT_ID() returns 0 when
the ID field is not specified,
implicit insert.
I tried insert the BLOB (JPEG image) from the command line (monitor), and it works:
mysql> describe picture_image_data;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| ID_Image_Data | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Image_Data | mediumblob | YES | | NULL | |
+---------------+------------------+------+-----+---------+----------------+
2 rows in set (0.03 sec)
mysql> PREPARE blob_stmt
-> FROM 'INSERT INTO picture_image_data (ID_Image_Data, Image_Data) VALUES (?, LOAD_FILE(?))';
Query OK, 0 rows affected (0.02 sec)
Statement prepared
mysql> SET #id = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET #c = 'KY_hot_brown_picture.jpg';
Query OK, 0 rows affected (0.00 sec)
mysql> EXECUTE blob_stmt USING #id, #c;
Query OK, 1 row affected (0.15 sec)
mysql> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
| 2 |
+------------------+
1 row in set (0.00 sec)
For now, the workaround is to use SQL statements to insert the BLOB with a prepared statement rather than the MySQL C++ Connector API.
You cannot expect issuing a SELECT LAST_INSERT_ID() on the mysql command line client to give you the last inserted value that was generated in your C++ code.
LAST_INSERT_ID() gives you the last autogenerated ID on the given connection. Obviously your C++ program and the mysql client uses 2 different connections to the mysql server.
(and LAST_INSERT_ID() should be used after an INSERT too, don't have other SQL statements inbetwwen the INSERT and SELECTLAST_INSERT_ID())