Oracle Apex - REST data source - nested JSON array - sync two tables - where to write SQL - oracle-apex

This question is a follow up to another SO question.
Summary: I have an API returning a nested JSON array. Data is being extracted via APEX REST Data Sources. The Row Selector in the Data Profile is set to "." (to select the "root node").
The lines array has been manually added to a column (LINES) to the Data Profile, set data type to JSON Document, and used lines as the selector.
SAMPLE JSON RESPONSE FROM API
[ {
"order_number": "so1223",
"order_date": "2022-07-01",
"full_name": "Carny Coulter",
"email": "ccoulter2#ovh.net",
"credit_card": "3545556133694494",
"city": "Myhiya",
"state": "CA",
"zip_code": "12345",
"lines": [
{
"product": "Beans - Fava, Canned",
"quantity": 1,
"price": 1.99
},
{
"product": "Edible Flower - Mixed",
"quantity": 1,
"price": 1.50
}
]
},
{
"order_number": "so2244",
"order_date": "2022-12-28",
"full_name": "Liam Shawcross",
"email": "lshawcross5#exblog.jp",
"credit_card": "6331104669953298",
"city": "Humaitá",
"state": "NY",
"zip_code": "98670",
"lines": [
{
"order_id": 5,
"product": "Beans - Green",
"quantity": 2,
"price": 4.33
},
{
"order_id": 1,
"product": "Grapefruit - Pink",
"quantity": 5,
"price": 5.00
}
]
},
]
The order attributes have been synchronized to a local table (Table name: SOTEST_LOCAL)
The table has the correct data. As you can see below, the LINES column contains the JSON array.
I then created an ORDER_LINES child table to extract the JSON from LINES column in the SOTEST_LOCAL table. (Sorry for the table names.. I should've named the tables as ORDERS_LOCAL and ORDER_LINES_LOCAL)
CREATE TABLE "SOTEST_ORDER_LINES_LOCAL"
( "LINE_ID" NUMBER,
"ORDER_ID" NUMBER,
"LINE_NUMBER" NUMBER,
"PRODUCT" VARCHAR2(200) COLLATE "USING_NLS_COMP",
"QUANTITY" NUMBER,
"PRICE" NUMBER,
CONSTRAINT "SOTEST_ORDER_LINES_LOCAL_PK" PRIMARY KEY ("LINE_ID")
USING INDEX ENABLE
) DEFAULT COLLATION "USING_NLS_COMP"
/
ALTER TABLE "SOTEST_ORDER_LINES_LOCAL" ADD CONSTRAINT "SOTEST_ORDER_LINES_LOCAL_FK" FOREIGN KEY ("ORDER_ID")
REFERENCES "SOTEST_LOCAL" ("ORDER_ID") ON DELETE CASCADE ENABLE
/
QuickSQL version..
SOTEST_ORDER_LINES_LOCAL
LINE_ID /pk
ORDER_ID /fk SOTEST_LOCAL references ORDER_ID
LINE_NUMBER
PRODUCT
QUANTITY
PRICE
So per Carsten's answer in the previous question, I can write SQL to extract the JSON array from the LINES column in the SOTEST_LOCAL table to the child table SOTEST_ORDER_LINES_LOCAL.
My question is two parts.
Where exactly do I write the SQL? Would I write it in SQL Workshop in SQL Commands?
The REST data is being synchronized to make a request every hour. So would I need to write a function that runs every time there is new data being merged?

there are multiple options for this:
Create a trigger on the local synchronization table
You could create an trigger on your ORDERS table, which runs AFTER INSERT, UPDATE or DELETE on your ORDERS table, and which maintains the LINES table. The nice things about this one is that the maintenance of the child table is independent from APEX or the REST Synchronization; it would also work if you just inserted rows with plain SQL*Plus.
Here's some pseudo-code on how the trigger could look like.
create or replace trigger tr_maintain_lines
after insert or update or delete on ORDERS_LOCAL
for each row
begin
if inserting then
insert into SOTEST_ORDER_LINES_LOCAL ( order_id, line_id, line_number, product, quantity, price)
( select :new.id,
seq_lines.nextval,
j.line#,
j.product,
j.quantity,
j.price
from json_table(
:new.lines,
'$[*]' columns (
line# for ordinality,
product varchar2(255) path '$.product',
quantity number path '$.quantity',
price number path '$.price' ) ) );
elsif deleting then
delete SOTEST_ORDER_LINES_LOCAL
where order_id = :old.id;
elsif updating then
--
-- handle the update case here.
-- I would simply delete and re-insert LINES rows.
end if;
end;
Handle child table maintenance in APEX itself.
You could turn off the schedule of your REST Source synchronization, and have it only running when called with APEX_REST_SOURCE_SYNC.SYNCHRONIZE_DATA (https://docs.oracle.com/en/database/oracle/apex/22.1/aeapi/SYNCHRONIZE_DATA-Procedure.html#GUID-660DE4D1-4BAF-405A-A871-6B8C201969C9).
Then create an APEX Automation, which runs on your desired schedule, and this automation has two Actions. One would be the REST Source Synchronization, the other one would call PL/SQL code to maintain the child tables.
Have a look into this blog posting which talks a bit about more complex synchronization scenarios (although it does exactly fit scenario): https://blogs.oracle.com/apex/post/synchronize-parent-child-rest-sources
I hope this helps

Related

Oracle Apex 22.21 - REST data source - trigger to convert string to_number and add into a column

This question is a follow up to another SO question.
I have an app with a REST Data Source taking in JSON responses from an API. There are two tables ORDERS and LINES. The ORDERS table contains a column LINES which is a JSON Document type.
I created a trigger on my ORDERS table which runs AFTER INSERT, UPDATE or DELETE on my ORDERS table, and which maintains the LINES table (Extracts the JSON array from the ORDERS table ORDER_ITEMS column and inserts into ORDER_ITEMS table and each column).
SAMPLE JSON RESPONSE FROM API
[ {
"order_number": "so1223",
"order_date": "2022-07-01",
"full_name": "Carny Coulter",
"email": "ccoulter2#ovh.net",
"credit_card": "3545556133694494",
"city": "Myhiya",
"state": "CA",
"zip_code": "12345",
"lines": [
{
"product": "Beans - Fava, Canned",
"quantity": 1,
"price": $1.99
},
{
"product": "Edible Flower - Mixed",
"quantity": 1,
"price": $1.50
}
]
},
{
"order_number": "so2244",
"order_date": "2022-12-28",
"full_name": "Liam Shawcross",
"email": "lshawcross5#exblog.jp",
"credit_card": "6331104669953298",
"city": "Humaitá",
"state": "NY",
"zip_code": "98670",
"lines": [
{
"order_id": 5,
"product": "Beans - Green",
"quantity": 2,
"price": $4.33
},
{
"order_id": 1,
"product": "Grapefruit - Pink",
"quantity": 5,
"price": $5.00
}
]
},
]
create or replace trigger "TR_MAINTAIN_LINES"
AFTER
insert or update on "ORDERS"
for each row
begin
if inserting or updating then
if updating then
delete LINES
where order_number = :old.order_number;
end if;
insert into LINES ( order_number, product, quantity, price, item_image)
( select :new.order_number,
j.product,
j.quantity,
j.price,
j.item_image
from json_table(
:new.lines,
'$[*]' columns (
product varchar2(4000) path '$.product',
quantity number path '$.quantity',
price varchar2(4000) path '$.price',
item_image varchar2(4000) path '$.item_image' ) ) j );
end if;
end;
So this works as expected right now. However, the PRICE column is a varchar type due to the '$' in the JSON response. (Ex: $1.99). I'm able to change this to a number type in a view by using TO_NUMBER(price,'$999,999,999.99').
How can I do this within the trigger?
Have the trigger after insert or update, convert the PRICE column (varchar) to a another column TOTAL_PRICE (number).
I also want to add another column UNIT_PRICE which takes TOTAL_PRICE/QUANTITY.
Are these possible within the given trigger or would I have to create a separate trigger?
For column TOTAL_PRICE
Yes this can be done in the trigger. Add the column to the lines table and modify the insert statement
...
insert into LINES ( order_number, product, quantity, price, total_price, item_image)
( select :new.order_number,
j.product,
j.quantity,
j.price,
TO_NUMBER(j.price,'$999.99'),
j.item_image
from json_table(
:new.lines,
'$[*]' columns (
product varchar2(4000) path '$.product',
quantity number path '$.quantity',
price varchar2(4000) path '$.price',
item_image varchar2(4000) path '$.item_image' ) ) j );
...
For column UNIT_PRICE
This can be done in the trigger as well but... it makes more sense to add a "virtual column" here since unit price is always TOTAL_PRICE/QUANTITY. A virtual column cannot be updated/inserted, only selected - the value is always up to date
alter table lines add unit_price number generated always as ( total_price / quantity ) virtual;

Oracle Apex - how to calculate the sum of prices matching order id on one table and insert to another table

This question is a follow-up to another SO question.
This is for an e-commerce platform to show the end-user sales reports. I have a table ORDERS and a table ORDER_ITEMS. The data is received through a REST API as a JSON response. The JSON is synced through APEX REST Data Source and added to the local tables via a PLSQL trigger.
Sample JSON response
[
{
"order_id": "ZCHI8-N9Zm-VJ1o",
"order_number": 89653,
"order_date": "2023-01-01",
"store_id": 1,
"full_name": "Amalee Planque",
"email": "aplanque0#shinystat.com",
"city": "Houston",
"state": "Texas",
"zip_code": "77040",
"credit_card": "5048378382993155",
"order_items": [
{
"line_number": 1,
"product_id": 4,
"quantity": 1,
"price": 3919.8
},
{
"line_number": 2,
"product_id": 6,
"quantity": 1,
"price": 3089.36
},
{
"line_number": 3,
"product_id": 1,
"quantity": 1,
"price": 3474.4
}
]
},
{
"order_id": "XZnQx-PwYR-zWy2",
"order_number": 37946,
"order_date": "2022-01-29",
"store_id": 2,
"full_name": "Marillin Gadie",
"email": "mgadie1#comsenz.com",
"city": "Cleveland",
"state": "Ohio",
"zip_code": "44191",
"credit_card": "5108757233070957",
"order_items": [
{
"line_number": 1,
"product_id": 5,
"quantity": 1,
"price": 3184.37
}
]
},
]
Trigger to insert the order_items to the ORDER_ITEMS table
create or replace trigger "TR_MAINTAIN_LINES"
AFTER
insert or update or delete on "ORDERS_LOCAL"
for each row
begin
if inserting or updating then
if updating then
delete ORDER_ITEMS_LOCAL
where order_id = :old.order_id;
end if;
insert into ORDER_ITEMS_LOCAL ( order_id, line_id, line_number, product_id, quantity, price)
( select :new.order_id,
seq_line_id.nextval,
j.line_number,
j.product_id,
j.quantity,
j.price
from json_table(
:new.order_items,
'$[*]' columns (
line_id for ordinality,
line_number number path '$.line_number',
product_id number path '$.product_id',
quantity number path '$.quantity',
price number path '$.price' ) ) j );
elsif deleting then
delete ORDER_ITEMS_LOCAL
where order_id = :old.order_id;
end if;
end;
The ORDERS table contains all the order fields (order_id, order_number, etc.) It also contains order_items as a JSON array. (which gets extracted into the ORDER_ITEMS table by the trigger)
ORDERS table
ORDERS table data
The ORDER_ITEMS table contains all the order item fields (line_number, quantity, price). It also contains the order_id to reference which order the line item is referring to.
ORDER_ITEMS table
ORDER_ITEMS table data
I need to add an ORDER_TAX column to the ORDERS table. In which:
Grabs the order_items from the ORDER_ITEMS table which has the same order_id.
Adds all the price columns to get the total.
Multiply the total by 0.15 to simulate the estimated sales tax
Insert the 'sales tax' into the ORDER_TAX column in the ORDERS table.
I've created the ORDER_TAX column. I think I need to create a new trigger for this but I'm not quite sure how to code this out. I also need to create an ORDER_TOTAL column to the ORDERS table but I think I can figure that out once someone helps me with this initial question.
-----------UPDATE--------------
Per Koen's comment, it seems I need to actually create a view for this instead of a trigger.
The SQL query below returns the expected results
select ORDER_ID, SUM(PRICE) * 0.15 AS ORDER_TAX
from ORDER_ITEMS_LOCAL
GROUP BY ORDER_ID
How do I create a view so it inserts the value into the ORDERS table, ORDER_TAX column?
Per Koen's comment, I've been able to figure this out.
Go to Object Browser Create View
select
ORDERS_LOCAL.*,
A2.TAX,
A2.SUBTOTAL,
A2.TOTAL
from ORDERS_LOCAL
inner join (
select
ORDER_ID,
cast(sum(price) *0.15 as decimal(10,2)) as tax,
sum(price) as subtotal,
cast(sum(price) *1.15 as decimal(10,2)) as total
from ORDER_ITEMS_LOCAL
group by ORDER_ID
) A2
ON ORDERS_LOCAL.ORDER_ID=A2.ORDER_ID;
I then created a Master Detail report with the view as the source.

How to query AWS DynamoDB using multiple Indexes?

I have an AWS DynamoDb cart table with the following item structure -
{
"cart_id": "5e4d0f9f-f08c-45ae-986a-f1b5ac7b7c13",
"user_id": 1234,
"type": "OTHER",
"currency": "INR",
"created_date": 132432423,
"expiry": 132432425,
"total_amount": 90000,
"total_quantity": 2,
"items": [
{
"amount": 90000,
"category": "Laptops",
"name": "Apple MacBook Pro",
"quantity": 1
}
]
}
-
{
"cart_id": "12340f9f-f08c-45ae-986a-f1b5ac7b1234",
"user_id": 1234,
"type": "SPECIAL",
"currency": "INR",
"created_date": 132432423,
"expiry": 132432425,
"total_amount": 1000,
"total_quantity": 2,
"items": [
{
"amount": 1000,
"category": "Special",
"name": "Special Item",
"quantity": 1
}
]
}
The table will have cart_id as Primary key,
user_id as an Index or GSI,
type as an Index or GSI.
I want to be able to query the cart table,
to find the items which have user_id = 1234 AND type != "SPECIAL".
I don't know if this means for the query -
--key-condition-expression "user_id = 1234 AND type != 'SPECIAL'"
I understand that an AWS DynamoDb table cannot be queried using multiple indexes at the same time,
I came across the following question, it has a similar use case and the answer is recommending creating a composite key,
Querying with multiple local Secondary Index Dynamodb
Does it mean that while putting a new item in the table,
I will need to maintain another column like user_id_type,
with its value as 1234SPECIAL and create an Index / GSI for user_id_type ?
Sample item structure -
{
"cart_id": "5e4d0f9f-f08c-45ae-986a-f1b5ac7b7c13",
"user_id": 1234,
"type": "OTHER",
"user_id_type" : "1234OTHER",
"currency": "INR",
"created_date": 132432423,
"expiry": 132432425,
"total_amount": 90000,
"total_quantity": 2,
"items": [
{
"amount": 90000,
"category": "Laptops",
"name": "Apple MacBook Pro",
"quantity": 1
}
]
}
References -
1. Querying with multiple local Secondary Index Dynamodb
2. Is there a way to query multiple hash keys in DynamoDB?
Your assumption is correct. Maybe you can add into that a delimitter field1_field2 or hash them if either of them is too big in size hashOfField1_hashOfField2
That mean spending some more processing power on your side, however. As DynamoDB does not natively support It.
Composite key in DynamoDB with more than 2 columns?
Dynamodb: query using more than two attributes
Additional info on your use case
KeyConditionExpression only allowed for the hash key.
You can put it in the FilterExpression
Why is there no **not equal** comparison in DynamoDB queries?
Does it mean that while putting a new item in the table,
I will need to maintain another column like user_id_type,
with its value as 1234SPECIAL and create an Index / GSI for user_id_type?
The answer is it depends on how many columns (dynamodb is schema-less, by a column I mean data field) you need and are you happy with 2 round trips to DB.
your query:
user_id = 1234 AND type != "SPECIAL"
1- if you need all information in the cart but you are happy with two round trips:
Solution: Create a GSI with user_id (HASH) and type (RANGE), then add cart_id (base table Hash key) as projection.
Explanation: so, you need one query on index table to get the cart_id given user_id and type
--key-condition-expression "user_id = 1234 AND type != 'SPECIAL'"
then you need to use cart_id(s) from the result and make another query to the base table
2- if you do not need all of cart information.
Solution: you need to create a GSI and make user_id HASH and type as RANGE and add more columns (columns you need) to projections.
Explanation: projection is additional columns you want to have in your index table. So, add some extra columns, which are more likely to be used as a result of the query, to avoid an extra round trip to the base table
Note: adding too many extra columns can double your costs, as any update on base table results in updates in GSI tables projection fields)
3- if you want just one round trip and you need all data
then you need to manage it by yourself and your suggestion can be applied
One possible answer is to create a single index with a sort key. Then you can do this:
{
TableName: "...",
IndexName: "UserIdAndTypeIndex",
KeyConditionExpression: "user_id = :user_id AND type != :type",
ExpressionAttributeValues: {
":user_id": 1234,
":type": "SPECIAL"
}
}
You can build GraphQL schema with AWS AppSync from your DynamoDB table and than query it in your app with GraphQL. Link

Is BigQuery possible to assign more than 10 thousands paremeters at `IN` query?

I defined following schema in BigQuery
[
{
"mode": "REQUIRED",
"name": "customer_id",
"type": "STRING"
},
{
"mode": "REPEATED",
"name": "segments",
"type": "RECORD",
"fields": [
{
"mode": "REQUIRED",
"name": "segment_id",
"type": "STRING"
}
]
}
]
I try to insert a new segment_id to specific customer ids something like this:
#standardSQL
UPDATE `sample-project.customer_segments.segments`
SET segments = ARRAY(
SELECT segment FROM UNNEST(segments) AS segment
UNION ALL
SELECT STRUCT('NEW_SEGMENT')
)
WHERE customer_id IN ('0000000000', '0000000001', '0000000002')
Is it possible to assign more than 10 thousands cusomer_id to IN query at BigQuery?
Is it possible to assign more than 10 thousands cusomer_id to IN query at BigQuery?
Assuming (based on example in your question) the length of customer_id is around 10 chars plus three chars for apostrophes and comma you will and up with extra around 130KB which is within limit of 250KB (see more in Quotas & Limits)
So, you should be fine with 10K and easily can calculate the limit - looks like limit will go around 19K
Just to clarify:
I meant below limitations (mostly first one)
Maximum unresolved query length — 256 KB
Maximum resolved query length — 12 MB
When working with a long list of possible values, it's a good idea to use a query parameter instead of inlining the entire list into the query, assuming you are working with the command line client or API. For example,
#standardSQL
UPDATE `sample-project.customer_segments.segments`
SET segments = ARRAY(
SELECT segment FROM UNNEST(segments) AS segment
UNION ALL
SELECT STRUCT('NEW_SEGMENT')
)
WHERE customer_id IN UNNEST(#customer_ids)
Here you would create a query parameter of type ARRAY<STRING> containing the customer IDs.

Handling missing and new fields in tableSchema of BigQuery in Google Cloud Dataflow

Here is the situation:
My BigQuery TableSchema is as follows:
{
"name": "Id",
"type": "INTEGER",
"mode": "nullable"
},
{
"name": "Address",
"type": "RECORD",
"mode": "repeated",
"fields":[
{
"name": "Street",
"type": "STRING",
"mode": "nullable"
},
{
"name": "City",
"type": "STRING",
"mode": "nullable"
}
]
}
I am reading data from a Google Cloud Storage Bucket and writing in to BigQuery using a cloud function.
I have defined TableSchema in my cloud function as:
table_schema = bigquery.TableSchema()
Id_schema = bigquery.TableFieldSchema()
Id_schema.name = 'Id'
Id_schema.type = 'INTEGER'
Id_schema.mode = 'nullable'
table_schema.fields.append(Id_schema)
Address_schema = bigquery.TableFieldSchema()
Address_schema.name = 'Address'
Address_schema.type = 'RECORD'
Address_schema.mode = 'repeated'
Street_schema = bigquery.TableFieldSchema()
Street_schema.name = 'Street'
Street_schema.type = 'STRING'
Street_schema.mode = 'nullable'
Address_schema.fields.append(Street_schema)
table_schema.fields.append(Address_schema)
City_schema = bigquery.TableFieldSchema()
City_schema.name = 'City'
City_schema.type = 'STRING'
City_schema.mode = 'nullable'
Address_schema.fields.append(City_schema)
table_schema.fields.append(Address_schema)
My data file looks like this: (each row is json)
{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}
{"Id": 2, "Address": {"City":"Mumbai"}}
{"Id": 3, "Address": {"Street":"XYZ Road"}}
{"Id": 4}
{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}
Question:
How can I handle when the incoming data has some missing keys?
e.g.,
On row #2 of the data "Street" is missing
On row #3 of the data "City" is missing
On row #4 of the data "Address" is missing
On row #5 of the data "PhoneNumber" shows up..
Question 1: How to handle WriteToBigQuery if the data in missing (e.g., row #2,#3,#4)
Question 2: How to handle if a new field shows up in the data?
e.g.,
On row #5 "PhoneNumber" shows up..
How can I add a new column in BigQuery table on the fly?
(Do I have have to define the BigQuery table schema exhaustive enough at first in order to accommodate such newly added fields?)
Question 3: How can I iterate through each row (while reading data file) of the incoming data file and determine which fields to parse?
One of the option for you is - instead of straggling with schema changes I would recommend to write your data into table with just one field line of type string - and apply schema logic on fly during the querying
Below example is for BigQuery Standard SQL of how to apply schema on fly against table with whole row in one field
#standardSQL
WITH t AS (
SELECT '{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}' line UNION ALL
SELECT '{"Id": 2, "Address": {"City":"Mumbai"}}' UNION ALL
SELECT '{"Id": 3, "Address": {"Street":"XYZ Road"}}' UNION ALL
SELECT '{"Id": 4} ' UNION ALL
SELECT '{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}'
)
SELECT
JSON_EXTRACT_SCALAR(line, '$.Id') id,
JSON_EXTRACT_SCALAR(line, '$.PhoneNumber') PhoneNumber,
JSON_EXTRACT_SCALAR(line, '$[Address].Street') Street,
JSON_EXTRACT_SCALAR(line, '$[Address].City') City
FROM t
with result as below
Row id PhoneNumber Street City
1 1 null MG Road Pune
2 2 null null Mumbai
3 3 null XYZ Road null
4 4 null null null
5 5 12345678 ABCD Road Bangalore
I think this approach answers/addresses all your four questions
Question: How can I handle when the incoming data has some missing keys?
Question 1: How to handle WriteToBigQuery if the data in missing (e.g., row #2,#3,#4)
Question 2: How to handle if a new field shows up in the data?
I recommend decoding the JSON string to some data structure, for example a custom Contact class, where you can access and manipulate member variables and define which members are optional and which are required. Using a custom class gives you a level of abstraction so that downstream transforms in the pipeline don't need to worry about how to manipulate JSON. A downstream transform can be implemented to build a TableRow from a Contact object and also adhere to the BigQuery table schema. This design follows general abstraction and separation of concerns principles and is able to handle all scenarios of missing or additional fields.
Question 3: How can I iterate through each row (while reading data file) of the incoming data file and determine which fields to parse?
Dataflow's execution of the pipeline does this automatically. If the pipeline reads from Google Cloud Storage (using TextIO for example), then Dataflow will process each line of the file as an individual element (individual JSON string). Determining which fields to parse is a detail of the business logic and can be defined in a transform which parses the JSON string.