Oracle Apex 22.21 - REST data source - nested JSON array - discovery - oracle-apex

I need to get APEX Rest Data Source to parse my JSON which has a nested array. I've read that JSON nested arrays are not supported but there must be a way.
I have a REST API that returns data via JSON as per below. On Apex, I've created a REST data source following the tutorial on this Oracle blog link
However, Auto-Discovery does not 'discover' the nested array. It only returns the root level data.
[ {
"order_number": "so1223",
"order_date": "2022-07-01",
"full_name": "Carny Coulter",
"email": "ccoulter2#ovh.net",
"credit_card": "3545556133694494",
"city": "Myhiya",
"state": "CA",
"zip_code": "12345",
"lines": [
{
"product": "Beans - Fava, Canned",
"quantity": 1,
"price": 1.99
},
{
"product": "Edible Flower - Mixed",
"quantity": 1,
"price": 1.50
}
]
},
{
"order_number": "so2244",
"order_date": "2022-12-28",
"full_name": "Liam Shawcross",
"email": "lshawcross5#exblog.jp",
"credit_card": "6331104669953298",
"city": "Humaitá",
"state": "NY",
"zip_code": "98670",
"lines": [
{
"order_id": 5,
"product": "Beans - Green",
"quantity": 2,
"price": 4.33
},
{
"order_id": 1,
"product": "Grapefruit - Pink",
"quantity": 5,
"price": 5.00
}
]
},
]
So in the JSON above, it only 'discovers' order_numbers up to zip_code. The 'lines' array with attributes order_id, product, quantity, & price do not get 'discovered'.
I found this SO question in which Carsten instructs to create the Rest Data Source manually. I've tried changing the Row Selector to "." (a dot) and leaving it blank. That still returns the root level data.
Changing the Row Selector to 'lines' returns only 1 array for each 'lines'
So in the JSON example above, it would only 'discover':
{
"product": "Beans - Fava, Canned",
"quantity": 1,
"price": 1.99
}
{
"order_id": 5,
"product": "Beans - Green",
"quantity": 2,
"price": 4.33
}
and not the complete array..
This is how the Data Profile is set up when creating Data Source manually.
There's another SO question with a similar situation so I followed some steps such as selecting the data type for 'lines' as JSON Document. I feel I've tried almost every selector & data type. But obviously not enough.
The docs are not very helpful on this subject and it's been difficult finding links on Google, Oracle Blogs, or SO.
My end goal would be to have two tables as below auto synchronizing from the API.
orders
id pk
order_number num
order_date date
full_name vc(200)
email vc(200)
credit_card num
city vc(200)
state vc(200)
zip_code num
lines
order_id /fk orders
product vc(200)
quantity num
price num
view orders_view orders lines

As you're correctly stating, REST Data Sources do not support nested arrays - a REST Source can only "extract" one flat table from the JSON response. In your example, the JSON as such is an array ("orders"). The Row Selector in the Data Profile would thus be "." (to select the "root node").
That gives you all the order attributes, but discovery would skip the lines array. However, you can manually add a column to the Data Profile, of the JSON Document data type, and using lines as the selector.
As a result, you'd still get a flat table from the REST Data Source, but that table contains a LINES column, which contains the "JSON Fragment" for the order line items. You could then synchronize the REST Source to a local table ("REST Synchronization"), then you can use some custom code to extract the JSON fragments to a ORDER_LINES child table.
Does that help?

Related

Group By with Django's queryset

I have a model in Django and this is how it looks like with fewer fields -
I want to group the rows w.r.t buy_price_per_unit and at the same time I also want to know the total units on sale for that buy_price_per_unit.
So in our case only two distinct buy_price_per_unit are available (9, 10). Hence the query would return only two rows like this -
The one last condition which I have to meet is the query result should be in descending order of buy_price_per_unit.
This is what I have tried so far -
orders = Orders.objects.values('id', 'buy_price_per_unit')\
.annotate(units=Sum("units"))\
.order_by("-buy_price_per_unit")\
The response for the query above was -
[
{
"id": 13,
"buy_price_per_unit": 10,
"units": 1
},
{
"id": 12,
"buy_price_per_unit": 9,
"units": 10
},
{
"id": 14,
"buy_price_per_unit": 9,
"units": 2
},
{
"id": 15,
"buy_price_per_unit": 9,
"units": 1
}
]
The problem with this response is that even for the same price multiple records are being returned.
This is happening because you have id in .values and based on the underlying query, it is grouping on id and buy_price_per_unit both.
So simply remove id from .values
orders = Orders.objects.values('buy_price_per_unit')\
.annotate(units=Sum("units"))\
.order_by("-buy_price_per_unit")\

How to extract more than label text items in a single annotation using Google NLP

I have created dataset using Google NLP Entity extraction and I uploaded input data's(train, test, validation jsonl files) like NLP format that will be stored in google storage bucket.
Sample Annotation:
{
"annotations": [{
"text_extraction": {
"text_segment": {
"end_offset": 10,
"start_offset": 0
}
},
"display_name": "Name"
}],
"text_snippet": {
"content": "JJ's Pizza\n "
}
} {
"annotations": [{
"text_extraction": {
"text_segment": {
"end_offset": 9,
"start_offset": 0
}
},
"display_name": "City"
}],
"text_snippet": {
"content": "San Francisco\n "
}
}
Here is the input text to predict the label as "Name", "City" and "State"
Best J J's Pizza in San Francisco, CA
Result in the following screenshot,
I expect the predicted results would be in the following,
Name : JJ's Pizza
City : San Francisco
State: CA
According to the sample annotation you provided, you're setting the whole text_snippet to be a name (or whatever field you want to extract).
This can confuse the model in understanding that all the text is that entity.
It would be better to have training data similar to the one in the documentation. In there, there is a big chunk of text and then we annotate the entities that we want extracted from there.
As an example, let's say that from these text snippets I tell the model that the cursive part is an entity named a, while the bold part is an entity called b:
JJ Pizza
LL Burritos
Kebab MM
Shushi NN
San Francisco
NY
Washington
Los Angeles
Then, when then the model reads Best JJ Pizza, it thinks all is a single entity (we trained the model with this assumption), and it will just choose the one it matches the best (in this case, it would likely say it's an a entity).
However, if I provide the following text sample (also annotated like cursive is entity a and bold is entity b):
The best pizza place in San Francisco is JJ Pizza.
For a luxurious experience, do not forget to visit LL Burritos when you're around NY.
I once visited Kebab MM, but there are better options in Washington.
You can find Shushi NN in Los Angles
You can see how you're training the model to find the entities within a piece of text, and it will try to extract them according to the context.
The important part about training the model is providing training data as similar to real-life data as possible.
In the example you provided, if the data in your real-life scenario is going to be in the format <ADJECTIVE> <NAME> <CITY>, then your training data should have that same format:
{
"annotations": [{
"text_extraction": {
"text_segment": {
"end_offset": 16,
"start_offset": 6
}
},
"display_name": "Name"
},
{
"text_extraction": {
"text_segment": {
"end_offset": 30,
"start_offset": 21
}
},
"display_name": "City"
}],
"text_snippet": {
"content": "Worst JJ's Pizza in San Francisco\n "
}
}
Note that the point of a Natural Language ML model is to process natural language. If your inputs are going to look as similar/simple/short as that, then it might not be worth going the ML route. A simple regex should be enough. Without the natural language part, it is going to be hard to properly train a model. More details in the beginners guide.

AWS DynamoDB Golang issue with inserting items into table

I've been following Miguel C's tutorial on setting up a DynamoDB table in golang but modified my json to look like this instead of using movies. I modified the movie struct into a Fruit struct (so there is no more info) and in my schema I defined the partition key as "Name" and the Sort Key as "Price". But when I run my code it says
"ValidationException: One of the required keys was not given a value"
despite me printing out the input as
map[name:{
S: "bananas"
} price:{
N: "0.25"
}]
which clearly shows that String bananas and Number 0.25 both have values in them.
My Json below looks like this:
[
{
"name": "bananas",
"price": 0.25
},
{
"name": "apples",
"price": 0.50
}
]
Capitalization issue, changed "name" to "Name" and it worked out.

PowerBI Custom Visual - Table data binding

Also asked this on the PowerBI forum.
I am trying to change sampleBarChart PowerBI visual to use a "table" data binding instead of current "categorical". First goal is to build a simple table visual, with inputs "X", "Y" and "Value".
Both data bindings are described on the official wiki. This is all I could find:
I cannot find any example visuals which use it and are based on the new API.
From the image above, a table object has "rows", "columns", "totals" and "identities". So it looks like rows and columns are my x/y indexes, and totals are my values?
This is what I tried. (Naming is slightly off as most of it came from existing barchart code)
Data roles:
{ "displayName": "Category1 Data",
"name": "category1",
"kind": 0},
{ "displayName": "Category2 Data",
"name": "category2",
"kind": 0},
{ "displayName": "Measure Data",
"name": "measure",
"kind": 1}
Data view mapping:
"table": {
"rows": {"for": {"in": "category1"}},
"columns": {"for": {"in": "category2"}},
"totals": {"select": [{"bind": {"to": "measure"}}]}
}
Data Point class:
interface BarChartDataPoint {
value: number;
category1: number;
category2: number;
color: string;
};
Relevant parts of my visualTransform():
...
let category1 = categorical.rows;
let category2 = categorical.columns;
let dataValue = categorical.totals;
...
for (let i = 1, len = category1.length; i <= len; i++) {
for (let j = 1, jlen = category2.length; j <= jlen; j++) {
{
barChartDataPoints.push({
category1: i,
category2: j,
value: dataValue[i,j],
color: "#555555"//for now
});
}
...
Test data looks like this:
__1_2_3_
1|4 4 3
2|4 5 5
3|3 6 7 (total = 41)
The code above fills barChartDataPoints with just six data points:
(1; 1; 41),
(1; 2; undefined),
(2; 1; 41),
(2; 2; undefined),
(3; 1; 41),
(3; 2; undefined).
Accessing zero indeces results in nulls.
Q: Is totals not the right measure to access value at (x;y)? What am I doing wrong?
Any help or direction is very appreciated.
User #RichardL shared this link on the PowerBI forum. Which helped quite a lot.
"Totals" is not the right measure to access value at (x;y).
It turns out Columns contain column names, and Rows contain value arrays which correspond to those columns.
From the link above, this is how table structure looks like:
{
"columns":[
{"displayName": "Year"},
{"displayName": "Country"},
{"displayName": "Cost"}
],
"rows":[
[2014, "Japan", 25],
[2015, "Japan", 30],
[2016, "Japan", 18],
[2015, "North America", 14],
[2016, "North America", 30],
[2016, "China", 100]
]
}
You can also view the data as your visual receives it by placing this
window.alert(JSON.stringify(options.dataViews))
In your update() method. Or write it in html contents of your visual.
This was very helpful but it shows up a few fundamental problems with the data management of PowerBI for a custom visual. There is no documentation and the process from Roles to mapping to visualTransform is horrendous because it takes so much effort to rebuild the data into a format that is usable consistently with D3.
Commenting on user5226582's example, for me, columns is presented in a form where I have to look up the Roles property to be able to understand the order of data presented in the rows column array. displayName offers no certainty. For exampe, if a user uses the same field in two different dataRoles then it all gets crazily awry.
I think the safest approach is to build a new array inside visualTransform using the known well-field names (the "name" property in dataRoles), then iterate columns interrogating the Roles property to establish an index to the rows array items. Then use that index to populate the new array reliably. D3 then gobbles that up.
I know that's crazy, but at least it means reasonably consistent data and allows for the user selecting the same data field more than once or choosing count instead of column value.
All in all, I think this area needs a lot of attention before custom Visuals can really take off.

kairosdb aggregate group by

I have one year's 15 minute interval data in my kairosdb. I need to do following things sequentially:
- filter data using a tag
- group filtered data using few tags. I am not specifying values of tags because I want them to automatically grouped by tag values at runtime.
- once grouped on those tags, I want to aggregate sum 15 min interval data into a month.
I wrote this query to run from python script based on information available on kairosdb google code forum. But the aggregated values seem incorrect. Output seem skewed. I want to understand where I am going wrong. I am doing this in python. Here is my json query:
agg_query = {
"start_absolute": 1412136000000,
"end_absolute": 1446264000000,
"metrics":[
{
"tags": {
"insert_date": ["11/17/2015"]
},
"name": "gb_demo",
"group_by": [
{
"name": "time",
"range_size": {
"value": "1",
"unit": "months"
},
"group_count": "12"
},
{
"name": "tag",
"tags": ["usage_kind","building_snapshot_id","usage_point_id","interval"]
}
],
"aggregators": [
{
"name": "sum",
"sampling": {
"value": 1,
"unit": "months"
}
}
]
}
]
}
For reference: Data is something like this:
[[1441065600000,53488],[1441066500000,43400],[1441067400000,44936],[1441068300000,48736],[1441069200000,51472],[1441070100000,43904],[1441071000000,42368],[1441071900000,41400],[1441072800000,28936],[1441073700000,34896],[1441074600000,29216],[1441075500000,26040],[1441076400000,24224],[1441077300000,27296],[1441078200000,37288],[1441079100000,30184],[1441080000000,27824],[1441080900000,27960],[1441081800000,28056],[1441082700000,29264],[1441083600000,33272],[1441084500000,33312],[1441085400000,29360],[1441086300000,28400],[1441087200000,28168],[1441088100000,28944],[1443657600000,42112],[1443658500000,36712],[1443659400000,38440],[1443660300000,38824],[1443661200000,43440],[1443662100000,42632],[1443663000000,42984],[1443663900000,42952],[1443664800000,36112],[1443665700000,33680],[1443666600000,33376],[1443667500000,28616],[1443668400000,31688],[1443669300000,30872],[1443670200000,28200],[1443671100000,27792],[1443672000000,27464],[1443672900000,27240],[1443673800000,27760],[1443674700000,27232],[1443675600000,27824],[1443676500000,27264],[1443677400000,27328],[1443678300000,27576],[1443679200000,27136],[1443680100000,26856]]
This is snapshot of some data from Sep and Oct 2015. When I run this, if I give start timestamp of Sep, it will sum Sep data correctly, but for october it doesn't.
I believe your group by time will create groups by calendar month (January to December), but your sum aggregator will sum values by a running month starting withyour start date... Which seems a bit weird. COuld that be the cause of what you see?
What is the data like? What is the aggregated result like?