How to pass dynamic values in sql in cube.js schema? - cube.js

Somewhere I want status=2 or status<3. Instead of writing separate schemas, how to reuse by passing dynamic values to the status field in SQL?
cube(`OrderFacts`, {
sql: `SELECT * FROM orders WHERE status>3`, // <--- I want to pass dynamic values to the condition
measures: {
count: {
type: `count`
}
},
dimensions: {
date: {
sql: `date`,
type: `time`
}
}
});

Please refer to the Unsafe Value section of the Context Variables documentation:
cube(`Orders`, {
sql: `SELECT * FROM orders WHERE status > ${SECURITY_CONTEXT.status.unsafeValue()`,
});
Generally speaking, it's best not to do this though; instead I'd recommend using segments to achieve the same functionality (assuming there aren't a lot of values for status)

Related

How to query DynamoDB GSI with compound conditions

I have a DynamoDB table called 'frank' with a single GSI. The partition key is called PK, the sort key is called SK, the GSI partition key is called GSI1_PK and the GSI sort key is called GSI1_SK. I have a single 'data' map storing the actual data.
Populated with some test data it looks like this:
The GSI partition key and sort key map directly to the attributes with the same names within the table.
I can run a partiql query to grab the results that are shown in the image. Here's the partiql code:
select PK, SK, GSI1_PK, GSI1_SK, data from "frank"."GSI1"
where
("GSI1_PK"='tesla')
and
(
( "GSI1_SK" >= 'A_VISITOR#2021-06-01-00-00-00-000' and "GSI1_SK" <= 'A_VISITOR#2021-06-20-23-59-59-999' )
or
( "GSI1_SK" >= 'B_INTERACTION#2021-06-01-00-00-00-000' and "GSI1_SK" <= 'B_INTERACTION#2021-06-20-23-59-59-999' )
)
Note how the partiql code references "GSI1_SK" multiple times. The partiql query works, and returns the data shown in the image. All great so far.
However, I now want to move this into a Lambda function. How do I structure a AWS.DynamoDB.DocumentClient query to do exactly what this partiql query is doing?
I can get this to work in my Lambda function:
const visitorStart="A_VISITOR#2021-06-01-00-00-00-000";
const visitorEnd="A_VISITOR#2021-06-20-23-59-59-999";
var params = {
TableName: "frank",
IndexName: "GSI1",
KeyConditionExpression: "#GSI1_PK=:tmn AND #GSI1_SK BETWEEN :visitorStart AND :visitorEnd",
ExpressionAttributeNames :{ "#GSI1_PK":"GSI1_PK", "#GSI1_SK":"GSI1_SK" },
ExpressionAttributeValues: {
":tmn": lowerCaseTeamName,
":visitorStart": visitorStart,
":visitorEnd": visitorEnd
}
};
const data = await documentClient.query(params).promise();
console.log(data);
But as soon as I try a more complex compound condition I get this error:
ValidationException: Invalid operator used in KeyConditionExpression: OR
Here is the more complex attempt:
const visitorStart="A_VISITOR#2021-06-01-00-00-00-000";
const visitorEnd="A_VISITOR#2021-06-20-23-59-59-999";
const interactionStart="B_INTERACTION#2021-06-01-00-00-00-000";
const interactionEnd="B_INTERACTION#2021-06-20-23-59-59-999";
var params = {
TableName: "frank",
IndexName: "GSI1",
KeyConditionExpression: "#GSI1_PK=:tmn AND (#GSI1_SK BETWEEN :visitorStart AND :visitorEnd OR #GSI1_SK BETWEEN :interactionStart AND :interactionEnd) ",
ExpressionAttributeNames :{ "#GSI1_PK":"GSI1_PK", "#GSI1_SK":"GSI1_SK" },
ExpressionAttributeValues: {
":tmn": lowerCaseTeamName,
":visitorStart": visitorStart,
":visitorEnd": visitorEnd,
":interactionStart": interactionStart,
":interactionEnd": interactionEnd
}
};
const data = await documentClient.query(params).promise();
console.log(data);
The docs say that KeyConditionExpressions don't support 'OR'. So, how do I replicate my more complex partiql query in Lambda using AWS.DynamoDB.DocumentClient?
If you look at the documentation of PartiQL for DynamoDB they do warn you, that PartiQL has no scruples to use a full table scan to get you your data: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.select.html#ql-reference.select.syntax
To ensure that a SELECT statement does not result in a full table scan, the WHERE clause condition must specify a partition key. Use the equality or IN operator.
In those cases PartiQL would run a scan and use a FilterExpression to filter out the data.
Of course in your example you provided a partition key, so I'd assume that PartiQL would run a query with the partition key and a FilterExpression to apply the rest of the condition.
You could replicate it that way, and depending on the size of your partitions this might work just fine. However, if the partition will grow beyond 1MB and most of the data would be filtered out, you'll need to deal with pagination even though you won't get any data.
Because of that I'd suggest you to simply split it up and run each or condition as a separate query, and merge the data on the client.
Unfortunately, DynamoDB does not support multiple boolean operations in the KeyConditionExpression. The partiql query you are executing is probably performing a full table scan to return the results.
If you want to replicate the partiql query using the DocumentClient, you could use the scan operation. If you want to avoid using scan, you could perform two separate query operations and join the results in your application code.

How to write a AWS AppSync response mapping template for an RDS data source

I have been following this guide for querying an Aurora Serverless database through an AppSync schema. Now I want to run a couple of queries at the same time with a request mapping like:
{
"version": "2018-05-29",
"statements": [
"SELECT * FROM MyTable WHERE category='$ctx.args.category'",
"SELECT COUNT(*) FROM MyTable WHERE category='$ctx.args.category'",
]
}
So, how to handle multiple selects in the response mapping? The page has a few examples, but none has two selects:
$utils.toJson($utils.rds.toJsonObject($ctx.result)[0]) ## For first item results
$utils.toJson($utils.rds.toJsonObject($ctx.result)[0][0]) ## For first item of first query
$utils.toJson($utils.rds.toJsonObject($ctx.result)[1][0]) ## For first item of second query
$utils.toJson($utils.rds.toJsonObject($ctx.result)??????) ## ?? For first & second item results
I predicted the response type to be like follows, but is not strict as long as I can get the values.
type MyResponse {
MyResponseItemList [MyResponseItem]
Count Int
}
type MyResponseItem {
Id: ID!
Name: String
...
}
Doing two selects will not work with AppSync.
I suggest you either break apart the two SQL queries into two different GraphQL query operations or combine the two SQL queries into one.
I faced the same issue and got this working as follows.
Instead of having Count as a direct Int type result, I converted that into a another type called PaginationResult.
type MyResponse {
MyResponseItemList [MyResponseItem]
Count PaginationResult
}
type PaginationResult {
Count Int
}
type MyResponseItem {
...
}
Response Velocity Template
#set($resMap = {
"MyResponseItemList": $utils.rds.toJsonObject($ctx.result)[0],
"Count": $utils.rds.toJsonObject($ctx.result)[1][0]
})
$util.toJson($resMap)
FWIW, I just got working a UNION ALL Appsync/RDS Request resolver query with two SELECTs:
{
"version": "2018-05-29",
"statements": ["SELECT patientIDa, patientIDb, distance FROM Distances WHERE patientIDa='$ctx.args.patientID' UNION ALL SELECT patientIDb, patientIDa, distance FROM Distances WHERE patientIDb='$ctx.args.patientID'"]
}
Not sure if this will help the OP but it may.
***Note: in my case (maybe because I'm on windows) the ENTIRE ["SELECT...] statement needs to be on one line (no cr/lf) or else graphql errors with "non-escaped character..." (testing using GraphiQL)

How to store the result of query on the current table without changing the table schema?

I have a structure
{
id: "123",
scans:[{
"scanid":"123",
"status":"sleep"
}]
},
{
id: "123",
scans:[{
"scanid":"123",
"status":"sleep"
}]
}
Query to remove duplicate:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY id)
row_number,
FROM table1
)
WHERE row_number = 1
I specified destination table as table1.
Here I have made scans as repeated records, scanid as string and status as string. But when I do some query (I am making a query to remove duplicate) and overwrite the existing table, the table schema is changed. It becomes scans_scanid(string) and scans_status(string). Scans record schema is changed now. Please suggest where am I going wrong?
It is known that NEST() is not compatible with UnFlatten Results Output and mostly is used for intermediate result in subquery.
Try below workaround
Note, I use INTEGER for id and scanid. If they should be STRING you need to
a. make change in output schema section
as well as
b. remove use of parseInt() function in t = {scanid:parseInt(x[0]), status:x[1]}
SELECT id, scans.scanid, scans.status
FROM JS(
( // input table
SELECT id, NEST(CONCAT(STRING(scanid), ',', STRING(status))) AS scans
FROM (
SELECT id, scans.scanid, scans.status
FROM (
SELECT id, scans.scanid, scans.status,
ROW_NUMBER() OVER (PARTITION BY id) AS dup
FROM table1
) WHERE dup = 1
) GROUP BY id
),
id, scans, // input columns
"[{'name': 'id', 'type': 'INTEGER'}, // output schema
{'name': 'scans', 'type': 'RECORD',
'mode': 'REPEATED',
'fields': [
{'name': 'scanid', 'type': 'INTEGER'},
{'name': 'status', 'type': 'STRING'}
]
}
]",
"function(row, emit){ // function
var c = [];
for (var i = 0; i < row.scans.length; i++) {
x = row.scans[i].toString().split(',');
t = {scanid:parseInt(x[0]), status:x[1]}
c.push(t);
};
emit({id: row.id, scans: c});
}"
)
Here I use BigQuery User-Defined Functions. They are extremely powerful yet still have some Limits and Limitations to be aware of. Also have in mind - they are quite a candidates for being qualified as expensive High-Compute queries
Complex queries can consume extraordinarily large computing resources
relative to the number of bytes processed. Typically, such queries
contain a very large number of JOIN or CROSS JOIN clauses or complex
User-defined Functions.
1) If you run the query on the web UI, the result is automatically flattened, so that's why you see the schema is changed.
You need to run your query and write to a destination table, you have options on the web UI also to do this.
2) If you don't run your query on the web UI but still see schema changed, you should make explicit selects so the schema is retained for you eg:
select 'foo' as scans.scanid
This creates for you a record like output, but it won't be a repeated record for that please read further.
3) For some use cases you may need to use the NEST(expr) function which
Aggregates all values in the current aggregation scope into a repeated
field. For example, the query "SELECT x, NEST(y) FROM ... GROUP BY x"
returns one output record for each distinct x value, and contains a
repeated field for all y values paired with x in the query input. The
NEST function requires a GROUP BY clause.
BigQuery automatically flattens query results, so if you use the NEST
function on the top level query, the results won't contain repeated
fields. Use the NEST function when using a subselect that produces
intermediate results for immediate use by the same query.

Doctrine DQL returns multiple types of entities

I have three entities: HandsetSubscription, Handset and Subscription.
The yaml of HandsetSubscription is:
App\SoBundle\Entity\HandsetSubscription:
type: entity
table: handset_subscription
manyToOne:
handset:
targetEntity: Handset
subscription:
targetEntity: Subscription
id:
id:
type: integer
generator: { strategy: AUTO }
options: { unsigned: true }
fields:
amount:
type: integer
nullable: false
options: { default: 0, unsigned: true }
discount:
type: integer
nullable: false
options: { default: 0, unsigned: true }
The query:
SELECT hs,s,h
FROM \App\SoBundle\Entity\HandsetSubscription hs
JOIN \App\SoBundle\Entity\Subscription s with s.id = hs.subscription
AND s.mins = 150
AND s.mb = 250
AND s.sms = 150
JOIN \App\SoBundle\Entity\Handset h with h.id = hs.handset ​
These are the class names of the entries retrieved:
App\SoBundle\Entity\HandsetSubscription
Proxies\__CG__\App\SoBundle\Entity\Subscription
Proxies\__CG__\App\SoBundle\Entity\Handset
App\SoBundle\Entity\HandsetSubscription
Proxies\__CG__\App\SoBundle\Entity\Handset
App\SoBundle\Entity\HandsetSubscription
Proxies\__CG__\App\SoBundle\Entity\Handset
…
I would expect to get only HandsetSubscription entities back. Why am I getting proxies of Subscription and Handset too?
By adding fetch eager to the handset and subscription mappings and removing handset and subscription from the SELECT statement in the query I would get only HandsetSubscription but I would like to do this through fetch joins, as stated in the manual (http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html#joins).
UPDATE
Quote from the link posted above:
Fetch join of the address:
<?php
$query = $em->createQuery("SELECT u, a FROM User u JOIN u.address a WHERE a.city = 'Berlin'");
$users = $query->getResult();
When Doctrine hydrates a query with fetch-join it returns the class in the FROM clause on the root level of the result array. In the previous example an array of User instances is returned and the address of each user is fetched and hydrated into the User#address variable. If you access the address Doctrine does not need to lazy load the association with another query.
Big thanks goes to veonik from the #doctrine irc channel for solving this.
Instead of joining with the fully qualified names of the entities you should join with the association. So the query becomes:
SELECT hs,s,h
FROM \App\SoBundle\Entity\HandsetSubscription hs
JOIN hs.subscription s with s.id = hs.subscription
AND s.mins = 150
AND s.mb = 250
AND s.sms = 150
JOIN hs.handset h with h.id = hs.handset

How to use MapReduce when extracting a group of document id's by some criteria from CouchDB

I'm in my first week of CouchDB experimentation and trying to stop thinking in SQL. I have a collection of documents (5000 event files) that all have some ID value that will be common to groups of documents. So there might be 10 that all have TheID: 'foobar'.
(In case someone asks - TheID is not an auto-increment value from a relational database - it is a unique id assigned by a partner company of ours. I cannot redesign my source data to identify itself some other way, I have to use this TheID field to recognise groups of documents.)
I want to query my list of documents:
{ _id: 'document1', Message: { TheID: 'foobar' } }
{ _id: 'document2', Message: { TheID: 'xyz' } }
{ _id: 'document3', Message: { TheID: 'xyz' } }
{ _id: 'document4', Message: { TheID: 'foobar' } }
{ _id: 'document5', Message: { TheID: 'wibble' } }
{ _id: 'document6', Message: { TheID: 'foobar' } }
I want the results:
'foobar': [ 'document1', 'document4', 'document6' ]
'xyz': [ 'document2', 'document3' ]
'wibble': [ 'document5' ]
The aim is to represent groups of documents on our UI grouped by TheID, so the user can see all documents for a specific TheID together, and select that TheID to drill into the data querying just by that TheID value. Yes, the string id of each document is useful - in our case, the _id value of each document is the source event identifier, so it is a unique and useful value that the user is going to want to see in the list on screen.
In SQL one might order by or group by the TheID field and iterate the result set appropriately. I doubt this thinking is any use at all with a CouchDB query.
I know that I can use a map function to extract the TheID value for each document, for example:
function (doc) {
emit(doc.Message.TheID, 1);
}
or perhaps
function (doc) {
emit(doc._id, doc.Message.TheID);
}
I'm not sure exactly what I should emit as the key and value. Even if this is useful, I'm getting the feeling that I should not use a reduce function to try to 'reduce' the large map output (1 result row per document in the database) to what I want (3 results each with a list of document id's).
http://guide.couchdb.org/draft/views.html says "A common mistake new CouchDB users make is attempting to construct complex aggregate values with a reduce function. Full reductions should result in a scalar value, like 5, and not, for instance, a JSON hash with a set of unique keys and the count of each."
I thought I might be able to use reduce to scan the results of the map and somehow collect all results that have a common TheID value into a single result object. What I see when reading the reduce documentation is that it will be given arrays of keys and values that contain fairly unpredictable collections, driven by the structure of the btree underlying the map results. It won't be given arrays guaranteed to contain all similar TheID values that I could scan for. This approach seems completely broken.
So, is a map/reduce pair the right thing to do here? Should I look at using a 'show' or 'list' instead? I'm intending to build a mustache based HTML template engine around the results, so 'list' seems the wrong way to go.
Thanks in advance for any guidance.
EDIT I have done some local dev and come up with what I think is a broken solution. Hopefully this will show you the direction I'm trying to go in. See a public cloud based CouchDB I created at https://neek.iriscouch.com/_utils/database.html?test/_design/test/_view/collectByTheID
This is public. If you would like to play, please copy it to a new view, don't pollute this one in case others come in and want to see the original.
map function:
function(doc) {
emit(doc.Message.TheID, doc._id);
}
reduce function:
function(keys, values, rereduce) {
if (!rereduce) {
return values;
} else {
var ret = [];
values.forEach(function (ar) {
ret.concat(ar);
});
return ret;
}
}
Results:
"foobar" ["document6", "document4", "document1"]
"wibble" ["document5"]
"xyz" ["document3", "document2"]
The reduce function first leaves the array of values alone, and on the second pass concatenates them together. However when I run this on my large 5000+ document database it comes up with some TheID values with empty document id arrays. I believe this suffers from the problem I mentioned before, where the array of values passed to reduce are build dependent on the btree structure of the map they are extractd from and are not guaranteed to contain a complete set of values for given keys.
Make use of the group_level feature:
Map:
emit([doc.message.TheID, doc._id], null)
Reduce:
You must include a reduce to use group_level, it can be empty as below or something else, i.e. _count
function(keys, values){
return null;
}
A query with group_level=1 would return:
/_design/d/_view/v?group_level=1
[
{key: ["foobar"], value: null},
{key: ["xyz"], value: null},
{key: ["wibble"], value: null}
]
You would use this query to populate the top level in your grouping UI. When the user expands a category, you would do another query with group_level 2 and start and end keys:
/_design/d/_view/v?group_level=2&startkey=["foobar"]&endkey=["foobar",{}]
[
{key: ["foobar", "document6"], value: null},
{key: ["foobar", "document4"], value: null},
{key: ["foobar", "document1"], value: null}
]
This doesn't produce the output exactly as you are requesting, however, I think you'll find it flexible enough