AWS Glue - finding schema reference for table - amazon-web-services

Using AWS CDK 2 for creating schemas and tables, I seem to have problems linking the schemaReference
const schema = new glue.CfnSchema(this, "User", {
compatibility: "NONE",
dataFormat: "JSON",
name: "user",
schemaDefinition: JSON.stringify(userSchema),
});
new glue.CfnTable(
this,
"UserTable",
{
catalogId: this.account,
databaseName: "my_db",
tableInput: {
name: "users",
tableType: "EXTERNAL_TABLE",
storageDescriptor: {
location: "my_db.public.users",
schemaReference: schema,
},
parameters: {
classification: "postgresql",
typeOfData: "table",
connectionName: "rds_conn",
},
},
}
);
It seems like I'd expect schemaReference to be able to use the Cfn output in some way? I can only get this working by hard coding a schemaReference object with a schemaVersionId that I find in the console.

My solution was to lock the schema version in its definition, then to reference the schema by name. Example
new glue.CfnSchema(this, "User", {
name: "user",
// ...
checkpointVersion: {
versionNumber: 1,
},
});
new glue.CfnTable(
this,
"UserTable",
{
// ...
tableInput: {
// ...
storageDescriptor: {
// ...
schemaReference: {
schemaId: {
registryName: "default-registry",
schemaName: "user",
},
schemaVersionNumber: 1,
},
},
},
}
);
Though verbose, it has the advantage of being portable across stacks.

Related

DynamoDB return the modified document (Old or new) when using TransactWrtieItems

Is there a way of making transactWriteItem return the document it updated?
const transactionParams = {
ReturnConsumedCapacity: "INDEXES",
TransactItems: [
{
Delete: {
TableName: reactionTableName,
Key: {
"SOME_PK_",
"SOME_SK_",
},
ReturnValues: 'ALL_OLD',
},
},
{
Update: {
TableName: reviewTableName,
Key: { PK: "SOME_PK", SK: "SOME_SK" },
ReturnValues: 'ALL_OLD',
},
},
],
};
try {
const result = await docClient.transactWrite(transactionParams).promise();
} catch (error) {
context.done(error, null);
}
For example in the above code get the documents that were touched (before or after update)?
No, TransactWriteItems API does not provide the ability to return values of a modified item, however, you could obtain those values using DynamoDB Streams, otherwise you would need to default to the singleton APIs UpdateItem/DeleteItem which are not ACID compliant together.

Can I validate the length of something by middy?

const baseHandler: APIGatewayProxyHandlerV2 = async (event) => {
return service.create(event.body);
}
const inputSchema = {
type: "object",
properties: {
body: {
type: "object",
properties: {
year: { type: "number" },
questionId: { type: "string" },
propSeq: { type: "number" },
questionTitle: { type: "string" },
propContent: { type: "string" },
isTrue: { type: "boolean" },
chapter: { type: "number" }
},
required: ["year", "questionId", "propSeq", "questionTitle", "propContent", "isTrue", "chapter"],
},
},
};
export const handler = middy(baseHandler)
.use(jsonBodyParser())
.use(validator({inputSchema}))
.use(httpErrorHandler())
I'm writing AWS Lambda code on Serverless Framework.
I wanted request body validator like express-validator, so I found middy.
But it looks impossible to validate the length of something.
I want to force the length of year to 4.
for example, 2023(o), 23(x)
properties: {
year: { type: "number", length: 4 }
}
As you guess, length property cannot be understood.
I don't want to add some codes to baseHandler function to validate the length.
Thank you in advance.
Middy uses JSONSchema, so you can use anything that is compatible with JSONSchema there. You could use length, but then you'd need to switch the type to string from number, as length is not supported for number (rightfully so in my opinion). If you want to keep it as number, then probably using range-based validation is your best bet: https://json-schema.org/understanding-json-schema/reference/numeric.html#range

Cubejs Access Other Cubes By Variable

Given the following Cube, I can use a function to create dynamic measures.
cube(`Creatives`, {
sql: `SELECT * FROM public.creatives`,
joins: {
Events: {
relationship: `hasMany`,
sql: `${Events}.creative_id = ${CUBE}.id`,
},
},
measures: {
...['impression'].reduce((all, event) => {
return {
...all,
[`Total_${event}_events`]: {
type: `count`,
title: `Total ${event} events`,
filters: [
{
sql: `${Events.type} = '${event}'`,
},
],
},
};
}, {}),
},
dimensions: {
...
},
});
But when I try to move the reducer to a function, similar to the examples I get
ReferenceEvents is not defined
Which obviously is because there's no variable within the scope of my function, where previously Cubes interpolation of the string was replacing it for me.
How can I get access to other Cubes in functions similar to the below? I.e. get (or pass in Events to createTotalEventsMeasure()
const createTotalEventsMeasure = (event) => ({
[`Total_${event}_events`]: {
type: `count`,
title: `Total ${event} events`,
filters: [
{
sql: (CUBE) => `${Events.type} = '${event}'`,
},
],
},
});
cube(`Creatives`, {
sql: `SELECT * FROM public.creatives`,
joins: {
Events: {
relationship: `hasMany`,
sql: `${Events}.creative_id = ${CUBE}.id`,
},
},
measures: {
...['impression'].reduce(
(all, event) => ({
...all,
...createTotalEventsMeasure(event),
}),
{}
),
},
dimensions: {
...
},
});

How do I check if all resources in my CDK stack have certain properties?

I'm fairly new to the AWS CDK. I just found out about the aws-cdk/assert module, which is a good reason for me to get more into test-driven development. My main difficulty right now is that I don't entirely understand how to test if all resources of a certain type pass a test. I'm only able to test if there is any resource matching.
Right now I have a combination of expectCDK(stack).to(countResources('AWS::S3::Bucket', 2)) to see if I produce the expected number of buckets, followed by two separate tests to check of they both are private and encrypted.
If I use the following code, it will pass because it simply looks for any resource that has a match (one out of two)
expectCDK(stack).to(haveResource('AWS::S3::Bucket', {
"AccessControl": "Private",
"BucketEncryption": {
"ServerSideEncryptionConfiguration": [
{
"ServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}
]
},
"VersioningConfiguration": {
"Status": "Enabled"
}
}))
Right now it's just two test buckets, but I want to make "least privilege principle" checks for IAM roles later. Given that solutions can have a lot of different roles, I don't want to skip any of them.
Is there a clever way to test if all my buckets are private and encrypted? I wouldn't mind writing testing the synthesized template, but I feel like the expectCDK is a bit closer to the source.
I was able to accomplish this with a little be of complexity:
test("no s3 buckets should be public", () => {
expect(stack).not.toHaveResourceLike("AWS::S3::Bucket", {
PublicAccessBlockConfiguration: ABSENT,
});
expect(stack).not.toHaveResourceLike("AWS::S3::Bucket", {
PublicAccessBlockConfiguration: notMatching(
exactValue({
BlockPublicAcls: true,
BlockPublicPolicy: true,
IgnorePublicAcls: true,
RestrictPublicBuckets: true,
})
),
});
});
test("all s3 buckets should be s3_managed encrypted", () => {
expect(stack).not.toHaveResourceLike("AWS::S3::Bucket", {
BucketEncryption: ABSENT,
});
expect(stack).not.toHaveResourceLike("AWS::S3::Bucket", {
BucketEncryption: notMatching(
exactValue({
ServerSideEncryptionConfiguration: [
{
ServerSideEncryptionByDefault: {
SSEAlgorithm: "AES256",
},
},
],
})
),
});
});
This may be late but this may be what you are looking for:
"AccessControl": "Private",
"BucketEncryption": {
"ServerSideEncryptionConfiguration": [
{
"ServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}
]
},
"VersioningConfiguration": {
"Status": "Enabled"
}
}))
Cheers !

How to aggregate data from another cube in cubejs?

I have the following cubes (I'm only showing the data necessary to reproduce the problem):
SentMessages:
cube(`SentMessages`, {
sql: `Select * from messages_sent`,
dimensions: {
campaignId: {
sql: `campaign_id`,
type: `number`
},
phone: {
sql: `phone_number`,
type: `number`
}
}
});
Campaigns:
cube(`Campaign`, {
sql: `SELECT * FROM campaign`,
joins: {
SentMessages: {
sql: `${Campaign}.id = ${SentMessages}.campaign_id`,
relationship: `hasMany`
}
},
measures: {
messageSentCount: {
sql: `${SentMessages}.phone`,
type: `count`
}
},
dimensions: {
name: {
sql: `name`,
type: `string`
},
}
});
The query being sent looks like this:
"query": {
"dimensions": ["Campaign.name"],
"timeDimensions": [
{
"dimension": "Campaign.createdOn",
"granularity": "day"
}
],
"measures": [
"Campaign.messageSentCount"
],
"filters": []
},
"authInfo": {
"iat": 1578961890,
"exp": 1579048290
},
"requestId": "da7bf907-90de-4ba0-80f8-1a802dd442f6"
For some reason this is resulting in the following error:
Error: 'Campaign.messageSentCount' references cubes that lead to row multiplication. Please rewrite it using sub query.
I've searched quite a bit on this error and cant find anything. Can someone please help or provide some insight into the problem? It would be really nice if the framework could show the erroneous sql generated just for troubleshooting purposes.
Campaign has many SentMessages and if joined to calculate Campaign.messageSentCount this calculation results might be affected. There's a simple check that ensures there're no hasMany cubes referenced inside aggregation function. This simple sanity check is required to avoid situation which leads to incorrect calculation results. For example if ReceivedMessages is also added as a join to the Campaign then Campaign.messageSentCount will generate incorrect results if ReceivedMessages and SentMessages are selected simultaneously.
To avoid this sanity check error, substitution with sub query is expected here as follows:
SentMessages:
cube(`SentMessages`, {
sql: `Select * from messages_sent`,
measures: {
count: {
type: `count`
}
},
dimensions: {
campaignId: {
sql: `campaign_id`,
type: `number`
},
phone: {
sql: `phone_number`,
type: `number`
}
}
});
Campaigns:
cube(`Campaign`, {
sql: `SELECT * FROM campaign`,
joins: {
SentMessages: {
sql: `${Campaign}.id = ${SentMessages}.campaign_id`,
relationship: `hasMany`
}
},
measures: {
totalMessageSendCount: {
sql: `${messageSentCount}`,
type: `sum`
}
},
dimensions: {
messageSentCount: {
sql: `${SentMessages.count}`,
type: `number`,
subQuery: true
},
name: {
sql: `name`,
type: `string`
},
}
});
For cases where Campaign.messageSentCount doesn't make any sense as a dimension, schema can be simplified and SentMessages.count can be used directly.
I figured part of this out on my own (at least the solution part), figured I'd post in case anyone else was having difficulty:
It appears that this definition is problematic (and uncessary):
messageSentCount: {
sql: `${SentMessages}.phone`,
type: `count`
}
I believe the correct way to do this is to add a measure to the table you want the COUNT to be applied to. In this query I want a count of SentMessages.phone (as shown above), so the following should be added to the SentMessages cube.
count: {
sql: `phone`
type: `count`,
},
Then the query works simply as follows:
"query": {
"dimensions": [
"Campaign.name"
],
"timeDimensions": [
{
"dimension": "SentMessages.createdOn",
"granularity": "day"
}
],
"measures": [
"SentMessages.count"
],
"filters": []
},
"authInfo": {
"iat": 1578964732,
"exp": 1579051132
},
"requestId": "c84b4596-2ee8-48e7-8e0a-974eb284dde3"
And it works as expected. I still don't understand the row multiplication error and why this measure doesn't work if placed on the Campaign cube. I will wait to accept this answer as i found this experimentally and still unclear of the problem.