How I can create an Athena data source in AWS CDK which is a JDBC connection to a MySQL database using the AthenaJdbcConnector?
I believe I can use aws-sam's CfnApplication to create the AthenaJdbcConnector Lambda, but how can I connect it to Athena?
I notice a lot of Glue support in CDK which would transfer to Athena (data catalog), and there are several CfnDataSource types in other modules such as QuickSight, but I'm not seeing anything under Athena in CDK.
See the image and references below.
References:
https://docs.aws.amazon.com/athena/latest/ug/athena-prebuilt-data-connectors-jdbc.html
https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-jdbc
https://serverlessrepo.aws.amazon.com/applications/us-east-1/292517598671/AthenaJdbcConnector
I have been playing with the same issue. Here is what I did to create the Lambda for federated queries (Typescript):
const vpc = ec2.Vpc.fromLookup(this, 'my-project-vpc', {
vpcId: props.vpcId
});
const cluster = new rds.ServerlessCluster(this, 'AuroraCluster', {
engine: rds.DatabaseClusterEngine.AURORA_POSTGRESQL,
parameterGroup: rds.ParameterGroup.fromParameterGroupName(this, 'ParameterGroup', 'default.aurora-postgresql10'),
defaultDatabaseName: 'MyDB',
vpc,
vpcSubnets: {
onePerAz: true
},
scaling: {autoPause: cdk.Duration.seconds(0)} // Optional. If not set, then instance will pause after 5 minutes
});
let password = cluster.secret!.secretValueFromJson('password').toString()
let spillBucket = new Bucket(this, "AthenaFederatedSpill")
let lambdaApp = new CfnApplication(this, "MyDB", {
location: {
applicationId: "arn:aws:serverlessrepo:us-east-1:292517598671:applications/AthenaJdbcConnector",
semanticVersion: "2021.42.1"
},
parameters: {
DefaultConnectionString: `postgres://jdbc:postgresql://${cluster.clusterEndpoint.hostname}/MyDB?user=postgres&password=${password}`,
LambdaFunctionName: "crossref_federation",
SecretNamePrefix: `${cluster.secret?.secretName}`,
SecurityGroupIds: `${cluster.connections.securityGroups.map(value => value.securityGroupId).join(",")}`,
SpillBucket: spillBucket.bucketName,
SubnetIds: vpc.privateSubnets[0].subnetId
}
})
This creates the lambda with a default connection string like you would have it, if you used the AWS Console wizard in Athena to connect to a DataSource. Unfortunately it is NOT possible to add a Athena-catalog specific connection string via CDK. It should be set as an Environment Variable on the Lambda, and I found no way to do that. The Application template simply don't allow it, so this is a post-process by hand. I would sure like to hear from anybody if they have a solution for that!
Also notice that I add the user/password in the jdbc URL directly. I wanted to use SecretsManager, but because the Lambda is deployed in a VPC, it simply refuses to connect to the secretsmanager. I think this might be solvable by added a private VPN connection to SSM. Again - I would like to hear from anybody have tried that.
Related
I have a VPC with two ISOLATED subnets, one for my RDS Serverless cluster, and one for my Lambda functions.
But my Lambda functions all timeout when they're calling my RDS.
My question is; is this VPC + isolated subnets a working structure for API Gateway -> Lambda -> RDS, or am I trying something impossible?
Lambda:
import * as AWS from 'aws-sdk';
const rdsDataService = new AWS.RDSDataService();
const query = `SELECT * FROM information_schema.tables;`;
export const handler = async (event) => {
const params = {
secretArn: `secret arn`,
resourceArn: "rds arn",
sql: query,
database: 'db name'
};
const res = await rdsDataService.executeStatement(params).promise();
return { statusCode: 200, body: {
message: 'ok',
result: res
}};
};
My RDS and Lambda share a Security Group in which I've opened for ALL traffic (I know this isn't ideal) and my Lambda has a role with Admin Rights (also not ideal) but still only times out.
You are using the Aurora Serverless Data API. The API does not exist inside your VPC. You have chosen isolated subnets, which have no access to anything that exists outside your VPC. You will either need to switch to private subnets, or add an RDS endpoint to your VPC.
It is important to call out that RDS API != RDS Data API; the two are different. You use the RDS API for standard RDS instances; for something like Aurora Serverless, you use the RDS Data API.
For anyone running across this in the future, there is now some helpful documentation describing how to create an Amazon VPC endpoint to allow access to the RDS Data API.
If you're using Terraform to create the VPC endpoint, here's a snippet that essentially replicates the instructions from the tutorial above:
resource "aws_vpc_endpoint" "rds-data" {
vpc_id = <your-vpc-id-here>
service_name = "com.amazonaws.<your-region-here>.rds-data"
vpc_endpoint_type = "Interface"
private_dns_enabled = true
security_group_ids = [
<your-security-group-ids-here>
]
subnet_ids = [
<your-subnet-ids-here>
]
}
I have developed the application using Java and I also used the Amazon PostgreSQL database for data management. I hosted the application in Elastic beanstalk. Now, Someone suggested me to use the Amazon CloudFormation. So I created the Infrastructure code in JSON Format that also include Amazon RDS but I have some doubts.
When I use CloudFormation then that will automatically creates the new DB instance for my application but I specified another DB instance name in Java code then how it will communicate?
Please help me to clarify the doubts.
Thanks in advance...
You can configure DB URL in outputs section of CFN so that you get the required URL
CFN outputs
To get endpoint url for your AWS::RDS::DBInstance is returned using Return values:
Endpoint.Address The connection endpoint for the database. For example: mystack-mydb-1apw1j4phylrk.cg034hpkmmjt.us-east-2.rds.amazonaws.com
Endpoint.Port The port number on which the database accepts connections. For example: 3306
To get the Endpoint.Address out of your stack, you have to add Outputs section to your template. En example would be:
"Outputs": {
"DBEndpoint": {
"Description": "Endpoint for my RDS Instance",
"Value": {
"Fn::GetAtt" : [ "MyDB", "Endpoint.Address" ]}
}
}
}
Then using AWS SDK for Java you can query the Outputs of your CFN Stack to use in your Java application.
I've set up an Aurora PostgreSQL compatible database. I can connect to the database via the public address but I'm not able to connect via a Lambda function which is placed in the same VPC.
This is just a test environment and the security settings are weak. In the network settings I tried to use "no VPC" and I tried my default VPC where the database and the lambda are placed. But this doesn't make a difference.
This is my nodejs code to create a simple Select statement:
var params = {
awsSecretStoreArn: '{mySecurityARN}',
dbClusterOrInstanceArn: 'myDB-ARN',
sqlStatements: 'SELECT * FROM userrole',
database: 'test',
schema: 'user'
};
const aurora = new AWS.RDSDataService();
let userrightData = await aurora.executeSql(params).promise();
When I start my test in the AWS GUI I get following errors:
"errorType": "UnknownEndpoint",
"errorMessage": "Inaccessible host: `rds-data.eu-central- 1.amazonaws.com'. This service may not be available in the `eu-central-1' region.",
"trace": [
"UnknownEndpoint: Inaccessible host: `rds-data.eu-central-1.amazonaws.com'. This service may not be available in the `eu-central-1' region.",
I've already checked the tutorial of Amazon but I can't find a point I did't try.
The error message "This service may not be available in the `eu-central-1' region." is absolutely right because in eu-central-1 an Aurora Serverless database is not available.
I configured an Aurora Postgres and not an Aurora Serverless DB.
"AWS.RDSDataService()", what I wanted to use to connect to the RDB, is only available in relation to an Aurora Serverless instance. It's described here: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/RDSDataService.html.
This is the reason why this error message appeared.
I have a CloudFormation template that creates my RDS cluster using aurora serverless. I want the cluster to be created with the data API enabled.
The option exists on the web console:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html
But I can't find it on the CloudFormation documentation.
How can I turn this option on from the template?
Set the EnableHttpEndpoint property to true, e.g.:
AWSTemplateFormatVersion: '2010-09-09'
Description: Aurora PostgreSQL Serverless Cluster
Resources:
ServerlessWithDataAPI:
Type: AWS::RDS::DBCluster
Properties:
Engine: aurora-postgresql
EngineMode: serverless
EnableHttpEndpoint: true
ScalingConfiguration:
...
You can enable the Data API from CloudFormation by creating a custom resource backed lambda and enable it using any of the available SDK.
I use boto3 (python), so the lambda would have code similar as below:
import boto3
client = boto3.client('rds')
response = client.modify_db_cluster(
DBClusterIdentifier='string',
EnableHttpEndpoint=True|False
)
Obviously, you need to handle different custom resource request types and return from the lambda with success or failure. But to answer your question, this is the best possible way to set up data API via CloudFormation, for now, IMHO.
For more information about the function (Boto3):
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rds.html#RDS.Client.modify_db_cluster
Enabling the Data API is currently only possible in the web console. This feature is still in beta so things like CloudFormation support and availability outside of us-east-1 are still pending, and using the Data API in production should be done with caution as it may still change.
I have been reading various articles/docs and watching some videos on this topic. My issue is that they all conflict in one way or another.
My goal is to use winston to send all console.logs/error messages from my ec2 server to Cloudwatch so that no logs are ever logged on the ec2 terminal itself.
Points of confusion:
If I use winston-aws-cloudwatch or winston-cloudwatch, do I still need to setup an IAM user on AWS or will these auto generate logs within Cloudwatch?
If I setup Cloudwatch as per AWS documentation will that automatically stream any would be console.logs from the EC2 server to Cloudwatch or will it do both? If the first one, then I don't need Winston?
Can I send logs from my local development server to Cloudwatch (just for testing purposes, as soon as it is clear it works, then I would test on staging and finally move it to production) or must it come from an EC2 instance?
I assume the AWS Cloudwatch key is the same as the AWS key I use for the rest of my account?
Present code:
var winston = require('winston'),
CloudWatchTransport = require('winston-aws-cloudwatch');
const logger = new winston.Logger({
transports: [
new (winston.transports.Console)({
timestamp: true,
colorize: true
})
]
});
const cloudwatchConfig = {
logGroupName: 'groupName',
logStreamName: 'streamName',
createLogGroup: false,
createLogStream: true,
awsConfig: {
aws_access_key_id: process.env.AWS_KEY_I_USE_FOR_AWS,
aws_secret_access_key: process.env.AWS_SECRET_KEY_I_USE_FOR_AWS,
region: process.env.REGION_CLOUDWATCH_IS_IN
},
formatLog: function (item) {
return item.level + ': ' + item.message + ' ' + JSON.stringify(item.meta)
}
};
logger.level = 3;
if (process.env.NODE_ENV === 'development') logger.add(CloudWatchTransport, cloudwatchConfig);
logger.stream = {
write: function(message, encoding) {
logger.info(message);
}
};
logger.error('Test log');
Yes
Depends on the transports you configure. If you configure only CloudWatch than it will only end up there. Currently your code has 2 transports, the normal Console one and the CloudWatchTransport so with your current code, both.
As long as you specify your keys as you would normally do with any AWS service (S3, DB, ...) you can push logs from your local/dev device to CloudWatch.
Depends on your IAM user if he has the privileges or not. But it is possible yes.