SSL connection from AWS lambda to AWS Redshift - amazon-web-services

I am trying to connect to an AWS Redshift database from a lambda function using c#, dotnet core 2.0, and npgsql. I am having difficulty with SSL.
I have created two non-publicly-accessible Redshift databases in a dedicated VPC. The lambda executes in the same VPC. The two databases are identical in every way except that one has the "force SSL" parameter set to true.
Using the following code snippet, I can access the non-SSL database just fine:
using (var conn = new NpgsqlConnection ("Host=x; Port=5439; Username=x;
Password=x;Database=xxx")
{
Console.WriteLine("Redshift pre-Open!");
conn.Open();
Console.WriteLine("Redshift: post-Open!");
...
}
When I access the SSL database, I get the "missing hba.conf" error message - seems standard, I've seen it before ...
When I append to the connection string: "ssl Mode=Require;Server Compatibility Mode=Redshift;Trust Server Certificate=true"
the conn.open statement hangs, and the second write statement never shows up in cloudwatch.
And yet ... this connection statement works when accessing the same database thru a rest API and C#/dotnetcore 2 WEBAPI (same runtime environment), with
an EC2 instance and load balancer.
A Python lambda connecting to the SSL database, in the same environment - subnets, security groups, lambda triggers, lambda parameters, ... is working just fine.
The csproj references Amazon.Lambda.Core 1.0.0, Amazon.Lambda.Serialization.Json 1.1.0, and
Npgsql.EntityFrameworkCore.PostgreSQL 2.0.1.
I'd try Wireshark, maybe, in another environment - but running as a lambda, I'm not sure how best to debug. I've tried many permutations and combinations, and I wouldn't put it past myself to be missing something blindingly obvious,
but I absolutely do not see why hangs. Thank you.

Related

Can't reach local DynamoDB from Lambda in AWS SAM

I have a simple AWS SAM setup with a Go lambda function. It's using https://github.com/aws/aws-sdk-go-v2 for dealing with DynamoDB, the endpoint is set via environment variables from the template. (Here: https://github.com/abtercms/abtercms2/blob/main/template.yaml)
My problem is that I can't get my lambda to return anything from DynamoDB, as a matter of fact I don't think it can reach it.
{
"level":"error",
"status":500,
"error":"failed to fetch item (status 404), err: operation error DynamoDB: GetItem, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"http://127.0.0.1:8000/\": dial tcp 127.0.0.1:8000: connect: connection refused",
"path":"GET /websites/abc",
"time":1658243299,
"message":"..."}
I'm able to reach the local DynamoDB just fine from the host machine.
Results were the same both for running DynamoDB in docker compose and via the .jar file, so the problem is not there as far as I can tell.
Also feel free to check out the whole project, I'm just running make sam-local and make curl-get-website-abc above. The code is here: https://github.com/abtercms/abtercms2.

Connection to AWS MemoryDB cluster sometimes fails

We have an application that is using AWS MemoryDB for Redis. We have setup a cluster with one shard and two nodes. One of the nodes (named 0001-001) is a primary read/write while the other one is a read replica (named 0001-002).
After deploying the application, connecting to MemoryDB sometimes fails when we use the cluster endpoint connection string to connect. If we restart the application a few times it suddenly starts working. It seems to be random when it succeeds or not. The error we get is the following:
Endpoint Unspecified/ourapp-memorydb-cluster-0001-001.ourapp-memorydb-cluster.xxxxx.memorydb.eu-west-1.amazonaws.com:6379 serving hashslot 6024 is not reachable at this point of time. Please check connectTimeout value. If it is low, try increasing it to give the ConnectionMultiplexer a chance to recover from the network disconnect. IOCP: (Busy=0,Free=1000,Min=2,Max=1000), WORKER: (Busy=0,Free=32767,Min=2,Max=32767), Local-CPU: n/a
If we connect directly to the primary read/write node we get no such errors.
If we connect directly to the read replica it always fails. It even gets the error above, compaining about the "0001-001" node.
We use .NET Core 6
We use Microsoft.Extensions.Caching.StackExchangeRedis 6.0.4 which depends on StackExchange.Redis 2.2.4
The application is hosted in AWS ECS
StackExchangeRedisCache is added to the service collection in a startup file :
services.AddStackExchangeRedisCache(o =>
{
o.InstanceName = redisConfiguration.Instance;
o.ConfigurationOptions = ToRedisConfigurationOptions(redisConfiguration);
});
...where ToRedisConfiguration returns a basic ConfigurationOptions object :
new ConfigurationOptions()
{
EndPoints =
{
{ "clustercfg.ourapp-memorydb-cluster.xxxxx.memorydb.eu-west-1.amazonaws.com", 6379 } // Cluster endpoint
},
User = "username",
Password = "password",
Ssl = true,
AbortOnConnectFail = false,
ConnectTimeout = 60000
};
We tried multiple shards with multiple nodes and it also sometimes fail to connect to the cluster. We even tried to update the dependency StackExchange.Redis to 2.5.43 but no luck.
We could "solve" it by directly connecting to the primary node, but if a failover occurs and 0001-002 becomes the primary node we would have to manually change our connection string, which is not acceptable in a production environment.
Any help or advice is appreciated, thanks!

"Kafka Timed out waiting for a node assignment." on MSK

Specs:
The serverless Amazon MSK that's in preview.
t2.xlarge EC2 instance with Amazon Linux 2
Installed Kafka from https://dlcdn.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
openjdk version "11.0.13" 2021-10-19 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode,
sharing)
Gradle 7.3.3
https://github.com/aws/aws-msk-iam-auth, successfully built.
I also tried adding IAM authentication information, as recommended by the Amazon MSK Library for AWS Identity and Access Management. It says to add the following in config/client.properties:
# Sets up TLS for encryption and SASL for authN.
security.protocol = SASL_SSL
# Identifies the SASL mechanism to use.
sasl.mechanism = AWS_MSK_IAM
# Binds SASL client implementation.
# sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required;
# Encapsulates constructing a SigV4 signature based on extracted credentials.
# The SASL client bound by "sasl.jaas.config" invokes this class.
sasl.client.callback.handler.class = software.amazon.msk.auth.iam.IAMClientCallbackHandler
# Binds SASL client implementation. Uses the specified profile name to look for credentials.
sasl.jaas.config = software.amazon.msk.auth.iam.IAMLoginModule required awsProfileName="kafka-client";
And kafka-client is the IAM role attached to the EC2 instance as an instance profile.
Networking: I used VPC Reachability Analyzer to confirm that the security groups are configured correctly and the EC2 instance I'm using as a Producer can reach the serverless MSK cluster.
What I'm trying to do: create a topic.
How I'm trying: bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --topic quickstart-events --bootstrap-server boot-zclcyva3.c2.kafka-serverless.us-east-2.amazonaws.com:9098
Result:
Error while executing topic command : Timed out waiting for a node assignment. Call: createTopics
[2022-01-17 01:46:59,753] ERROR org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
I'm also trying: with the plaintext port of 9092. (9098 is the IAM-authentication port in MSK, and serverless MSK uses IAM authentication by default.)
All the other posts I found on SO about this node assignment error didn't include MSK. I tried suggestions like uncommenting the listener setting in server.properties, but that didn't change anything.
Installing kcat for troubleshooting didn't work for me, since there's no out-of-the box installation for the yum package manager, which Amazon Linux 2 uses, and since these instructions failed for me at checking for libcurl (by compile)... failed (fail).
The Question: Any other tips on solving this "node assignment" error?
The documentation has been updated recently, I was able to follow it end to end without any issue (The IAM policy is now correct)
https://docs.aws.amazon.com/msk/latest/developerguide/serverless-getting-started.html
The created properties file is not automatically used; your command needs to include --command-config client.properties, where this properties file is documented at the MSK docs on the linked IAM page.
Extract...
ssl.truststore.location=<PATH_TO_TRUST_STORE_FILE>
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
Alternatively, if the plaintext port didn't work, then you have other networking issues
Beyond these steps, I suggest reaching out to MSK support, and telling them to update the "Create a Topic" page to no longer use Zookeeper, keeping in mind that Kafka 3.0 is not (yet) supported

How to reconnect if AWS RDS recovery happens

How have I written the code
createPool is used at the start of the app
then for every request I am using getConnection
I am using AWS RDS & it went into sudden recovery mode, due to which my db url was unchanged but instance IP must have changed as it was created in another AZ
So for such a scenario I am supposed to reinitialize my db connection so that new instance DNS is updated.
The issue is in such a scenario I did not received any timeout error or connection error. So how do I capture this type of error?
Kindly guide if possible.
Thanks
It is unclear from your description what exactly you have built, but it sounds like you've created a connection pool.
If you open a connection to the db, the first time you call getConnection you should validate that the connection is still active - obviously if the db fails over, the existing connection will get closed, and you will either need to create a new connection or re-open the existing one.

AWS Lambda + Tinkerpop/Gremlin + TitanDB on EC2 + AWS DynamoDB in cloud

I am trying to execute following flow:
user hits AWS Gateway (REST),
it triggers AWS Lambda,
that uses Tinkerpop/Gremlin connects to
TitanDB on EC2, that uses
AWS DynamoDB in cloud (not on EC2) as backend.
Right now I have managed to crete fully working TitanDB instance on EC2, that stores data in DynamoDB in cloud.
I am also able to connect from AWS Lambda to EC2 through Tinkerpop/Gremlin BUT only this way:
Cluster.build()
.addContactPoint("10.x.x.x") // ip of EC2
.create()
.connect()
.submit("here I type my query as string and it will work");
And this works, however I strongly prefer to use "Criteria API" (GremlinPipeline) instead of plain Gremlin language.
In other words, I need ORM or something like that.
I know, that Tinkerpop includes it.
I have realized, that what I need is object of class Graph.
This is what I have tried:
Graph graph = TitanFactory
.build()
.set("storage.hostname", "10.x.x.x")
.set("storage.backend", "com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager")
.set("storage.dynamodb.client.credentials.class-name", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
.set("storage.dynamodb.client.credentials.constructor-args", "")
.set("storage.dynamodb.client.endpoint", "https://dynamodb.ap-southeast-2.amazonaws.com")
.open();
However, it throws "Could not find implementation class: com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager".
Of course, computer is correct, as IntelliJ IDEA also cannot find it.
My dependencies:
//
// aws
compile 'com.amazonaws:aws-lambda-java-core:+'
compile 'com.amazonaws:aws-lambda-java-events:+'
compile 'com.amazonaws:aws-lambda-java-log4j:+'
compile 'com.amazonaws:aws-java-sdk-dynamodb:1.10.5.1'
compile 'com.amazonaws:aws-java-sdk-ec2:+'
//
// database
// titan 1.0.0 is compatible with gremlin 3.0.2-incubating, but not yet with 3.2.0
compile 'com.thinkaurelius.titan:titan-core:1.0.0'
compile 'org.apache.tinkerpop:gremlin-core:3.0.2-incubating'
compile 'org.apache.tinkerpop:gremlin-driver:3.0.2-incubating'
What is my goal: have fully working Graph object
What is my problem: I don't have DynamoDBStoreManager class, and I do not know what dependency I have to add.
My additional question is: why connecting through Cluster class requires only IP and works, but TitanFactory requires properties like those I have used on gremlin-server on EC2?
I do not want to create second server, I just want to connect as client to it and take Graph object.
EDIT:
After adding resolver, it builds, in output I get multiple:
13689 [TitanID(0)(4)[0]] WARN com.thinkaurelius.titan.diskstorage.idmanagement.ConsistentKeyIDAuthority - Temporary storage exception while acquiring id block - retrying in PT2.4S: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Wrote claim for id block [1, 51) in PT0.342S => too slow, threshold is: PT0.3S
and execution hangs on open() method, so does not allow me to execute any queries.
For the DynamoDBStoreManager class, you would need this dependency:
compile 'com.amazonaws:dynamodb-titan100-storage-backend:1.0.0'
Then for the DynamoDBLocal issue, try adding this resolver:
resolvers += "AWS DynamoDB Local Release Repository" at "http://dynamodb-local.s3-website-us-west-2.amazonaws.com/release"
I'm not entirely clear on what this means -- "Criteria API" instead of plain Gremlin language. I'm guessing that you mean that you want to interact with the graph using Java rather than passing Gremlin as a string over to a running Titan/Gremlin Server? If this is the case, then you don't need to start a Titan/Gremlin Server at all (step 4 above). Write an AWS Lambda program (step 2-3 above) that creates a direct Titan client connection via TitanFactory, where all of the Titan configuration properties are for your DynamoDB instance (step 5 above).