Secrets Manager rotation timeout - amazon-web-services

Alright, so after 3 days of trying to get this working I finally give up.
I have:
a private VPC with a subnet that contains an RDS MySQL instance
a Lambda rotator function based on the AWS Python template for single user MySQL rotation
an Endpoint for Secrets Manager, Private DNS enabled.
I have security groups, but for debugging, I've allowed all traffic for all security groups and Network ACLs.
The Lambda rotator function has permissions for secrets manager for all resources, logging to Cloudwatch and the relevant VPC permissions to execute in a VPC.
What does work:
logging to CloudWatch works. I have turned on DEBUG mode
Secrets Manager is invoking the lambda function successfully
a few requests to Secrets Manager appear to work
What doesn't work:
after a few requests, any subsequent requests start timing out
After some time, it manages to send a few more requests. Could this have something to do with Python, networking timeouts and lambda connections being held or dropped due to timeouts?
I can see it does a DescribeSecret request.
Then I can see a GetSecretValue request for an AWSCURRENT stage.
Then I can see a GetSecretValue request for an AWSPENDING stage.
This one returns:
Secrets Manager can't find the specified secret value for VersionId: xxxxxxxx"
Then I can see a GetRandomPassword request.
After that, I see the following in the logs:
Resetting dropped connection: secretsmanager.ap-southeast-2.amazonaws.com
The lambda function now times out.
From this point on, it can't even successfully do a DescribeSecret without the lambda timing out. After maybe 10-15 minutes it starts working again up to the GetRandomPassword part and then drops the connection again.
I don't think it's a security group, ACL or endpoint config issue, because it would either work or not work, not sometimes work.
I also don't think I'm stressing out the API that much - a few requests in a period of a few seconds and then nothing for many minutes should be fine for AWS.
I found a little clue here maybe after GetSecretValue is called?
[DEBUG] 2022-04-09T10:49:20.073Z 34585068-3f21-4471-9035-f9368a3094dd Response headers: {'x-amzn-RequestId': 'f443766f-921c-4772-997f-b150643c4909', 'Content-Type': 'application/x-amz-json-1.1', 'Content-Length': '156', 'Date': 'Sat, 09 Apr 2022 10:49:19 GMT', 'Connection': 'close'}
Looks like the response header contains Connection: close, but that's coming back FROM Secrets Manager.
When I look at other people's logs I can see the headers that the boto3 client sends usually contains Connection: keep-alive, yet looking at my logs none of them contain that header.
I did a bit of an experiment by injecting that header.
session = boto3.session.Session()
session.events.register('before-call.secrets-manager.*', inject_header)
...
def inject_header(params, **kwargs):
params['headers']['Connection'] = 'keep-alive'
However, even if I send that header to the Secrets Manager API it makes no difference.
There's got to be something else going on, I just don't understand the intermittent nature of it!
For reference, the lambda role policy. As you can see for debugging and troubleshooting I've left the secrets manager policy wide open.
{
"Statement": [
{
"Action": [
"secretsmanager:DescribeSecret",
"secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue",
"secretsmanager:UpdateSecretVersionStage"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"secretsmanager:GetRandomPassword"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ec2:CreateNetworkInterface",
"ec2:DescribeNetworkInterfaces",
"ec2:DeleteNetworkInterface",
"ec2:AssignPrivateIpAddresses",
"ec2:UnassignPrivateIpAddresses"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": "logs:CreateLogGroup",
"Effect": "Allow",
"Resource": "arn:aws:logs:ap-southeast-2:xxxxxxxxxxxx:*"
},
{
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Effect": "Allow",
"Resource": [
"arn:aws:logs:ap-southeast-2:xxxxxxxxxxxx:log-group:/aws/lambda/rcf-apse2-dev-onsite-rds-secret-rotator-function:*"
]
}
],
"Version": "2012-10-17"
}
The Python code is as per the following template:
https://github.com/aws-samples/aws-secrets-manager-rotation-lambdas/blob/master/SecretsManagerRDSMySQLRotationSingleUser/lambda_function.py
The secret value is being stored in this format as required:
{
"dbClusterIdentifier": "rcf-apse2-dev-onsite",
"engine": "mysql",
"host": "rcf-apse2-dev-onsite.cluster-xxxxxxxxx.ap-southeast-2.rds.amazonaws.com",
"password": "xxxxxxx",
"username": "xxxxx"
}

I think I finally understand what's going on!
The default timeout of boto3 is 60 seconds, but the Lambda execution timeout was only set to 30 seconds, which is why the boto3 retry logic never had a chance to kick in and the Lambda function would keep timing out.
Obviously a crude way to fix this is to increase the Lambda timeout, but a better solution in my opinion is to add the following to the Python code and adjust the timeouts as you see fit.
from botocore.config import Config
...
config = Config(
connect_timeout=2,
read_timeout=2,
retries = {
'max_attempts': 10,
'mode': 'standard'
}
)
service_client = boto3.client('secretsmanager', config=config, endpoint_url=os.environ['SECRETS_MANAGER_ENDPOINT'])
I'm still not sure why the connection resets happen in the first place, but I suspect it's probably because AWS doesn't want hold on to open connections for too long as they cost memory and resources.
Oh the joys of AWS!

Related

New AWS Lambda URLs - has anyone got the 'secure' version with the AWS_IAM working?

I have a simple function that returns an item of text.
When I set auth to NONE it works fine.
When I set auth to AWS_IAM and create the resource based policy within the permissions section of AWS Lambda I set the following:
"Version": "2012-10-17",
"Id": "default",
"Statement": [
{
"Sid": "sid8",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::xxxxxxxxxx:user/xxxxxxxxxx"
},
"Action": "lambda:InvokeFunctionUrl",
"Resource": "arn:aws:lambda:eu-west-1:xxxxxxxxx:function:simpleFunction",
"Condition": {
"StringEquals": {
"lambda:FunctionUrlAuthType": "AWS_IAM"
}
}
}
]
}
On this I get a forbidden error.
Every demo / example on the internet uses NONE for auth.
I have also tried adding the lambda:InvokeFunctionUrl to the IAM policy of the user for the specified resource but still getting a forbidden error.
Am I missing something or does this aspect of the new function not work?
The problem is that when you are using IAM_AUTH you're required to sign your requests with SigV4. Essentially, this is identical to using API Gateway with IAM_AUTH type.
There are multiple ways of signing requests you can even use botocore functionality to do so. The easiest would be to use awscurl or postman, also check this doco that confirms this requirement https://docs.aws.amazon.com/lambda/latest/dg/urls-invocation.html

boto3 giving 403s: Do we need to modify anything to have boto3 S3 work across S3 regions?

I had lambdas from one region (us-west-2), receiving 403's for S3 operations (HeadObject, PutObject, CopyObject) against objects in a bucket from a different region (ca-central-1). The policy simulator assured me that the operations should work under my policy, but clearly there was something else at play. This policy is attached to a role, and I have a trust relationship between the lambda and that role.
One attempt I made at solving the problem was to specify the region name by appending it to the bucket name.
i.e., changing:
head_object(Bucket="foo", ...)
to the (slightly) more qualified naming:
head_object(Bucket="foo.us-west-2", Key="bar")
Interestingly, this would change the 403 to a 404.
I've stumbled upon this workaround (?) through guesswork, based on the required structure of the host header, and intro: working with buckets. But it's a stretch.
I can't find a reference in the docs where the various accepted forms of bucket names are listed (e.g. from the simple name, to a fully qualified ARN). Is the list of supported formats for specifying bucket and key names readily available?
Appending .<region> to the bucket name will allow HeadObject to work differently, but PutObject and CopyObject fail with NoSuchBucket if I try the same trick. Perhaps each S3 API call has a different syntax to specify source and destination regions?
I'm including the policy attached to my lambda's role. Maybe there's something specific to it that hinders cross-region operations, as was suggested in the comments? My source and destination buckets do not have any bucket policy attached. The lambda, and the two buckets are owned by the same account.
The lambda has a role with the following policy attached to it:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3Ops",
"Effect": "Allow",
"Action": [
"s3:DeleteObjectTagging",
"s3:DeleteObjectVersion",
"s3:GetObjectVersionTagging",
"s3:DeleteObjectVersionTagging",
"s3:GetObjectVersionTorrent",
"s3:PutObject",
"s3:GetObjectAcl",
"s3:GetObject",
"s3:GetObjectTorrent",
"s3:AbortMultipartUpload",
"s3:GetObjectVersionAcl",
"s3:GetObjectTagging",
"s3:GetObjectVersionForReplication",
"s3:DeleteObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::a-specific-bucket-1/*",
"arn:aws:s3:::a-specific-bucket-2/*",
"arn:aws:s3:::*/*",
"arn:aws:logs:*:*:*"
]
},
{
"Sid": "AllowLogging",
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Sid": "AllowPassingRoleToECSTaskRoles",
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "*"
},
{
"Sid": "AllowStartingECSTasks",
"Effect": "Allow",
"Action": "ecs:RunTask",
"Resource": "*"
},
{
"Sid": "AllowCreatingLogGroups",
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:*:*:*"
}
]
}
Note: I've used both wildcards and specific bucket names in the list of resources. I used to only have the specific names, and then I threw in the wildcards for testing.
Note: This is very related to this question on S3 403s. Even though the accepted answer seems to claim it has to do with policy adjustment, I think it's just a matter of resource naming qualification.
There's a multilevel answer to this.
Documentation on parameters such as AmazonS3 latest API are useful, but there is variation as to how the region names are specified between the different language libraries.
In Boto3 (python), for instance, bucket names can always be specified in their short form, regardless of the region they are in.
The fact that doing client.head_object(Bucket=short_name, Key="foo") returned a 403, but that client.head_object(Bucket=short_name + ".us-west-2", Key="foo") returned a 404 is somewhat of a red herring. Boto3 performs bad validation, in my opinion. Adding the region suffix will cause boto3 to parse the parameters differently -- part of the bucket name will end up in the request Path:
# short form (my-bucket) - 403 forbidden
Starting new HTTPS connection (1): my-bucket.s3.ca-central-1.amazonaws.com
[INFO] Starting new HTTPS connection (1): my-bucket.s3.ca-central-1.amazonaws.com
[DEBUG] "HEAD /foo HTTP/1.1" 403 0
# short form + region ("my-bucket.us-west-2") -- 404 not found
# the bucket name has moved to the request Path (wrong!)
Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
[DEBUG] "HEAD /my-bucket.us-west-2/hosts HTTP/1.1" 404 0
I've discovered one root problem with the policy. Changing:
"Resource": [
"arn:aws:s3:::a-specific-bucket-1/*",
"arn:aws:s3:::a-specific-bucket-2/*",
"arn:aws:s3:::*/*",
"arn:aws:logs:*:*:*"
]
and adding :::* fixes the cross region issue. i.e, I've added one line to the previous block to obtain this:
"Resource": [
"arn:aws:s3:::a-specific-bucket-1/*",
"arn:aws:s3:::a-specific-bucket-2/*",
"arn:aws:s3:::*/*",
"arn:aws:s3:::*" <--- *this line*
"arn:aws:logs:*:*:*"
]
This modification has allowed these cross bucket requests to go through successfully. I was playing around a bit more with the policy simulator afterwards, and noticed that the added line was also necessary to support HeadBucket or ListBucket operations.
Also, given that the first two lines in that resource block are redundant after adding the wildcard entry, they can be omitted without any effect, to produce the final version:
"Resource": [
"arn:aws:s3:::*/*",
"arn:aws:s3:::*"
"arn:aws:logs:*:*:*"
]
Note: I haven't checked whether :::* includes :::*/*. It could very well be that the :::* makes the :::*/* redundant. My suspicion is that */* is interpreted to mean anything within the bucket, but not the bucket itself.
Note: I think I may also have jumped too quickly to the (wrong) conclusion that this was a cross-region problem, because of the status code change. I initially did some testing against a-specific-bucket-1 and a-specific-bucket-2, which was working fine (because they were hardcoded in the policy), and it so happened that the first new bucket (different than those two) I got errors on happened to be in a different region. A third bucket in the same region might have also given me 403s.

Amazon Kinesis: Caught exception while sync'ing Kinesis shards and leases

I am trying to make Snowplow work on AWS. When I am trying to run stream-enrich service on instance, I am getting this exception:
[main] INFO com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker - Syncing Kinesis shard info
[main] ERROR com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask - Caught exception while sync'ing Kinesis shards and leases
[cw-metrics-publisher] WARN com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable - Could not publish 4 datums to CloudWatch
I don't think error is due to Cloud Watch:
Caught exception while sync'ing Kinesis shards and leases
As mentioned in the comments above, this error will crop when you're lacking permissions to AWS resources required by Kinesis Client Library (KCL). This can be the DynamoDB, CloudWatch, or Kinesis. For the Stream Enrich component of Snowplow, you'll need the following permissions:
Read permission to input kinesis stream (collector good)
Write permission to output kinesis streams (enrich good & enrich bad)
List permission to kinesis streams
Read/write/create permission to DynamoDB state table (table name is the “appName” value in your stream enrich application.conf)
PutMetricData to Cloudwatch
A templated version of an IAM policy that meets these needs is as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:GetShardIterator",
"kinesis:GetRecords",
"kinesis:ListShards"
],
"Resource": [
"${collector_stream_out_good}"
]
},
{
"Effect": "Allow",
"Action": [
"kinesis:ListStreams"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"kinesis:DescribeStream",
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": [
"${enricher_stream_out_good}",
"${enricher_stream_out_bad}"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:CreateTable",
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem"
],
"Resource": [
"${enricher_state_table}"
]
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": "*"
}
]
}
I've written up a blog post that covers required IAM permissions for Stream Enrich and other Snowplow components since documentation on the exact required permissions was sparse/non-existent in the Snowplow documentation.
Hope that helps!
So I had this problem when setting up Snowplow. I'm using terraform to automate the infrastructure and got this error after a destroy and re-apply. Here's what I learned.
You give the enricher DynamoDB privilages so it can create a table. If this table is already created before the enricher creates it (but not destroyed by terraform in my case) it is not able to create a table with the same name. It also seemingly won't link to existing tables.
My solution was to delete the existing DynamoDB table via the AWS console, terminate my enricher, and start up a new one. The error no longer appeared and my enricher worked as intended.
I faced this issue today. For me, the issue was that, I changed the kinesis stream names without changing the appName in the enrich configuration.
Once I changed the appName to a new name and deployed an updated to snowplow enrich, I was able to get rid of the error.

Authorization error while connecting to AWS IoT after attaching the device object to connection client

I am trying out the ShadowSample example that comes as part of Java AWS IoT Device SDK - https://github.com/aws/aws-iot-device-sdk-java/blob/master/aws-iot-device-sdk-java-samples/src/main/java/com/amazonaws/services/iot/client/sample/shadow/ShadowSample.java
I am able to run it successfully and it works properly. However, it works properly when I have the following policy attached to the certificate (and certificate in turn to device) -
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iot:Subscribe",
"iot:Receive",
"iot:Publish",
"iot:Connect"
],
"Resource": [
"*"
]
}
]
}
I want to have a generic and strict policy (especially for publish and receive actions) than granting rights to all i.e. "*" resource. So, I update the policy in the below forms and all are not working -
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iot:Subscribe",
"iot:Connect"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"iot:Publish"
],
"Resource": [
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/update",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/get",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/delete",
]
},
{
"Effect": "Allow",
"Action": [
"iot:Receive"
],
"Resource": [
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/update/accepted",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/update/rejected",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/update/delta",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/update/documents",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/get/accepted",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/get/rejected",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/delete/accepted",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/delete/rejected",
]
}
]
}
and
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iot:Subscribe",
"iot:Connect"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"iot:Publish",
"iot:Receive"
],
"Resource": [
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/update/*",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/get/*",
"arn:aws:iot:us-west-2:64867635xxxx:topic/$aws/things/${iot:Connection.Thing.ThingName}/shadow/delete/*",
]
}
]
}
Note - For security purpose, I have put xxxx against account number. In real policies, I have proper values.
I even replace the * in topic with #, but the same result continues. The result is -
Oct 25, 2017 9:32:43 AM com.amazonaws.services.iot.client.mqtt.AwsIotMqttConnectionListener onFailure
WARNING: Connect request failure
Unable to connect to server (32103) - java.net.ConnectException: Connection timed out: connect
at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:94)
at org.eclipse.paho.client.mqttv3.internal.SSLNetworkModule.start(SSLNetworkModule.java:103)
at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:701)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:80)
... 9 more
Oct 25, 2017 9:32:43 AM com.amazonaws.services.iot.client.core.AwsIotConnection onConnectionFailure
INFO: Connection temporarily lost
On the AWS CloudWatch, under AWS IoT specific logs, I find -
2017-10-25 04:10:22.633 TRACEID:1db8b391-83c7-faf3-5aeb-c0d106787afe PRINCIPALID:e7ac813cdcbecffebecef9647a166882f93f5a2aa214cb6bbd9a1e41f7832f76 [ERROR] EVENT:MQTT Client Connect MESSAGE:Connect Status: AUTHORIZATION_ERROR
What shall be the generic and strict policy for supporting publish and receive actions?
Note that the aforementioned policies which I have tried are working fine when I try out the pub-sub example (also provided in the same SDK). These policies are not working for shadow example because there is an additional step of attaching device to connection client.
Another small query: How will I minimise or restrict topic subscription for subscribe action in generic way as variable replacement like thing type, thing name do not work for subscribe (but work for receive and publish)?
Is your MQTT Client ID the same as your Thing Name?
I recently had a similar issue while trying to create a policy that uses the Thing attributes. It wasn't picking up the fact that I had attached my Cognito Identity to the Thing as I would have expected it to do. My policy did not work until I changed my MQTT Client ID and then the thing attributes were pulled in correctly. I was hoping to have a Thing per customer, but as MQTT Client ID must be unique and it's also what attaches the connection to a Thing, I'll need to create a Thing per customer per device.
I believe the reasoning, behind attaching an identity principal to a Thing and using the MQTT Client ID to attach the connection to the Thing, is because you can actually attach the same identity principal to multiple Things and on the backend it wouldn't know the exact thing you connection should be. This allows you to reuse the same identity across multiple devices for the same customer, which is more keeping with how Cognito works. It also prevents a certificate or Cognito identity from attaching to a Thing that it doesn't have permissions.
Let me know if this helps.

AWS IoT MQTT over Websocket with STS temporary credentials

I am having issues using the temporary credentials to initiate a connection to AWS IoT using STS temporary credentials, whilst keeping things secure.
I have already successfully connected embedded devices using certificates with policies.
But when I come to try connecting via the browser, using a pre-signed URL, I have hit a stumbling block.
Below is a code snippet from a Lambda function which first authenticates the request (not shown), and then builds the url using STS credentials via assumeRole.
Using my generated URL along with Paho javascript client, I have been successful up to the point of receiving a response of "101 Switching Protocols" in the browser. But the connection is terminated instead of switching to websockets.
Any help or guidance anyone out there can provide me with would be much appreciated.
const iot = new AWS.Iot();
const sts = new AWS.STS({region: 'eu-west-1'});
const params = {
DurationSeconds: 3600,
ExternalId: displayId,
Policy: JSON.stringify(
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iot:*"
],
"Resource": [
"*"
]
},
/*{
"Effect": "Allow",
"Action": [
"iot:Connect"
],
"Resource": [
"arn:aws:iot:eu-west-1:ACCID:client/" + display._id
]
},
{
"Effect": "Allow",
"Action": [
"iot:Receive"
],
"Resource": [
"*"
]
}*/
]
}
),
RoleArn: "arn:aws:iam::ACCID:role/iot_websocket_url_role",
RoleSessionName: displayId + '-' + Date.now()
};
sts.assumeRole(params, function(err, stsData) {
if (err) {
fail(err, db);
return;
}
console.log(stsData);
const AWS_IOT_ENDPOINT_HOST = 'REDACTED.iot.eu-west-1.amazonaws.com';
var url = v4.createPresignedURL(
'GET',
AWS_IOT_ENDPOINT_HOST,
'/mqtt',
'iotdata',
crypto.createHash('sha256').update('', 'utf8').digest('hex'),
{
key: stsData.Credentials.AccessKeyId,
secret: stsData.Credentials.SecretAccessKey,
protocol: 'wss',
expires: 3600,
region: 'eu-west-1'
}
);
url += '&X-Amz-Security-Token=' + encodeURIComponent(stsData.Credentials.SessionToken);
console.log(url);
context.succeed({url: url});
});
Edit: If it helps, I just checked inside the "Frames" window in Chrome debugger, after selecting the request which returns a 101 code. It shows a single frame: "Binary Frame (Opcode 2, mask)".
Does this Opcode refer to MQTT control code 2 AKA "CONNACK"? I am not an expert at MQTT (yet!).
I realised my mistake by reading the docs on STS.
If you pass a policy to this operation, the temporary security credentials that are returned by the operation have the permissions that are allowed by both the access policy of the role that is being assumed, and the policy that you pass.
The RoleARN that is supplied must also allow the actions that you are requesting via STS assumeRole.
i.e. The RoleARN could allow iot:*, then when you assume role, you can narrow the permissions down to, for instance iot:Connect and for specific resources.