Dynamic-DynamoDB not scaling down - amazon-web-services

I've been testing Dynamic-DynamoDB on a single table, and it does not want to scale down the provisioning. Can anybody see what I've done wrong?
Here is the log from one cycle. I let it run overnight... This message sequence goes on and on.
2014-03-31 12:58:51,617 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned read units: 25
2014-03-31 12:58:51,683 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned read units: 25
2014-03-31 12:58:51,683 - dynamic-dynamodb - INFO - myTestTable - Consumed read units: 0%
2014-03-31 12:58:51,702 - dynamic-dynamodb - INFO - myTestTable - Read throttle count: 0
2014-03-31 12:58:51,719 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned write units: 100
2014-03-31 12:58:51,779 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned write units: 100
2014-03-31 12:58:51,779 - dynamic-dynamodb - INFO - myTestTable - Consumed write units: 0%
2014-03-31 12:58:51,806 - dynamic-dynamodb - INFO - myTestTable - Write throttle count: 0
2014-03-31 12:58:51,806 - dynamic-dynamodb - INFO - myTestTable - No need to change provisioning
And, here is the configuration for the table:
[table: myTestTable]
reads-upper-threshold: 90
reads-lower-threshold: 30
increase-reads-with: 50
decrease-reads-with: 50
increase-reads-unit: percent
decrease-reads-unit: percent
min-provisioned-reads: 5
max-provisioned-reads: 25
writes-upper-threshold: 90
writes-lower-threshold: 30
increase-writes-with: 50
decrease-writes-with: 50
increase-writes-unit: percent
decrease-writes-unit: percent
min-provisioned-writes: 5
max-provisioned-writes: 100
#maintenance-windows: 22:00-23:59,00:00-06:00
sns-message-types: scale-up, scale-down
allow-scaling-down-reads-on-0-percent: true
allow-scaling-down-writes-on-0-percent: true
#always-decrease-rw-together: true
The only thing I have not tried yet is setting the maintenance window times. I assume that when they are not set, it will do updates at any time.
Update. I found that attempt at a scale up event that happened overnight during a test. Obviously, I'm at the max already so it didn't do anything, as I would expect. I just don't understand why the scale down is not working.
2014-03-30 23:27:16,789 - dynamic-dynamodb - INFO - myTestTable - Consumed read units: 0%
2014-03-30 23:27:16,808 - dynamic-dynamodb - INFO - myTestTable - Read throttle count: 0
2014-03-30 23:27:16,827 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned write units: 100
2014-03-30 23:27:16,880 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned write units: 100
2014-03-30 23:27:16,880 - dynamic-dynamodb - INFO - myTestTable - Consumed write units: 117%
2014-03-30 23:27:16,901 - dynamic-dynamodb - INFO - myTestTable - Write throttle count: 0
2014-03-30 23:27:16,902 - dynamic-dynamodb - INFO - myTestTable - Reached provisioned writes max limit: 100
2014-03-30 23:27:16,902 - dynamic-dynamodb - INFO - myTestTable - No need to change provisioning
2014-03-30 23:27:17,104 - dynamic-dynamodb - DEBUG - Sleeping 300 seconds until next check
2014-03-30 23:32:17,277 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned read units: 25
2014-03-30 23:32:17,353 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned read units: 25
2014-03-30 23:32:17,354 - dynamic-dynamodb - INFO - myTestTable - Consumed read units: 0%
2014-03-30 23:32:17,375 - dynamic-dynamodb - INFO - myTestTable - Read throttle count: 0
2014-03-30 23:32:17,433 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned write units: 100
2014-03-30 23:32:17,481 - dynamic-dynamodb - DEBUG - myTestTable - Currently provisioned write units: 100
2014-03-30 23:32:17,481 - dynamic-dynamodb - INFO - myTestTable - Consumed write units: 151%
2014-03-30 23:32:17,501 - dynamic-dynamodb - INFO - myTestTable - Write throttle count: 0
2014-03-30 23:32:17,501 - dynamic-dynamodb - INFO - myTestTable - Reached provisioned writes max limit: 100
2014-03-30 23:32:17,502 - dynamic-dynamodb - INFO - myTestTable - No need to change provisioning
2014-03-30 23:32:17,695 - dynamic-dynamodb - DEBUG - Sleeping 300 seconds until next check

Seems to be a bug in the version.
https://github.com/sebdah/dynamic-dynamodb/issues/142
I will stop posting these here and use GitHub if I have any more issues. Sebdah answers very quickly.

In this case it may have been a bug, but if this still happens to some people, there is a limit of how many times per 24 hour period you can scale down a DynamoDB table. Once you reach that limit, DynamoDB will not let you scale down more times and will throw an error if you try to request to do it again.

Related

Flask application: Using gunicorn on ECS Fargate , How to

Info:
I created a flask app and on my Dockerfile last command CMD gunicorn -b 0.0.0.0:5000 --access-logfile - "app:create_app()"
I build,tag and upload image on ECR
I used this docker image to create an ECS Fargate instance with the following configs (just posting the one needed for the question):
ECSTaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Cpu: "256"
Memory: "1024"
RequiresCompatibilities:
- FARGATE
ContainerDefinitions:
- Name: contained_above
.
.
.
ECSService:
Type: AWS::ECS::Service
DependsOn: ListenerRule
Properties:
Cluster: !Sub "${EnvName}-ECScluster"
DesiredCount: 1
LaunchType: FARGATE
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 50
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
Subnets:
- Fn::ImportValue: !Sub "${EnvName}-PUBLIC-SUBNET-1"
- Fn::ImportValue: !Sub "${EnvName}-PUBLIC-SUBNET-2"
SecurityGroups:
- Fn::ImportValue: !Sub "${EnvName}-CONTAINER-SECURITY-GROUP"
ServiceName: !Sub "${EnvName}-ECS-SERVICE"
TaskDefinition: !Ref ECSTaskDefinition
LoadBalancers:
- ContainerName: contained_above
ContainerPort: 5000
TargetGroupArn: !Ref TargetGroup
(App is working normally)
Question
Now my question is what number should be the workers on gunicorn command (my last command in dockerfile)?
On gunicorn design it is stated to use Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with.
So whats the number of cores on a fargate? Does actually make sense to combine gunicorn with Fargate like the above process? Is there 'compatibility' between loadbalancers and gunicorn workers? What is the connection between DesiredCount of ECS Service and the gunicorn -w workers value? Am I missing or miss-understanding something?
Possible solution(?)
One way that I could call it is the following:
CMD gunicorn -b 0.0.0.0:5000 -w $(( 2 * `cat /proc/cpuinfo | grep 'core id' | wc -l` + 1 )) --access-logfile - "app:create_app()"
But I am not sure if that would be a good solution.
Any insights? Thanks
EDIT: I'm using a configuration file for gunicorn to use when starting:
gunicorn.conf.py
import multiprocessing
bind = "0.0.0.0:8080"
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornH11Worker"
keepalive = 0
you can tell gunicorn which config file to use with the --config flag.
Sadly I can't find the source anymore, but I've read that 4-12 workers should be enough to handle hundreds if not thousands of simultaneous requests - depending on your application structure, worker class and payload size.
Do take this with a grain of salt tho, since I can't find the source anymore, but it was in an accepted SO answer from a well-reputated person if I remember correctly.
Offical gunicorn docs state somthing in the 2-4 x $(NUM_CORES)range.
Another option would be as gunicorn docs state at another point:
Generally we recommend (2 x $num_cores) + 1 as the number of workers
to start off with. While not overly scientific, the formula is based
on the assumption that for a given core, one worker will be reading or
writing from the socket while the other worker is processing a
request.
Obviously, your particular hardware and application are going to
affect the optimal number of workers. Our recommendation is to start
with the above guess and tune using TTIN and TTOU signals while the
application is under load.
So far I've been running well with holding true to the 4-12 worker recommendation. My company runs several APIs, which connect to other APIs out there, which results in mostly 1-2seconds request time, with the longest taking up to a whole minute (a lot of external API calls here).
Another colleague I talked to mentioned, they are using 1 worker per 5 simultaneous requests they expect - with similar APIs to ours. Works fine for them as well.

Provisioning Issues in the DataFusion

When DataFusion runs a data pipeline, it persists in the provisioning state and then stops.
As a result, Dataproc cannot be created.
Dataproc's settings are as follows:
- Master
- Number of masters : 1
- Master Cores : 2
- Master Memory(GB) : 4
- Master Disk Size(GB) : 1000
- Worker
- Number of Workers : 2
- Worker Cores : 4
- Worker Memory(GB) : 16
- Worker Disk Size(GB) : 1500
In the data pipeline, the driver and executor are as follows:
- Executor
- CPU : 2
- Memory : 4
- Driver
- CPU : 2
- Memory : 4
If I actually look at dataproc in Google Cloud Console window, it will be provisioned and then disappear. Please share your opinion on how to solve this problem.

Why doesn't Dask dashboard update when I run some code?

I'm trying to recreate the behaviour of the Dask dashboard as illustrated in this Youtube video https://www.youtube.com/watch?time_continue=1086&v=N_GqzcuGLCY. I can see my dashboard, but it doesn't update when I run a computation.
I'm running everything on my local machine (Kubuntu 18.04).
I used anaconda to set up my environment, including
python 2.7.14
dask 0.17.4
dask-core 0.17.4
bokeh 1.0.4
tornado 4.5.1
I set up my scheduler from the command line
dask-scheduler
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://192.168.1.204:8786
distributed.scheduler - INFO - bokeh at: :8787
distributed.scheduler - INFO - Local Directory: /tmp/scheduler-bYQe2p
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Register tcp://127.0.0.1:35007
distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:35007
...and a worker too.
dask-worker localhost:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://127.0.0.1:36345'
distributed.worker - INFO - Start worker at: tcp://127.0.0.1:44033
distributed.worker - INFO - Listening to: tcp://127.0.0.1:44033
distributed.worker - INFO - bokeh at: 127.0.0.1:8789
distributed.worker - INFO - nanny at: 127.0.0.1:36345
distributed.worker - INFO - Waiting to connect to: tcp://localhost:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 4
distributed.worker - INFO - Memory: 16.70 GB
distributed.worker - INFO - Local Directory: /home/fergal/orbital/repos/projects/safegraph/dask/dask-worker-space/worker-QjJ1ke
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://localhost:8786
distributed.worker - INFO - -------------------------------------------------
Then my code, borrowed from the video, is
from dask.distributed import Client
import dask.array as da
client = Client(processes=False)
print(client)
x = da.random.random((10000, 10000, 10), chunks=(1000,1000,5))
y = da.random.random((10000, 10000, 10), chunks=(1000,1000,5))
z = (da.arcsin(x) + da.arcsin(y)).sum(axis=(1,2))
z.visualize('eg.svg')
z.compute()
The code runs, and produces a graph via graph-viz. The bokeh server is accessible at 127.0.0.1:8787/status, and displays a big blue block at the top right, as per the first few seconds of the video. But when I run my code, the webpage doesn't update to show a running computation, nor does it show any results when the computation is finished. Iwould expect to see something like what I see around time 1:20 on the video.
I'm undoubtedly neglecting to set something up properly, but I can't find any clues in either the documentation or on Stack Overflow. So what am I doing wrong?
I found a solution.
Update dask to 1.1.5, shutdown the dask-scheduler (and dask-worker). I'm guessing my problem was that the version of dask from the default conda channel was out of date. I downloaded the newer version from conda-forge

Slow upload speed in aws deploy push command

I am trying to use AWS CodeDeploy. I use aws deploy push --debug command. The file to be uploaded is around 250 KB. But upload doesn't finish. Following is the logs displayed.
2017-10-27 11:11:40,601 - MainThread - botocore.auth - DEBUG - CanonicalRequest:
PUT
/frontend-deployer/business-services-0.0.1-SNAPSHOT-classes.jar
partNumber=39&uploadId=.olvaJkxreDZf1ObaHCMtHmkQ5DFE.uZ9Om0sxZB08YG3tqRWBxmGLTFWSYQaj9mHl26LPJk..Stv_vPB5NMaV.zAqsYX6fZz_S3.uN5J4FlxHZFXoeTkMiBSYQB2C.g
content-md5:EDXgvJ8Tt5tHYZ6Nkh7epg==
host:s3.us-east-2.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20171027T081140Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
...
2017-10-27 11:12:12,035 - MainThread - botocore.endpoint - DEBUG - Sending http request: <PreparedRequest [PUT]>
2017-10-27 11:12:12,035 - MainThread - botocore.awsrequest - DEBUG - Waiting for 100 Continue response.
2017-10-27 11:12:12,189 - MainThread - botocore.awsrequest - DEBUG - 100 Continue response seen, now sending request body.
Even though the file is fairly small (250 KB), upload doesn't finish.
On the other hand, upload via aws s3 cp command lasts 1 second.
How can I increase the upload speed in aws deploy push command?

Elastic Beanstalk CloudWatch Log streaming stops working – How to debug

My Elastic Beanstalk environment is stopping streaming node.js events to CloudWatch Logs. Streaming works fine for a view minutes on a new instance. After a view minutes no more logs show up in CloudWatch.
I set up AWS Elastic Beanstalk to stream logs to CloudWatch under Configuration > Software Configuration > CloudWatch Logs > Log Streaming (true). I deactivated log streaming and reactivated it as a test. Taking a look at cloudwatch
Last eb-activity log is about 10 minutes old
Error log is not available (on neither of the instances)
nginx/access.log is a view seconds old
nodejs.log is about an hour old (short after relaunching instance)
Every health check writes an log entry every view seconds into nodejs.log though.
I did not find any logs on the ec2 instance regarding log streaming.
Has anyone similar issues?
How do I make Elastic Beanstalk stream nodejs logs to CloudWatch logs.
--- EDIT
[ec2-user#ip-###-##-##-## log]$ cat /var/log/awslogs.log
2017-03-07 11:01:05,928 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Detected file rotation, notifying reader
2017-03-07 11:01:05,928 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Reader is still alive.
2017-03-07 11:01:05,928 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:05,928 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:06,052 - cwlogs.push.reader - INFO - 31861 - Thread-8 - No data is left. Reader is leaving.
2017-03-07 11:01:10,929 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Removing dead reader [2177a5cce5ed29525de329bfdc292ff1, /var/log/nginx/access.log]
2017-03-07 11:01:10,929 - cwlogs.push.stream - INFO - 31861 - Thread-1 - Starting reader for [92257964a10edeb586f084f4f2ba35de, /var/log/nginx/access.log]
2017-03-07 11:01:10,930 - cwlogs.push.reader - INFO - 31861 - Thread-11 - Start reading file from 0.
2017-03-07 11:01:10,930 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:10,930 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:15,931 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:15,931 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:16,788 - cwlogs.push.publisher - INFO - 31861 - Thread-7 - Log group: /aws/elasticbeanstalk/production/var/log/nginx/access.log, log stream: i-0bd24767864801e2c, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1488884470930, 'start_position': 0L, 'end_position': 114L}, 'fallback_events_count': 0, 'last_event': {'timestamp': 1488884472931, 'start_position': 341L, 'end_position': 454L}, 'source_id': '92257964a10edeb586f084f4f2ba35de', 'num_of_events': 4, 'batch_size_in_bytes': 554}
2017-03-07 11:01:20,932 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:20,932 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:25,933 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/error.log*'.
2017-03-07 11:01:25,933 - cwlogs.push.stream - WARNING - 31861 - Thread-1 - No file is found with given path '/var/log/httpd/access.log*'.
2017-03-07 11:01:27,881 - cwlogs.push.publisher - INFO - 31861 - Thread-7 - Log group: /aws/elasticbeanstalk/production/var/log/nginx/access.log, log stream: i-0bd24767864801e2c, queue size: 0, Publish batch: {'skipped_events_count': 0, 'first_event': {'timestamp': 1488884481933, 'start_position': 454L, 'end_position': 568L}, 'fallback_events_count': 0, 'last_event': {'timestamp': 1488884482934, 'start_position': 568L, 'end_position': 681L}, 'source_id': '92257964a10edeb586f084f4f2ba35de', 'num_of_events': 2, 'batch_size_in_bytes': 277}
When Andrew (#andrew-ferk) and myself activated log streaming, it created all the log groups and streams in CloudWatch with the current log. After we deployed again, we noticed the logs stopped. This is because aws hashes the first line of the log. If it has seen that hash before it will treat that file like it's already been processed
If you are using npm start the first lines will be your application's name with version.
You can add a CMD date && npm start to your dockerfile to trigger a different first line each time or run npm in silent mode (as long as your first output is unique).
Also according to their docs you should add some policy to your elastic-beanstalk before enabling the feature AWS-Docs
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:GetLogEvents",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:PutRetentionPolicy"
],
"Resource": [
"*"
]
}
]
}
The following FAQs might be helpful:
CloudWatch Logs Agent FAQs
Why can’t I push log data to CloudWatch Logs with the awslogs agent?
Some things to check if you are streaming custom log files:
eb ssh into the instance and look at /var/log/awslogs.log. If that doesn't even mention "Loading additional configs from (your awslogs config file)", make sure you are installing your config file correct as well as restarting the awslogs service after installing it (presumably using .ebextensions. See "Custom Log File Streaming" in Using Elastic Beanstalk with Amazon CloudWatch Logs. See the commands section in logs-streamtocloudwatch-linux.config for how to restart the awslogs service.
The CloudWatch Logs Agent is stateful. If the first few lines of your log file are blank or never change, you may need to set file_fingerprint_lines. See CloudWatch Logs Agent Reference.