I'm running spark on Yarn but my application keeps getting Out of Memory Exception while trying to load a large RDD even though I had dynamic scheduling set to true
...
.set("spark.shuffle.service.enabled", "true")
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.default.parallelism", String.valueOf(cpu * 3))
To fix this, I had to specify the executor memory
...
.set("spark.driver.memory", "5g")
.set("spark.executor.memory", "30g")
.set("spark.shuffle.service.enabled", "true")
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.default.parallelism", String.valueOf(cpu * 3))
```
but isn't the whole point of dynamic scheduling to allocate the required resources from Yarn?
From the documentation on YARN dynamic resource allocation:
In Spark, dynamic resource allocation is performed on the granularity of the executor
This means it will spin up more executors, but it does not mean it will change how much memory individual executors have. If you are loading more resources into memory than a single executor can handle, spinning up more of them won't do any good.
Related
Elasticsearch throwing below exception
\"root_cause\":[{\"type\":\"circuit_breaking_exception\",\"reason\":\"[parent] Data too large, data for [<http_request>] would be [16351863432/15.2gb], which is larger than the limit of [16254631936/15.1gb], real usage: [16351862128/15.2gb], new bytes reserved: [1304/1.2kb]\",\"bytes_wanted\":16351863432,\"bytes_limit\":16254631936}],\"type\":\"circuit_breaking_exception\",\"reason\":\"[parent] Data too large, data for [<http_request>] would be [16351863432/15.2gb], which is larger than the limit of [16254631936/15.1gb], real usage: [16351862128/15.2gb], new bytes reserved: [1304/1.2kb]\",\"bytes_wanted\":16351863432,\"bytes_limit\":16254631936},\"status\":429}"}
I increased indices.breaker.request.limit to 50%. but still getting same error
PUT /_cluster/settings
{
"persistent" : {
"indices.breaker.request.limit" : "50"
}
}
Instead of increasing the limit for your parent circuit breaker, default of which is 70%, you have decreased it to 50%, please increase it to higher value and check.
Refer parent-circuit beaker doc for more info, and from same doc
indices.breaker.total.limit logo cloud (Dynamic) Starting limit for
overall parent breaker. Defaults to 70% of JVM heap if
indices.breaker.total.use_real_memory is false.
If
indices.breaker.total.use_real_memory is true, defaults to 95% of the
JVM heap.
I recently started to work with AWS Data Migration Service (DMS) and running into some issues.
Currently attempting to migrate a 10GB Oracle DB to AWS RDS Postgres. Works but has crazy(?) memory requirements. Feels like it loads the entire DB into memory... Started with dms.r4.large (15.5GB) but can not allocate memory after approx. 98%.... Will run smoothly with dms.r4.xlarge (30.5GB)
As you can see in the screenshot (free-able memory, minimum), the instance is constantly running "full" before all memory gets released when the task finishes (or crashs).
Is there any setting to change this and why does it behave like this? It makes the whole task unnecessary expensive...
As confirmed by AWS, this was indeed a bug with the latest engine (v3.1.3). Following additional insights have been provided by AWS to estimate the actual memory requirements:
Full LOB mode (using single row insert+update, commit rate)
Memory: (# of lob columns in a table) x (Number of table in parallel,
default is 8) x (lob chunk size) x (Commit rate during full load) = 2
* 8 *64(k) * 10000k
Note: You may consider to reduce the "Commit rate during full load "
value because we allocate memory using roughly the above method
Limited LOB mode (using array)
Memory: (# of lob columns in a table) x (Number of table in
parallel, default is 8) x maxlobSize x bulkArraySize = 2 * 8 * 4096(k)
* 1000
I am using AWS Redis for a project and ran into an Out of Memory (OOM) issue. In investigating the issue, I discovered a couple parameters that affect the amount of usable memory, but the math doesn't seem to work out for my case. Am I missing any variables?
I'm using:
3 shards, 3 nodes per shard
cache.t2.micro instance type
default.redis4.0.cluster.on cache parameter group
The ElastiCache website says cache.t2.micro has 0.555 GiB = 0.555 * 2^30 B = 595,926,712 B memory.
default.redis4.0.cluster.on parameter group has maxmemory = 581,959,680 (just under the instance memory) and reserved-memory-percent = 25%. 581,959,680 B * 0.75 = 436,469,760 B available.
Now, looking at the BytesUsedForCache metric in CloudWatch when I ran out of memory, I see nodes around 457M, 437M, 397M, 393M bytes. It shouldn't be possible for a node to be above the 436M bytes calculated above!
What am I missing; Is there something else that determines how much memory is usable?
I remember reading it somewhere but I can not find it right now. I believe BytesUsedForCache is a sum of RAM and SWAP used by Redis to store data/buffers.
As Elasticache's docs suggest that SWAP should not go higher than 300 MB.
I would suggest checking the SWAP metric at that time.
I am running an Aerospike cluster in Google Cloud. Following the recommendation on this post, I updated to the last version (3.11.1.1) and re-created all servers. In fact, this change cause my 5 servers to operate in a much lower CPU load (it was around 75% load before, now it is on 20%, as show in the graph bellow:
Because of this low load, I decided to reduce the cluster size to 4 servers. When I did this, my application started to receive the following error:
All batch queues are full
I found this discussion about the topic, recommending to change the parameters batch-index-threads and batch-max-unused-buffers with the command
asadm -e "asinfo -v 'set-config:context=service;batch-index-threads=NEW_VALUE'"
I tried many combinations of values (batch-index-threads with 2,4,8,16) and none of them solved the problem, and also changing the batch-index-threads param. Nothing solves my problem. I keep receiving the All batch queues are full error.
Here is my aerospace.conf relevant information:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 4
batch-index-threads 40
proto-fd-max 15000
batch-max-requests 30000
replication-fire-and-forget true
}
I use 300GB SSD disks on these servers.
A quick note which may or may not pertain to you:
A common mistake we have seen in the past is that developers decide to use 'batch get' as a general purpose 'get' for single and multiple record requests. The single record get will perform better for single record requests.
It's possible that you are being constrained by the network between the clients and servers. Reducing from 5 to 4 nodes reduced the aggregate pipe. In addition, removing a node will start cluster migrations which adds additional network load.
I would look at the batch-max-buffer-per-queue config parameter.
Maximum number of 128KB response buffers allowed in each batch index
queue. If all batch index queues are full, new batch requests are
rejected.
In conjunction with raising this value from the default of 255 you will want to also raise the batch-max-unused-buffers to batch-index-threads x batch-max-buffer-per-queue + 1 (at least). If you do not do that new buffers will be created and destroyed constantly, as the amount of free (unused) buffers is smaller than the ones you're using. The moment the batch response is served the system will strive to trim the buffers down to the max unused number. You will see this reflected in the batch_index_created_buffers metric constantly rising.
Be aware that you need to have enough DRAM for this. For example if you raise the batch-max-buffer-per-queue to 320 you will consume
40 (`batch-index-threads`) x 320 (`batch-max-buffer-per-queue`) x 128K = 1600MB
For the sake of performance the batch-max-unused-buffers should be set to 13000 which will have a max memory consumption of 1625MB (1.59GB) per-node.
I am trying to determine how many nodes I need for my EMR cluster. As part of best practices the recommendations are:
(Total Mappers needed for your job + Time taken to process) / (per instance capacity + desired time) as outlined here: http://www.slideshare.net/AmazonWebServices/amazon-elastic-mapreduce-deep-dive-and-best-practices-bdt404-aws-reinvent-2013, page 89.
The question is how to determine how many parallel mappers the instance will support since AWS don't publish? https://aws.amazon.com/emr/pricing/
Sorry if i missed something obvious.
Wayne
To determine the number of parallel mappers , you will need to check this documentation from EMR called Task Configuration where EMR had a predefined mapping set of configurations for every instance type which would determine the number of mappers/reducers.
http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html
For example : Lets say you have 5 m1.xlarge core nodes. According to the default mapred-site.xml configuration values for that instance type from EMR docs, we have
mapreduce.map.memory.mb = 768
yarn.nodemanager.resource.memory-mb = 12288
yarn.scheduler.maximum-allocation-mb = 12288 (same as above)
You can simply divide the later with former setting to get the maximum number of mappers supported by one m1.xlarge node = (12288/768) = 16
So, for the 5 node cluster , it would a max of 16*5 = 80 mappers that can run in parallel (considering a map only job). The same is the case with max parallel Reducers(30). You can do similar math for a combination of mappers and reducers.
So, If you want to run more mappers in parallel , you can either re-size the cluster or reduce the mapreduce.map.memory.mb(and its heap mapreduce.map.java.opts) on every node and restart NM to
To understand what the above mapred-site.xml properties mean and why you do need to do those calculations , you can refer it here :
https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Note : The above calculations and statements are true if EMR stays in its default configuration using YARN capacity scheduler with DefaultResourceCalculator. If for example , you configure your capacity scheduler to use DominantResourceCalculator, it will consider VCPU's + Memory on every nodes (not just memory's) to decide on parallel number of mappers.