"msg": "The following modules failed to execute: setup\n setup: Unhandled exception while executing module: The requested object does not exist.
Playbook run took 0 days, 0 hours, 0 minutes, 1 seconds
Related
The error is below one :
22/09/14 10:09:30 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 7, ip-172-34-112-69.us-west-2.compute.internal, executor 1): com.amazonaws.services.glue.util.FatalException: Unable to parse file: *****_20220901.csv
I am using terraform to create my architecture.
Adding the following rows on the resource ecs_task_definitions
“runtimePlatform”: {
“operatingSystemFamily”: “LINUX”,
“cpuArchitecture”: “ARM64”
},
The task does not run. This is the error
STOPPED (CannotPullContainerError: inspect image has been retried 1 time(s): failed to resolve ref "********.dkr.ecr.eu-central-1.amazonaws.com/ecr-repo-prod:latest": **********.dkr.ecr.eu-central-1.amazonaws.com/ecr-repo-prod:latest: not found)
Do you know where is the problem?
We are runing a spark job which runs close to 30 scripts one by one. it usually takes 14-15h to run, but this time it failed in 13h. Below is the details:
Command:spark-submit --executor-memory=80g --executor-cores=5 --conf spark.sql.shuffle.partitions=800 run.py
Setup: Running spark jobs via jenkins on AWS EMR with 16 spot nodes
Error: Since the YARN log is huge (270Mb+), below are some extracts from it:
[2022-07-25 04:50:08.646]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : ermediates/master/email/_temporary/0/_temporary/attempt_202207250435265404741257029168752_0641_m_000599_168147 s3://memberanalytics-data-out-prod/pipelined_intermediates/master/email/_temporary/0/task_202207250435265404741257029168752_0641_m_000599 using algorithm version 1 22/07/25 04:37:05 INFO FileOutputCommitter: Saved output of task 'attempt_202207250435265404741257029168752_0641_m_000599_168147' to s3://memberanalytics-data-out-prod/pipelined_intermediates/master/email/_temporary/0/task_202207250435265404741257029168752_0641_m_000599 22/07/25 04:37:05 INFO SparkHadoopMapRedUtil: attempt_202207250435265404741257029168752_0641_m_000599_168147: Committed 22/07/25 04:37:05 INFO Executor: Finished task 599.0 in stage 641.0 (TID 168147). 9341 bytes result sent to driver 22/07/25 04:49:36 ERROR YarnCoarseGrainedExecutorBackend: Executor self-exiting due to : Driver ip-10-13-52-109.bjw2k.asg:45383 disassociated! Shutting down. 22/07/25 04:49:36 INFO MemoryStore: MemoryStore cleared 22/07/25 04:49:36 INFO BlockManager: BlockManager stopped 22/07/25 04:50:06 WARN ShutdownHookManager: ShutdownHook '$anon$2' timeout, java.util.concurrent.TimeoutException java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:124) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) 22/07/25 04:50:06 ERROR Utils: Uncaught exception in thread shutdown-hook-0 java.lang.InterruptedException
I am getting following error when trying run CI tool Jenkins which is hosted on AWS:
Running TestSuite
/var/lib/jenkins/workspace/losDummyData/chromedriver: error while
loading shared libraries: libnss3.so: cannot open shared object file:
No such file or directory
Jenkins is using github as source code ,
Here is the code for creating driver instance.
System.setProperty("webdriver.chrome.driver","chromedriver");
driver = new ChromeDriver();
Running TestSuite Starting ChromeDriver (v2.10.267518) on port 32528
Only local connections are allowed. [0.179][WARNING]: PAC support
disabled because there is no system implementation Tests run: 3,
Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 61.019 sec <<<
FAILURE! - in TestSuite login(closeDummyApplication.ViaEmail) Time
elapsed: 60.931 sec <<< FAILURE!
org.openqa.selenium.WebDriverException: unknown error: Chrome failed
to start: exited abnormally (Driver info:
chromedriver=2.10.267518,platform=Linux 3.13.0-74-generic x86_64)
(WARNING: The server did not provide any stacktrace information)
I have a spark application that was running fine in standalone mode, I'm now trying to get the same application to run on an AWS EMR Cluster but currently it's failing.
The message is one I've not seen before and implies that the workers are not receiving jobs and are being shut down.
**16/11/30 14:45:00 INFO ExecutorAllocationManager: Removing executor 3 because it has been idle for 60 seconds (new desired total will be 7)
16/11/30 14:45:00 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 2
16/11/30 14:45:00 INFO ExecutorAllocationManager: Removing executor 2 because it has been idle for 60 seconds (new desired total will be 6)
16/11/30 14:45:00 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 4
16/11/30 14:45:00 INFO ExecutorAllocationManager: Removing executor 4 because it has been idle for 60 seconds (new desired total will be 5)
16/11/30 14:45:01 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 7
16/11/30 14:45:01 INFO ExecutorAllocationManager: Removing executor 7 because it has been idle for 60 seconds (new desired total will be 4)**
The DAG shows the workers initialised, then a collect (one that is relatively small) and then shortly after they all fail.
Dynamic allocation was enabled so there was a thought that perhaps the driver wasn't sending them any tasks and so they timed out - to prove the theory I spun up another cluster without dynamic allocation and the same thing happened.
The master is set to yarn.
Any help is massively appreciated, thanks.
16/11/30 14:49:16 INFO BlockManagerMaster: Removal of executor 21 requested
16/11/30 14:49:16 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asked to remove non-existent executor 21
16/11/30 14:49:16 INFO BlockManagerMasterEndpoint: Trying to remove executor 21 from BlockManagerMaster.
16/11/30 14:49:24 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1480517110174_0001_01_000049 on host: ip-10-138-114-125.ec2.internal. Exit status: 1. Diagnostics: Exception from container-launch.
Container id: container_1480517110174_0001_01_000049
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
My step is quite simple - spark-submit --deploy-mode client --master yarn --class Run app.jar