Keep running Dataproc Master node - google-cloud-platform

Is it possible to keep the master machine running in Dataproc? Every time that I run the job after a while (~1 hour), I see the master node is stopped. It is not a real issue since I would easily start it again but I would like to know if there is a way to keep it awake.
A possible way that occurs to me is to do a schedule job in the master machine, but want to know if there is more official way to achieve this.

Dataproc does not stop any cluster nodes (including master) when they are idle.
You need to check if you have some kind of automation or user that can do this on your end.

Related

Does GCP scheduled VM exits/loses its schedule when the VM is manipulated, stop/start, manualy?

I have previously scheduled my VM (e2-micro in europe-central2-a) succesfully. It has a schedule assigned to start and stop at certain hours... in order to save machine use time at night. It worked until I stopped the instance manually from CLI for some work.
Since that, the schedule is not forcing the instance to Start-Stop and I have to do it manually.
Now, The machine is working fine, no issues, But it doesn't resume/return to said schedule previously set if I stop-start it manually (for some manual work through ssh-connection). Is there a specific procedure (maybe I missed some instructions for the case) to do it?

Can I configure Google DataFlow to keep nodes up when I drain a pipeline

I am deploying a pipeline to Google Cloud DataFlow using Apache Beam. When I want to deploy a change to the pipeline, I drain the running pipeline and redeploy it. I would like to make this faster. It appears from the logs that on each deploy DataFlow builds up new worker nodes from scratch: I see Linux boot messages going by.
Is it possible to drain the pipeline without tearing down the worker nodes so the next deployment can reuse them?
rewriting Inigo's answer here:
Answering the original question, no, there's no way to do that. Updating should be the way to go. I was not aware it was marked as experimental (probably we should change that), but the update approach has not changed in the last 3 i have been using DF. About the special cases of update not working, supposing your feature existed, the workers would still need the new code, so no really much to save, and update should work in most of the other cases.

AWS Sagemaker: Jupyter Notebook kernel keeps dying

I get disconnect every now and then when running a piece of code in Jupyter Notebooks on Sagemaker. I usually just restart my notebook and run all the cells again. However, I want to know if there is a way to reconnect to my instance without having to lose my progress. At the minute, it shows that there is "No Kernel" at the bottom bar, but my file seems active in the kernel sessions tab. Can I recover my notebook's variables and contents? Also, is there a way to prevent future kernel disconnections?
Note that I reverted back to tornado = 5.1.1, which seems to decrease the number of disconnections, but it still happens every now and then.
Often, disconnections will be caused by inactivity because a job is running for a long time with no user input. If it's pre-processing that's taking a long time, you could increase the instance size of the processing job so that it executes faster, or increase the instance count. If you're using EMR, you can now run an EMR Spark query directly on the EMR cluster since December 2021:
https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-sagemaker-studio-data-notebook-integration-emr/
There's a useful blog here https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/ which is helpful in getting you up and running.
Please let me know if you need more information, or vote for the answer if it's useful. :-)
For me a quick solution was to open a Terminal instead, save the notebook file as a Pytohn file, and run it from the terminal within Sagemaker.

Amazon EMR: Only start new scheduled job if previous job has finished

I have a AWS EMR cluster job which runs every 2 hours. I have set up a schedule using cloudWatch job to run every two hours.
But sometimes the next job (which runs after 2 hour from previous one) starts when previous one is not finished as it sometimes take more than 2 hours to completed depending on data to be processed.
I need some configuration by which I could prevent next job to be started if previous job is running.
I tried but couldn't found any set up. Can anyone knows how to do that please?
Add them as EMR steps. EMR steps run sequentially by default(Unless you change the concurrency setting)

How can I modify the Load Balancing behavior Jenkins uses to control slaves?

We use Jenkins for our CI build system. We also use 'concurrent builds' so that Jenkins will build each change independently. This means we often have 5 or 6 builds of the same job running simultaneously. To accommodate this, we have 4 slaves each with 12 executors.
The problem is that Jenkins doesn't really 'load balance' among its slaves. It tries to build a job on the same slave that it previously built on (presumably to reduce the time syncing from source control). This is a problem because Jenkins will build all 6 instances of our build on the same slave (or more likely between 2 slaves). One build machine gets bogged down and runs very slowly while the rest of them sit idle.
How do I configure the load balancing behavior of Jenkins, and how it controls its slaves?
We were facing a similar issue. So I've put together a plugin that changes the Load Balancer in Jenkins to select a node that currently has the least load - https://plugins.jenkins.io/leastload/
Any feedback is appreciated.
If you do not find a plugin that does it automatically, here's an idea of what you can do:
Install Node Label Parameter plugin
Add SLAVE parameter to your jobs
Restrict jobs to run on ${SLAVE}
Add a trigger job that will do the following:
Analyze load distribution via a System Groovy Script and decide on which node to start next build.
Dispatch the build on that node with Parameterized Trigger
plugin
by assigning appropriate value to SLAVE parameter.
In order to analyze load distribution you need to install Groovy plugin and familiarize yourself with Jenkins Main Module API. Here are some useful initial pointers.
If your build machines cannot comfortably handle more than 1 build, why configure them with 12 executors? If that is indeed the case, you should reduce the number of executors to 1. My Jenkins has 30 slaves, each with 1 executor.
You may also use the Throttle Concurrent Builds plugin to restrict how many instances of a job can run in parallel on the same node
I have two labels -- one for small tasks and one for big tasks. I have one executor for the big task and 4 for the small tasks. This does balance things a little.