Is it possible to stagger builds in Hudson/Jenkins? - build

I have Jenkins set up to build XBMC images for different platforms. My system takes around 6 hours to build each image, so I prefer to run them in parallel, usually 2 or 3 at a time. The problem with this is, that if they have to download updates to modules (like linux kernel or sometihng), the 2 or 3 building in parallel will download at the same time, corrupting the download (they point to the same folder)
Is it possible in jenkins/hudson to specify an offset? (I know you can schedule builds, as well as use a trigger that builds after completion of one project) something like:
Build 1: immediately
Build 2: start 20 minutes after build 1
Build 3: start 20 minutes after build 2
I tried looking for a plugin as well as google but no luck. I also know that I could schedule via the cron-like schedule capabilities in jenkins, but I have my build trigger set up to poll the GIT repo to look for changes for a build, I'm not just blind scheduling.

One way to do it is to choose the "Quiet Period" option under "Advanced".
Set it to 1200 seconds for Job 2, and 2400 seconds for Job 3.
That means Job 1 will be queued immediately when a change is noticed in git, Job 2 will go into the queue with a 20 minute delay, and Job 3 with a 40 minute delay.

Another way to do this would be to make the job some sort of a build flow (whether with the build flow plugin or by saying that the last task of job A is to run job B). If you can turn the download into its own job, then you can define the "download" job as single-threaded, and the rest as multithreaded.
Doing this serializes only what needs to be serialized. Doing an "every twenty minutes" thing will waste time when it takes fifteen minutes to download, and will fail (possibly in a hard-to-debug way) when there's a slowdown and it takes twenty-five minutes to download.

Related

How does Bazel record Cache hit

Hi all the Bazel users,
I am trying to understand the logging of the cache hit and action execution environment in Bazel.
Firstly, I have noticed that there is a Action_cache.proto file which provides information about the cache hits. Can I assume that it is the local cache hits count?
Secondly, in ActionResult.java class there is a boolean locallyExecuted which returns true if all spawns of action were executed locally. In my understanding 1 action = 1 spawn, so does it mean 1 action = x spawn, and some spawns could be executed locally/remotely? That would be confusing since the runner's list in SpawnStats.java is based on each spawn but when printed in the build summary (in the console) we can see that number of runners = number of actions.
Lastly, in SpawnMetrics.java we can find the different execution kinds namely REMOTE, LOCAL, WORKER, OTHER. I was just wondering if WORKER belongs to local execution?
Thanks for your help!
Calculations for checking the cache rate is mentioned here
INFO: 7 processes: 3 remote cache hit, 4 linux-sandbox.
This means that out of 7 attempted actions, 3 got a remote cache hit and 4 actions did not have cache hits and were executed locally using linux-sandbox strategy. Local cache hits are not included in this summary. If you are getting 0 processes (or a number lower than expected), run bazel clean followed by your build/test command.

AWS Batch jobs stuck in PENDING when they `dependsOn`

I have an issue chaining AWS Batch jobs.
There are 3 Compute environments (CE_A, CE_B, CE_C) and they have associated one Job queue each (JQ_A, JQ_B, JQ_C).
There are 6 Job definitions (JD_1, JD_2, ..., JD_6).
Let <jqce>-<jd>-<name> be a Job launched on job queue (or compute environment) <jqce> and with job definition <jd>. Example: A-1-a, C-6-z.
I want to execute sequentially about 20 jobs (launched with different environment variables): A-1-a, A-1-b, B-2-c, A-3-d, A-3-e, A-3-f, ...
For each job I specify the dependency on previous job with:
params.dependsOn = [{ "jobId": "xxxxx-xxxx-xxxx-xxxxxx"}] in Batch.submitJob(params).
The first two jobs A-1-a and A-1-b execute successfully after waiting few minutes for ressource allocation.
The third job, B-2-c also executes successfully, after a some minutes of waiting for the Compute environment CE_B to be up.
Meanwhile, the compute environment CE_A is turned off since no job has presented.
HERE IS THE PROBLEM:
I expect at this point that CE_B goes down and CE_A goes up. CE_A is not going up.
The A-3-d is never executed, 16 hours later it is still in PENDING status.
The dependsOn is ok, its dependency ended long time ago.
Without dependsOn the Batch runs ok, with the same environment variables and config.
QUESTIONS
Did you face similar problems with AWS Batch and dependsOn?
Is it possible to chain batches from different Job Queues?
Is it possible to chain batches from different Compute Environments?
Does the params.dependsOn = [{ "jobId": "xxx-xxx-xxx-xxx" }] seem ok to you? It seems I do not have to set the type attribute see array jobs;
Does the params.dependsOn = [{ "jobId": "xxx-xxx-xxx-xxx" }] seem ok to you? It seems I do not have to set the type attribute see array jobs;
Yes, type is only required when it's defined as an Array job. And the JobID you're providing is what was returned when you submitted the specific job?
Is it possible to chain batches from different Job Queues?
Is it possible to chain batches from different Compute Environments?
You should be able to do it but I've never done that.
Meanwhile, the compute environment CE_A is turned off since no job has presented.
So CE_A was running already and ran A-1-a, A-1-b already?
As I recall AWS checks every 10 minutes for certain statuses and people have run into cases where the system seems stuck.
You could set CE_A to always have a minimum of 1 CPU so it doesn't disappear or become difficult to get a version of.
Can you simply for testing purposes? Shorter actions, reducing Queues, etc
Consider checking the AWS forum on Batch. Not much activity there but worth an additional set of eyes.

Robot Framework: Set Timeout in Robot framework

I have created a framework in which I have used Set Browser Implicit Wait 30
I have 50 suite that contains total of 700 test cases. A few of the test cases (200 TC's) has steps to find if Element present and element not present. My Objective is that I do not want to wait until 30 seconds to check if Element Present or Element not Present. I tried using Wait Until Element Is Visible ${locator} timeout=10, expecting to wait only 10 seconds for the Element , but it wait for 30 seconds.
Question : Can somebody help with the right approach to deal with such scenarios in my framework? If I agree to wait until 30 seconds, the time taken to complete such test case will be more. I am trying to save 20*200 secs currently Please advise
The simplest solution is to change the implicit wait right before checking that an element does not exist, and then changing it back afterwards. You can do this with the keyword set selenium implicit wait.
For example, your keyword might look something like this:
*** Keywords ***
verify element is not on page
[Arguments] ${locator}
${old_wait}= Set selenium implicit wait 10
run keyword and continue on failure
... page should not contain element ${locator}
set selenium implicit wait ${old_wait}
You can simply add timeout="${Time}" next to the keyword you want to execute (Exp., Wait Until Page Contains Element ${locator} timeout=50)
The problem you're running into deals with issue of "Implicit wait vs Explicit Wait". Searching the internet will provide you with a lot of good explanations on why mixing is not recommended, but I think Jim Evans (Creator of IE Webdriver) explained it nicely in this stackoverflow answer.
Improving the performance of your test run is typically done by utilizing one or both of these:
Shorten the duration of each individual test
Run test in parallel.
Shortening the duration of a test typically means being in complete control of the application under test resulting in the script knowing when the application has successfully loaded the moment it happens. This means having a a low or none Implicit wait and working exclusively with Fluent waits (waiting for a condition to occur). This will result in your tests running at the speed your application allows.
This may mean investing time understanding the application you test on a technical level. By using a custom locator you can still use all the regular SeleniumLibrary keywords and have a centralized waiting function.
Running tests in parallel starts with having tests that run standalone and have no dependencies on other tests. In Robot Framework this means having Test Suite Files that can run independently of each other. Most of us use Pabot to run our suites in parallel and merge the log file afterwards.
Running several browser application tests in parallel means running more than 1 browser at the same time. If you test in Chrome, this can be done on a single host - though it's not always recommended. When you run IE then you require multiple boxes/sessions. Then you start to require a Selenium Grid type solution to distribute the execution load across multiple machines.

MRTG ThreshPro is only one time. Not runs every 5 minute

I have setup MRTG-rrdtools-routers2.cgi and setup working fine and happy as a beginner :)
I have set 'ThreshDir:', 'ThreshMinI' and 'ThreshProgI' in MRTG cfgs. At the first run my script in 'ThreshProgI' is run without any issue but it not going to run in the next 5 minutes runs.
I see that in the 'ThreshDir:' location, there is a file generate at at first MRTG run. If I remove that file then my script in 'ThreshProgI' will run in the next MRTG run.
So far what I notice here is that after generating the 'ThreshDir:' file, 'ThreshProgI' will stop working in my setup. What could be the reason for this, how can I make 'ThreshProgI' run every 5 minutes (when 'ThreshMinI' fails).
This is by design.
MRTG only runs the threshold program on the FIRST time the threshold is broken, and not on the following runs, until it recovers. The last status is held in the ThreshDir in order to manage this.
There is another definition for a threshold program to run on recovery.
The only way to trick MRTG into running the threshold program on every run regardless of the previous status, is to delete the status history file in the ThreshDir each pass (as you are doing).

Will Cron start a new job if the current job is not complete? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to prevent the cron job execution, if it is already running
I have a cron job that may take 2 mins or may take 5 hours to complete.
I need to make sure that this job is always executed.
My question is:
Will it start after the previous one is done or will they both run at the same time and mess up the database if i set it to execute every minute ?
They'll run at the same time. The standard practice around this is to create a file lock (commonly referred to as a flock), and if the lock isn't available, don't run.
The advantages to this over Zdenek's approach is that it doesn't require a database, and when the process ends, the flock is automatically released. This includes cases where the the process is killed, server rebooted, etc.
While you don't mention what your cron job is written in, flock is standard in most languages. I'd suggest googling for locks and the language you're using, as this will be the more robust solution, and not relying upon random timeouts. Here are some examples from SO:
Shell script
Python
PHP
Perl
They will both run at the same time.
I would suggest to record it in database. I would record when script started and if task was successfully finished.
The script should also update record in db - something like every 5 mins - I am still running.
When new job is started, it should check if all previous tasks are finished or last update was more then then minutes (?) ago, if yes then run, no then exit.
This way you know that previous script is running or if it died, job which is not finished and hasn't been updated for 10 minutes is going to be marked as finished.
I should mention this solution might be better than flock when you have multiple servers and task can be triggered on any of them.