Clarification on Hudson/Jenkins Slave operation

Clarification on Hudson/Jenkins Slave operation - build

Fairly simple question really, yet I can't find a clear answer on google for it. In Hudson or Jenkins, when you set up slave nodes to build, does the build system piece of parts of jobs to the slaves, or does it send out full builds? For example:
Say I have two projects to build, i386 version and an x86_64 version. If I use slave nodes in my system, will Jenkins farm the x86_64 verison to a slave, while the master builds the i386 version? or will it farm out pieces to the slave (say, compiling linux kernel to the slave, while the master builds another piece?
Thanks!

Short answer : a complete job is done by only one slave
When using slaves in a Hudson/Jenkins instance, you will be able to send jobs on every nodes.
You wil be able to build many jobs in parallel, on multiple slaves, but each job instance will be done completely by a unique slave, no load-balancing of the part of a job.

Related

How do I make Drone OSS build anything?

I have built the Drone OSS using instructions at https://github.com/harness/drone/blob/master/BUILDING_OSS and successfully connected it to GitHub, it is triggering builds and I can log-in to the UI.
However, nothing happens after a build is triggered, the pipeline is stuck in Loading... and step does not execute.
Now this is not my first rodeo with Drone, I have the enterprise one running just fine, with runners connected and builds running fine. So I am relatively certain I don't have setup issues.
It is my understanding the OSS one does not support runners, and sure enough when runners try to connect to it they get 404 on the API endpoints they are trying to connect to.
So the question then is, how does one actually build anything with Drone OSS? what pipeline syntax / config must one use?
I am at a loss.

Solution: Start server with DRONE_AGENTS_DISABLED=true
Source: https://community.harness.io/t/drone-community-not-running-builds/11024/5

AWS Step Functions vs Luigi for orchestration

My team had a monolithic service for a small scale project but for a re-architecture and scaling, we are planning to move to cloud services of Amazon AWS and evaluating for orchestration whether to run Luigi as a container task or use AWS Step Functions instead? I don't have any experience with any of them especially Luigi.
Can anyone point out any issues that they have seen with Luigi or how it can prove to be better than AWS if at all? Any other suggestions for the same.
Thanks in advance.

I don't know about how AWS does orchestration, but if you are planning to at any time scale to at least thousands of jobs, I would not recommend investing in Luigi. Luigi is extremely useful for small to medium(ish) projects. It provides a fantastic interface for defining jobs and ensuring job completion through atomic filesystem actions. However, the problem when it comes to Luigi is the framework for running jobs. Luigi requires constant communication to workers for them to run, which in my own experience destroyed network bandwidth when I tried to scale.
For my research, I will generate a network of 10,000 tasks on a light to medium workflow, using my university's cluster computing grid which runs SLURM. All of my tasks don't take that long to complete, maybe 5 min max each. I have tried the following three methods to use Luigi efficiently.
SciLuigi's slurm task to submit jobs to SLURM from a central luigi worker (not using central scheduler). This method works well if your jobs will be accepted quickly and run. However, it uses an unreasonable amount of resources on the scheduling node, as each worker is a new process. Further, it destroys any priority you would have in the system. A better method would be to first allocate many workers and then have them continually work on jobs.
The second method I attempted was just that. I started the Luigi central scheduler on my home server (because otherwise I could not monitor the state of work, just like in the above workflow) and started up workers on the SLURM cluster that all had the same configuration, so each of them could run any part of the experiment. The problem was, even with 500Mbps internet, past ~50 workers Luigi would stop functioning and so would my internet connection to my server. So, I began running jobs with only 50 workers, which drastically slowed my workflow. In addition, each worker had to register each job with the central scheduler (another huge pain point), which could take hours with only 50 workers.
To mitigate this startup time I decided to partition the root-task subtrees by their parameters and submit each to SLURM. So now the startup time is reasonably low, but I lost the ability for any worker to run any job, which is still pretty important. Also, I can still only work with ~50 workers. When I completed the subtrees, I ran one last job to finish the experiment.
In conclusion, Luigi is great for small to medium-small workflows, but once you start hitting 1,000+ tasks and workers, the framework quickly fails to keep up. I hope that my experiences provide some insight into the framework.

Changes to ignite cluster membership unexplainable

I am running a 12 node jvm ignite cluster. Eeach jvm runs on its own vmware node. I am using zookeeper to keep these ignite nodes in sync using tcp discovery. I have been seeing lot of node failures in zookeeper logs
although the java processes are running, I don't know why some ignite nodes leave the cluster with "node failed" kind of errors. Vmware uses vmotion to do something what they call as "migration".I am assuming that is some kind of filesystem sync process between vmware nodes.
I am also seeing pretty frequent "dumping pending object" and "Failed to wait for partition map exchange" kind of messages in the jvm logs for ignite.
My env setup is as follows:
Apache Ignite 1.9.0
RHEL 7.2 (Maipo) runs on each of the 12 nodes
Oracle Jdk1.8.
Zookeeper 3.4.9
Please let me know your thoughts.
TIA

There are generally two possible reasons:
Memory issues. For example, if a node goes to long GC pause, it can become unresponsive and therefore removed from topology. For more details read here: https://apacheignite.readme.io/docs/jvm-and-system-tuning
Network connectivity issues. Check if the network between your VMs is stable. You may also want to try increasing the failure detection timeout: https://apacheignite.readme.io/docs/cluster-config#failure-detection-timeout

VM Migrations sometimes involve suspending the VM. If the VM is suspended, it won't have a clean way to communicate with the rest of the cluster and will appear down.

How to use hudson when building for multiple platforms

Right now we are building a number of C++ apps for Win32 platform. We will be soon porting to Linux and then maybe more (32 and 64 bits for both).
What is the standard practice , do you use multiple hudson servers each on their own platform to do a build, or does the hudson service create VMs and do builds?
It is not clear to me the best practical way to do this.
Ideally I just want one box with a bunch of VMs running hudson, and then it kicks off builds as needed.
Is there a resource someone can point me to for this?

We use Hudson to manage C/C++ (GNU C, GNU C++, Watcom C) builds for multiple OSs. For us, software is built for Linux, Linux x64, QNX 4, and QNX6. The way we have it set up is:
1 x VM for the Hudson server, running Windows
4 x VMs, one for each slave type, so I have 4 Hudson slaves - 1 each for QNX4, QNX6 and Linux 32 and Linux 64. All of them are running on the same server, just as different VMs, and we have faced no problems. We build about a 100 projects, divided almost equally between the 4 system types.
You should not require any additional hardware. There is a Hudson plugin that works with VMWare VMs, to start them up and shut them down as required.
I hope that helps.

I've never used hudson for C++ but for what you are planning to do, it might make sense to look at the VMWare plugin and see if it will do what you want. I would recommend having only a single Hudson master if possible. What you most likely want to do is set up a VMWare machine image with a Hudson Slave process for each target environment then spawn a build in that Slave.

I have played with hudson in a multiple platform scenario a bit more than a year ago. I had one hudson server (which was ridiculously easy to setup) on some machine and separate build slaves for each of the platforms. I remember that for a while one of the build clients was in a VirtualBox on the machine that hosted the hudson server. (I think I had the server on a VM for a while, too.) I cannot remember there being any principle problem with this setup.
However, if you want to have several virtual build machines building on the same physical machine I think you'd need a very powerful machine for that. C++ compilation takes quite an amount of resources and, IIRC, when hudson starts a build, it starts it on all platforms at the same time.

Note that there need not be any relation between the server that's running Hudson and the slave machines that are building your software apps. Due to the magic of Java, you can connect the disparate slave machines to the master using JNLP. (one example) So, whether they are physical or virtual machines, you can have one running Windows, another Linux; one 32-bit, another 64-bit; etc -- whatever your apps require. As long as they all have the JRE installed they can connect to the Hudson master and report the status of the builds.

Increasing Jenkins process priority?

When my C++ program build script is called from a Jenkins job, it takes far more time to be built. The CPU usage instead of being on 100% is on only taking 16%.
Of course I don't want Jenkins to fully occupy my computer rendering it unusable while doing a build but making it faster would be very useful.
I have installed Jenkins via brew on Mac OS.
Does anyone know how to change the priority of the Jenkins process so it's allowed to use more CPU while building?

Following one of the comments suggestions I have decided to increase the heap size of the Java machine on the homebrew.mxcl.jenkins.plist file:
<string>-Xmx2048m</string>
And then call:
brew services stop jenkins
brew services start jenkins
The behaviour was the same so I decided to restart the machine and try again and now it is working as supposed. I'm not sure if this was a general glitch or if it was related with the Java heap size parameter.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js