I am using Jenkins on a small test server that has only two cores.
I noticed that jenkins binds each build to a core, i.e., if a job is launched it will use only one core.
My tests are essentially matrix operations and I need jenkins to use several cores per build.
Is there a plug-in to enable this feature?
For anyone running into the same issue. Jenkins is multicore by default. My issue was that I was running an R application and it turns out that the base linear algebra library of R does not multithread. Installing a parallel library https://csantill.github.io/RPerformanceWBLAS/ enabled multithreading.
Related
I’m a bit confused about how new MapReduce2 applications should be developed to work with YARN and what happen with the old ones.
I currently have MapReduce1 applications which basically consist in:
Drivers which configure the jobs to be submitted to the cluster (previous JobTracker and now the ResourceManager).
Mappers + Reducers
From one side I see that applications coded in MapReduce1 are compatible in MapReduce2 / YARN, with a few caveats, just recompiling with new CDH5 libraries (I work with Cloudera distribution).
But from other side I see information about writing YARN applications in a different way than MapReduce ones (using YarnClient, ApplicationMaster, etc):
http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
But for me, YARN is just the architecture and how the cluster manage your MR app.
My questions are:
Are YARN applications including MapReduce applications?
Should I write my code like a YARN application, forgetting drivers
and creating Yarn clients, ApplicationMasters and so on?
Can I still develop the client classes with drivers + job settings?
Are MapReduce1 (recompiled with MR2 libraries) jobs managed by YARN
in the same way that YARN applications?
What differences are between MapReduce1 applications and YARN applications regarding the way in which YARN will manage them internally?
Thanks in advance
HADOOP Version 1
The JobTracker is responsible for resource management---managing the slave nodes--- major functions involve
tracking resource consumption/availability
job life-cycle management---scheduling individual tasks of the job, tracking progress, providing fault tolerance for tasks.
Issues with Hadoop v1
JobTracker is responsible for all spawned MR applications, it is a single point of failure---If JobTracker goes down, all applications in the cluster are killed. Moreover, if the cluster has a large number of applications, JobTracker becomes the performance bottleneck, to address the issues of scalability and job management Hadoop v2 was released.
Hadoop v2
The fundamental idea of YARN is to split the two major responsibilities of the Job-Tracker—that is, resource management and job scheduling/monitoring—into separate daemons: a global ResourceManager and a per-application ApplicationMaster (AM). The ResourceManager and per-node slave, the NodeManager (NM), form the new, and generic, operating system for managing applications in a distributed manner.
To interact with the new resourceManagement and Scheduling, A Hadoop YARN mapReduce Application is developed---MRv2 has nothing to do with the mapReduce programming API
Application programmers will see no difference between MRv1 and MRv2, MRv2 is fully backward compatible---Yes a MR application(.jar), can be run on both the frameworks without any change in code.
The Hadoop 2.x already contains the code for MR Client and AppMaster, the programmer just needs to focus on their MapReduce Applications.
MapReduce was previously integrated in Hadoop Core---the only API to interact with data in HDFS. Now In Hadoop v2 it runs as a separate Application, Hadoop v2 allows other application programming frameworks---e.g MPI---to process HDFS data.
Refer to Apache documentation page on YARN architecture and related SE posts:
Hadoop gen1 vs Hadoop gen2
Are YARN applications including MapReduce applications?
YARN support Mapreduce applications. It also runs Spark jobs unlike in Hadoop 1.x.
Should I write my code like a YARN application, forgetting drivers and creating Yarn clients, ApplicationMasters and so on?
Yes. You should forget about all these application components and write your application. Have a look at sample code
Can I still develop the client classes with drivers + job settings? ¿Are MapReduce1 (recompiled with MR2 libraries) jobs managed by YARN in the same way that YARN applications?
Yes. You can do. But look at this compatibility article.
What differences are between MapReduce1 applications and YARN applications regarding the way in which YARN will manage them internally?
Refer to this SE post:
What additional benefit does Yarn bring to the existing map reduce?
YARN is just a cluster manager.
First the applications have to developed for YARN (if not already implemented). Here are few of the applications which are supported on YARN. If you want a new appplications to run on YARN this is the guide.
Then the same MR/Spark/Hama programs can be run on YARN.
We are building several project with TeamCity. In addition to an agent on the main server which runs on Linux, we also have three additional agents that run on separate boxes. One on Linux, one on Mac and one on Windows.
If all agents are idle, the first available agent, as they are listed, is chosen for the build. This means that if the load on TeamCity is small, the same agent is always used. We have had situations where a project had been built successfully by the same Linux agent for more than 50 builds, then when it finally had a run on the Windows agent, a test failed, due to code that was committed fairly early in the run of the 50 builds.
Since many of our tests may be affected by the environment, we are looking for ways to spread the builds on the agents, automatically. Is there any way of setting up a round robin agent selection policy? Or any other way to spread the builds on the agents?
You can have a schedule trigger that runs a build on all agents.
Or configure a build for each platform (linux,windows, mac, ...) that will run on a specific agent, selected with some specific requirements.
The answer to my question, at least for TeamCity 8 and earlier, is NO.
See JetBrains own TeamCity Developer forum: https://devnet.jetbrains.com/message/5533629
I need information regarding distributed build with Jenkins. The distribution i need is not the normal Jenkins distributed build (Master/slave config) where it acts like a load balancer so that the job will get executed on the available node.
For cpp projects , there are tools like distcc,netcc etc to distribute build across several machines on network so that the compilation will be fast. Is there any similar tools or way that we can use inorder to reduce the build timing?
thanks in advance
Jenkins is not a compiler - it is merely a coordinator for software build activities.
There is nothing stopping you from using distcc or similar in a build script that Jenkins starts, and the compiler nodes does not need to be aware of the fact that Jenkins started it.
If you have a distributed compiler and can make use of it from your command prompt, it can be called from a Jenkins job as well.
We use Jenkins for our CI build system. We also use 'concurrent builds' so that Jenkins will build each change independently. This means we often have 5 or 6 builds of the same job running simultaneously. To accommodate this, we have 4 slaves each with 12 executors.
The problem is that Jenkins doesn't really 'load balance' among its slaves. It tries to build a job on the same slave that it previously built on (presumably to reduce the time syncing from source control). This is a problem because Jenkins will build all 6 instances of our build on the same slave (or more likely between 2 slaves). One build machine gets bogged down and runs very slowly while the rest of them sit idle.
How do I configure the load balancing behavior of Jenkins, and how it controls its slaves?
We were facing a similar issue. So I've put together a plugin that changes the Load Balancer in Jenkins to select a node that currently has the least load - https://plugins.jenkins.io/leastload/
Any feedback is appreciated.
If you do not find a plugin that does it automatically, here's an idea of what you can do:
Install Node Label Parameter plugin
Add SLAVE parameter to your jobs
Restrict jobs to run on ${SLAVE}
Add a trigger job that will do the following:
Analyze load distribution via a System Groovy Script and decide on which node to start next build.
Dispatch the build on that node with Parameterized Trigger
plugin
by assigning appropriate value to SLAVE parameter.
In order to analyze load distribution you need to install Groovy plugin and familiarize yourself with Jenkins Main Module API. Here are some useful initial pointers.
If your build machines cannot comfortably handle more than 1 build, why configure them with 12 executors? If that is indeed the case, you should reduce the number of executors to 1. My Jenkins has 30 slaves, each with 1 executor.
You may also use the Throttle Concurrent Builds plugin to restrict how many instances of a job can run in parallel on the same node
I have two labels -- one for small tasks and one for big tasks. I have one executor for the big task and 4 for the small tasks. This does balance things a little.
I am a Qt/C++ developer. I would like to setup a continuous integration environment whereby after committing the source code, it triggers a build process that build the code for the 3 platforms I'm using:
Linux
OS X
Win32
If possible, how do I setup such environment. Any hints or links are welcome.
I've read around about Jenkins, but I can't find any good tutorial for it.
I also suggest Jenkins for several reasons:
It will run on all of the platforms you listed.
It can be configured to start a build when the repository is updated (hint: configure the Job to "Poll SCM" and you won't have to muck with your SCM tool to get it to tell Jenkins to start building).
It provides good support (mostly through plugins) for Unit Testing. [You're project is doing unit testing, right?]
The price is right
A bigger issue is going to have is that AFAIK, Qt doesn't really do cross-compiling for other platforms well. Using Jenkins (and the appropriate plugins), you should be able to solve this.
One method that comes quickly to mind is to have an instance of Jenkins on each platform. Each instance is responsible for building the version for its own platform. At the end of the build, the created artifacts are all put into a common, shared location.
Jenkins supports this feature via plugins for all major source control systems. If you seriously considering using Jenkins (and I would highly recommend it), consider buying John Ferguson Smart's Jenkins: The Definitive Guide.
Two solutions coming to my mind:
BuildBot
BuildBot is a highly customizable continuous integration system written in Python. The master component offers a nice web-based GUI to monitor and trigger builds; slave components are put on the target machines (usually virtual machines but they could be the Mac laptop of one of the developers). Docs are good enough to build up a basic system, customization could be a little tricky (at least it was for me). Using commit/push hooks provided by VC systems you can easily activate the master and trigger builds across the slaves. It also supports incremental builds (a must if your project is big).
CDash
Developed by the authors of CMake, CDash is a web application collecting builds coming from across the network, not exactly what you asked for but I think it's worth a try. Very powerful if you have a team of developers who could continuosly submit build result on their machines to the server (and if you use CMake it's almost transparent). You cannot trigger builds from the server as Buildbot does, but you could setup a bunch of VM with a cron which checks for changes and in case performs the build and sends results to CDash
Sure it's possible. Most of the version control systems are able to execute custom script on server side. Some of them (git, for example), has hooks to achieve the same locally. Have a look at git's post-commit hook.
All you need is to create a script that will trigger cross-platform builds.
Most version control systems allow post-commit hooks to allow you to kick off events like builds. Alternatively build systems can be configured to regularly poll a source control repository and manage their own build scheduling (this is how we use Jenkins).
Something to bear in mind is how long it will take to do a complete build across platforms and the typical number of check-ins in that interval. You might find batching check-ins a better way of doing continuous integration builds if you have an fair sized team or limited build server resources. Otherwise your build system could quickly end up trying to play catch up.
As for whether it is possible to build on all target platforms, that depends on your tool chain.