Getting started with MapReduce version 2 - mapreduce

Good morning,
I did not succeed to find a mapReduce example on YARN (i.e. The 2nd version of MapReduce), the one that is always presented is WordCount which is just the very same code as the one presented on the first version of MapReduce.
Even "Hadoop: the definitive guide" doesn't have codes in YARN!
Can you provide me with a code that shows me the difference of writing a mapReduce code in the previous version and the newest version?
In fact, I was trying to write a branch and bound code on MR1 but then I saw that YARN can make the things easiser thanks to BranchReduce.
Any help is appreciated,
Thanks in advance

You could compile a program written for MRv1 with Yarn(MRv2) without modifying any single line of the source code. It is completely source-code compatible.
Here is the Yarn Example: http://wiki.apache.org/hadoop/WordCount
Here is the Map Red 1 Example: https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v1.0
Some obvious differences to note at the API level:
- New API uses abstract classes over interfaces
- The mapreduce package is different

Related

How do you write a system test for your runtime?

Hi I am developing a runtime using Substrate-FRAME and I would like to know how can I write a system test for my runtime?
The main purpose of writing a system test is to ensure that the final build is fulfilling all of the required specifications and also to ensure nothing is compromised on a runtime upgrade.
The idea for me is something similar to point no. 2 mentioned in this thread.
Any documentation regarding this type of tests would be greatly helpful.
Update:
I ended up using py-substrate-interface to make test scenarios. Now I can automatically deploy nodes to form a network (thanks to Python) and run my custom system test scenarios. Very useful tool for developing runtimes in Substrate.
There is an overview here on the DevHub
And there are examples throughout substrate that include tests.rs and mock.rs files to use as reference.
If you have not already, checkout the create a pallet tutorial and the recipes all have some tasty examples to look at for these as well.

GNU Parallel host sticky jobs

I am writing a parallel build farm to build C++ cross-platform applications against various platforms / environments. Every time new code is pushed to a git repo, I build and test the latest code against all the platforms.
I've setup parallel to correctly distribute the jobs among several hosts using the --sshlogin option.
I transfer files, collect output and results. It's all working more than fine and I love the tool.
The build time being sometimes quite long for some platforms, I would like the build to be as incremental as possible.
My only issue is that the build is only incremental if the scheduler sends the jobs to the same machine and reuse the artefacts of the previous build on this specific host.
Say I have 3 hosts, I have 1 chance in 3 for the build to be incremental. If a hosts hasn't built this platform in a while, it might take a long time.
Is it possible to gain control over the host a specific input source will run on and only fallback to the other hosts if the host is busy?
Ideally, I would love to see a tag system where I tag input source with a name and tag several hosts with a name, creating pools of jobs and pools of machines specialized into that type of build.
But a very simple implementation where the input sources are distributed in the same order as the order the sshlogins are defined could be a simple & quick fix in my situation.
I tried to find the source code to implement it myself but I only see doc generation when I browse the code on Savannah.
Any ideas?
Thanks,
M
There is currently no support for prioritizing a given argument to a given sshlogin. The source code is at https://savannah.gnu.org/git/?group=parallel
Feel free to join the mailing list and discuss the idea: https://lists.gnu.org/mailman/listinfo/parallel
The only priority in the code is when a job has failed on an sshlogin, then GNU Parallel prefers to retry that job on another sshlogin. Maybe that could be extended?
If a job is marked as having failed -1 time for a given sshlogin, then GNU Parallel ought to prefer to run the job on that sshlogin.
I've been trying to discuss this idea on the mailing list as you suggested but never had any respone in more than 10 days... I guess you must be busy with other things at the moment. So I went along and forked the source code to make the necessary changes and make my solution work.
I pushed it there a week ago:
http://michakfromparis.github.io/gnu-parallel-sticky/
the source code is available on github here:
https://github.com/michaKFromParis/gnu-parallel-sticky
Wasn't exactly easy without any guidance as the source code has a lot of history so I tried to keep the changes surgical to ease merge of your future releases.
I've been using it in production for more than a week now and it works perfectly in my configuration.
It is also compatible with older formats, should be a drop-in replacement for usual parallel uses with extra features on the side.
Would love to get feedback from other users though as it might not be completely dry.
Thanks for sharing the original source code.
Best Regards,
M

Test Session for pallet

Is there a way of testing pallet crates? I am trying to build an elasticsearch crate but each time I want to test something is working I need to start a machine and wait for everything to install etc. Possibly a way to just see what commands would be sent to the machine would be useful to start with and would provide a lot of insight.
I'm discussing this topic currently also. As I'm right, pallet translates config-actions from clojure to bash - right?
So the clojure part should be testable in the normal clojure way - I found these options valid:
http://blog.jayfields.com/2010/08/clojuretest-introduction.html
https://github.com/marick/Midje and
a discussion comparing these
options: How do Midje and Speclj compare?
The translation to bash itself should be well tested in pallet project - so all units are covered.
But anyway I expect there many integration-test problems left - and at this point I haven't any good idea yet.
Okay - your comment made me find an example for testing in pallet (would be interested in your experience ;-)
https://github.com/pallet/pallet/tree/develop/test/pallet/crate

Rendering using OpenSG python bindings

Hey!! I'm looking for python bindings for opensg 1.8.. I haven't been able to find it. I have read somethin about pyopengs. Is it still available? I am working on linux platform (ubuntu). If anyone could direct me to it I would be grateful.
The homepage and source code is on google code: pyOpenSG Project
As one of the creators of pyOpenSG, I can tell you that it is definitely still alive and kicking. We use it in production software all the time. It has become so stable for us though that we don't often update the code base. The python binding generator that we use (py++) just keeps everything working between revisions.

good/full Boot Spirit examples using version 2 syntax

Almost all of the examples I've gone and looked at so far from: http://boost-spirit.com/repository/applications/show_contents.php use the old syntax. I've read and re-read the actual documentation at http://www.boost.org/doc/libs/1_42_0/libs/spirit/doc/html/index.html and the examples therein. I know Joel is starting a compiler series on the blog http://boost-spirit.com/home/ but that hasn't gotten in full swing yet. Any other resources to see worked examples using some more sophisticated/involved aspects in the context of fully working applications?
Well, there is always the examples directory in Boost SVN: $BOOST_ROOT/libs/spirit/example containing a couple of more sophisticated things to look at. The tests directory adjacent to this contains a huge amount of small tests scrutinizing each and every technique we know of as well.
In addition, Joel and I will have a presentation about the progress we made with the compiler thing you mentioned at BoostCon next week. All of the material will be available right after the talk and all the related code is already in the examples directory in Boost SVN (trunk). We probably will start writing about this effort on the Spirit website after the conference.
I know this is not as much as we have for Spirit.Classic in the application repository, but we really hope to get there over time... Everything depends on what will get contributed by the people using Spirit!