Hadoop : how to start my first project

Hadoop : how to start my first project - mapreduce

I'm starting to work with Hadoop but I don't know where and how do it. I'm working on OS X and I follow some tutorial to install Hadoop, it's done and it's work but now I don't know what to do.
Is there an IDE to install (maybe eclipse)? I find some codes but nothing works and I don't know what I have to add in my project etc ...
Can you give me some informations or guide me to a complete tutorial ?

If you want to learn Hadoop framework then i recomend to just start with installing Cloudera QuickStart virtual machine on your OSX system provided your system has all the prerequisites:
http://www.cloudera.com/downloads/quickstart_vms/5-8.html
Cloudera QuickStart virtual machines include everything you need to try Hadoop, MapReduce, Hive, Pig, Impala, etc. and Eclipse IDE as well.
Above will do perfect if you are interested in perusing career as Hadoop Developer however, if you are interested in Hadoop systems administrator then follow the #Alvaro recommendation.
Then there is a intro to Hadoop and MapReduce course on Udacity would be a good start for beginners:
https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617
Hadoop: The Definitive Guide By Tom White could be a great comprehensive book to refer: http://shop.oreilly.com/product/0636920033448.do

I would recommend you install the Cloudera pseudo distributed example on a virtual machine, the latest LTS Ubuntu. That way, you don't messed up with your laptop and it would be a environment closer to anything you would do in production. Have you checked vagrantup.com?
When you have it installed, you could choose on work directly on Java or chose a framework like MrJob (python) to execute some custom programs.
Best,
Alvaro.

Related

Deploying a scikit learn pipeline on IBM DSX

How to deploy a trained scikit learn pipeline on IBM Data science experience? Can I do that from a Jupyter notebook?

Deployment of Scikit Learn Pipeline will be available in Watson Machine Learning. Today it is in close beta and only support deployment of Spark ML pipelines but we are adding support for Scikit Learn before summer.
More information here: https://console.ng.bluemix.net/docs/#services/PredictiveModeling/index.html

Yesterday I was at IBM DevConnect Hyderabad 2017, where I learned for the first time using Python Jupyter notebooks on DSX, that are by default available as a standard mode of using Python in IBM DSX platform.
By default, all notebooks come pre-installed with the common 20+ libraries necessary for data science experiments like scikit-learn, numpy,matplotlib, pandas etc. You can go to notebook info and check environment to get the full list of modules installed and versions.
You can try some of the samples from Github here to get started: github.com/IBMDevConnect17/DSX_HandsON
There is lot of learning available here with 4 courses here: https://www.ibm.com/developerworks/library/ba-1611data-science-fundamentals-learning-path-bdu-trs/index.html

Find here a tutorial to deploy Scikit Learn pipelines to Watson Machine Learning(WML): https://datascience.ibm.com/exchange/public/entry/view/acba02c8efecc5218b1d65ba9b8a5bbb
WML supports deployment of Scikit-Learn v0.17, find more information here: https://console.bluemix.net/docs/services/PredictiveModeling/pm_service_supported_frameworks.html#supported-machine-learning-frameworks

You can install packages by using the !pip command with user setting.
For example.
!pip install --user --upgrade sklearn

Step by Step install of wso2 EMM for Ubuntu

I've visited the wso2 website and the install instructions are very disjointed in that there is a lot of jumping around between pages. I've seen the following blog that seemed to streamline the instructions but it doesn't seem complete (plus it's out of date with the version it's installing) - https://maxmalm.se/blog/2014-06-17-installing-wso2-enterprise-mobility-manager-110
Has anyone seen step-by-step instructions on what needs to be done to completely setup wso2-EMM on a newly installed Ubuntu 14.04 virtual machine with just the O/S on it and none of the pre-reqs installed yet? The blog I mentioned above seems to give a lot of the necessary apt-get install bits but doesn't mention anything about a database (yet the wso2 has a whole section on installing and using a database).
Thank you.

To try out WSO2 EMM you will only need to have JDK 7 or 8 [1] installed as minimum to start off the server. WSO2 products are build to run with OOB database which is H2. So to get things started and play around, I suggest that you install java and then start the pack to get things going.
[1] https://docs.wso2.com/display/EMM201/Installing+on+Linux+or+OS+X

To getting started all you need is JDK installed in your machine and setting the Java related environment variables like PATH, JAVA_HOME. You might have to install the correct version of JDK for the particular version of the EMM.

Rethinkdb chef solo cookbook

Is there any RethinkDB chef solo cookbook that allows one to install latest rethinkdb on ubuntu 14.04 / AWS.
I tried couple options, however it didn't help.
https://github.com/vFense/rethinkdb-chef - how to install latest version?
https://github.com/sprij/rethinkdb-cookbook.git - source compilation takes hours
I would appreciate any help regarding this.
Thanks

Try the cookbook that is available from the community repository first:
https://supermarket.chef.io/cookbooks/rethinkdb
It claims to be integration tested on Ubuntu. If it doesn't work under chef-solo, then I'd advise you to switch to local mode chef client instead.
https://www.chef.io/blog/2013/10/31/chef-client-z-from-zero-to-chef-in-8-5-seconds/
PS
Also checkout Berkshelf for managing cookbook dependencies. It's a standard tool in the chefdk

I updated rethinkdb-chef to work with the latest version of RethinkDB as well as removed the network portion of the .kitchen.yml file. I validated that this does work on CentOS 6 and Ubuntu 14.04.
I still need to write tests as well as documentation. As per Marks answer, try to use the community supported version 1st. I created this cookbook, so that I can customize it as per my needs with vFense.

Migrate a python software easily to other hosts (running on ubuntu)

I have built a software running on an ubuntu 14.04, it runs on a python environment with a lot of dependencies (databases, flask, etc...).
I am trying to figure out the best way to have an easily and portable environment for migration. For example, if I want to install the software on another computer or maybe AWS...
I checked out some solutions and for now, Vagrant, Puppet, Chef look interesting but I am a bit confused.
Furthermore, the software requires high performances and it would be way too slow to run it on a VM.
What are your thoughts on that ? Basically an easy way to install a software with all his dependencies and being able to migrate the environment on whatever host I would need.
Thanks a lot

Installing open source web-based software?

new to working with Web Servers and despite my tedious Googling, I think I am missing some of the most general (obvious?) questions regarding how to install an open source web-based program.
I have a dedicated server running CentOS 6, 32GB of RAM, etc........ I used a SSH Client to install the prerequisites of PandoraFMS. Everything installed finE.
Now what, just upload all the open-source files onto the web server?? That's the part I am not understanding about the general process of installing an open source program using build files, do I just UPLOAD it all to my server, or am I missing something???

You use Yum from the command line. Here is a link to the documentation http://www.centos.org/docs/5/html/yum/sn-managing-packages.html. If you really want a linux box that is easy to use I recommend Ubuntu. Good Luck

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Hadoop : how to start my first project - mapreduce

Related

Deploying a scikit learn pipeline on IBM DSX

Step by Step install of wso2 EMM for Ubuntu

Rethinkdb chef solo cookbook

Migrate a python software easily to other hosts (running on ubuntu)

Installing open source web-based software?

Categories

Resources