how to use hadoop map/reduce with nodejs server? - c++

I've a web application with Nodejs server and HTML client.
I have server integrated with many c++ algorithms. To reduce server loading and for high performance, I wanna distribute my algorithms in parallel from server.
I'm a newbie to Hadoop and its Map/Reduce programming concept.
Question:
Shall I use clustering for this architecture?
Is this happens with map reducing?

You are mixing up:
Clustering, as in data analysis ("cluster analysis", but that is hard to pronounce)
Clustering, as in load balancing (this would be easy to pronounce and precisey but not as cool as "clustering")
Make sure to distinguish these two.

Related

django deployment with java and c++

I have created a django app that contains c++ for some of the views as well as a java library. How would I deploy this app? What kind of hosting service allows for multiple languages? I have looked at EC2, GAE, and several platforms (like heroku) but I can't seem to find a definitive solution.
I have never deployed anything to the web so a simple explanation would be much appreciated.
PaaS stuff is probably not your best bet. If you want the scalability and associated buzzwords(muh 99.9999999999% availability because my servers are hosted in a parallel dimension without electrical storms, power outages, hurricanes, earthquakes, or nuclear holocausts) that comes with hosting your application on a huge web company's platform, check out IaaS(Infrastructure as a service) systems like Google's Compute Engine or AWS. With these you just get a virtual server (or servers), running your Linux distro of choice, and you can install and run whatever you please on them without being constrained to a specific platform like App Engine or Heroku(where you have to basically write your app to specifically run on that platform). If you plan on consuming a ton of bandwidth/resources from the get-go, you will almost certainly get a better deal using a dedicated server(s) from a small company.
Interested in what specifically you are executing C++ for in a Django view. Image/video processing?
Well. Deployment is not really something where a simple explanation helps much.
First I would check what the requirements to the operating system are (compilers, dependencies,…). That will maybe reduce the options quickly.
I guess that with a setup containing C++ & Java artifacts, the usual PaaS (GaE, Heroku,…) offerings will not be sufficient because they define the stack. And a mixture of Python/C++/Java is rather uncommon I'd say.
Choosing an IaaS offering (EC2, …) may be an option. There you can run your whole self-defined stack and have the possibility of easier scaling.
Hosting the application on your own server(s) is also always possible. Check your data protection regulations to find out if it's not even a requirement.
There are a lot of ways to get the Django application to run. The Django documentation has some information about deployment. If you have certain special requirements, uwsgi may be a good application server.
You may also want a web server in front of the application. Possibilities range from using uwsgi's built-in http server or using e.g. Nginx with uwsgi.
All in all every component of the whole "deployment" has hundereds of bells and whistels and it's not easy to give advice without knowing specific requirements and properties of the system itself. You'll also probably need a database you have to deploy.
But before deploying it to the web, it's also important to have a solid build process to assemble all the parts. And not only on the development machine. With three languages involved this should be the first step solve. If it easily and automagically deploys in a development environment, moving it to a server is easier.

SQL API for *NIX C++

I am currently writing a client-server app for the iOS platform. The client is written in Obj-C, and the server uses C++ on OSX11.9. Since I intend to run the server software on an Ubuntu dedicated server, I am trying my best to keep the serverside code portable.
To store data about users and user-game-relations I intend to use an SQL database (most likely MySQL or possibly PostgreSQL since I'm familiar with those). I know that it is possible to read from/write to the database through a filedescriptor just like I do in my TCP module, but I wish to utilize a higher-level SQL communications API to make the programming process quicker.
Can anyone recommend me a good open source/free SQL API for *NIX C++? Any help would be appreciated. Thanks in advance!
You have several options here:
Use native database SDK. They are usually distributed along with the database installation or as separate downloads/packets. The upside is you can get maximum speed out of it. Downside is that you'll be limited by your initial choice - no switching afterwards without rewriting part of application.
Use a C++ ORM (example: ODB). This gives you DB independence along with some tasty features, at the cost of slightly reduced speed.
unixODBC supports both MySQL and PostgreSQL. Take a look at it.

Testing Distributed File System

I have been developing a robust distributed file system to be run on tcp/udp network.
I am writing the application in C++.
Currently I am looking for test framework that I can use for basic testing of the DFS.
I am assuming I have to write some sort of plugin for the test framework.
As, I don't have bunch of computing power(have two machines). Also, would like to know ideas on whether to use some sort of simulator or buy some hardware for testing. Currently I am thinking about putting multiple VM's on my machines to create my test environment.
Test framework should be agnostic to network protocol being used. I am assuming most are, but not sure.
Any addition suggestions regarding test environment/framework would be appreciated.
Using MIT's Star Cluster toolkit you can launch a cluster. Amazon provides free tier service, you can use that.

Alternative to POA CORBA in web services world

I have been told that CORBA programming is not modern and that I should
use newer technologies. OK ...
But what I appreciated in the CORBA world was the POA (despite its complexity)
because it was very flexible and gave me the opportunity to choose
adequate policies to my distributed objects.
Are there things similar to POA in the WEB Services world ? or should I code
it myself ?
Thanks for your replies !
Batches are a new approach to relational database access, remote procedure calls, and web services.
A Remote Batch statement combines remote and local execution: all the remote code is executed in a single round-trip to the server, where all data sent to the server and results from the batch are communicated in bulk. RBI supports remote blocks, iteration and conditionals, and local handling of remote exceptions. RBI is efficient even for fine-grained interfaces, eliminating the need for hand-optimized server interfaces.
Batch services also provide a simple and powerful interface to relational databases, with support for arbitrary nested queries and bulk updates. One important property of the system is that a single batch statement always generates a constant number of SQL queries, no matter how many nested loops are used.

Using Amazon MapReduce

How does this work exactly... if I have a data mining system built in php, how would it work differently on MapReduce than it would on a simple server? Is it the mere fact that there's more than 1 server doing the processing?
If your code is made to partition work between multiple processes already, then MapReduce only adds the ability to split work among additional servers.