Cluster simulator for Hadoop MapReduce - mapreduce

I am looking for a cluster simulator to test my Hadoop MapReduce Java applications (Driver, Mappers, Reducers, ...).
There is something that could simulate the HDFS, the execution of the tasks of each virtual nodes?
I am not clearly interested in performances, I have a Intel Core i7 and 16GB RAM maybe that's enough to simulate a small cluster.

There are multiple ways to simulate your cluster.
Hortonworks or cloudera docker image , which you can spawn and connect to it in your code.
Run MR locally by setting
conf.set("fs.default.name", "file:///");
conf.set("mapred.job.tracker", "local");
See Integration test in : http://bytepadding.com/big-data/map-reduce/word-count-map-reduce/
Use hbase-testing-util which spawns service of HDFS , HBASE as threads and cab be run loacally.
http://bytepadding.com/big-data/hbase/hbase-readwrite-from-map-reduce/

Related

run gazebo on EC2

I'm trying to develop an robot application. But I don't have GPU locally. So I created an EC2 instance, and tried to run gazebo on it. However, it always fails with the following error. Maybe it's because I'm using VNC to connect to the server.
Dec 11 08:26:29 ip-172-31-33-33 kernel: [ 1633.096463] gzclient[3264]: segfault at 20 ip 00007fb23955d867 sp 00007ffc7b0e0820 error 4 in libOgreMain.so.1.9.0[7fb2391f7000+571000]
Did anyone do that before (running gazebo and ros on AWS and connect to it through remote desktop)? It it possible if I don't have local GPU and want to develop an ros application on cloud?
For robotics programming, including ROS1, ROS2, Gazibo, etc, AWS provides managed services called AWS RoboMaker:
AWS RoboMaker is a service that makes it easy to create robotics applications at scale. AWS RoboMaker extends the Robot Operating System (ROS) framework with cloud services.
It supports a number of robotics applications, such as Gazebo:
Gazebo (versions 9 and 11): Tool to simulate robots in an environment.

Running multiple schedulers in Airflow 2.0 on same machine

I understand that Airflow 2.0 now supports running multiple schedulers concurrently for HA. Can I run multiple schedulers on the same machine (VM)? Do I do it just by running airflow scheduler twice if I want 2 schedulers, without configuring anything else?
Can I run multiple schedulers on the same machine (VM)?
Yes, you can.The second scheduler can run with like this:
airflow scheduler --skip-serve-logs
Can I run multiple schedulers on the differnt machine (VM)?
Yes, you can.I do create second VM to run multi schedulers.
Check these Dependencies
Set use_row_level_locking config to True(default is True).
Check Your backend database's version.
Make Sure All scheduler's database config point to same databse.
After check these Dependencies.You can run multi schedulers on different machines.
After I start two schedulers on different VM, I run a task to check whether this task would be executed twice.Fortunately, only one scheduler get this task.
If you run only one webserver, you would loss some task's log when these task is executed on other scheduler machine. At this situation, you would need log collector such as elastic-search.

Speed up Chromedriver/Selenium in AWS EC2 instance

Hi i've developed a bot that automates the shopping process on a specific website. When testing it on my mac it works perfectly and can place an order quite fast. I have tried to run the script on an AWS EC2 instance using the free t2.micro tier with an Ubuntu instance.
The script runs fine and all the packages work but i've noticed the time it takes to open chrome in headless mode and finish the process is 5/6 times longer than when I run it on my local macbook. Ive tried all the suggested things in the chromedriver options to do with the proxy server but my EC2 instance still isn't fast enough.
Is it the small t2.micro free tier thats slowing me down or should i be using a different instance other than Ubuntu if I want to speed up my selenium script?
You're using an incredibly small machine, which is going to be much slower than the powerful machine you're running locally.

Changes to ignite cluster membership unexplainable

I am running a 12 node jvm ignite cluster. Eeach jvm runs on its own vmware node. I am using zookeeper to keep these ignite nodes in sync using tcp discovery. I have been seeing lot of node failures in zookeeper logs
although the java processes are running, I don't know why some ignite nodes leave the cluster with "node failed" kind of errors. Vmware uses vmotion to do something what they call as "migration".I am assuming that is some kind of filesystem sync process between vmware nodes.
I am also seeing pretty frequent "dumping pending object" and "Failed to wait for partition map exchange" kind of messages in the jvm logs for ignite.
My env setup is as follows:
Apache Ignite 1.9.0
RHEL 7.2 (Maipo) runs on each of the 12 nodes
Oracle Jdk1.8.
Zookeeper 3.4.9
Please let me know your thoughts.
TIA
There are generally two possible reasons:
Memory issues. For example, if a node goes to long GC pause, it can become unresponsive and therefore removed from topology. For more details read here: https://apacheignite.readme.io/docs/jvm-and-system-tuning
Network connectivity issues. Check if the network between your VMs is stable. You may also want to try increasing the failure detection timeout: https://apacheignite.readme.io/docs/cluster-config#failure-detection-timeout
VM Migrations sometimes involve suspending the VM. If the VM is suspended, it won't have a clean way to communicate with the rest of the cluster and will appear down.

Deployment of VM workers running an C++ executable on Azure

I am presented with a program developed in c++ that implements a compute intensive algorithm that uses mpi to achieve better results. The executable has been tested on a single VM with 16 cores on Azure and the results have been satisfying. Now we need to test its performance and its scalability on 64 or 128 cores. As far as I know , a single VM can't employ more than 16 cores , so I think I need to implement a set of VMs that will execute the computation, but I don't know where to start with the deployment and the communication among the VMs. Any guidance or ideas would be appreciated.