run a c++ application under spark cluster

run a c++ application under spark cluster - c++

I am working on my school project. there is a video duplication detection application wrote with c++. The application is designed to run on a single machine, and I would like to create a spark cluster and run that application under the cluster.
Is this possible? difficult?

Let me try to answer your question:
First, you should figure out that which format the c++ application has?
Is it source code or binary executable bin?
source code
You can implement the algorithm in java/scala, and make the most use of the cluster resouce, make you job much more quick that the single machine version.
if your time is limited, you can use gcc to compile your c++ source code, and follow the next method.
binary executable bin
Because java/scala bytecode(.class format) run on the jvm, not compatible with the native code on machine, which is determined by the combination of compiler and operating system.
In the case, the only choice is get a new process to execute the c++ executable bin, and get what you want through inter process communication, such as pipe.
In a word, you should get a new process on driver node to de-duplicate your data, and use spark engine to do the following parallel computing to accelerate you project.
and get new process is so easy in scala or java, please refer the doc:
Process

Related

Using DISM Api to Capture Image Programatically within Windows PE Environment

I've been going through the windows documentation for the Dism API with the goal of writing an exe in C++ (or whatever language can accomplish this) that can create a WIM image while running in Windows PE. I found a .NET Wrapper for the Dism API that seems like it might be useful for this purpose, but I'm unsure if a .NET app will successfully run in Windows PE. Overall, my problem is that I don't see a function that can create--and doesn't simply modify--a wim file.
If I didn't care about encapsulating this in an .exe file, the Dism documentation does show how to initially create a wim--which makes me curious why a similar function wouldn't exist within the api. Please advise if the simplest solution is to have my code call a function such as system() within the code.
To summarize, I'm looking for a way to create a wim file programmatically (called from executing an exe file) from within Windows PE.
As always, thank you for the help and advice.

I work on a project that works with DISM in WinPE quite a bit. We configure WinPE with all the .net packages as described here. Then WinPE can be configured to start an application.
I use c#, but you can do managed apps in c++ as I'm sure you know. I find putting c# code into WinPE substantially easier, but that's more a function of my experience, I suppose.
The main way we use to interact with DISM is run a command using System.Diagnostics.Process. The process runs in a separate thread, but the API is simple, and you can wait on (and/or timeout) your process for synchronization purposes. This just uses the DISM command line interface, although you can also use powershell cmdlets if you've added that package to your WinPE image. It may seem like a hacky "interface" from your app to DISM, but it works reliably, and you can keep the process window from showing up on the screen. This makes for a decent asynchronous platform for running bunches of windows imaging utilities, such as DISKPART, DISM and BCDEDIT.
The principal way you'd capture a new image is with DISM /Capture-Image. Sounds like you've already discovered this fact. Lots of options that are somewhat beyond the scope of this q/a, but I hope this gets you on a useful path.

Even though this post is a bit older, here is a possibly still relevant resource for you. Perhaps this one will help.
I've written a small GUI-based tool, project-named WIM-Backup, that uses the Windows Imaging Format (WIM) to create full backups of computer systems (operating system images) within WinPE and then restore them.
The application is hosted on GitHub, is open source, and is offered under the Apache 2.0 license.
In addition, the repository includes an illustrated step-by-step guide to help get it up and running.
Brief summary:
WIM-Backup always requires an external bootable media such as a USB flash drive.
From this drive WinPE is booted to perform a backup or restore to or from an external medium (e.g. a USB hard drive).
On the bootable USB flash drive the WinPE must be set up before (is documented illustrated in the readme).
After completion of the respective operation, a status message is displayed whether the operation was successful or failed.
After restoring a backup, you can boot normally from the destination drive.
Both the backup and restore process are relatively simple (not "rocket science").
To set up the solution, you need about 30 minutes time in the best case due to the necessary downloads (e .g. ADK)
Last but not least: it has a permissive license (non-proprietary) and is open source.
The project can be found here: WIM-Backup

TraCIScenarioManagerForker vs veins-launchd

I currently use TraCIScenarioManagerForker to spawn SUMO for each simulation, the "forker" method. However, the official VEINS documentation recommends launching the SUMO daemon separately using the veins-launchd script and then run simulations, the "launchd" method.
Using the forker method makes running simulations just a one command job since SUMO is killed when simulation ends. However, with the launchd method, one has to take care of setting up the SUMO daemon and killing it when simulation ends.
What are the advantages and disadvantages of each method? I'm trying to understand the recommended best practices when using VEINS.

Indeed, Veins 5.1 provides three (four, if you count an experimental one) ways of connecting a running OMNeT++ to SUMO:
assuming SUMO is already running and connecting there directly (TraCIScenarioManager)
running SUMO directly from the process - on Linux: as a fork, on Windows: as a process in the same context (TraCIScenarioManagerForker)
connecting to a Proxy (veins_launchd) that launches an isolated instance of SUMO for every client that connects to it (TraCIScenarioManagerLaunchd)
if you are feeling adventurous, the veins_libsumo fork of Veins offers a fourth option: including the SUMO engine directly in your OMNeT++ simulation and using it via method calls (instead of remote procedure calls via a network socket). Contrast, for example, TraCI based code vs. libsumo based code. This can be orders of magnitude faster with none of the drawbacks discussed below. At the time of writing (Mar 2021) this fork is just a proof of concept, though.
Each of these has unique benefits and drawbacks:
is the most flexible: you can connect to a long-running instance of SUMO which is only rolled backwards/forwards in time through the use of snapshots, connect multiple clients to a single instance, etc but
requires you to manually take care of running exactly as many instances of SUMO as you need at exactly the time when you need them
is very convenient, but
requires the simulation (as opposed to the person running the simulation) to "know" how to launch SUMO - so a simulation that works on one machine won't work on another because SUMO might be installed in a different path there etc.
launches SUMO in the directory of the simulation, so file output from multiple SUMO instances overwrites each other and file output is stored in the directory storing the simulation (which might be a slow or write protected disk, etc.)
results in both SUMO and OMNeT++ writing console output into what is potentially the same console window, requiring experience in telling one from the other when debugging crashes (and things get even more messy if one wants to debug SUMO)
does not suffer from any of these problems, but
requires the user to remember starting the proxy before starting the simulations

Isis2 in ns-3 and bridge tap

So I need to simulate Isis2 in ns-3. (I am also to modify Isis2 slightly, wrapping it with some C/C++ code since I need at least a quasi real-time mission-critical behavior)
Since I am far from having any of that implemented it would interesting to know if this is a suitable way of conduct. I need to specifically monitor the performance of the consensus during sporadic wifi (ad hoc) behavior.
Would it make sense to virtualize a machine for each instance of Isis2 and then use the tap bridge( model and analyze the traffic in the ns-3 channel?
(I also am to log the events on each instance; composing the various data into a unified presentation)

You need to start by building an Isis2 application program, and this would have to be done using C/CLI or C++/CLI. C++/CLI will be easier because the match with the Isis2 type system is closer. But as I type these words, I'm trying to remember whether Mono actually supports C++/CLI. If there isn't a Mono compiler for C++/CLI, you might be forced to use C# or IronPython. Basically, you have to work with what the compiler will support.
You'll build this and the library on your mono platform and should test it out, which you can do on any Linux system. Once you have it working, that's the thing you'll experiment with on NS/3. Notice that if you work on Windows, you would be able to use C++/CLI (for sure) and then can just make a Windows VM for NS3. So this would mean working on Windows, but not needing to learn C#.
This is because Isis2 is a library for group communication, multicast, file replication and sharing, DHTs and so forth and to access any particular functionality you need an application program to "drive" it. I wouldn't expect performance issues if you follow the recommendations in the video tutorials and the user manual; even for real-time uses the system is probably both fast enough and steady enough in its behavior.
Then yes, I would take a virtual machine with the needed binaries for Mono (Mono is loaded from DLLs so they need to be available at the right virtual file system locations) and your Isis2 test program and run that within NS3. I haven't tried this but don't see any reason it wouldn't work.
Keep in mind that the default timer settings for timeout and retransmission are very slow and tuned for running on Amazon AWS, inside a data center. So once you have this working, but before simulating your wifi setup, you may want to experiment with tuning the system to be more responsive in that setting. I'm thinking that ISIS_DEFAULTTIMEOUT will probably be way too long for you, and the RTDELAY setting may also be too long for you. Amazon AWS is a peculiar environment and what makes Isis2 stable in AWS might not be ideal in a Wifi setting with very different goals... but all of those parameters can be tuned by just setting the desired values in the Environment, which can be done in bash on the line that launches your test program, or using the bash "Export" command.

Writing a program which can open and use another program: (Audio program)

I've got a project going on, far away from complete, a stand alone audio mixer/effects processor. I plan to eventually, have all of my effects in stand alone program as VST, AU, and maybe TDM plugins.
I would like to be able to batch convert all the files in a project using an external sample rate converter. If not your choice of external converter, then just a specific program, R8 "brain free", or "R8 brain" pro, by Voxengo.
The second thing I would like to be able to do, is launch "Reaper", from within a project in my program, and have the files in a project, opened in reaper, and all of my effects plugins added with specific settings.
Is this even possible to do?

It depends on what level of automation interface the other programs offer. This could range from taking command line parameters to perform certain actions, to offering a sophisticated automation interface through a mechanism such as COM or OLE automation. It's a matter of checking what is offered by the software you plan to run from your own program.
The reaper documentation suggests it has quite a good API for automation purposes.

Is there a batch processing framework that could be used for C++ applications?

In my team we develop several command-line C++ applications that work together; they're currently run by hand in separate windows, and after a while managing windows gets really confusing.
I'm looking for a better way to manage the processing of these applications; ideally it would have a GUI with the ability to do the following:
start applications in a specified order
display application status
close particular applications
Is there anything available that does this type of thing or would we need to develop our own? Is there a better way than this?

I run a basic batch scheduler on Windows XP by scheduling a Windows command file to execute at a certain time, and putting the various batch processes in the Windows command file.
I did a Google search on "c++ batch scheduling" (without the quotes) and came up with a few candidates.
Enterprise Batch Scheduling
Vizant Software
Skybot Software

Batch processing, there's either bash if you're on Linux, or Windows Scripting Host (or Powershell) if you're on Windows.
Neither has a GUI, but they can both be scripted to call your apps as you like.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js