Has anyone used dataflow programming in a real project with a mainstream language? - clojure

I am looking at using some Dataflow programming techniques in a clojure program but I am having difficulty in finding much information from projects using Java, C#, or other mainstream languages that have used such techniques in the real world. I would be grateful to hear if anyone has any expereinces they could share regarding this.

Here, we are! We've made... (quotation is from one of my older post):
We've designed and implemented a DF
server for our automation project
(dispatcher, component iterface, a
bunch of components, DF language, DF
compiler, UI). It is written in bare
C++, and runs on several Unix-like
systems (Linux x86, MIPS, avr32 etc.,
Mac OSX). It lacks several features,
e.g. sophisticated flow control,
complex thread control (there is only
a not too advanced component for it),
so it is just a prototype, even it
works. We're now working on a
full-featured server. We've learnt lot
during implementing and using the
prototype.
Also, we'll make a visual editor some
day.
There're dataflow systems wich don't even mention dataflow approach:
SynthEdit: http://www.synthedit.com/ - It's an audio related framework and component set for creating VST plugins
TinyOS: http://www.tinyos.net/ - It's an embedded operating system/framework
Digital synthetisers/samplers are dataflow systems, programmed - supposedly - in C or some parts in Assembly, check my answer to another post about some examples.
Quartz Composer, a graphic magic tool for Mac,
Blender has dataflow subsystem for image composing.
Writing a dataflow system is not rocket science. Here's my older post about the basics of dataflow framework.
The term dataflow is wide. There are realtime synchronous dataflow systems, like synthetisers and samplers, there are asynchronous ones, like our home aut. system (the system is in idle unless the user presses a button or a timer runs out), and there're even different architectures, like spreadsheets or make.
Wanna reading more about dataflow programming? Read J. Paul Morrison's site and book.

Pervasive DataRush is a framework for parallel dataflow programming for any JVM language, including Clojure.
Pervasive DataRush uses a dataflow architecture. The architecture implements a program that executes as a graph of computation nodes interconnected by dataflow queues. The nodes use the queues to share data. As the data is streaming, only data required by any active operation needs to be in memory at any given time, allowing very large data sets to be analyzed. Besides offering the potential for scaling to problems larger than available memory, dataflow graphs exploit multiple forms of parallelism.
Customers are using DataRush for big data analytics and data preparation (ETL).

We've made another one: a collaborative spreadsheet with MySQL/PHP backend and AJAX frontend. The software is in beta state, documentation is under construction.

Related

What level of control is required for google cloud ml

When using google cloud ML to train models:
The official examples https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tensorflowcore/trainer/task.py uses hooks, is_client, MonitoredTrainingSession and some other complexity.
Is this required for cloud ml or is using this example enough: https://github.com/amygdala/tensorflow-workshop/tree/master/workshop_sections/wide_n_deep?
The documentation is a bit limited in terms of best practices and optimisation, will GCP ML handle the client/worker mode or do we need to set devices e.g. replica_device_setter and so on?
CloudML Engine is largely agnostic to how you write your TensorFlow programs. You provide a Python program, and the service executes it for you, providing it with some environment variables you can use to perform distributed training (if necessary), e.g., task index, etc.
census/tensorflowcore demonstrates how to do things with the "core" TensorFlow library -- how to do everything "from scratch", including using replica_device_setters, MonitoredTrainingSessions, etc.. This may be necessary sometimes for ultimate flexibility, but can be tedious.
Alongside the census/tensorflowcore example, you'll also see a sample called census/estimator. This example is based on a higher level library, which unfortunately is in contrib and therefore does not yet have a fully stable API (expect lots of deprecation warnings, etc.). Expect it to stabilize in a future version of TensorFlow.
That particularly library (known as Estimators) is a higher level API that takes care of a lot of the dirty work for you. It will parse TF_CONFIG for you and setup the replica_device_setter as well as handle the MonitoredTrainingSession and necessary Hooks, while remaining fairly customizable.
This is the same library that the wide and deep example you pointed to is based on and they are fully supported on the service.

what is the Open Cloud Computing Interface?

I am going to develop a cloud application and in my research for state of the art tools in Cloud Computing i saw some references to OCCI (Open Cloud Computing Interface).
I was not able to find out an answer to the following questions
1)Is it easy to use this Interface ?
2)What programming languages does this interface Supports ?
3)Is this Interface mature enough?
Any information are well appreciated!
This question has been asked quite some time ago but, hopefully, the answer is still relevant.
Is it easy to use?
Depends on what you want. If you want to make your own implementation, then probably not. If you use one of the existing implementations (see bellow), then yes.
What programming languages does this interface Support?
We know about two implementations (libraries, CLI), which are for Ruby and Java. See:
https://wiki.egi.eu/wiki/rOCCI:ROCCI
https://github.com/EGI-FCTF/jOCCI-api
rOCCI (the first one) also as a server side (the rOCCI-server) that translates OCCI to propriatary cloud management platforms such as OpenNebula.
Is this Interface mature enough?
Yes, given that it is being used by real-world infrastructures. Among them, e.g., the EGI Federated Cloud. That said, the current OCCI specification (1.1) has a few shortcomings that will be addressed in version 1.2 (due in Autumn 2015), so that if someone is just starting a project, it is worth implementing with 1.2 already.
Many of your questions can be answered (positively, by the way!) by visiting the OCCI-WG home site at http://occi-wg.org and/or searching on "occi implementation".
Another recent and useful resource is the tutorials and workshop talks given at the recent Cloud Interoperability Week held simultaneously with events in Madrid and Santa Clara, part of the Cloud Plugfest hands-on developer training series:
Or generally at http://www.cloudplugfest.org/
The basic specs are published by the Open Grid Forum.
The Open Cloud Computing Interface (OCCI) is a set of specifications delivered through the Open Grid Forum, for cloud computing service providers. OCCI has a set of implementations that act as proofs of concept. It builds upon World Wide Web fundamentals by using the Representational State Transfer (REST) approach for interacting with services.

Continuous build infrastructure recommendations for primarily C++; GreenHills Integrity

I need your recommendations for continuous build products for a large (1-2MLOC) software development project. Characteristics:
ClearCase revision control
Approx 80% C++; 15% Java; 5% script or low-level
Compiles for Green Hills Integrity OS, but also some windows and JVM chunks
Mostly an embedded system; also includes some UI pieces and some development support (simulation tools, config tools, etc...)
Each notional "version" of the deliverable includes deployment images for a number of boards, UI machines, etc... (~10 separate images; 5 distinct operating systems)
Need to maintain/track many simultaneous versions which, notably, are built for a variety of different board support packages
Build cycle time is a major issue on the project, need support for whatever features help address this (mostly need to manage a large farm of build machines, I guess..)
Operates in a secure environment (this is a gov't program) (Edited to add: This is a classified program; outsourcing the build infrastructure is a non-starter.)
Interested in any best practices or peripheral guidance you might offer. The build automation issues is one of several overlapping best practices that appear to be missing on the program, but try to keep your answers focused on build infrastructure piece and observations directly related.
Cost is not the driving concern. Scalability and ease of retrofitting onto an existing infrastructure are key.
(Edited to address #Dan's comment. ;-)
From my experience with similar systems, there are approximately two parts to this problem:
A repeatable method for checking out sources, building the software, and testing it (if you want to do continual testing as well as building), using a small number of command-line invocations.
A means of calling these command lines on various servers in the build farm.
For the latter, we've been using BuildBot, which seems to work pretty well.
For the former, we have a homegrown solution that started out as a simple bash shell script and grew ... rather substantially. From experience, I'd suggest starting out in python rather than bash -- you'll spend far more code in handling setup and configuration than in actually invoking programs. (Also, it's probably easier to run it on Windows if you're doing that.)
The things I've found to be really key in our script's usefulness are:
Ironclad repeatability. We have a standard set of build tools, and the scripts start out by scrubbing environment variables. There are very few command-line options; everything goes into configuration files, and those go in version control.
Logging. We produce a log of every command that the build script executes.
Configuration file inheritance. Each variant of our software gets a configuration file, and those files can include more-general settings (which include even-more-general settings).
Extensibility. When we add a new source component, it's pretty easy to add a set of instructions for building that component (and the instructions can be arbitrary bash code). The "can be arbitrary code" part is probably key here; no way is a pre-existing product going to be able to do all of the quirky things that you need for a large complex real-world system.
You can get started with a reasonably simple script and let it grow organically as the need arises; honestly, although ours is a bit messy, I think we got a much more usable result that way than we would have with heavy top-down design.
Cost isn't an object? I've worked for GreenHills, and they've solved these issues for their in-house build/test farms. Ask them to do the same for you.
When I see emphasis on things like scalability and security in a build system, I start thinking that you might be a candidate for the enterprise class build systems / CI systems. Conveniently, it sounds like you can afford them as well. A year old SD Times article provides a basic breakdown between the enterprise and team level build tools.
My company makes AnthillPro and we've worked with a number of companies on large embedded projects as well as highly secure projects. IBM is probably the largest other player in the space with BuildForge.
AnthillPro puts some extra emphasis on what you do with the images in the minutes/hours/days post build (do you install them onto simulators / hardware and run automated tests? stage them? promote them?) but we also see folks using it for just build.

An example of an embedded project for a single person

I've been trying to wrap my head around embedded. Since I will be self-taught in this specific niche, I realize it will be harder to get a job in the field, so I'm hoping to add a completed project to my resume to prove to potential employers that I've done it and can do it again for them.
Can someone suggest a project that I can undertake as a single person and actually be able to finish, but at the same time not too simple that it doesn't prove anything? Something reasonable that I can aim for.
If you can substantiate your example with a project you worked on yourself, and mention how many people were involved, and how long it took to finish it, that would also help me gauge the difficulty of projects I see in general and rule out the ones that are probably too big for my capacity. It's very difficult to gauge the amount of work a project needs from my position.
You should take a look at the arduino. To quote their site:
Arduino is an open-source electronics prototyping platform based on flexible, easy-to-use hardware and software. It's intended for artists, designers, hobbyists, and anyone interested in creating interactive objects or environments.
There is a really handy playground listing a bunch of personal projects on the arduino, any one of which might fulfil your need to do some embedded development. You can also trawl around the internet (e.g. instructables) to find many other interesting arduino applications -- I particularly like the one building a fancy control system for an espresso machine, and, of course, there is the mandatory fart detecting chair that tweets its findings.
Being an arduino experimenter myself, I can attest to the simplicity and power of this device -- and the great fun you will have playing with it. If you want to get started quickly, I can recommend buying the starter kit from the very helpful people at oomlout.
Are you looking specifically at embedded software development, or are you interested in circuit board design as well?
If it's just software, then I would suggest getting hold of an ARM development board (Possibly the Philips LPC range - sparkfun have some nice ones) that you can program via a bootloader over usb and start hacking. Get one with a display and an ethernet port and you can build up to making some sort of network attached sensor (temperature, water level, object counter, etc). Start out little (turn on a LED from a button) and work your way up.
If you're also into the electronics side of things, I'd suggest something like an MP3 (or WAV) player and maybe stick to the AVR or PIC 8bit microcontrollers (AVR is used on the Arduino) as these are a little easier to deal with than ARM. Here you could start with a usb powered device that streams wav files from a PC serial port out to a pair of headphones, and build up to a battery powered board, feeding data to an MP3 decoder IC from an SD card.
Some things you may want to learn & demonstrate:
Understands the bounds of working with limited resources, including memory management (dynamic and/or static); resource management (locks, semaphores, mutex); multiple tasks (interrupts); and appropriate data structures
Ability to interface with other devices/ICs over various interconnects (analog & digital IO, serial bus (RS232, I2C, SPI))
Ability to sanely structure a program and segment the various modules without producing 'spaghetti' code
Ability to use source and integrate 3rd party libraries where appropriate (think FAT filesystem, or TCP/IP stack)
Misc Tips:
read and understand the datasheets (yes all of them)
code and test on the desktop where possible, but understand that there are differences and bugs will still creep through (this is where it helps to be using a tool-chain that is common with the desktop - GCC is good, but the tools are generally CLI)
use assert a lot - you can flash the line number of a failed assert using a single LED - this is invaluable
Most of all have fun - it still makes me smile when you first get a new component working (display, motor, sensor). Embedded makes the world go round :)

Cross-platform, open-source development framework that needs 3d graphics

I'm thinking of developing a game-like piece of software. It will probably require a bit of OpenGL, MIDI input, and math. I'd like to eventually sell the software, so it needs to be installable on PCs with different OSes. And I don't want to have to spend a lot of time on memory management and other low-level details.
My question is this: what language/framework would you use for such software?
You have got a lot of options my friend, here are just a few which allow you to use a high level language to develop.
Torque 3D http://www.garagegames.com/
I've used this a bit and can tell you its a pretty good solution. You can build you game logic in their TorqueScript. Using it also gets you the option to release on pretty much every major platform including consoles and the browser. The only snag is it does cost money, but is very affordable for indies.
Panda3d http://www.panda3d.org/
This a completely free open source engine. I provides a lot of functionality and also allows you to program your game logic in python. The platforms it supports is Linux/Mac/PC.
Mono http://mono-project.com/Main_Page
I have not played around with this too much, and am not sure how good their 3d is (it isn't known for it anyway). It does allow you to program in a number of high level languages (C# and python to name a few). It also allows you to deploy to a number of platforms including embeded devices and the iphone (MonoTouch).
I would check these out and see if any are a fit for your situation. If none are then there are a large number of other options out there.
I think the closest thing to what you're after is Java. It has decent support for OpenGL(JOGL) and a good standard library that works on most systems.
Despite what some people will tell you, Java isn't as fast as C, and this can rear it's head doubly so in a game. It is cross platform though, and you don't have to bother with all that tiresome memory-management.
I would use C# for scripting & the Unity 3D Engine. http://unity3d.com/
Unity has a reasonable licensing fee but it's also free to download and get started. Check the details for when a licensing fee is payable.
Anyways, Unity3D takes care of:
3D rendering
Memory management
Input
Audio & Video
Networking
Asset pipelines
Scripting through the Mono CLR (ie, you can use C#)
And has a great level/world editor
If you are willing to trade raw speed for ease of use then this is for you. We started a project intending to use Unity. Our project needed greater customization than we could get from the engine. We wanted source code and to run at 60fps so we upgraded. But I would still recommend it as a solid multi-platform multi-OS solution.