Computing code metrics for emf eol - eclipse-emf

I would like to measure various code metrics (e.g. McCabe, Halstead) for .eol-scripts (Epsilon Object Language, for querying models).
I've already found and tried metrics for my modeling project, but it did not compute any metrics for .eol-files in my modeling project.
Eclipse Version is Luna (4.4.2).
Can anyone point me to a tool or into a direction, where I could find a tool that measures code metrics for EOL?

Sadly modelling languages are not popular enough to make it into the standard langauges that get support from metric anlaysis tools and it is often the case that you have to develop you own (or an extension if the metrics tool supports extensions e.g. via plugins).
Depending on the complexity of the metrics a simple scripts can be used to measure LOC and number of mappings, for example.
But for more complex metrics you definitley need to first do a static analysis of the EOL script and then compute the metrics.
Regarding the first part, static analysis, the Epsilon framework has recently been enhanced with an EOL static analysis tool! (early versions lacked this support). The tool is available here: Haetae. With it you can get the static analysis information of your script.
Regarding the second part I am no metrics expert but I guess once the static information is available it should no the hard to calculate them. Sadly a quick overview of the Eclipse metrics project you reference does not provide any information on how to add support for additional languages.

Related

What level of control is required for google cloud ml

When using google cloud ML to train models:
The official examples https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tensorflowcore/trainer/task.py uses hooks, is_client, MonitoredTrainingSession and some other complexity.
Is this required for cloud ml or is using this example enough: https://github.com/amygdala/tensorflow-workshop/tree/master/workshop_sections/wide_n_deep?
The documentation is a bit limited in terms of best practices and optimisation, will GCP ML handle the client/worker mode or do we need to set devices e.g. replica_device_setter and so on?
CloudML Engine is largely agnostic to how you write your TensorFlow programs. You provide a Python program, and the service executes it for you, providing it with some environment variables you can use to perform distributed training (if necessary), e.g., task index, etc.
census/tensorflowcore demonstrates how to do things with the "core" TensorFlow library -- how to do everything "from scratch", including using replica_device_setters, MonitoredTrainingSessions, etc.. This may be necessary sometimes for ultimate flexibility, but can be tedious.
Alongside the census/tensorflowcore example, you'll also see a sample called census/estimator. This example is based on a higher level library, which unfortunately is in contrib and therefore does not yet have a fully stable API (expect lots of deprecation warnings, etc.). Expect it to stabilize in a future version of TensorFlow.
That particularly library (known as Estimators) is a higher level API that takes care of a lot of the dirty work for you. It will parse TF_CONFIG for you and setup the replica_device_setter as well as handle the MonitoredTrainingSession and necessary Hooks, while remaining fairly customizable.
This is the same library that the wide and deep example you pointed to is based on and they are fully supported on the service.

Collecting performance and runtime metrics in cpp

I am aware that go language has an excellent support for collecting metric data. We can visualize the data in any manner we want using handy tools like go-metrics and influxDB.
I have a CPP application (similar to bitcoin) and looking for an easy way to collect some metrics. Can any CPP developer recommend anything you are aware of?

Amazon Machine Learning for sentiment analysis

How flexible or supportive is the Amazon Machine Learning platform for sentiment analysis and text analytics?
You can build a good machine learning model for sentiment analysis using Amazon ML.
Here is a link to a github project that is doing just that: https://github.com/awslabs/machine-learning-samples/tree/master/social-media
Since the Amazon ML supports supervised learning as well as text as input attribute, you need to get a sample of data that was tagged and build the model with it.
The tagging can be based on Mechanical Turk, like in the example above, or using interns ("the summer is coming") to do that tagging for you. The benefit of having your specific tagging is that you can put your logic into the model. For example, the difference between "The beer was cold" or "The steak was cold", where one is positive and one was negative, is something that a generic system will find hard to learn.
You can also try to play with some sample data, from the project above or from this Kaggle competition for sentiment analysis on movie reviews: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews. I used Amazon ML on that data set and got fairly good results rather easily and quickly.
Note that you can also use the Amazon ML to run real-time predictions based on the model that you are building, and you can use it to respond immediately to negative (or positive) input. See more here: http://docs.aws.amazon.com/machine-learning/latest/dg/interpreting_predictions.html
It is great for starting out. Highly recommend you explore this as an option. However, realize the limitations:
you'll want to build a pipeline because models are immutable--you have to build a new model to incorporate new training data (or new hyperparameters, for that matter)
you are drastically limited in the tweakability of your system
it only does supervised learning
the target variable can't be other text, only a number, boolean or categorical value
you can't export the model and import it into another system if you want--the model is a black box
Benefits:
you don't have to run any infrastructure
it integrates with AWS data sources well
the UX is nice
the algorithms are chosen for you, so you can quickly test and see if it is a fit for your problem space.

Getting the amplitude(or rms voltage) of audio signal captured in C++ by wavin lib.?

I am working on a very basic robotics project, and wish to implement voice recognition in it.
i know its a complex thing but i wish to do it for only 3 or 4 commands(or words).
i know that using wavin i can record audio. but i wish to do real-time amplitude analysis on the audio signal, how can that be done, the wave will be inputed as 8-bit, mono.
i have thought of divinding the signal into a set of some specific time, further diving it into smaller subsets, getting the average rms value over the subset and then summing them up and then see how much different they are from the actual stored signal.If the error is below accepted value for all(or most) of the sets, then print the word.
How can this be implemented?
if you can provide me any other suggestion also, it would be great.
Thanks, in advance.
There is no simple way to recognize words, because they are basically a sequence of phonemes which can vary in time and frequency.
Classical isolated word recognition systems use signal MFCC (cepstral coefficients) as input data, and try to recognize patterns using HMM (hidden markov models) or DTW (dynamic time warping) algorithms.
You will also need a silence detection module if you don't want a record button.
For instance Edimburgh University toolkit provides some of these tools (with good documentation).
If you don't want to build it "from scratch" or have a source of inspiration, here is an (old but free) implementation of such a system (which uses its own toolkit) with a full explanation and practical examples on how it works.
This system is a LVCSR (Large-Vocabulary Continuous Speech Recognition) and you only need a subset of it. If someone know an open source reduced vocabulary system (like a simple IVR) it would be welcome.
If you want to make a basic system from your own, I recommend you to use MFCC and DTW:
For each target word to modelize:
record some instances of the word
compute some (eg each 10ms) delta-MFCC through the word to have a model
When you want to recognize a signal:
compute some delta-MFCC of this signal
use DTW to compare these delta-MFCC to each modelized word's delta-MFCC
output the word that fits the best (use a threshold to drop garbage)
If you just want to recognize a few commands, there are many commercial and free products you can use. See Need text to speech and speech recognition tools for Linux or What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? or Speech Recognition on iPhone. The answers to these questions link to many available products and tools. Speech recognition and understanding of a list of commands is a very common problem solved commercially. Many of the voice automated phone systems you call uses this type of technology. The same technology is available for developers.
From watching these questions for few months, I've seen most developer choices break down like this:
Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx
Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/
Commercial products - Nuance, Loquendo, AT&T, others
Online service - Nuance, Yapme, others
Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

Continuous build infrastructure recommendations for primarily C++; GreenHills Integrity

I need your recommendations for continuous build products for a large (1-2MLOC) software development project. Characteristics:
ClearCase revision control
Approx 80% C++; 15% Java; 5% script or low-level
Compiles for Green Hills Integrity OS, but also some windows and JVM chunks
Mostly an embedded system; also includes some UI pieces and some development support (simulation tools, config tools, etc...)
Each notional "version" of the deliverable includes deployment images for a number of boards, UI machines, etc... (~10 separate images; 5 distinct operating systems)
Need to maintain/track many simultaneous versions which, notably, are built for a variety of different board support packages
Build cycle time is a major issue on the project, need support for whatever features help address this (mostly need to manage a large farm of build machines, I guess..)
Operates in a secure environment (this is a gov't program) (Edited to add: This is a classified program; outsourcing the build infrastructure is a non-starter.)
Interested in any best practices or peripheral guidance you might offer. The build automation issues is one of several overlapping best practices that appear to be missing on the program, but try to keep your answers focused on build infrastructure piece and observations directly related.
Cost is not the driving concern. Scalability and ease of retrofitting onto an existing infrastructure are key.
(Edited to address #Dan's comment. ;-)
From my experience with similar systems, there are approximately two parts to this problem:
A repeatable method for checking out sources, building the software, and testing it (if you want to do continual testing as well as building), using a small number of command-line invocations.
A means of calling these command lines on various servers in the build farm.
For the latter, we've been using BuildBot, which seems to work pretty well.
For the former, we have a homegrown solution that started out as a simple bash shell script and grew ... rather substantially. From experience, I'd suggest starting out in python rather than bash -- you'll spend far more code in handling setup and configuration than in actually invoking programs. (Also, it's probably easier to run it on Windows if you're doing that.)
The things I've found to be really key in our script's usefulness are:
Ironclad repeatability. We have a standard set of build tools, and the scripts start out by scrubbing environment variables. There are very few command-line options; everything goes into configuration files, and those go in version control.
Logging. We produce a log of every command that the build script executes.
Configuration file inheritance. Each variant of our software gets a configuration file, and those files can include more-general settings (which include even-more-general settings).
Extensibility. When we add a new source component, it's pretty easy to add a set of instructions for building that component (and the instructions can be arbitrary bash code). The "can be arbitrary code" part is probably key here; no way is a pre-existing product going to be able to do all of the quirky things that you need for a large complex real-world system.
You can get started with a reasonably simple script and let it grow organically as the need arises; honestly, although ours is a bit messy, I think we got a much more usable result that way than we would have with heavy top-down design.
Cost isn't an object? I've worked for GreenHills, and they've solved these issues for their in-house build/test farms. Ask them to do the same for you.
When I see emphasis on things like scalability and security in a build system, I start thinking that you might be a candidate for the enterprise class build systems / CI systems. Conveniently, it sounds like you can afford them as well. A year old SD Times article provides a basic breakdown between the enterprise and team level build tools.
My company makes AnthillPro and we've worked with a number of companies on large embedded projects as well as highly secure projects. IBM is probably the largest other player in the space with BuildForge.
AnthillPro puts some extra emphasis on what you do with the images in the minutes/hours/days post build (do you install them onto simulators / hardware and run automated tests? stage them? promote them?) but we also see folks using it for just build.