Is it possible in Akka to create a Flow from a Source=>Source function, as with RxJava/Reactor compose? - akka

I still have not found a easy way to do a "filtering" Flow in AkkaStreams. Doing a "mapping" flow is clear to me, with fromFunction, but doing a filtering one is not. In RxJava/Reactor there is a compose operator on Flowable/Observable that takes a function from a Flowable to another Flowable, so then a transformation can be described as a chain of operators, and of course the filter operator on a Source is what I need for a filtering Flow, but it is unclear to me how to define a filtering Flow, though clearly it is easy for me how to filter a Source of course.
Please advise

// Filter elements which are even (use the modulo operator: `%`)
def filterEvenValues: Flow[Int, Int, NotUsed] =
Flow[Int].filter(number => number % 2 == 0)

Related

terraform "element" and "concat" used together

we have a module that builds a security proxy that hosts an elasticsearch site using terraform. In its code there is this;
elastic_search_endpoint = "${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)}"
which as I understand, then goes and finds the es_cluster module and gets the elasticsearch endpoint that was outputted from that. This then allows the proxy to have this endpoint available so it can run elasticsearch.
But I don't actually understand what this piece of code is doing and why the 'element' and 'concat' functions are there. Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Let's break this up and see what each part does.
It's not shown in the example, but I'm going to assume that module.es_cluster.elasticsearch_endpoint is an output value that is a list of eitehr zero or one ElasticSearch endpoints, presumably because that module allows disabling the generation of an ElasticSearch endpoint.
If so, that means that module.es_cluster.elasticsearch_endpoint would either be [] (empty list) or ["es.example.com"].
Let's consider the case where it's a one-element list first: concat(module.es_cluster.elasticsearch_endpoint, list("")) in that case will produce the list ["es.example.com", ""]. Then element(..., 0) will take the first element, giving "es.example.com" as the final result.
In the empty-list case, concat(module.es_cluster.elasticsearch_endpoint, list("")) produces the list [""]. Then element(..., 0) will take the first element, giving "" as the final result.
Given all of this, it seems like the intent of this expression is to either return the one ElasticSearch endpoint, if available, or to return an empty string as a placeholder if not.
I expect this is written this specific way because it was targeting an earlier version of the Terraform language which had fewer features. A different way to write this expression in current Terraform (v0.14 is current as of my writing this) would be:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : ""
)
It's awkward that this includes the full output reference twice though. That might be justification for using the concat approach even in modern Terraform, although arguably the intent wouldn't be so clear to a future reader:
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, "")[0]
)
Modern Terraform also includes the possibility of null values, so if I were writing a module like yours today I'd probably prefer to return a null rather than an empty string, in order to be clearer that it's representing the absense of a value:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : null
)
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, null)[0]
)
First things first: who wrote that code? Why is not documented? Ask the guy!
Just from that code... There's not much to do. I'd say that since concat expects two lists, module.es_cluster.elasticsearch_endpoint is a list(string). Also, depending on some variables, it might be empty. Concatenating an empty string will ensure that there's something at 0 position
So the whole ${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)} could be translated to length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint[0] : "" (which IMHO is much readable)
Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Probably because elastic_search_endpoint is an string and, as mentioned before, module.es_cluster.elasticsearch_endpoint is a list(string). You should provide a default value in case the list is empty

About autograd in pyorch, Adding new user-defined layers, how should I make its parameters update?

everyone !
My demand is a optical-flow-generating problem. I have two raw images and a optical flow data as ground truth, now my algorithm is to generate optical flow using raw images, and the euclidean distance between generating optical flow and ground truth could be defined as a loss value, so it can implement a backpropagation to update parameters.
I take it as a regression problem, and I have to ideas now:
I can set every parameters as (required_grad = true), and compute a loss, then I can loss.backward() to acquire the gradient, but I don’t know how to add these parameters in optimizer to update those.
I write my algorithm as a model. If I design a “custom” model, I can initilize several layers such as nn.Con2d(), nn.Linear() in def init() and I can update parameters in methods like (torch.optim.Adam(model.parameters())), but if I define new layers by myself, how should I add this layer’s parameters in updating parameter collection???
This problem has confused me several days. Are there any good methods to update user-defined parameters? I would be very grateful if you could give me some advice!
Tensor values have their gradients calculated if they
Have requires_grad == True
Are used to compute some value (usually loss) on which you call .backward().
The gradients will then be accumulated in their .grad parameter. You can manually use them in order to perform arbitrary computation (including optimization). The predefined optimizers accept an iterable of parameters and model.parameters() does just that - it returns an iterable of parameters. If you have some custom "free-floating" parameters you can pass them as
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(my_params)
and you can also merge them with the other parameter iterables like below:
model_params = list(model.parameters())
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(model_params + my_params)
In practice however, you can usually structure your code to avoid that. There's the nn.Parameter class which wraps tensors. All subclasses of nn.Module have their __setattr__ overridden so that whenever you assign an instance of nn.Parameter as its property, it will become a part of Module's .parameters() iterable. In other words
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.my_param_1 = nn.Parameter(torch.tensor(...))
self.my_param_2 = nn.Parameter(torch.tensor(...))
will allow you to write
module = MyModule()
optim = torch.optim.Adam(module.parameters())
and have the optim update module.my_param_1 and module.my_param_2. This is the preferred way to go, since it helps keep your code more structured
You won't have to manually include all your parameters when creating the optimizer
You can call module.zero_grad() and zero out the gradient on all its children nn.Parameters.
You can call methods such as module.cuda() or module.double() which, again, work on all children nn.Parameters instead of requiring to manually iterate through them.

Pass Parameter to GCMLE Prediction Graph

Ror my ML Engine Prediction Graph, I have a part of the graph which takes a long time to compute and is not always necessary. Is there a way to create a boolean flag that will skip over this section of the graph? I would like to pass this flag when creation a batch predict job or an online prediction. For example, it would be something like:
gcloud ml-engine predict --model $MODEL --version $VERSION --json-instance $JSON_INSTANCES --boolean_flag $BOOLEAN_FLAG
In the example above, I would either pass True/False as the $BOOLEAN_FLAG and then this would determine whether a part of the prediction graph is evaluated. I would imagine that this flag could also be passed in the body of the batch prediction job, just like model/version are. Is this at all possible?
I know that I could add a new input field to the prediction request that is True/False for each element in the batch and just pass that as False when I don't want to obtain the prediction, but I'm curious if there is a way to do this with just a single parameter.
This is not currently possible. We'd like to hear more about your requirements for this feature. Please reach out to us at cloudml-feedback#google.com
How about adding two different export signatures, each with a different head? Then you can deploy to two different endpoints? Choose the url to call depending on whether you want full or partial.
Write two serving input functions, one for each case. In the first case, set the flag to zero, and in the second case, set the flag to one. The reason to use ones_like and zeros_like is to ensure that you have a batch of zeros and ones:
def case1_serving_input_fn():
feature_placeholders = ...
features = ...
features['myflag'] = tf.zeros_like(features['other'])
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
def case2_serving_input_fn():
feature_placeholders = ...
features = ...
features['myflag'] = tf.ones_like(features['other'])
return tf.estimator.export.ServingInputReceiver(features, feature_placeholders)
In your train_and_evaluate function, have two exporters:
def train_and_evaluate(output_dir, nsteps):
...
exporter1 = tf.estimator.LatestExporter('case1', case1_serving_input_fn)
exporter2 = tf.estimator.LatestExporter('case2', case2_serving_input_fn)
eval_spec=tf.estimator.EvalSpec(
input_fn = make_input_fn(eval_df, 1),
exporters = [exporter1, exporter2] )
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

Weka Resample to balance instances in binary dataset

I've only been using Weka for a couple of weeks but I am absolutely blown away by how great it is!
But I have a question, I have a dataset with a target column which is either True or False.
6709 instances in my dataset are True
25318 instances are False.
I want to randomly add duplicates of my True instances to produce a new dataset with 25318 True and 25318 False.
The only filter I can find which does this is the supervised Resample filter however I am having trouble understanding what parameters I should use.
(there might be a better filter to do what I want)
I've got some success with these parameters
biasToUniformClass = 1.0
invertSelection = False
noReplacement = False
randomSeed = 1
sampleSizePercent = 157.5 (a magic number I've arrived at by trial and error)
This produces 25277 True and 25165 False. Not exactly what I want, but quite close.
The problem is that I cant figure out how to arrive at the magic number. I'm also not getting exactly the numbers of instances that I really want.
Is there a better filter for this purpose?
If not, is there a way to calculate the sampleSizePercent magic number?
Any help is greatly appreciated :)
Supplemental question, am I best to run NominalToBinary on my boolean columns to ensure they are Binary? I'm using a NaiveBayes classifier (at the moment) and I don't have any missing instances.
Jason
I think the tricky part of this question is getting a perfect balance using the Resample Filter. This is because, as it is stated in the description, it 'Produces a random sub-sample of a dataset using either sampling with replacement or without replacement'. If these cases are being drawn randomly, there is no guarantee that you will get an equal measure between the two classes.
As for the magic number, this would be associated with the total number of cases that you would like to have when the filter is applied. In your case, it would be 50636 instead of 32027. In this case, your magic number would be 50636 / 32027 = 1.581. However, as stated above, you may not get an exact match of true and false cases.
If you really need an exact figure, you could use your favourite spreadsheet and preprocess the data. One possible method is to randomise the true cases (in a separate column), sort and copy all of the cases until the number matches the false one. It's not an automated solution, and the solution is outside of Weka, but I have used this method before and does the job reasonably quickly.
Hope this Helps!

How to detect and delete noise in rapidminer?

I am new in rapid miner 5, just want to know how to find noise in my data and show them in chart and how to delete them?
A complex problem because it depends what you mean by noise.
If you mean finding individual attributes whose values are plain wrong then you could plot a histogram view and work out some sort of limits on what constitutes a valid value. You could then impose that rule by using Filter Examples to remove them.
If you mean finding attributes that have some sort of random jitter applied to them it would be difficult to detect these. Only by knowing beforehand what the expected shape of the distribution is could you compare with observation and do something about it. However, the action to take is by no means obvious.
If you mean finding examples within an example set that are obviously different from other examples then you could consider using the various outlier functions. The simplest one to get started is Detect Outlier (Distances). This finds a set number of outliers (default 10) based on a distance calculation that uses all the attributes for examples. It creates a new attribute called outlier that is set to true or false. You could then use the Filter Examples operator to remove those that are set to true.
Hope that helps at least as a start.