referencing typed element of tuple - tuples

Discovering late (used to define classes instead) TUPLES and looking through the documentation I was wondering if there is a mechanism to get the right type of a given TUPLE.
The goal is both anchoring its types and avoid having to test its type before getting an item.
Is there a language mechanism?
I also found few documentation about them, maybe I'm not looking at the right place.
For the following code I'd like to tell like tuple_items.types[1] and tuple_items.typed_item (1)
use_it
do
if attached {STRING} tuple_items.item (1) as l_tuple_item_1 then
io.put_string (l_tuple_item_1)
end
if attached {DATE} tuple_items.item (1) as l_tuple_item_2 then
io.put_string (l_tuple_item_2.out)
end
- ...
end
tuple_items: TUPLE[STRING, DATE, INTEGER]
local
l_first: STRING -- like tuple_items.first?
do
Result := [l_first, create {DATE}.make_now, 1]
end

A named tuple enables accessing the tuple items by name:
tuple_items: TUPLE [name: STRING; date: DATE; quantity: INTEGER]
Then, the feature use_it simplifies to
io.put_string (tuple_items.name)
io.put_string (tuple_items.date.out)
At the moment, tuple item names cannot be used in an anchored type, so there is no way to specify the type of l_first relative to the type of the tuple item. A workaround might be adding an anchor
name_anchor: STRING
require
is_callable: False
do
check from_precondition: False then end
end
and using it in anchored types, including the tuple declaration:
tuple_items: TUPLE [name: like name_anchor; date: DATE; quantity: INTEGER]
local
l_first: like name_anchor

You may also use the anchor another way:
name_anchor: detachable STRING
Then the named-TUPLE reference changes slightly to:
tuple_items: TUPLE [name: attached like name_anchor]
If you want to ensure that name_anchor never becomes attached, you can add a class invariant:
invariant type_anchor_only: not attached name_anchor
This will prevent anything from trying to attach an object to your anchor. The downside is that this prevention code in the invariant is perhaps far removed from the location of the name_anchor feature in your code, which means another programmer may not easily pick up on the reason for the invariant. Hopefully, the tag on the contract helps to tell that story.

Related

terraform "element" and "concat" used together

we have a module that builds a security proxy that hosts an elasticsearch site using terraform. In its code there is this;
elastic_search_endpoint = "${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)}"
which as I understand, then goes and finds the es_cluster module and gets the elasticsearch endpoint that was outputted from that. This then allows the proxy to have this endpoint available so it can run elasticsearch.
But I don't actually understand what this piece of code is doing and why the 'element' and 'concat' functions are there. Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Let's break this up and see what each part does.
It's not shown in the example, but I'm going to assume that module.es_cluster.elasticsearch_endpoint is an output value that is a list of eitehr zero or one ElasticSearch endpoints, presumably because that module allows disabling the generation of an ElasticSearch endpoint.
If so, that means that module.es_cluster.elasticsearch_endpoint would either be [] (empty list) or ["es.example.com"].
Let's consider the case where it's a one-element list first: concat(module.es_cluster.elasticsearch_endpoint, list("")) in that case will produce the list ["es.example.com", ""]. Then element(..., 0) will take the first element, giving "es.example.com" as the final result.
In the empty-list case, concat(module.es_cluster.elasticsearch_endpoint, list("")) produces the list [""]. Then element(..., 0) will take the first element, giving "" as the final result.
Given all of this, it seems like the intent of this expression is to either return the one ElasticSearch endpoint, if available, or to return an empty string as a placeholder if not.
I expect this is written this specific way because it was targeting an earlier version of the Terraform language which had fewer features. A different way to write this expression in current Terraform (v0.14 is current as of my writing this) would be:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : ""
)
It's awkward that this includes the full output reference twice though. That might be justification for using the concat approach even in modern Terraform, although arguably the intent wouldn't be so clear to a future reader:
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, "")[0]
)
Modern Terraform also includes the possibility of null values, so if I were writing a module like yours today I'd probably prefer to return a null rather than an empty string, in order to be clearer that it's representing the absense of a value:
elastic_search_endpoint = (
length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint : null
)
elastic_search_endpoint = (
concat(module.es_cluster.elasticsearch_endpoint, null)[0]
)
First things first: who wrote that code? Why is not documented? Ask the guy!
Just from that code... There's not much to do. I'd say that since concat expects two lists, module.es_cluster.elasticsearch_endpoint is a list(string). Also, depending on some variables, it might be empty. Concatenating an empty string will ensure that there's something at 0 position
So the whole ${element(concat(module.es_cluster.elasticsearch_endpoint, list("")),0)} could be translated to length(module.es_cluster.elasticsearch_endpoint) > 0 ? module.es_cluster.elasticsearch_endpoint[0] : "" (which IMHO is much readable)
Why can't it just be like this?
elastic_search_endpoint = "${module.es_cluster.elasticsearch_endpoint}"
Probably because elastic_search_endpoint is an string and, as mentioned before, module.es_cluster.elasticsearch_endpoint is a list(string). You should provide a default value in case the list is empty

Store Text into an Scalar within If / Else Statement - Robot Framework

I want to accomplish the following into Robot Framework
I want to store a Text from an Element xpath://*[#id="plsAttachGrid_0_1"]into a Scalar ${name3}, however, the Element is not always available on the web page, therefore, I need to run a IF/ELSE statement in order to avoid an Error when the Element is not available
My idea was the following:
${countt2}= Get Element Count xpath://*[#id="plsAttachGrid_0_1"]
Run Keyword If ${countt2}>0
${name3}= Get Text xpath://*[#id="plsAttachGrid_0_1"]
ELSE LOG NO_FILE
However, I have learnt It is not possible to store into a Scalar inside Run Keyword If, since it is expecting a Keyword.
Therefore, how can I accomplish it?
A keyword inside Run Keyword If can return a value that propagates up - e.g. can be assigned to a variable placed before the Run Keyword If; so in your case:
${name3}= Run Keyword If ${countt2}>0 Get Text xpath://*[#id="plsAttachGrid_0_1"]
So now ${name3} will have the return value of Get Text when ${countt2}>0.
What will its value be if the condition is not met? It'll be None (the data type, not a string), as the variable is now declared, but not explicitly defined.
You can set it to a different value in this case, to handle it easier later in the code:
${name3}= Run Keyword If ${countt2}>0 Get Text xpath://*[#id="plsAttachGrid_0_1"]
... ELSE Set Variable not_set # or ${0}, or any other value you need for ${countt2}<=0

What are the ways of Key-Value extraction from unstructured text?

I'm trying to figure out what are the ways (and which of them the best one) of extraction of Values for predefined Keys in the unstructured text?
Input:
The doctor prescribed me a drug called favipiravir.
His name is Yury.
Ilya has already told me about that.
The weather is cold today.
I am taking a medicine called nazivin.
Key list: ['drug', 'name', 'weather']
Output:
['drug=favipiravir', 'drug=nazivin', 'name=Yury', 'weather=cold']
So, as you can see, in the 3d sentence there is no explicit key 'name' and therefore no value extracted (I think there is the difference with NER). At the same time, 'drug' and 'medicine' are synonyms and we should treat 'medicine' as 'drug' key and extract the value also.
And the next question, what if the key set will be mutable?
Should I use as a base regexp approach because of predefined Keys or there is a way to implement it with supervised learning/NN? (but in this case how to deal with mutable keys?)
You can use a parser to tag words. Your problem is similar to Named Entity Recognition (NER). A lot of libraries, like NLTK in Python, have POS taggers available. You can try those. They are generally trained to identify names, locations, etc. Depending on the type of words you need, you may need to train the parser. So you'll need some labeled data also. Check out this link:
https://nlp.stanford.edu/software/CRF-NER.html

Tuples in MDX SSAS OLAP queries (SS management studio)

I have a question about MDX tuples, I would like to gain some insight on something that seems confusing to me.
Most of the literature I have read talks about tuples being a set of co-ordinates essentially pointing to a cell which contains a measure value. From what I understand a tuple is defined as containing only one distinct member from each dimension. Typically when writing queries we don't specify every member for every dimension we let SSAS engine use the default members and aggregate the measure data accordingly.
Straight out of the adventure works sample OLAP database (cube) "adventure works"
A super simple query that I understand represents a tuple:
SELECT
([Date].[Calendar Quarter of Year].&[CY Q3],[Measures].[Sales Amount]) --Tuple
ON COLUMNS
FROM [Adventure Works]
SS Management studio returns this result
No problem here the tuple specified by the &[CY Q3] member point to the cell containing the displayed measure amount. Clearly a tuple has been returned.
Typically though I use this sort of thing more often:
select
non empty ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) --Tuple??
ON COLUMNS
FROM [Adventure Works]
Which returns all the quarter totals across all years for said measure (not a great example but it's just an example):
I see this result as a set because more than one distinct member has been returned from the same dimension (date). In fact, by default all members are being returned if so how can it be a tuple?
So my question is this. The parenthesis around the "tuple" in the query above, indicate to me that I'm selecting a tuple, the query engine processes and a result is returned that to me looks like a set, not because more than one cell value is returned but because more than one member from the date dimension has been used.
The query indicates that a tuple is being selected, and the query engine seems to accept it as one however the result set, includes multiple members from the same dimension and corresponding cell values indicating to me that more than one tuple will be returned --> set.
Also, The query engine throws no error when I treat it as a set and use set functions on it:
select
nonempty({([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount])}) --Set
ON COLUMNS
FROM [Adventure Works]
My question is this, Assuming I am correct and that the results do in fact represent a set (a set of tuples denoted by each distinct member instance), why does the query engine allow you to specify parenthesis indicating selection of a tuple to return something that is not a tuple?
This makes more sense to me :
SELECT
nonemptycrossjoin(
{[Date].[Calendar].[Calendar Quarter]}, --Set 1
{[Measures].[Sales Amount]} --Set 2
)
ON COLUMNS
FROM [Adventure Works]
At least this code reflects the result set that's returned Thoughts?
Or is it all just Analysis Services semantics?
Thanks
Unfortunately, MDX is an ambiguous language. From what I've understood, the () notation is a tuple or an operator precedence notation. And:
{...} , {...}
is actually a crossjoin:
{...} * {...}
Then when you specify a " level " where a set is expected, the level.members is defaulted. When you specify a tuple where a set is expected a singleton set is created with this only tuple. So:
( [level], [measures].[amount] )
is equivalent to:
crossjoin( [level].members, { [measures].[amount] } )
The only tuple (a member is a tuple) specified is [Measures].[Amount] which by the way does not use the () notation ;-)
You mention that you typically use this syntax - but I write mdx nearly every day and never use this syntax -
SELECT
NON EMPTY ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) ON COLUMNS
FROM [Adventure Works];
I'm a little surprised it runs as it seems to be indicating to analysis services to create a tuple from a level and a measure member.
MarcP mentions that mdx is ambiguous, but I'm more in favour of saying it can be written ambiguously - it unfortunately fails quite graciously most of the time and you get no numbers returned or the wrong numbers - I wish it threw more errors and enforced tighter syntax rules as this might make it more understandable.
Your original script I would just use the * operator rather than typing out the full crossjoin function when you need it - in your script it is much more readable to move measures onto the rows and delete that tuple? Like Marc mentions SSAS's implicit use of the MEMBERS function - I find things more readable to include it explicitly when it is being used:
SELECT
NON EMPTY
[Date].[Calendar].[Calendar Quarter].MEMBERS ON 0,
[Measures].[Sales Amount] ON 1
FROM [Adventure Works];
[Date].[Calendar].[Calendar Quarter]
As Marc mentioned this is actually the same as [Date].[Calendar].[Calendar Quarter].MEMBERS in this case MSDN is your friend (and for mdx unlike some other ms languages I find msdn very good) - here is the definition of the MEMBERS function:
https://msdn.microsoft.com/en-us/library/ms144851.aspx
telling you the return type:
Returns the set of members in a dimension, level, or hierarchy.
nonemptycrossjoin
You mentioned nonemptycrossjoin - this is not needed any more - just a simple crossjoin via * is all that is needed with the last 2 or three versions of SSAS.

RapidMiner: Can I use a wildcard as an attribute value for training a decision tree model?

I am working on a fairly simple process in RapidMiner 5.3.013, which reads a CSV file and uses it as a training set to train the decision tree classifier. The result of the process is the model. A second CSV is read and used as the unlabeled set. The model (calculated earlier) is applied to the unlabeled test set, in an effort to label it properly.
Each line of the CSVs contains a few attributes, for example:
15, 0, 1555, abc*15, label1
but some lines of the training set may be like this:
15, 0, *, abc*15, label2
This is done because the third value may take various values, so the creator of the training set used a star as a wildcard in the place of the value.
What I would like to do is let the decision tree know that the star there means "match anything", so that it does not literally only match a star.
Notes:
the star in the 4th field (abc*15) should be matched literally and not as a wildcard.
if the 3rd field always contained stars, I could just not include it in the attributes, but that's not the case. Sometimes the 3rd field contains integer values, which should be matched literally.
I tried leaving the field blank, but it doesn't work
So, is there a way to use regular expressions, or at least a simple wildcard while training the classifier or using the model?
A different way to put it is: Can I instruct the classifier to not use some of the attributes in some of the entries (lines in the CSV)?
Thanks!
I would process the data so the missing value is valid in its own right and I would discretize the valid numbers to be in ranges.
In more detail, what I meant by missing is the situation where the value of an attribute is something like *. I would simply allow this to be one valid value that the attribute takes. For all the other values of this attribute, these are numerical so they need to be converted to a nominal value to be compatible with the now valid *.
It's fairly fiddly to do this and I haven't tried this but I would start with the operator Declare Missing Value to detect the * and make them missing. From there, I would use the operator Discretize by Binning to convert numbers into nominal values. Finally, I would use Replace Missing Values to change the missing values to a nominal value like Missing. You might ask why bother with the first Declare Missing step above? The reason is that it will allow the Discretizing operation to work because it will be working on numbers alone given that non-numbers are marked as missing.
The resulting example set then be passed to a model in the normal way. Obviously, the model has to be able to cope with nominal attributes (Decision trees does).
It occurred to me that some modelling operators are more tolerant of missing data. I think k-nearest-neighbours may be one. In this case, you could simply mark the missing ones as above and not bother with the discretizing step.
The whole area of missing data does need care because it's important to understand the source of missingness. If missing data is correlated with other attributes or with the label itself, handling it inappropriately can skew results.