Using a cut-off node in SAS after a HP Model - sas

I am having the following problem. We have a SAS Enterprise miner project with several model. Some are "normal" and some are HP. We are using a cutoff node after the models. Now if the node is being used after a "normal" node than everything is find, but if we try to use it after a HP node, the cutoff node is not working anymore.
Anyone have seen this? Anyone has any idea?
Thanks in advance, Umberto

We have found a kind of solution. Our binary variables contained two strings. Once we changed the two strings to 0 and 1 the cutoff node started working. Now is strange since if we change the values to 'A1' and 'A2' (for example) it still works. Only for some specific strings is not working.
I would consider this a workout but maybe can save some time to others finding the same problem.

Related

Adding a node and edge to a graph using Gremlin behaving strange

I'm new to using Gremlin (up until now I was accessing Neptune using Opencypher and given up due to how slow it was) and I'm getting really confused over some stuff here.
Basically what I'm trying to do is -
Let us say we have some graph A-->B-->C. There are multiple such graphs in the database, so I'm looking for the specific A,B,C nodes that have the property 'idx' equals '1'. I want to add a node D{'idx' = '1'} and an edge so I will end up having
A-->B-->C-->D
It is safe to assume A,B,C already exist and are connected together.
Also, we wish to add D only if it doesn't already exist.
So what I currently have is this:
g.V().
hasLabel('A').has('idx', '1').
out().hasLabel('B').has('idx', '1').
out().hasLabel('C').has('idx', '1').as('c').
V().hasLabel('D').has('idx', '1').fold().
coalesce(
unfold(),
addV('D').property('idx','1')).as('d').
addE('TEST_EDGE').from('c').to('d')
now the problem is that well, this doesn't work and I don't understand Gremlin enough to understand why. This returns from Neptune as "An unexpected error has occurred in Neptune" with the code "InternalFailureException"
another thing to mention is that if the node D does exist, I don't get an error at all, and in fact th node is properly connected to the graph as it should.
furthermore, I've seen in a different post that using ".as('c')" shouldn't work since there is a 'fold' action afterwards which makes it unusable (for a reason I still don't understand, probably cause I'm not sure how this entire .as,.store,.aggregate work)
And suggests using ".aggregate('c')" instead, but doing so will change the returned error to "addE(TEST_EDGE) could not find a Vertex for from() - encountered: BulkSet". This, adding to the fact that the code I wrote actually works and connects node D to the graph if it already exists, makes me even more confused.
So I'm lost
Any help or clarification or explanation or simplification would be much appreciated
Thank you! :)
A few comments before getting to the query:
If the intent is to have multiple subgraphs of (A->B->C), then you may not want to use this labeling scheme. Labels are meant to be of lower variation - think of labels as groups of vertices of the same "type".
A lookup of a vertex by an ID is the fastest way to find a vertex in a TinkerPop-based graph database. Just be aware of that as you build your access patterns. Instead of doing something like `hasLabel('x').has('idx','y'), if both of those items combined make a unique vertex, you may also want to think of creating a composite ID of something like 'x-y' for that vertex for faster access/lookup.
On the query...
The first part of the query looks good. I think you have a good understanding of the imperative nature of Gremlin just up until you get to the second V() in the query. That V() is going to tell Neptune to start evaluating against all vertices in the graph again. But we want to continue evaluating beyond the 'C' vertex.
Unless you need to return an output in either case of existence or non-existence, you could get away with just doing the following without a coalesce() step:
g.V().
hasLabel('A').has('idx', '1').
out().hasLabel('B').has('idx', '1').
out().hasLabel('C').has('idx', '1').
where(not(out().hasLabel('D').has('idx','1'))).
addE('TEST_EDGE).to(
addV('D').property('idx','1'))
)
The where clause allows us to do the check for the non-existence of a downstream edge and vertex without losing our place in the traversal. It will only continue the traversal if the condition specified is not() found in this case. If it is not found, the traversal continues with where we left off (the 'C' vertex). So we can feed that 'C' vertex directly into an addE() step to create our new edge and new 'D' vertex.

ROS How to check parameters from previous nodes

Im working on rosparam and I have exercise here to have a node that prints out a number. I can change the number through params. Theres another condition in which the node can run multiple times unless it has different number from previous nodes. Any idea on how to check the parameter of the previous nodes?
ROS params are stored globally on the ros param server. This means that individual nodes don't really own the param value themselves. Instead you should just be pulling params normally with the correct namespace. You can see the difference in namespacing below
std::string global_name, relative_name, other_node_name;
ros::param::get("/global_name", global_name);
ros::param::get("relative_name", relative_name);
ros::param::get("/some_node/param_number", other_node_name);

What are hp.Discrete and hp.Realinterval? Can I include more values in hp.realinterval instead of just 2?

I am using Hyperparameter using HParams Dashboard in Tensorflow 2.0-beta0 as suggested here https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams
I am confused in step 1, I could not find any better explanation. My questions are related to following lines:
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
My question:
I want to try more dropout values instead of just two (0.1 and 0.2). If I write more values in it then it throws an error- 'maximum 2 arguments can be given'. I tried to look for documentation but could not find anything like from where these hp.Discrete and hp.RealInterval functions came.
Any help would be appreciated. Thank you!
Good question. They notebook tutorial lacks in many aspects. At any rate, here is how you do it at a certain resolution res
for dropout_rate in tf.linspace(
HP_DROPOUT.domain.min_value,
HP_DROPOUT.domain.max_value,
res,):
By looking at the implementation to me it really doesn't seem to be GridSearch but MonteCarlo/Random search (note: this is not 100% correct, please see my edit below)
So on every iteration a random float of that real interval is chosen
If you want GridSearch behavior just use "Discrete". That way you can even mix and match GridSearch with Random search, pretty cool!
Edit: 27th of July '22: (based on the comment of #dpoiesz)
Just to make it a little more clear, as it is sampled from the intervals, concrete values are returned. Therefore, those are added to the grid dimension and grid search is performed using those
RealInterval is a min, max tuple in which the hparam will pick a number up.
Here a link to the implementation for better understanding.
The thing is that as it is currently implemented it does not seems to have any difference in between the two except if you call the sample_uniform method.
Note that tf.linspace breaks the mentioned sample code when saving current value.
See https://github.com/tensorflow/tensorboard/issues/2348
In particular OscarVanL's comment about his quick&dirty workaround.

Use the function "mod" in the instructions "if" and "select case"

I wrote a little code in Fortran. But the code doesn't behave as I thought, and I can figure out where is the problem.
I will not put the code here because it has 1200 lines but here its philosophy:
I create a 3D grid represented by a four dimensional table (I stock a vector of 2 elements on each point of the grid, corresponding at the nature of the site and who is occupying the site). This grid represents what we call a crystal (where atoms can be found periodically)
When this grid is constructed, the code scans each point of this grid and it looks to the neighboring sites to count the different type of atoms or the vacancies.
For this last point, I use a triple imbricated loop which permit to explore the different sites and I check the different neighboring site using either the if or the select case instructions. As I want my grid to be periodic, I have the function mod in the argument of the if or the select case.
The problem is sometimes, It found a different element in a neighboring site that the actual element in this specific neighboring site. As an example:
In the two ouput files where all the coordinates are written with the
element type I have grid(0,0,1)=-1 (which correspond to a empty site).
But while the code is looking to the neighboring sites of grdi(0,0,1) It tells that there is actually an element indexed 2 in grid(0,0,1).
I look carefully to the block in the triple implemented loop, but it seems fine.
I would like to know if anyone has already meet this kind of problem, or know if there is some problems using mod in a if or select case argument ?
If some of you want to look closer, I can send you the code, with some explanations.
Arrays are usually dimensioned as:
REAL(KIND=8),DIMENSION(0:N) ::A
or
REAL(KIND=8),DIMENSION(N) :: A
In the later example, they are assumed to start at 1.
You could also go (-N:N) or (10:191)
If you use the compiler switch '-check bounds' or ;-check all' you will see if you are going outside the array/etc. This is not an uncommon thing to get hosed up, but the compiler will abort quickly when the dimension is outside.
Once it works then removed the -check bounds and/or -check all.
Thanks for your consideration francescalus and haraldkl.
It was not related to the dimension of arrays Holmz, but thank you to try to help
It seems I finally succeed to fix it. I will post an over answer If I fully understand why it was not working properly.
Apparently, it was related to the combination of a different argument order in a call procedure and the subroutine header + a declaration in the subroutine with intent(inout).
It was like the intent(inout) was masking the problem. But It a bit strange for me.
Some explanations about the code :
As I said, the code create a 3D grid where each intersection of the 3D grid correspond to a crystallographic site. I attribute a value at each site -1 for an empty site, 1 for a crystal atom (0 if there is a vacancy instead of a crystal atom), 2,3,4,5 for different impurities. Actually, the empty sites and the sites which received crystal atoms are not of the same type, that's why an empty site and a vacancy are distinguished. The impurities can only occupied the empty site and are forbidden to occupied a crystal site.
The aim of the code is to explore the configurational space of the system, in other words all the possible distribution we can obtained with the different elements. To do so I start from a initial configuration and I choose randomly to site (respecting the rules of occupation) and I virtually switch them. I calculate the energy of the old an new configurations, if the new has a lower energy I keep it, if not, i keep the old one. The calculus of the energy is based on the knowledge of the environment of each vacancies and impurities, so we need to know their neighbors. And I repeat the all procedure again and again to converge to the most stable (so the most probable) configuration.
The next step is to include the temperature effect, and to add the second type of empty sites.
Have a nice day,
M.

How can I select Yes/No qestionID dynamically in weka j48 App

I'm developing a Weka app like Akinator by using the j48 method.
Sample:
http://jbossews-vdoctor.rhcloud.com/doctor
The following is the app's table definition and sample data
qa means question id(Please refer the master which can be set by user) + answer(1:Yes, 2: I don't know, 3: No).
1 line per 1 question & answer.
id,qa,class
A,13,1
A,23,1
B,13,2
B,21,2
The point is to find a way to select the question which can maximize the entropy.
Currently this app is regarding first node id of decision tree as the best question.
And then it narrows down the options by this elimination way.
But the accuracy was too bad to run correctly so I'd like to improve it.
I noticed that the qa column was identified as numeric so it could not build the correct decision tree.
I am confused what I should do for improvement. Dataset? Table definition? Logic?
This is quite a broad question that you are asking, and without code or a clear understanding of the problem it is quite difficult to answer, but I'll give some tips for improvement:
Table Definition
What may have made more sense here is to have an attribute for each question, instead of using a single instance per question. For Example, instead of id, qa and class, you could have A, B, C, D, E, F and Disease. (I believe there were six questions, and naming each attribute would be recommended instead of A-F)
Dataset
You will need at least as many cases as there are diseases, if not more for defining multiple subsets of the problem space for the same disease. There are likely cases where some questions are irrelevant or missing, and the model may need to handle such situations.
Logic
In such a case, you might be able to do the questionnaire by starting with the root node and asking questions until you reach the estimated class. This way, you can ask from node to node until a class is reached.
I hope this helps in improving your existing model.
NOTE: I tried your questionnaire and answered No to all of your questions, and I strangely ended up with Trichomoniasis. Perhaps there could be a 'No Disease' category for your training data also.
My nominal qa data is building such a decision tree by binary split.
actually this structure won't make sense because there is tree at only one side. When qa equal 23 it would be always '3' answer. It's irrational.
http://www.fastpic.jp/viewer.php?file=2693704973.jpg
You should first reformat your features to get all possible questions A,B,C,D... as binary features and your final answer (ie. what to guess) as target class if you want your tree to get a sequence of questions reaching to your answer. Your data will certainly be sparse (many questions without data/answer).
By the way, a binary tree is not the right ML structure and algorithm to build an Akinator like or 20Q/Guess-who. Please look some suggestions here: https://stats.stackexchange.com/questions/6074/akinator-com-and-naive-bayes-classifier