Linear programming feasibility: Non connexe solution ensemble - linear-programming

I would like to solve a feasibility problem subject to linear constraint. My constraint look like:
abs(x_i - x_j) < d_ij_1
abs(x_i - x_j - a) < d_ij_2
abs(x_i - x_j) > d_ij_3
etc...
I am adding a picture of an example for just 3 variables domain (I am fixing the first variable to 0). I know that the white region are valid solution, and for instance I can choose the red dot.
My issue is as I increase the number of unknown x_j, I cannot represent the problem anymore in a way that make it easy to find a solution. I was wondering how can I try to solve such a problem ? Would linear programming help, even though the solution space is not really connexe here ? For scale, I am looking at solving it for ~6-10 variables. Also, I posted here as I don't know what stack would be the most fitted for this kind of problem

Related

What are hp.Discrete and hp.Realinterval? Can I include more values in hp.realinterval instead of just 2?

I am using Hyperparameter using HParams Dashboard in Tensorflow 2.0-beta0 as suggested here https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams
I am confused in step 1, I could not find any better explanation. My questions are related to following lines:
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
My question:
I want to try more dropout values instead of just two (0.1 and 0.2). If I write more values in it then it throws an error- 'maximum 2 arguments can be given'. I tried to look for documentation but could not find anything like from where these hp.Discrete and hp.RealInterval functions came.
Any help would be appreciated. Thank you!
Good question. They notebook tutorial lacks in many aspects. At any rate, here is how you do it at a certain resolution res
for dropout_rate in tf.linspace(
HP_DROPOUT.domain.min_value,
HP_DROPOUT.domain.max_value,
res,):
By looking at the implementation to me it really doesn't seem to be GridSearch but MonteCarlo/Random search (note: this is not 100% correct, please see my edit below)
So on every iteration a random float of that real interval is chosen
If you want GridSearch behavior just use "Discrete". That way you can even mix and match GridSearch with Random search, pretty cool!
Edit: 27th of July '22: (based on the comment of #dpoiesz)
Just to make it a little more clear, as it is sampled from the intervals, concrete values are returned. Therefore, those are added to the grid dimension and grid search is performed using those
RealInterval is a min, max tuple in which the hparam will pick a number up.
Here a link to the implementation for better understanding.
The thing is that as it is currently implemented it does not seems to have any difference in between the two except if you call the sample_uniform method.
Note that tf.linspace breaks the mentioned sample code when saving current value.
See https://github.com/tensorflow/tensorboard/issues/2348
In particular OscarVanL's comment about his quick&dirty workaround.

Weka improve model TP Rate

j48 weka
Hi,
I have problem with my model in weka (j48 cross-validation) that many instances are classified wrong when it comes to the second class. Is there any way to improve it or rather not? I'm not an expert in weka. Thank you in advance. My output is above.
In NaiveBayes it presents better but still TP Rate < 0.5 for the second class.
NaiveByes weka
It is hard to reproduce your example with the given information. However the solution is probably to turn your classifiert into a cost sensitive classifier
https://weka.wikispaces.com/CostSensitiveClassifier?responseToken=019a566fb2ce3b016b9c8c791c92e8e35
What it does it assigns a higher value to misclassifications of a certain class. In your case this would be the "True" class.
You can also simulate such an algorithm by oversampling your positive examples. This is, if you have n positive examples you sample k*n positive example, while you keep your negative examples as they are. You could also simply double positive examples.

How can I select Yes/No qestionID dynamically in weka j48 App

I'm developing a Weka app like Akinator by using the j48 method.
Sample:
http://jbossews-vdoctor.rhcloud.com/doctor
The following is the app's table definition and sample data
qa means question id(Please refer the master which can be set by user) + answer(1:Yes, 2: I don't know, 3: No).
1 line per 1 question & answer.
id,qa,class
A,13,1
A,23,1
B,13,2
B,21,2
The point is to find a way to select the question which can maximize the entropy.
Currently this app is regarding first node id of decision tree as the best question.
And then it narrows down the options by this elimination way.
But the accuracy was too bad to run correctly so I'd like to improve it.
I noticed that the qa column was identified as numeric so it could not build the correct decision tree.
I am confused what I should do for improvement. Dataset? Table definition? Logic?
This is quite a broad question that you are asking, and without code or a clear understanding of the problem it is quite difficult to answer, but I'll give some tips for improvement:
Table Definition
What may have made more sense here is to have an attribute for each question, instead of using a single instance per question. For Example, instead of id, qa and class, you could have A, B, C, D, E, F and Disease. (I believe there were six questions, and naming each attribute would be recommended instead of A-F)
Dataset
You will need at least as many cases as there are diseases, if not more for defining multiple subsets of the problem space for the same disease. There are likely cases where some questions are irrelevant or missing, and the model may need to handle such situations.
Logic
In such a case, you might be able to do the questionnaire by starting with the root node and asking questions until you reach the estimated class. This way, you can ask from node to node until a class is reached.
I hope this helps in improving your existing model.
NOTE: I tried your questionnaire and answered No to all of your questions, and I strangely ended up with Trichomoniasis. Perhaps there could be a 'No Disease' category for your training data also.
My nominal qa data is building such a decision tree by binary split.
actually this structure won't make sense because there is tree at only one side. When qa equal 23 it would be always '3' answer. It's irrational.
http://www.fastpic.jp/viewer.php?file=2693704973.jpg
You should first reformat your features to get all possible questions A,B,C,D... as binary features and your final answer (ie. what to guess) as target class if you want your tree to get a sequence of questions reaching to your answer. Your data will certainly be sparse (many questions without data/answer).
By the way, a binary tree is not the right ML structure and algorithm to build an Akinator like or 20Q/Guess-who. Please look some suggestions here: https://stats.stackexchange.com/questions/6074/akinator-com-and-naive-bayes-classifier

The New Villa Acm solution strategy

I am trying to solve this ACM problem The New Villa
and i am not figuring out how to approach this problem definitely its graph problem but doors and the room that have switches to other rooms are very confusing to make a generic solution. Can some body help me in defining the strategy for this problem.
Also i want some discussion forum for ACM problems if you know any one then please share.
Thanks
A.S
It seems like a pathfinding problem on states.
You can represent each vertex with a binary vector of size n + an indentifier - where which room you are in at the moment [n is the number of rooms].
G=(V,E) where V = {all binary vectors of size n and a recored for which room you are in} and E = {(u,v) | you can switch from binary vector u to v by clicking a button in the room you are in, or move to adjacent lights on room }
Now you only need to run a search algorithm on the possible paths.
Possible search algorithms:
BFS - simplest to program, though slowest run time
bi - directional BFS - since there is only one target node,
a bi-directional search will work here, it is expected to be much
faster then BFS
A* - find an admissible heurstic function and run
informed A* on the problem. It is harder to program it then the rest - but if you find a good heurisitc, it will most likely perform much better.
(*) All of the above are both complete [will find a solution if one exists] and optimal [will find the shortest solution, if one exists]
(*) This solution runs in exponential time on the number of rooms, but it should end up for d <= 10 as indicated in the problem in reasonable time.

C++ pathfinding with a-star, optimization

Im wondering if I can optimize my pathfinding code a bit, lets look at this map:
+ - wall, . - free, S - start, F - finish
.S.............
...............
..........+++..
..........+F+..
..........+++..
...............
The human will look at it and say its impossible, becouse finish is surrounded... But A-star MUST check all fields to ascertain, that there isnt possible road. Well, its not a problem with small maps. But when I have 256x265 map, it takes a lot of time to check all points. I think that i can stop searching while there are closed nodes arround the finish, i mean:
+ - wall, . - free, S - start, F - finish, X - closed node
.S.............
.........XXXXX.
.........X+++X.
.........X+F+X.
.........X+++X.
.........XXXXX.
And I want to finish in this situation (There is no entrance to "room" with finish). I thought to check h, and while none of open nodes is getting closer, then to finish... But im not sure if its ok, maybe there is any better way?
Thanx for any replies.
First of all this problem is better solved with breadth-first search, but I will assume you have a good reason to use a-star instead. However I still recommend you first check the connectivity between S and F with some kind of search(Breadth-first or depth-first search). This will solve our issue.
Assuming the map doesn't change, you can preprocess it by dividing it to connected components. It can be done with a fast disjoint set data structure. Then before launching A* you check in constant time that the source and destination belong to the same component. If not—no path exists, otherwie you run A* to find the path.
The downside is that you will need additional n-bits per cell where n = ceil(log C) for C being the number of connected components. If you have enough memory and can afford it then it's OK.
Edit: in case you fix n being small (e.g. one byte) and have more than that number of components (e.g. more than 256 for 8-bit n) then you can assign the same number to multiple components. To achieve best results make sure each component-id has nearly the same number of cells assigned to it.