SegNet results of train set (test via test_segmentation.py) - python-2.7

I run SegNet on my own dataset (by Segnet tutorial). I see great results via test_segmentation.py.
my problem is that I want to see the real net results and not test_segmentation own colorisation (via classes).
for example, if I have trained net with 2 classes, so after the train I will see not only 2 colors (as we see with the classes), but we will see the real net color segmentation ([0.22,0.19,0.3....) lighter and darker as the net see it]
I hope that I explained myself well. thanks for helping.

You could use a python script to achieve what you want. Take a look at this script.
The command out = out['argmax'], extracts the raw output, so you can get a segmentation map with 'lighter and darker' values as you wanted.

When you say the 'real' net color segmentation I will assume that you mean the probability maps. Effectively the last layer will have one map for every class; and if you check the function predict in inference.py, they take the argmax; that is the channel (which represents the class) with the highest probability. If you want to get these maps, you just have to get the data without computing the argmax; something like:
predicted = net.blobs['prob'].data

I solve it. the solution is to range cmin and cmax from 0 to 1 in the scipy saving method. for example: scipy.misc.toimage(output, cmin=0.0, amax=1).save(/path/.../image.png)

Related

userWarning pymc3 : What does reparameterize mean?

I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.

How to use uncertainties to weight residuals in a Savitzky-Golay filter.

Is there a way to incorporate the uncertainties on my data set into the result of the Savitzky Golay fit? Since I am not passing this information into the function, I asume that it is simply calcuating the 'best fit' via an unweighted least-squares process. I am currently working with data that has non-uniform uncertainty, and so the fit of the data could be improved by including the errors that I have for my main dataset.
The wikipedia page for the Savitzky-Golay filter suggests how I might go about alter the process of calculating the coefficients of the fit, and I am staring at the code for scipy.signal.savgol_filter, but I cannot get my head around what I need to adjust so that this will do what I want it to.
Are there any ready-made weighted SG filters floating about? I find it hard to believe that no-one else has ever needed this tool in Python, but maybe I have missed something.
Check out this Python module: https://github.com/surhudm/savitzky_golay_with_errors
This python script improves upon the traditional Savitzky-Golay filter
by accounting for errors or covariance in the data. The inputs and
arguments are all modelled after scipy.signal.savgol_filter
Matlab function sgolayfilt supports weights. Check the documentation.

Data mining with Weka

I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

sequential/online kmeans clustering, how does it work? Existing codes?

I'm a little confused about online kmeans clustering. I know that it allows me to cluster with just one data at a time. But,is this all limited to one session? Suppose that I have a bunch of data clustered via this method and I get the clustered data result, would I be able to add more data to the cluster in the future?
I've also been looking for implementations of this code, and to no avail. Anyone know of any?
Update:
To clarify more. Here is how my code works right now:
Image is taken from live video feed, once enough pictures are saved, get kmeans of sift features.
Repeat step 1, a new batch of live feed pictures, get kmeans again. Combine the kmeans vectors with the previous kmeans like :[A B]
You can see that this is bad, because I quickly get too much clusters, and each batch of clusters will definitely have overlaps with another batch.
What I want:
Image taken from live video feed, once pics are saved, get kmeans
Repeat step 1, get kmeans again, which updates and adds new clusters to the previous cluster.
Nothing that I've seen could accommodate that, unless I'm just not understanding them correctly.
If you look at the original (!) publications, the method proposed by MacQueen - where the name k-means comes from - was in fact an online algorithm. I'm not sure if MacQueen did multiple passes over the data to improve the result. I believe he used a single pass, and objects would never be reassigned to a different cluster. If so, it was already an online algorithm!
Means are commonly computed as sum / count. This is not very sensible from a numerical point of view. E.g. in the classic Knuth book you can find a method for incrementally updating means. Wikipedia has it also.
Things get slightly more complicated once you actually want to reassign earlier points. But usually in a streaming context you do not know the previous points, so you cannot do that anyway.