Achieving Mutability When Mixing Primitives and Cocoa Collections - c++

Okay, I think I might be over-complicating this issue but I truly am stuck. Basically, I am trying to model a weight set, specifically an olympic weight set. So I have the bar which is 45 lbs, then I have 2 weights of 2.5 lbs, 4 of 5 lbs, and then 2 of 10, 25, 35, and 45 respectively. This makes a total of 300 lbs.
bar = 45 lbs
2 of 2.5
4 of 5
2 of 10
2 of 25
2 of 35
2 of 45
I want to model this weight set so that I have this information: the weight and the quantity of weights I have. I know I could hard-code this but I eventually want to let the user enter how many of each weight they may have.
Anyways, originally I thought I could simply have an NSDictionary with the key being the weight, such as 35, and the value being the quantity.
Naturally I cannot store primitives in an NSDictionary or other Cocoa collection, so I have to encapsulate each integer in an NSNumber. However, the point of my modeling this weight set is so that I can simulate the use of certain weights. For example, if I use a 35 lbs. weight that takes 2 off (one for each side), so I have to go and edit the value for the 35 lbs. weight to reflect the fact that I have deducted 2 from the quantity.
This involves the tedious task of unboxing the NSNumber, converting back to a primitive, doing the math, and then re-boxing into an NSNumber and assigning that new result to the appropriate location in the NSDictionary. After searching around a bit, I confirmed my initial premonition that this was not a good idea.
So I have a couple questions. First of all, is there a better way of modeling a weight set aside from using a dictionary-style solution? If not, what is the suggested way to go about doing this? Do I have to leave the cocoa-realm and resort to using some sort of C++ STL template such as a map?
I have seen some information on NSDecimalNumber, should I just use that?
Like I said, I wouldn't be surprised if I am over-complicating this. I would really appreciate any help, thanks.
EDIT: I am beginning to think that the weight set 'definition' as described should indeed be immutable, as it is a definition after all. Then when I use a certain weight, I can add to some sort of tally. The thing is that the tally will also be some form of collection whose values I will be modifying (adding to), so that I can correlate it to the specific weight. So I guess I am in the same problem.
I think where I am trying to get at is creating a 'clone' so to speak of the weight set definition which I can easily modify (to simulate the usage of individual weights).
Sorry, I'm burned out.

Storing this in a dictionary isn't a natural fit. I think the best approach would be to make a Weight class that represents the weights, and stick them in an NSCountedSet. You can get the individual kinds of Weight and the counts for each kind, and you can get the weight of the whole set with [weightSet valueForKeyPath:#"#sum.weightInPounds"] (assuming the Weights have a weightInPounds property that represents how heavy they are).
You could use NSNumbers in the NSCountedSet and sum them with #sum.integerValue if you wanted, but it seems a bit awkward to me. At any rate, NSCountedSet is definitely a more natural collection than an NSDictionary for storing — well, a counted set.

There's nothing wrong with storing your numbers in an NSDictionary! The question you referenced was referring to complicated, frequent math. Converting from NSNumber and back is slow compared to simple int addition, but is still super-fast compared to human perception. I think your dictionary idea is EDIT: not as good as Chuck's NSCountedSet idea. :)

Related

Linear Programming - Re-setting a variable based on it's cumulative count

Detailed business problem:
I'm trying to solve a production scheduling business problem as below:
I have two plants producing FG A and B respectively.
Both the products consume the same Raw Material x
I need to create a 30 day production schedule looking at the Raw Material availability.
FG A and B can be produced if there is sufficient raw material available on the day.
After every 6 days of production the plant has to undergo maintenance and the production on that day will be zero.
Objective is to maximize the margin looking at the day level Raw material available and adhere to the production constraint (i.e. shutdown after every 6th day)
I need to build a linear programming to address the below problem:
Variable y: (binary)
variable z: cumulative of y
When z > 6 then y = 0. I also need to reset the cumulation of z after this point.
Desired output:
How can I build the statement to MILP constraint. Are there any techniques for solving this problem. Thank you.
I think you can model your maintenance differently. Just forbid any sequences of 7 ones for y. I.e.
y[t-6]+y[t-5]+y[t-4]+y[t-3]+y[t-2]+y[t-1]+y[t] <= 6 for t=1,..,T
This is easier than using your accumulator. Note that the beginning needs some attention: you can use historic data for this. I.e., at t=1, the values for t=0,-1,-2,.. are known.
Your accumulator approach is not inherently wrong. We often use it to model inventory. An inventory capacity is a restriction on how large the accumulated inventory can be.

Data mining with Weka

I am learning how to do data mining and I am using this data set from UCI's website.
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
The problem I am encountering is how to deal with the area class. My understanding from the description is that I need to apply ln(x+1) to area using AddExpression.
Am I going in the correct direction with this? Or are there other filters I should investigate? Thank you.
I try to answer your question based on the little information you provide. And I haven't worked with the forest-fires data set, but by inspection I see that the classifier attribute "area" often has the value 0. Maybe you can't simply filter out these rows with Area = 0. Your dataset might become too small, or whatnot.
I think you are asked to perform regression of some attribute(s) against "log(area)" in order to linearize it. However,when you try to calculate the log of the Area, values such as log(0) are a problem. values between 0 and 1 might also be problematic.
So a common fix is to add 1 to the value of "Area". This introduces a systematic error, but it is small, and it removes all 0-values, and you can still derive useful models from your log(x+1)-transformed dataset.
And yes, in Weka you do this by "Preprocess"/ AddExpression(x+1). This creates a new attribute. Then you might remove the old area attribute.
Of course, in interpreting your model, you should be aware of the transformation. If you just want to find out what the significant independent attributes are in your linear regression model, I'd say the transformation does not matter. The data points are just shifted a little bit.

How to normalize sequence of numbers?

I am working user behavior project. Based on user interaction I have got some data. There is nice sequence which smoothly increases and decreases over the time. But there are little discrepancies, which are very bad. Please refer to graph below:
You can also find data here:
2.0789 2.09604 2.11472 2.13414 2.15609 2.17776 2.2021 2.22722 2.25019 2.27304 2.29724 2.31991 2.34285 2.36569 2.38682 2.40634 2.42068 2.43947 2.45099 2.46564 2.48385 2.49747 2.49031 2.51458 2.5149 2.52632 2.54689 2.56077 2.57821 2.57877 2.59104 2.57625 2.55987 2.5694 2.56244 2.56599 2.54696 2.52479 2.50345 2.48306 2.50934 2.4512 2.43586 2.40664 2.38721 2.3816 2.36415 2.33408 2.31225 2.28801 2.26583 2.24054 2.2135 2.19678 2.16366 2.13945 2.11102 2.08389 2.05533 2.02899 2.00373 1.9752 1.94862 1.91982 1.89125 1.86307 1.83539 1.80641 1.77946 1.75333 1.72765 1.70417 1.68106 1.65971 1.64032 1.62386 1.6034 1.5829 1.56022 1.54167 1.53141 1.52329 1.51128 1.52125 1.51127 1.50753 1.51494 1.51777 1.55563 1.56948 1.57866 1.60095 1.61939 1.64399 1.67643 1.70784 1.74259 1.7815 1.81939 1.84942 1.87731
1.89895 1.91676 1.92987
I would want to smooth out this sequence. The technique should be able to eliminate numbers with characteristic of X and Y, i.e. error in mono-increasing or mono-decreasing.
If not eliminate, technique should be able to shift them so that series is not affected by errors.
What I have tried and failed:
I tried to test difference between values. In some special cases it works, but for sequence as presented in this the distance between numbers is not such that I can cut out errors
I tried applying a counter, which is some X, then only change is accepted otherwise point is mapped to previous point only. Here I have great trouble deciding on value of X, because this is based on user-interaction, I am not really controller of it. If user interaction is such that its plot would be a zigzag pattern, I am ending up with 'no user movement data detected at all' situation.
Please share the techniques that you are aware of.
PS: Data made available in this example is a particular case. There is no typical pattern in which numbers are going to occure, but we expect some range to be continuous with all the examples. Solution I am seeking is generic.
I do not know how much effort you want to involve in this problem but if you want theoretical guaranties,
topological persistence seems well adapted to your problem imho.
Basically with that method, you can filtrate local maximum/minimum by fixing a scale
and there are theoritical proofs that says that if you sampling is
close from your function, then you extracts correct number of maximums with persistence.
You can see these slides (mainly pages 7-9 to get the idea) to get an idea of the method.
Basically, if you take your points as a landscape and imagine a watershed starting from maximum height and decreasing, you have some picks.
Every pick has a time where it is born which is the time where it becomes emerged and a time where it dies which is when it merges with an higher pick. Now a persistence diagram pictures a point for every pick where its x/y coordinates are its time of birth/death (by assumption the first pick does not die and is not shown).
If a pick is a global maximal, then it will be further from the diagonal in the persistence diagram than a local maximum pick. To remove local maximums you have to remove picks close to the diagonal. There are fours local maximums in your example as you can see with the persistence diagram of your data (thanks for providing the data btw) and two global ones (the first pick is not pictured in a persistence diagram):
If you noise your data like that :
You will still get a very decent persistence diagram that will allow you to filter local maximum as you want :
Please ask if you want more details or references.
Since you can not decide on a cut off frequency, and not even on the filter you want to use, I would implement several, and let the user set the parameters.
The first thing that I thought of is running average, and you can see that there are so many things to set, to get different outputs.

Most efficient way to process complex histogram data?

I'm currently implementing a histogram that will show a very large scale data using Qt and I have some doubts about which data structure(s) I should be using for my problem. I will be displaying the amount of queries received from users of an application and the way I should display is as follows -in a single application that will show different histograms upon clicking different "show me this data etc." buttons-
1) Display the histogram of total queries per every month -4 months of data here, I
kept four variables and incremented them as I caught queries belonging to those months
in the CSV file-
2) Display the histogram of total queries per every single day in a selected month -I was thinking of using 4 QVectors to represent the months for this one, incrementing every element of the vector (day), as I come by that specific day -e.g. the vector represents the month of August and whenever I come across a data with 2011-08-XY , I will increment the (XY + 1)th element of that vector by 1- my second alternative is to use 4 QLinkedList's for the sake of better complexity but I'm not sure if the ways I've come up with are efficient enough and I'm willing to listen to any other idea.
3) Here's where things get a bit complicated. Display the histogram of total queries per every hour in a selected day and month. The data represented is multiplied in a vast manner and I don't know which data structure -or combination of structures- I should use to implement this one. A list of lists perhaps?
Any ideas on my problems at 2) and 3) would be helpful, Thanks in advance.
Actually, it shouldn't be too unmanageable to always do queries per hour. Assuming that the number of queries per hour is never greater than the maximum int value, that's only 24 ints per day = 32 bits or 64 depending on your machine. Assuming 32 bits, then you could get up to 28 years worth of data per MB.
There's no need to transfer the month/year - your program can work that out. Just assign hour 0 to the earliest point in your data, which you keep as a constant, then work out the date based on hours passed since then.
This avoids having to have a list of lists or anything fancy - just use an array where each address contains the number of hours since hour 0, and the number of queries for that hour.
Why don't you simply use a classical database?
When you start asking these kind of question I think it is a good time to consider a more robust structure.There are multiple data structures implemented inside any DB, optimized either for different access type. You should considerate at least lookup, insertion, deletion, range queries. There is no structure which is better than the others in all costs, so there is always a trade-off.
Qt has some database classes you can use. I never used the Qt SQL library, but I think you should give it a shot. Fortunately, there is a Qt SQL programming guide at the end of the page linked.

CLI/C++ How to store more than 15 digit float number?

For a school project, I have a simple program, which compares 20x20 photos. I put 20 photos, and then i put 21th photo, which is compared to existing 20, and pops up the answer, which photo i did insert (or which one is most similar). The problem is, my teacher wanted me to use nearest neighbour algorithm, so i am counting distance from every photo. I got everything working, but the thing is, if photos are too similar, i got the problem with saying which one is closer to my one. For example i get these distances with 2 different photos (well, they are ALMOST the same):
0 distance: 1353.07982026191
1 distance: 1353.07982026191
It is 15 digits already, and i am using double type. I was reading that long double is the same. Is there any "easy" way to store numbers with more than 15 digits and do math on them?
I count distance using Euclidean distance
I just need to be more precise, or thats limit i probably wont pass here, and i should talk to my teacher i cant compare such similar photos?
I think you need this: gmplib.org
There's a guide how to install this library on this site too.
And here's article about floats: http://gmplib.org/manual/C_002b_002b-Interface-Floats.html#C_002b_002b-Interface-Floats
Maybe you could use an algebraic approach.
Let us assume that you are trying to calcuate if vector x is closer to a or b. What you need to calculate is the sign of
d2(x, a) - d2(x, b)
Which becomes (I'll omit some passages for brevity)
and then
Which only contains differences between values which should be very similar. Summing over such small values should yield a better precision than working on the aggregate.