Dealing with vector of list of struct - c++

I read text file as:
std::vector< std::list< struct> >
My data in the form:
1 0.933 0.9 2 0.865 0.6 3 0.919 0.2 4 0.726 0.5
3 0.854 0.6 5 0.906 0.2 6 0.726 0.5
1 0.906 0.2 2 0.726 0.5
1 0.933 0.2 2 0.865 0.5 4 0.919 0.1 5 0.726 0.5 6 0.933 0.9
Where each line consist of some integer numbers and each integer number has 2 real numbers,
for example:
in the first line, integer number 1 has to real number 0.933, and 0.9
This the code for scanning data:
struct Lines1 {
int Item;
float Prob;
float W;
};
std::istream& operator>>(std::istream &is, Lines1 &d)
{
return is >> d.Item >> d.Prob>> d.W;
}
float threshold;
std::map<int, float> FFISupp;
std::map <int, vector <int> > AssociatedItem;
std::vector<std::list<Lines1>> data;
void ScanData()
{
ifstream in;
in.open(dataFile);
std::string line;
int i = 0;
while (std::getline(in, line))
{
std::stringstream Sline1(line);
std::stringstream ss(line);
std::list<Lines1 > inner;
Lines1 info;
while (ss >> info)
{
inner.push_back(info);
}
data.push_back(inner);
}
}
Now I successfully stored the data in the text file, in the map data which is vector of list of strcut
BUT I didn't succeed in dealing with vector of list of strcut (data) to do the following:
1- create map namely FFISupp such that:
FFISupp (key = the 6 distinct integer number in the data struct, value = the summation of probabilities for each number)
For example:
since the integer number 1 presents in the data sets in three positions, the total probability for integer number 1 =0.933 + 0.906 + 0.933 = 2.772
==> The result of FFISupp
FFISupp (1, 2.772)
FFISupp (2, 2.456)
.
.
FFISupp (6,1.659)
2- create map namely AssociatedItem such that:
AssociatedItem (key = 6 distinct integer number, value = the associated items with this number)
associated items means, for example, the integer number 1 presents in the dataset with other integer number like (2,3,4,5,6)
AssociatedItem (1, (2,3,4,5,6))
AssociatedItem (2, (1,3,4,5,6))
AssociatedItem (3, (1,2,4,5,6))
AssociatedItem (4, (1,2,3,5,6))
AssociatedItem (5, (1,2,3,4,6))
AssociatedItem (6, (1,2,3,4,5))
3- delete all item that has the result of sum of its probabilities < threshold from FFISupp
and update both FFISupp and AssociatedItem
for example, if two items 3, and 6 have total probabilities < threshold, then, I will update FFISupp
FFISupp (1, 2.772)
FFISupp (2, 2.456)
FFISupp (4, 1.645)
FFISupp (5,1.632)
also update AssociatedItem
AssociatedItem (1, (2,4,5))
AssociatedItem (2, (1,4,5))
AssociatedItem (4, (1,2,5))
AssociatedItem (5, (1,2,4))
This my try:
void Pass()
{
for (unsigned i = 0; i < data.size() - 1; ++i)
{
for (unsigned k = 0; i < data[i].size() - 1; ++k)
{
for (unsigned l = k + 1; l < data[i].size(); ++l)
{
auto p1 = make_pair(data[i][k].Item, data[i][k].Prob);
FFISupp[p1.first] += p1.second;
AssociatedItem[data[i][k].Item].push_back(data[i][l].Item);
}
}
}
/*update the FFISupp, and AssociatedItem by erasing allitems with <= Min_Threshold*/
std::map<int, float> ::iterator current = FFISupp.begin();
std::map<int, vector <int>> ::iterator current2 = AssociatedItem.begin();
while (current != FFISupp.end())
{
if (current->second <= threshold)
{
current = FFISupp.erase(current);
while (current2 != AssociatedItem.end())
{
current2 = AssociatedItem.erase(current2);
++current2;
}
}
else
++current;
}
}

as i only understand what you meant in stage #1 - i'll help only it.
you code - as shown below should iterate over all the data vector elements - therefore the stop condition should be simply data.size().
Stating data.size() - 1 reminds me of C array... well. an std::vector is not an array, and by iterating until data.size() - 1 you lose the last item.
I don't understand what stage#2 and stage#3 goal are.

Related

how to calculate multiset of elements given probability on each element?

let say I have a total number
tN = 12
and a set of elements
elem = [1,2,3,4]
and a prob for each element to be taken
prob = [0.0, 0.5, 0.75, 0.25]
i need to get a random multiset of these elements, such as
the taken elements reflects the prob
the sum of each elem is tN
with the example above, here's some possible outcome:
3 3 2 4
2 3 2 3 2
3 4 2 3
2 2 3 3 2
3 2 3 2 2
at the moment, maxtN will be 64, and elements the one above (1,2,3,4).
is this a Knapsack problem? how would you easily resolve it? both "on the fly" or "pre-calculate" approch will be allowed (or at least, depends by the computation time). I'm doing it for a c++ app.
Mission: don't need to have exactly the % in the final seq. Just to give more possibility to an elements to be in the final seq due to its higher prob. In few words: in the example, i prefer get seq with more 3-2 rather than 4, and no 1.
Here's an attempt to select elements with its prob, on 10 takes:
Randomizer randomizer;
int tN = 12;
std::vector<int> elem = {2, 3, 4};
std::vector<float> prob = {0.5f, 0.75f, 0.25f};
float probSum = std::accumulate(begin(prob), end(prob), 0.0f, std::plus<float>());
std::vector<float> probScaled;
for (size_t i = 0; i < prob.size(); i++)
{
probScaled.push_back((i == 0 ? 0.0f : probScaled[i - 1]) + (prob[i] / probSum));
}
for (size_t r = 0; r < 10; r++)
{
float rnd = randomizer.getRandomValue();
int index = 0;
for (size_t i = 0; i < probScaled.size(); i++)
{
if (rnd < probScaled[i])
{
index = i;
break;
}
}
std::cout << elem[index] << std::endl;
}
which gives, for example, this choice:
3
3
2
2
4
2
2
4
3
3
Now i just need to build a multiset which sum up to tN. Any tips?

find a selection of elements adjacent sum = 10

Description:
Given matrix [x] [y], with x- rows and y- number of columns . Filled random numbers from 0 to 5 inclusive .
Description of finding a solution : the solution is considered to be a set of matrix elements that are adjacent to each other ( diagonal neighborhood is not taken into account ) and the sum of the number are 10. Each element of the matrix can be used 1 time for a decision . The solution may have any number of digits. The decision must end any number other than zero .
Example:
given
0 1 2 3 4 5
1 2 3 4 5 0
2 3 4 5 1 2
Solution 1 : (1 - 2 - 3 - 4)
0 **1** 2 3 4 5
1 **2** 3 4 5 0
2 **3** **4** 5 1 2
i tried to do smth like this, but it is wrong, i dont know when i must stop,
Solution it is a class which contains mair of indexes, pls help me.
void xxx(int colCount, int rowCount, int currentRow, int currentCol, int** matrix, int sum, Solution *solution, int solCount) {
sum += matrix[currentRow][currentCol];
matrix[currentRow][currentCol] = -1;
if(sum > 10){
sum - = matrix[currentRow][currentCol];
return;
} else if(sum == 10){
solution[solCount].additem(currentRow, currentCol);
return xxx(5,5,currentRow - 1, currentCol, matrix, sum, solution, solCount+1);
} else {
//UP
if( currentRow > 0 && matrix [currentRow - 1][currentCol] != -1){
xxx(5,5,currentRow - 1, currentCol, matrix, sum, solution,solCount);
}
//LEFT
if(currentCol > 0 && matrix [currentRow][currentCol-1] != -1){
xxx(5,5,currentRow, currentCol - 1, matrix, sum, solution,solCount);
}
//DOWN
if(currentRow + 1 < colCount && matrix[currentRow + 1][currentCol] != -1){
xxx(5,5,currentRow + 1, currentCol, matrix, sum, solution,solCount);
}
//RIGHT
if(currentCol + 1 < rowCount && matrix[currentRow][currentCol + 1] != -1){
xxx(5,5,currentRow, currentCol + 1, matrix, sum, solution,solCount);
}
}
}

codility MaxDistanceMonotonic, what's wrong with my solution

Question:
A non-empty zero-indexed array A consisting of N integers is given.
A monotonic pair is a pair of integers (P, Q), such that 0 ≤ P ≤ Q < N and A[P] ≤ A[Q].
The goal is to find the monotonic pair whose indices are the furthest apart. More precisely, we should maximize the value Q − P. It is sufficient to find only the distance.
For example, consider array A such that:
A[0] = 5
A[1] = 3
A[2] = 6
A[3] = 3
A[4] = 4
A[5] = 2
There are eleven monotonic pairs: (0,0), (0, 2), (1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (3, 3), (3, 4), (4, 4), (5, 5). The biggest distance is 3, in the pair (1, 4).
Write a function:
int solution(vector &A);
that, given a non-empty zero-indexed array A of N integers, returns the biggest distance within any of the monotonic pairs.
For example, given:
A[0] = 5
A[1] = 3
A[2] = 6
A[3] = 3
A[4] = 4
A[5] = 2
the function should return 3, as explained above.
Assume that:
N is an integer within the range [1..300,000];
each element of array A is an integer within the range [−1,000,000,000..1,000,000,000].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
Here is my solution of MaxDistanceMonotonic:
int solution(vector<int> &A) {
long int result;
long int max = A.size() - 1;
long int min = 0;
while(A.at(max) < A.at(min)){
max--;
min++;
}
result = max - min;
while(max < (long int)A.size()){
while(min >= 0){
if(A.at(max) >= A.at(min) && max - min > result){
result = max - min;
}
min--;
}
max++;
}
return result;
}
And my result is like this, what's wrong with my answer for the last test:
If you have:
0 1 2 3 4 5
31 2 10 11 12 30
Your algorithm outputs 3, but the correct answer is 4 = 5 - 1.
This happens because your min goes to -1 on the first full run of the inner while loop, so the pair (1, 5) will never have the chance to get checked, max starting out at 4 when entering the nested whiles.
Note that the problem description expects O(n) extra storage, while you use O(1). I don't think it's possible to solve the problem with O(1) extra storage and O(n) time.
I suggest you rethink your approach. If you give up, there is an official solution here.

Histogram of the distribution of dice rolls

I saw a question on careercup, but I do not get the answer I want there. I wrote an answer myself and want your comment on my analysis of time complexity and comment on the algorithm and code. Or you could provide a better algorithm in terms of time. Thanks.
You are given d > 0 fair dice with n > 0 "sides", write an function that returns a histogram of the frequency of the result of dice rolls.
For example, for 2 dice, each with 3 sides, the results are:
(1, 1) -> 2
(1, 2) -> 3
(1, 3) -> 4
(2, 1) -> 3
(2, 2) -> 4
(2, 3) -> 5
(3, 1) -> 4
(3, 2) -> 5
(3, 3) -> 6
And the function should return:
2: 1
3: 2
4: 3
5: 2
6: 1
(my sol). The time complexity if you use a brute force depth first search is O(n^d). However, you can use the DP idea to solve this problem. For example, d=3 and n=3. You can use the result of d==1 when computing d==2:
d==1
num #
1 1
2 1
3 1
d==2
first roll second roll is 1
num # num #
1 1 2 1
2 1 -> 3 1
3 1 4 1
first roll second roll is 2
num # num #
1 1 3 1
2 1 -> 4 1
3 1 5 1
first roll second roll is 3
num # num #
1 1 4 1
2 1 -> 5 1
3 1 6 1
Therefore,
second roll
num #
2 1
3 2
4 3
5 2
6 1
The time complexity of this DP algorithm is
SUM_i(1:d) {n*[n(d-1)-(d-1)+1]} ~ O(n^2*d^2)
~~~~~~~~~~~~~~~ <--eg. d=2, n=3, range from 2~6
The code is written in C++ as follows
vector<pair<int,long long>> diceHisto(int numSide, int numDice) {
int n = numSide*numDice;
vector<long long> cur(n+1,0), nxt(n+1,0);
for(int i=1; i<=numSide; i++) cur[i]=1;
for(int i=2; i<=numDice; i++) {
int start = i-1, end = (i-1)*numSide; // range of previous sum of rolls
//cout<<"start="<<start<<" end="<<end<<endl;
for(int j=1; j<=numSide; j++) {
for(int k=start; k<=end; k++)
nxt[k+j] += cur[k];
}
swap(cur,nxt);
for(int j=start; j<=end; j++) nxt[j]=0;
}
vector<pair<int,long long>> result;
for(int i=numDice; i<=numSide*numDice; i++)
result.push_back({i,cur[i]});
return result;
}
You can do it in O(n*d^2). First, note that the generating function for an n-sided dice is p(n) = x+x^2+x^3+...+x^n, and that the distribution for d throws has generating function p(n)^d. Representing the polynomials as arrays, you need O(nd) coefficients, and multiplying by p(n) can be done in a single pass in O(nd) time by keeping a rolling sum.
Here's some python code that implements this. It has one non-obvious optimisation: it throws out a factor x from each p(n) (or equivalently, it treats the dice as having faces 0,1,2,...,n-1 rather than 1,2,3,...,n) which is why d is added back in when showing the distribution.
def dice(n, d):
r = [1] + [0] * (n-1) * d
nr = [0] * len(r)
for k in xrange(d):
t = 0
for i in xrange(len(r)):
t += r[i]
if i >= n:
t -= r[i-n]
nr[i] = t
r, nr = nr, r
return r
def show_dist(n, d):
for i, k in enumerate(dice(n, d)):
if k: print i + d, k
show_dist(6, 3)
The time and space complexity are easy to see: there's nested loops with d and (n-1)*d iterations so the time complexity is O(n.d^2), and there's two arrays of size O(nd) and no other allocation, so the space complexity is O(nd).
Just in case, here a simple example in Python using the OpenTurns platform.
import openturns as ot
d = 2 # number of dice
n = 6 # number of sides per die
# possible values
dice_distribution = ot.UserDefined([[i] for i in range(1, n + 1)])
# create the sum distribution d times the sum
sum_distribution = sum([dice_distribution] * d)
That's it!
print(sum_distribution)
will show you all the possible values and their corresponding probabilities:
>>> UserDefined(
{x = [2], p = 0.0277778},
{x = [3], p = 0.0555556},
{x = [4], p = 0.0833333},
{x = [5], p = 0.111111},
{x = [6], p = 0.138889},
{x = [7], p = 0.166667},
{x = [8], p = 0.138889},
{x = [9], p = 0.111111},
{x = [10], p = 0.0833333},
{x = [11], p = 0.0555556},
{x = [12], p = 0.0277778}
)
You can also draw the probability distribution function:
sum_distribution.drawPDF()

How to make a complete matrix with 2 loops in c++

I have a string in c++ and it represents an upper triangular matrix, What I want to do is meake a complete matrix from this string
std::string input = "1,2,1,3,6,1,4,7,9,1";
//this represents
//1 2 3 4
//2 1 6 7
//3 6 1 9
//4 7 9 1
std::replace(input.begin(), input.end(), ',', ' ');
std::vector<double> Matrix;
std::istringstream inputStream(input);
double value;
int rowNum = 0;
int colNum = 0;
while (inputStream >> value){
for (colNum = 0; colNum < 2; colNum++){
if (colNum >= rowNum){
Matrix.push_back( value );
}
else{
Matrix.push_back( Matrix[colNum * 2 + rowNum]);
}
}
rowNum++;
}
inputStream >> std::ws;
Instead of getting
1 2 3 4
2 1 6 7
3 6 1 9
4 7 9 1
But I am getting
1.0000 1.0000 1.0000 2.0000
1.0000 1.0000 2.0000 1.0000
1.0000 2.0000 1.0000 1.0000
2.0000 1.0000 1.0000 2.0000
What is it my error? I can not see it...
You should show the indexing scheme used for printing the output (i.e. how do you expect the indexes works): your choice of using a vector instead of a matrix make hard to correct the code. For sure, I see the following points that have no clear connection with the input pattern:
1) each number you read you increment the rowNum index. The row should be incremented instead at 'steps' 1, 1+2, 1+2+3,...
2) colNum should range from 0 to current rowNum, instead assumes just 0,1
3) there is no chance to fill a row (say the first) before you read (say the last). You could do if the input would be 1 2 3 4 1 6 7 1 9 1
all these points are related, and origin from the wrong data representation, that make difficult a trivial task.
In C++, a very effective way to tackle these problems is data hiding: consider how easily we can write a class that gives the correct logical representation and doesn't waste space:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
template <class T = double>
class upper_triangular_matrix
{
std::vector<T> Matrix;
public:
upper_triangular_matrix(std::string input)
{
// trade time for space: store the values, compute indexing
std::replace(input.begin(), input.end(), ',', ' ');
std::istringstream inputStream(input);
T value;
while (inputStream >> value)
Matrix.push_back(value);
// validate size: ok 1,1+2,1+2+3 etc
}
T operator()(int r, int c) const
{
// compute indexing accounting for miss duplicated
if (c > r)
std::swap(c, r);
int p = 0, n = 1;
while (r > 0)
{
p += n++;
r--;
}
return Matrix[p + c];
}
};
int main()
{
upper_triangular_matrix<> m("1,2,1,3,6,1,4,7,9,1");
for (int r = 0; r < 4; ++r)
{
for (int c = 0; c < 4; ++c)
std::cout << m(r, c) << ' ';
std::cout << std::endl;
}
}
when run, this prints
1 2 3 4
2 1 6 7
3 6 1 9
4 7 9 1
It is hard to tell exactly where the error is but here is where it starts:
std::vector<double> Matrix;
Yes, a non-empty std::vector<double> with n elements is a matrix: either a 1xn or a nx1 matrix (or both). In your context this view is, however, utterly unhelpful.
Let's look at the for-loop when you read the first element:
colNum == 0, rowNum == 0 => (1, 1) = Matrix[0] = 1
colNum == 1, rowNum == 0 => (2, 1) = Matrix[1] = 1
This start is clearly wrong. After this rowNum becomes 1:
colNum == 0, rowNum == 1 => (3, 1) = Matrix[2] = Matrix[colNum * 2 + rowNum] = Matrix[1] = 1
colNum == 1, rowNum == 1 => (4, 1) = Matrix[3] = 2
Well, I guess you can write the remainder up yourself. Of course, I could quickly write the code to solve your problem but I think this little exercise is for you. The way to do it is to fill the first row columns (where row is the current row being processed, using conventional index starting with 0) with the values from the transposed matrix and then read the remaining n - row columns (where n is the size of the matrix) from the file.