C++ Incompatible types: calculating allele frequencies - c++

Here is what the input file looks like:
1-1_Sample 1
GCCCATGGCT
2-1_Sample 1
GAGTGTATGT
3-1_Sample 1
TGTTCTATCT
1-1_Sample 2
GCTTAGCCAT
2-1_Sample 2
TGTAGTCAGT
3-1_Sample 2
GGGAACCAAG
1-1_Sample 3
TGGAAGCGGT
2-1_Sample 3
CGGGAGGAGA
3-1_Sample 3
CTTCAGTTTT
#include <cstdlib>
#include <iostream>
#include <string>
#include <fstream>
#include <stdlib.h>
using namespace std;
const int pops = 10;
const int sequence = 100;
string w;
string popname;
string lastpop;
int totalpops;
string ind;
int i;
int j;
char c;
float dna[pops][4][sequence];
float Af[1][1][1];
int main(int argc, char *argv[])
{
ifstream fin0("dnatest.txt");
lastpop = "nonsense";
totalpops = -1;
if (fin0)
{
do
{
getline(fin0, w);
cout << w<<endl;
i=0;
ind = "";
popname = "";
do {c = w [i];
i++;
if ((c != '>')&(c!='-')) ind=ind+c; } while (c != '-');
do {c = w [i];
i++; } while (c != ' ');
do {c = w [i];
i++;
if (c!= '\n') popname=popname+c; } while (i< w.length());
if (popname != lastpop) { totalpops++;
lastpop=popname;
}
getline (fin0, w);
cout << w<<endl << w.length()<<endl;
for (i=0; i<w.length(); i++)
{if (w[i]=='A') dna[totalpops][0][i]++;
if (w[i]=='C') dna[totalpops][1][i]++;
if (w[i]=='G') dna[totalpops][2][i]++;
if (w[i]=='T') dna[totalpops][3][i]++;
}
for(int k=0;k<1;k++)
{for(int j=0; j<1;j++)
{for (int i=0;i<1;i++)
Af[0] = Af[0][0][0]+dna[i][j][k]; //RETURNS THE ERROR "INCOMPATIBLE TYPES IN ASSIGNMENT OF 'FLOAT' TO 'FLOAT[1][1]'
cout<<Af<<endl;}
}
while (!fin0.eof());
}
system("PAUSE");
return EXIT_SUCCESS;
}
Background:
I am very new to C++ and trying to teach myself to use it to supplement my graduate research. I am genetics PhD candidate trying to model different evolutionary histories, and how they affect the frequency of alleles across populations.
Question:
I am trying to extract certain portions of data from the "dna" array that I created from the input file.
For example, here I have created another array "Af" where I am trying to extract counts for the first "cell," so to speak, of the dna array. The purpose of doing this, is so that I can calculate a frequency by comparing the counts in certain groups of cells to the entire dna array. I can't figure out how to do this. I keep getting the error message: "INCOMPATIBLE TYPES IN ASSIGNMENT OF 'FLOAT' TO 'FLOAT[1][1]'"
I have spent a great deal of time researching this on different forums, but I cannot seem to understand what this error means, and how else to achieve what I'm trying to achieve.
So the dna array I'm visualizing is a made from the input file such that there are 4 rows (A,C,G,T). and then 10 columns (one column for each nucleotide in the series). This "grid" is then stacked 3 times (one "sheet" for each Sample (here sample means population, and there are three individuals per population) as listed on the input file).
So from this stack of grids I want to extract, for example, the first cell (the number of A's in Sample 1 at position 1. I would then want to compare this number to the total number of A's at position 1 across all samples. This frequency would then be a meaningful number for the model I'm testing.
The problem is, I don't know how to extract portions of the dna array - once I figure out this condensed example, I will be applying it to very large input files, and will want to extract more than one cell at a time.

Af is a 3-dimensional array:
float Af[1][1][1];
However, it contains only a single element. It has one row, one column, and one "layer" (or however you want to name the 3rd dimension). That makes it a bit pointless. You might as well just have this:
float Af;
Nonetheless, you don't have that - you have a 3D array. Now let's look at this line:
Af[0] = Af[0][0][0] + dna[i][j][k];
So first it takes the (0, 0, 0)th element from Af (which as we've just seen is the only element in A and adds the (i, j, j)th element from dna to it. That bit is fine because both of these elements are of type float. That is:
Af[0] = Af[0][0][0] + dna[i][j][k];
// ^^^^^^^^^^^ ^^^^^^^^^^^^
// These are both floats
So the result of this addition is also a float. Then what do you try to assign this result to? Well you try to assign it to Af[0], but that is not a float. You've simplify specified the 0th index in the first dimension. There's still two other dimensions to specify. The type of Af[0] is actually a float[1][1] (a two dimensional array of floats). This would work, for example:
Af[0][0][0] = Af[0][0][0] + dna[i][j][k];
// Or equivalently:
Af[0][0][0] += dna[i][j][k];
Whether that's what you want to do or not is completely dependent on the problem, which I can't begin to understand. However, as I said, it makes very little sense to have Af as a 3 dimensional array with only a single element in it. If it's just one float, make it a float, not an array. Then you would do the above line as:
Af += dna[i][j][k];

Related

Does anyone know how to solve problems on variable length arrays?

Input Format
The first line contains two space-separated integers denoting the respective values of (the number of variable-length arrays) and (the number of queries).
Each line of the subsequent lines contains a space-separated sequence in the format k a[i]0 a[i]1 … a[i]k-1 describing the -element array located at.
Each of the subsequent lines contains two space-separated integers describing the respective values of (an index in the array ) and (an index in the array referenced by ) for a query.
Output Format-
For each pair of and values (i.e., for each query), print a single integer denoting the element located at an index of the array referenced by. There should be a total of lines of output.
Sample Input
2 2
3 1 5 4
5 1 2 8 9 3
0 1
1 3
Sample Output
5
9
Somebody has solved this problem by -
int main() {
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
int n,q; //n number of variable lenght arrays
// q no of queries asked
cin >>n >>q;
int ** Vectors = new int *[n];//no of length of var. arrays
int j;
for (int i=0;i<n;i++)
{
cin>>j;
Vectors[i] = new int [j];
for (int y=0;y<j;y++)
cin>>Vectors[i][y];
}
int q1,q2;
for (int i=0;i<q;i++)
{
cin >>q1 >> q2;
cout<<Vectors[q1][q2]<<endl;
}
return 0;
}
Can somebody explain me this code? Or if anyone has a better approach to solve this problem. Then please explain it in detail.
This shouldn't be hard to understand, that code is basically initializing dynamic 2D array at run time then inserting values to the 2D array and then accessing it by giving index:
int ** Vectors = new int *[n];//no of length of var. arrays
int j;
for (int i=0;i<n;i++)
{
cin>>j;
Vectors[i] = new int [j]; // initialzing inner array.. consider it as 2D array with n rows and j columns
for (int y=0;y<j;y++)
cin>>Vectors[i][y]; // insert element at specified index
}
cout<<Vectors[q1][q2]<<endl; // access element from 2D array
What you might want to use is a Matrix class.
Using
vector<vector<int>>
should do it.
Alternatively the snipet code should be refactored into a Matrix class with a constructor and a destructor.
The example you give present a memory leak since the allocated memory is not freed.

Dynamic Programming w/ 1D array USACO Training: Subset Sums

While working through the USACO Training problems, I found out about Dynamic Programming. The first training problem that deals with this concept is a problem called Subset Sums.
The Problem Statement Follows:
For many sets of consecutive integers from 1 through N (1 <= N <= 39), one can partition the set into two sets whose sums are identical.
For example, if N=3, one can partition the set {1, 2, 3} in one way so that the sums of both subsets are identical:
{3} and {1,2}
This counts as a single partitioning (i.e., reversing the order counts as the same partitioning and thus does not increase the count of partitions).
If N=7, there are four ways to partition the set {1, 2, 3, ... 7} so that each partition has the same sum:
{1,6,7} and {2,3,4,5}
{2,5,7} and {1,3,4,6}
{3,4,7} and {1,2,5,6}
{1,2,4,7} and {3,5,6}
Given N, your program should print the number of ways a set containing the integers from 1 through N can be partitioned into two sets whose sums are identical. Print 0 if there are no such ways.
Your program must calculate the answer, not look it up from a table.
INPUT FORMAT
The input file contains a single line with a single integer representing N, as above.
SAMPLE INPUT (file subset.in)
7
OUTPUT FORMAT
The output file contains a single line with a single integer that tells how many same-sum partitions can be made from the set {1, 2, ..., N}. The output file should contain 0 if there are no ways to make a same-sum partition.
SAMPLE OUTPUT (file subset.out)
4
After much reading, I found an algorithm that was explained to be a variation of the 0/1 knapsack problem. I implemented it in my code, and I solved the problem. However, I have no idea how my code works or what is going on.
*Main Question: I was wondering if someone could explain to me how the knapsack algorithm works, and how my program could possibly be implementing this in my code?
My code:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream fin("subset.in");
ofstream fout("subset.out");
long long num=0, ways[800]={0};
ways[0]=1;
cin >> num;
if(((num*(num+1))/2)%2 == 1)
{
fout << "0" << endl;
return 0;
}
//THIS IS THE BLOCK OF CODE THAT IS SUPPOSED TO BE DERIVED FROM THE
// O/1 KNAPSACK PROBLEM
for (int i = 1; i <= num; i++)
{
for (int j = (num*(num+1))/2 - i; j >= 0; --j)
{
ways[j + i] += ways[j];
}
}
fout << ways[(num*(num+1))/2/2]/2 << endl;
return 0;
}
*note: Just to emphasize, this code does work, I just would like an explanation why it works. Thanks :)
I wonder why numerous sources could not help you.
Trying one more time with my ugly English:
ways[0]=1;
there is a single way to make empty sum
num*(num+1))/2
this is MaxSum - sum of all numbers in range 1..num (sum of arithmetic progression)
if(((num*(num+1))/2)%2 == 1)
there is no chance to divide odd value into two equal parts
for (int i = 1; i <= num; i++)
for every number in range
for (int j = (num*(num+1))/2 - i; j >= 0; --j)
ways[j + i] += ways[j];
sum j + i might be built using sum j and item with value i.
For example, consider that you want make sum 15.
At the first step of outer cycle you are using number 1, and there is ways[14] variants to make this sum.
At the second step of outer cycle you are using number 2, and there is ways[13] new variants to make this sum, you have to add these new variants.
At the third step of outer cycle you are using number 3, and there is ways[12] new variants to make this sum, you have to add these new variants.
ways[(num*(num+1))/2/2]/2
output number of ways to make MaxSum/2, and divide by two to exclude symmetric variants ([1,4]+[2,3]/[2,3]+[1,4])
Question for self-thinking: why inner cycle goes in reverse direction?

Generate an exponentially-spaced list of numbers

I want to generate an exponentially-spaced list of numbers in C++, where the number of points and the bounds are known (just like Matlab's linspace or logspace, or Python's numpy.logspace). I have found several implementations for log-spaced numbers (see below), but couldn't think of a way to invert these to exponentially-spaced numbers, besides, bounds can be negative.
Here's is what I have found so far :
Is there something like numpy.logspace in C++?
EXPLIST: Stata module to generate an exponentially-spaced list of numbers (No idea what this language is actually)
Generating a logarithmically spaced numbers
EDIT :
I should have given the problem a little more thinking before rushing to stackoverflow, here's what I actually did (inspired by this question) :
Given two bounds first and last, I wanted to generate a n-size array that starts with first and ends with last where each array's element is the exponential of some x.
This mathematical problem is a simple series U(i) that starts with U(0) = first and ends with U(n) = last with U(i) = first * q^i (for i in {0, 1, ..., n}) and q = pow(last / first, 1 / (n - 1)).
Here's a raw code :
#include <Eigen\Dense>
using namespace Eigen;
VectorXd expList(double first, double last, DenseIndex n)
{
VectorXd vector(n); // native C++ array or vector can be used of course
double m = (double) 1 / (n - 1);
double quotient = pow(last / first, m);
vector(0) = first;
for (DenseIndex i = 1; i < n; i++) // DenseIndex is just a typedef ptrdiff_t from the Eigen library
vector(i) = vector(i - 1) * quotient;
return vector;
}
This works for any same sign doubles first and last where first < last of course, but It can work for a negative first and positive last too with a little tweaking.
Example :
for first = 50 and last = 300 000 and a 100 elements array
I assume what you mean is a list of doubles (d1,...,dn) such that e^d(i+1)-e^di is constant?
In that case the following function should do what you want:
#include <vector>
#include <math.h>
#include <iostream>
std::vector<double> explist(double first, double last, double size)
{
if(first>last) std::swap(first,last);
double expfirst = exp(first);
double explast = exp(last);
double step = (explast-expfirst)/(size-1);
std::vector<double> out;
for(double x=expfirst; x<=explast; x+=step)
{
double a = log(x);
out.push_back(a);
}
return out;
}
int main()
{
std::vector<double> test = explist(0,1,6);
for(double d : test)
{
std::cout<<d<<" ";
}
std::cout<<std::endl;
for(double d : test)
{
std::cout<<exp(d)<<" ";
}
std::cout<<std::endl;
}
Output:
0 0.295395 0.523137 0.708513 0.86484 1
1 1.34366 1.68731 2.03097 2.37463 2.71828
At the moment this function only produces ascending lists (it just assumes that the smaller value is the left bound). There are several ways to make it work for descending lists as well (always assuming the leftmost argument to be the left bound). I just wanted to make the function as simple as possible and I think if you understand the function it will be easy for you to add that functionality.

Comparisons of strings with c++

I used to have some code in C++ which stores strings as a series of characters in a character matrix (a string is a row). The classes Character matrix and LogicalVector are provided by Rcpp.h:
LogicalVector unq_mat( CharacterMatrix x ){
int nc = x.ncol() ; // Get the number of columns in the matrix.
LogicalVector out(nc); // Make a logical (bool) vector of the same length.
// For every col in the matrix, assess whether the column contains more than one unique character.
for( int i=0; i < nc; i++ ) {
out[i] = unique( x(_,i) ).size() != 1 ;
}
return out;
}
The logical vector identifies which columns contain more than one unique character. This is then passed back to the R language and used to manipulate a matrix. This is a very R way of thinking of doing this. However I'm interested in developing my thinking in C++, I'd like to write something that achieves the above: So finds out which characters in n strings are not all the same, but preferably using the stl classes like std::string. As a conceptual example given three strings:
A = "Hello", B = "Heleo", C = "Hidey". The code would point out that positions/characters 2,3,4,5 are not one value, but position/character 1 (the 'H') is the same in all strings (i.e. there is only one unique value). I have something below that I thought worked:
std::vector<int> StringsCompare(std::vector<string>& stringVector) {
std::vector<int> informative;
for (int i = 0; i < stringVector[0].size()-1; i++) {
for (int n = 1; n < stringVector.size()-1; n++) {
if (stringVector[n][i] != stringVector[n-1][i]) {
informative.push_back(i);
break;
}
}
}
return informative;
}
It's supposed to go through every character position (0 to size of string-1) with the outer loop, and with the inner loop, see if the character in string n is not the same as the character in string n-1. In cases where the character is all the same, for example the H in my hello example above, this will never be true. For cases where the characters in the strings are different the inter loops if statement will be satisfied, the character position recorded, and the inner loop broken out of. I then get a vector out containing the indicies of the characters in the n strings where the characters are not all identical. However these two functions give me different answers. How else can I go through n strings char by char and check they are not all identical?
Thanks,
Ben.
I expected #doctorlove to provide an answer. I'll enter one here in case he does not.
To iterate through all of the elements of a string or vector by index, you want i from 0 to size()-1. for (int i=0; i<str.size(); i++) stops just short of size, i.e., stops at size()-1. So remove the -1's.
Second, C++ arrays are 0-based, so you must adjust (by adding 1 to the value that is pushed into the vector).
std::vector<int> StringsCompare(std::vector<std::string>& stringVector) {
std::vector<int> informative;
for (int i = 0; i < stringVector[0].size(); i++) {
for (int n = 1; n < stringVector.size(); n++) {
if (stringVector[n][i] != stringVector[n-1][i]) {
informative.push_back(i+1);
break;
}
}
}
return informative;
}
A few things to note about this code:
The function should take a const reference to vector, as the input vector is not modified. Not really a problem here, but for various reasons, it's a good idea to declare unmodified input references as const.
This assumes that all the strings are at least as long as the first. If that doesn't hold, the behavior of the code is undefined. For "production" code, you should include a check for the length prior to extracting the ith element of each string.

Accessing specific column / row and compare to another (C++)

I'm attempting to take a text file as an input with, let's say, six columns and twenty rows and make various calculations based on the data in the text file.
Is there a way to be able to access a specific column/row in the code and compare it to another? I'm basically trying to see how many numbers in, let's say, column two are +10 away from each other so if column two was 10 11 16 20 21 25 30 31 34 40 50, the program would give me the solution 10,20,30,40,50 and 11,21,31.
It sounds like you may want to utilize this functionality to do more than just figure out if numbers in a row are a set distance from eachother, so I'll provide a more generalized solution.
First create a 20x6 matrix of character pointers:
char *inputmatrix[20][6];
Then load up the matrix with the values from the file. We first get the whole line from the file with fgets, from there we need to parse the line based on spaces using strtok. From there we'll need to create space for each element using malloc, copy in the value from strtok (because it gets overridden on the next call to strtok), and then store the pointer in our array:
char buffer[256];
char *value;
while(!feof(f)){
if(!fgets(buffer,256,f))
break;
value = strtok(buffer," ");
while(value != NULL){
inputmatrix[currow][curcol] = (char*)malloc(strlen(value+1));
memset(inputmatrix[currow][curcol],0,strlen(value+1));
memcpy(inputmatrix[currow][curcol],value,strlen(value));
curcol++;
value = strtok(NULL," ");
}
currow++;
curcol = 0;
}
Now that we've got a matrix of strings, we can go through and run any algorithm you want. For instance, to find out all the elements in a column that are +10 away from eachother we'll have to first determine if the element can be converted to an int using atoi, then compare it with the next int in the column and so on:
int curelement = -1, nextelement = -1;
for(int i=0;i<3;i++){
for(int j=0;j<6;j++){
if((nextelement = atoi(inputmatrix[i][j])) != 0){
if(nextelement - curelement == 10){
printf("row %i,: %i,%i\n",i,curelement,nextelement);
}
curelement = nextelement;
}
}
The above algorithm only works if the integers in the row are in ascending order; if not you have to take each integer and compare it with the rest of the integers in the row.