I am using xline() to add vertical lines to scatter plots in Stata. I stored the values for the lines, which are the means for different subsamples, in a matrix. Now I want to use the values from the matrix as coordinates in xline().
I tried:
mat means=J(1,5,.)
mat means[1,1]=mean(subsample1)
...
scatter data1 data2, xline(means[1,1])
scatter data3 data4, xline(means[1,2])
...
However, I get the error invalid line argument.
I am grateful for any hint!
// open some example data
sysuse nlsw88, clear
// create a matrix of means
reg grade ibn.race, hascons
matrix means = e(b)
// use those means in -xline()-
scatter wage grade, xline(`=el(means,1,1)' `=el(means,1,2)' `=el(means,1,3)')
Related
I am currently having a problem with using Stata to draw a scatterplot when A (independent variable) and B (dependent variable) are two matrix vectors of size 1 x 1000.
I used the command twoway scatterbut this keeps failing because Stata deems A and B not to be variables. However, I defined A and B with the command matrix define.
The Variables window is empty and I am not sure why A and B are not variables.
Sample Code:
matrix define A = [1,2,3,4,5,6,7,8,9,10]'
matrix define B = [2,3,4,5,6,7,8,9,10]'
//drawing scatterplot with A vs B and overlay a horizontal line x = 5 onto the scatterplot.
twoway scatter A B || xline(5)
Can I declare a matrix as a variable-type and save it so that I can re-use it with twoway scatter?
You need to use the svmat command to first create the variables and then draw the graph:
clear
matrix define A = (1,2,3,4,5,6,7,8,9,10)'
matrix define B = (2,3,4,5,6,7,8,9,10)'
svmat A
svmat B
twoway scatter A B, xline(5)
Matrices and variables in Stata are two different things.
I am trying to draw random samples from some distribution as follows:
my code runs but the numbers look strange. so I am not sure what went wrong, maybe some operators. The elements are extremely large.
my attempt:
C_hat=(((x`)*x)**(-1))*((x`)*z);
S=((z-x*c_hat)`)*((z-x*c_hat));
*draw sigma;
sigma = shape(RandWishart(1, 513 - 3 - 2,s**(-1)),4,4);
*draw vec(c);
vec_c_hat= colvec(c_hat`); *vectorization of c_hat;
call randseed(4321);
vec_c = RandNormal(1,vec_c_hat,(sigma`)#(((x`)*x)**(-1)));
c = shape(vec_c,4,4);
print c;
Since you haven't provided data or a reference, it is difficult to guess whether your "strange" and "extremely large" numbers are correct. However, the program looks mostly correct, so check your data.
A minor problem with your program is that you are using the SHAPE function to reshape the vec_c vector into a matrix. You should be using the SHAPECOL function (or transpose the result).
The following program uses the Sashelp.Cars data, which is distributed with SAS, to initialize the X and Z matrices. The program computes a random matrix C which is close to the inverse crossproduct matrix for the data. I've also added some intermediate computations and comments. This version works as expected on the Sashelp.Cars data:
proc iml;
use sashelp.cars;
read all var {weight wheelbase enginesize horsepower} into X;
read all var {mpg_city mpg_highway} into Z;
close;
*normal equations and covariance;
xpx = x`*x;
invxpx = inv(xpx);
C_hat = invxpx*(x`*z);
r = z-x*c_hat;
S = r`*r;
*draw sigma;
call randseed(4321);
DF = nrow(X)-ncol(X)-2;
W = RandWishart(1, DF, inv(S)); /* 1 x (p*p) row vector */
sigma = shape(W, sqrt(ncol(W))); /* reshape to p x p matrix */
*draw vec(c);
vec_c_hat = colvec(c_hat`); /* stack columns of c_hat */
vec_c = RandNormal(1, vec_c_hat, sigma#invxpx);
c = shapecol(vec_c, nrow(C_hat), ncol(C_hat)); /* reshape w/ SHAPECOL */
print C_hat, c;
I was trying to read my nc file. There are 3 variables in it, they are:
zonalWind (height, lon, lat)
meridionalWind (height, lon, lat)
verticalVelocity (height_2, lon, lat)
Below is my code reading the arrays:
vtkNetCDFCFReader *reader = vtkNetCDFCFReader::New();
reader->SetFileName(fileName);
reader->SetOutputTypeToStructured();
reader->UpdateMetaData();
reader->Update();
reader->Print(std::cout);
reader->SetVariableArrayStatus("verticalVelocity", 1);
reader->SetVariableArrayStatus("zonalWind", 1);
reader->SetVariableArrayStatus("meridionalWind", 1);
But then I got the following error in termianl skipping the verticalVelocity array because of the dimension problem:
vtkNetCDFCFReader (0x7fb1f1517350): Variable verticalVelocity dimensions (height_2 lat lon) are different than the other variable dimensions (height lat lon). Skipping
Is there any method I can read in all 3 variable data instead of "skipping", and do some processing afterwards?
TIA
No. You should create 2 vtkNetCDFCFReader instances and read variables with the same dimensions for each.
If you want to extract just a portion of the larger grid and use those values on the smaller grid, then attach a vtkExtractGrid filter to one or both of the reader outputs to obtain datasets of the same size. Finally, run a vtkMergeArrays filter on the results to generate a single dataset with all the array values.
I have data set 1 and 2. Those have 2D data.
For example,
Data1 :
(x1, y1), (x2, y2), (x3, y3) .... (xn, yn)
Data2 : (x1', y1'), (x2', y2'), .... (xm', ym')
I'd like to compare them using histogram and Earth Mover's Distance(EMD) if possible.
Because I have 2D data, the data should be placed on 2D map, and the height of the histogram on 2D map has the frequency of the data, thus it should be 3D histogram I guess. Even though I success to create example to draw histogram and compare them using 1D data, I failed to try to change it to 2D data. How it works?
For example,
calcHist(&greyImg, 1, channel_numbers, Mat(), histogram1, 1, &number_bins, &channel_ranges);
This code makes tha Image's grayscale intensity(1D data) to histogram. But I could not change it to 2D data.
my Idea is this :
I create cv::Mat Data1Mat, Data2Mat; (Mat size is set as maximum value of x and y)
Then, push the Data1's x values to Mat1's first channel, push y values to second channel. (Same to Data2 and Data2Mat)
For example, for (x1, y1), set
Data1Mat.at(x1,y1)[0] = x1, Data1Mat.at(x1, y1)[1] = y1;
like this.Then create Histogram of them and compare. Do I think correctly?
I think it is more correct to say: histogram of 1D data, of histogram of 2D data.
You need histogram of 2D data.
1D histogram computes number of scalar values hit bin intervals.
2D histogram divides plane by regions and compute number of 2D points
hit each region.
Here computed H,S 2D histogram for an image: Calculate HSV histogram of a coloured image is it different from H-S histogram?
You have near the same problem, but put your x to instead of H, and y instead of S.
How should I be using the var() function in armadillo ?
I have a matrix in which rows are variables/features and columns observations/instances.
I which to get the variance of each row so I can determine variables/features with the greatest variance.
Currently I am calling:
auto variances = arma::var(data, 0, 1);
Where data is my matrix.
As far as I can tell at the moment I am getting a matrix ? And the documentation suggests this is correct. I was expecting to get back a single vector with variance scores for each of my matrix rows.
I can loop through my rows and get the variance for each row individually like so:
for (auto i = 0; i < data.n_rows; ++i)
auto rowVariance = arma::var(dataSet.data.row(i));
But I would prefer not to do this.
I would like to get back a single vector containing variance values for each row in my matrix and then use arma::sort_index() on this vector to get a sorted set of indices corresponding to the sorted variances.
Thanks in advance.
Turns out the error was because I was using arma::var variances = arma::var(data, 0, 1) and should have been using arma::Col<T> variances = arma::var(data, 0, 1)due to my data matrix being of type arma::Mat<T> as I'm allowing both float and double point precision only.
The comment above from vagoberto set me on the right track.