I want to display a 10x48 dataset (10 sampling points with 48 or more samples each) in a box plot using MathGL. The result should be 10 box plots that summarize the data for each sampling point. The problem I'm having is, I can't figure out what data format the boxplot function from MathGL needs. In the documentation it says that for each entry, 5 values are provided (Minimum, Q1, Q2/Median, Q3, Maximum), yet when I structure the mvlData like such:
mglData(10x5) =
{
Min_1, Q1_1, Q2_1, Q3_1, Max_1,
Min_2, Q1_2, Q2_2, Q3_2, Max_2,
Min_3, Q1_3, Q2_3, Q3_3, Max_3,
...
Min_10, Q1_10, Q2_10, Q3_10, Max_10
}
I do not get the correct outputs. If I'd structure it with data like such:
mglData(10x48) =
{
Data_1_1, Data_1_2, Data_1_3, ... , Data_1_48,
Data_2_1, Data_2_2, Data_2_3, ... , Data_2_48,
Data_3_1, Data_3_2, Data_3_3, ... , Data_3_48,
...
Data_10_1, Data_10_2, Data_10_3, ... , Data_10_48
}
it outputs nice boxplots, but with the wrong values. The example show that the mglData needs to contain Nx7 values, hence in my case 10x7? but I can only see 5 possible values (not 7) or is there more than Minimum, Q1, Q2/Median, Q3, Maximum to a boxplot?
Any help is very appreciated.
Thanks
Max
First variant with transposed dimensions is correct one. You can create an array 10*5 directly (10 points in x-direction) or transpose the current one by function mglData::Transpose().
Related
I have a simple question that I can't seem to answer. I HAVE a large data set where I am searching for values of column 2 that are found in column 1, until column 2 is a specific value. Sounds like a DO loop but I don't have much experience using them. Please see image as this likely will explain better.
Essentially, I have a "starting" point (with the first_match flag=1). Then, I want to grab the value of column 2 in this row (B in this example). Next, I want to search for this value (B) in column 1. Once I find that row (with column 1 = B & column 2 = C), I again grab the value in column 2 (C). Again, I find where in column 1 this new value occurs and obtain the corresponding value of column 2. I repeat this process until column 2 has a value of Z. That's my stopping point. The WANT table shows my desired output.
My apologies if the above is confusing, but it seems like a simple exercise that I can't seem to solve. Any help would be greatly appreciated. Glad to supply further clarification as well.
Have & Want
I have tried PROC SQL to create flags and grab the appropriate rows, but the code is extremely bulky and doesn't seem efficient. Also, the example I laid out has a desired output table with 3 rows. This may not be the case as the desired output could contain between 1 and 10 rows.
This question has been asked and answered previously.
Path traversal can be done using a DATA Step hash object.
Example:
data have;
length vertex1 vertex2 $8;
input vertex1 vertex2;
datalines;
A B
X B
D B
E B
B C
Q C
C Z
Z X
;
data want(keep=vertex1 vertex2 crumb);
length vertex1 vertex2 $8 crumb $1;
declare hash edges ();
edges.defineKey('vertex1');
edges.defineData('vertex2', 'crumb');
edges.defineDone();
crumb = ' ';
do while (not last_edge);
set have end=last_edge;
edges.add();
end;
trailhead = 'A';
vertex1 = trailhead;
do while (0 = edges.find());
if not missing(crumb) then leave;
output;
edges.replace(key:vertex1, data:vertex2, data:'*');
vertex1 = vertex2;
end;
if not missing(crumb) then output;
stop;
run;
All paths in the data can be discovered with an additional outer loop iterating (HITER) over a hash of the vertex1 values.
I want to write a function that gets a time series and a standard deviation as parameters and returns an adjusted time series which looks like a forecast.
With this function I want to test a system for stability, which gets a forecasted time series list for weather as input parameter.
My approach for such a function, which is described below:
vector<tuple<datetime, double>> get_adjusted_timeseries(vector<tuple<datetime, double>>& timeseries_original, const double stddev, const double dist_mid)
{
auto timeseries_copy(timeseries_original);
int sign = randInRange(0, 1) == 0 ? 1 : -1;
auto left_limit = normal_cdf_inverse(0.5 - dist_mid, 0, stddev);
auto right_limit = normal_cdf_inverse(0.5 + dist_mid, 0, stddev);
for (auto& pair : timeseries_copy)
{
double number;
do
{
nd_value = normal_distribution_r(0, stddev);
}
while (sign == -1 && nd_value > 0.0 || sign == 1 && nd_value < 0.0);
pair = make_tuple(get<0>(pair), get<1>(pair) + (nd_value / 100) * get<1>(pair));
if (nd_value > 0.0 && nd_value < right_limit || nd_value < 0.0 && nd_value > left_limit)
{
sign = sign == -1 ? 1 : -1;
}
}
return timeseries_copy;
}
Make a copy from the original time series, which is also from type vector<tuple<datetime, double>>
Get a random number that is either 0 or 1 and use the number to set the sign.
Use the Inverse Cumulative distribution function to get the limits, which indicate when the sign is changed. The sign is changed when the value of the copied time series is close to the original value. The implementation of the inverse CDF is shown here
For-loop for each item in the time series:
get a normal distributed value, which should be lower zero when sign == -1 and greater zero when sign == 1
adjust old value of time series according to the normal distributed
value
change sign if the normal distributed value is close to the original value.
The result for a low standard deviation, for example, can be seen here in yellow:
If the mean absolute percentage error (MAPE) of the two time series is calculated, the following relationship results:
stddev: 5 -> MAPE: ~0.04
stddev: 10 -> MAPE: ~0.08
stddev: 15 -> MAPE: ~0.12
stddev: 20 -> MAPE: ~0.16
What do you think of this approach?
Can this function be used to test a system that has to deal with predicted time series?
You want to generate time series data that behave like some existing time series data that you have from real phenomena (weather and stock exchange). That generated time series data will be fed into some system to test its stability.
What you could do is: fit some model to your exiting data, and then use that model to generate data that follow the model, and hence your existing data. Fitting data to a model yields a set of model parameters and a set of deviations (differences not explained by the model). The deviations may follow some known density function but not necessarily. Given the model parameters and deviations, you can generate data that look like the original data. Note that if the model does not explain the data well, deviations will be large, and the data generated with the model will not look like the original data.
For example, if you know your data is linear, you fit a line through them, and your model would be:
y = M x + B + E
where E is a random variable that follows the distribution of the error around the line that fits your data, and where M and B are the model parameters. You can now use that model to generate (x, y) coordinates that are rougly linear. When sampling the random variable E, you can assume that it follows some known distribution like a normal distribution, or use an histogram, to generate deviations that follow arbitrary density functions.
There are several time series models that you could use to fit your weather and stock exchange data. You could look at exponential smoothing. It has several different models. I am sure you can find many other models on Wikipedia.
If a model does not fit well your data, you can also see its parameters as random variables. In our example above, suppose that we have observed data where it seems that the slope is changing. We would fit several lines and obtain a distribution for M. We would then sample that variable along with E when generating data.
I have two variables say x and y and both have around 60 points in them(basically values of the x and y axis of the plot). Now when I try to display it in the result file in form of a column or a table with the x value and the corresponding y value I end up with all the x values displayed in both the columns followed then by the y values. I am unable to get it out correctly.
This is a small part of the code
xpts = PIC1(1,6:NYPIX,1)
ypts = PIC1(2,6:NYPIX,1)
write(21,*), NYPIX
write(21,"(T2,F10.4: T60,F10.4)"), xpts, ypts
This is the output I get. the x values continue from the column 1 to 2 till all are displayed and then the y values are displayed.
128.7018 128.7042
128.7066 128.7089
128.7113 128.7137
128.7160 128.7184
128.7207 128.7231
128.7255 128.7278
128.7302 128.7325
128.7349 128.7373
128.7396 128.7420
128.7444 128.7467
128.7491 128.7514
128.7538 128.7562
128.7585 128.7609
128.7633 128.7656
128.7680 128.7703
128.7727 128.7751
128.7774 128.7798
128.7822 128.7845
128.7869 128.7892
128.7916 128.7940
128.7963 128.7987
128.8011 128.8034
86.7117 86.7036
86.6760 86.6946
86.6317 86.6467
86.6784 86.8192
86.8634 87.0909
87.2584 87.6427
88.1245 88.8343
89.5275 90.2652
91.0958 91.8668
92.6358 93.2986
93.8727 94.4631
You could use a do loop:
do i=1,size(xpts)
write(21,"(T2,F10.4: T60,F10.4)"), xpts(i), ypts(i)
enddo
There is already an answer saying how to get the output as wanted. It may be good, though, to explicitly say why the (unwanted) output as in the question comes about.
In the (generalized) statement
write(unit,fmt) xpts, ypts
the xpts, ypts is the output list. In the description of how the output list is treated we see (Fortran 2008 9.6.3)
If an array appears as an input/output list item, it is treated as if the elements, if any, were specified in array element order
That is, it shouldn't be too surprising that (assuming the lower bound of xpts and ypts are 1)
write(unit, fmt) xpts(1), xpts(2), xpts(3), ..., ypts(1), ypts(2), ...
gives the output seen.
Using a do loop expanded as
write(unit, fmt) xpts(1), ypts(1)
write(unit, fmt) xpts(2), ypts(2)
...
is indeed precisely what is wanted here. However, a more general "give me the elements of the arrays interleaved" could be done with an output implied-do:
write(unit, fmt) (xpts(i), ypts(i), i=LBOUND(xpts,1),UBOUND(xpts,1))
(assuming that the upper and lower bounds of ypts are the same as xpts).
This is equivalent to
write(unit, fmt) xpts(1), ypts(1), xpts(2), ypts(2), ...
(again, for convenience switching to the assumption about lower bounds).
This implied-do may be more natural in some cases. In particular note that the first explicit do loop writes one record for each pair of elements from xpts and ypts; for the implied-do the new record comes about from format reversion. The two for the format in the question are equivalent, but for some more exotic formats the former may not be what is wanted and it ties the structure of the do loop to the format.
This splitting of records holds even more so for unformatted output (which hasn't format reversion).
This is a programming question, but I'll give you a little of the stats background first. This question refers to part of a data sim for a mixed-effects location scale model (i.e., heterogeneous variances). I'm trying to simulate two MVN variance components using the RANDNORMAL function in IML. Because both variance components are heterogeneous, the variances used by RANDNORMAL will differ across people. Thus, I need IML to select the specific row (e.g., row 1 = person 1) and use the RANDNORMAL function before moving onto the next row, and so on.
My example code below is for 2 people. I use DO to loop through each person's specific variance components (VC1 and VC2). I get the error: "Module RANDNORMAL called again before exit from prior call." I am assuming I need some kind of BREAK or EXIT function in the DO loop, but none I have tried work.
PROC IML;
ColNames = {"ID" "VC1" "VC2"};
A = {1 2 3,
2 8 9};
PRINT A[COLNAME=ColNames];
/*Set men of each variance component to 0*/
MeanVector = {0, 0};
/*Loop through each person's data using THEIR OWN variances*/
DO i = 1 TO 2;
VC1 = A[i,2];
VC2 = A[i,3];
CovMatrix = {VC1 0,
0 VC2};
CALL RANDSEED(1);
U = RANDNORMAL(2, MeanVector, CovMatrix);
END;
QUIT;
Any help is appreciated. Oh, and I'm using SAS 9.4.
You want to move some things around, but mostly you don't want to rewrite U twice: you need to write U's 1st row, then U's 2nd row, if I understand what you're trying to do. The below is a bit more efficient also, since I j() the U and _cv matrices rather than constructing then de novo every time through the loop (which is slow).
proc iml;
a = {1 2 3,2 8 9};
print(a);
_mv = {0,0};
U = J(2,2);
_cv = J(2,2,0);
CALL RANDSEED(1);
do i = 1 to 2;
_cv[1,1] = a[i,2];
_cv[2,2] = a[i,3];
U[i,] = randnormal(1,_mv, _cv);
end;
print(u);
quit;
Your mistake is the line
CovMatrix = {VC1 0, 0 VC2}; /* wrong */
which is not valid SAS/IML syntax. Instead, use #Joe's approach or use
CovMatrix = (VC1 || 0) // (0 || VC2);
For details, see the article "How to build matrices from expressions."
You might also be interested in this article that describes how to carry out this simulation with a block-diagonal matrix: "Constructing block matrices with applications to mixed models."
I am trying to implement an IIR filter I have designed in Matlab into a c++ program to filter out an unwanted signal from a wave file. The fdatool in Matlab generated this C header to use (it is a bandstop filter):
#include "tmwtypes.h"
/*
* Expected path to tmwtypes.h
* C:\Program Files (x86)\MATLAB\R2013a Student\extern\include\tmwtypes.h
*/
const int al = 7;
const real64_T a[7] = {
0.9915141178644, -5.910578456199, 14.71918523779, -19.60023964796,
14.71918523779, -5.910578456199, 0.9915141178644
};
const int bl = 7;
const real64_T b[7] = {
1, -5.944230431733, 14.76096188047, -19.60009655976,
14.67733658492, -5.877069568864, 0.9831002459245
};
After hours of exhausting research, I still can't figure out the proper way to use these values to determine the W values and then how to use those W values to properly calculate my Y outputs. If anyone has any insight into the ordering these values should be used to do all these conversions, it would be a major help.
All the methods I've developed and tried to this point do not generate a valid wave file, the header values all translate correctly, but everything beyond cannot be evaluated by a media player.
Thanks.
IIR filters work this way:
Assuming an array of samples A and and array of ceof named 'c' the result array B will be:
B[i] = (A[i] * c[0]) + (B[i-1] * c[1]) + ... + (B[n] * c[n])
Note that only the newest element is taken from A.
This is easier to do in-place, just update A as you move along.
These filter coefs are very violent, are you sure you got them right?
The first one is also symmetrical which probably indicates it's an FIR filter.
It appears to me that you have a 3 pole IIR filter with the coefficients given for an Nth order implementation (as opposed to a series of 2nd order sections). Since this is a band reject (or band pass) the polynomial order is twice the pole count.
I am not sure what you mean by W values, unless you are trying to evaluate the frequency response of this filter.
To calculate the Y values, as you put it, see this link for code on implementing IIR filters. See the Nth order implementation code in particular.
http://www.iowahills.com/A7ExampleCodePage.html
BTW: I assumed these were Nth order coefficients and simulated them. I got a 10 dB notch at 0.05 Pi. Sound about right?
where
B6 = 0.9915141178644
.
.
.
b0 = 0.9915141178644
a6 = 0.9831002459245
.
.
.
a0 = 1
Also, you may want to post a question like this on:
https://dsp.stackexchange.com/