Stata: tsline graphs values not at the right date - stata

I want to depict the evolution of a variable called share over time. I do so by using tsline, but the resulting graph looks off: Although my data starts in May 1989 and ends in December 1993, the trendline is drawn so that it begins in January 1989 and ends in
mid-1993.
gen double time3 = monthly(time2, "YM")
format time3 %tm
tsset time3
tsline share, ///
ttitle("years") ytitle("") ylabel(0(.2).65) ///
tlabel(1989m5(12)1994m5, format(%tmY) labsize(small))
I know that Stata stores dates as integers and tried replacing the year-month-indications after tlabel by integers. Since the time variable is defined as months since 1960m1, 1989m5 is stored internally as 352 and 1993m12 as 407. I learned this by running dis tm(1989m5). But even with tlabel(352(12)407), the trendline is not drawn correctly. Has anyone an idea about how to fix this? This is the how the graph looks like by now.
This is a subsample of the data that I used:
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 time2 double(time3 share)
"1989-05" 352 .1536926147704591
"1989-06" 353 .1665024630541872
"1989-08" 355 .12674650698602793
"1989-09" 356 .18095712861415753
"1989-10" 357 .24629080118694363
"1989-11" 358 .23008849557522124
"1989-12" 359 .17638036809815952
"1990-01" 360 .20521653543307086
"1990-02" 361 .1754473161033797
"1990-03" 362 .17401960784313725
"1990-04" 363 .14173998044965788
"1990-05" 364 .1669970267591675
"1990-06" 365 .1398838334946757
"1990-08" 367 .10461689587426326
"1990-09" 368 .14965312190287414
"1990-10" 369 .1921182266009852
"1990-11" 370 .18038617886178862
"1990-12" 371 .19577735124760076
"1991-01" 372 .10562685093780849
"1991-02" 373 .09596928982725528
"1991-03" 374 .1941747572815534
"1991-04" 375 .1889106967615309
"1991-05" 376 .1794234592445328
"1991-06" 377 .1968390804597701
"1991-08" 379 .17846309403437816
"1991-09" 380 .19425173439048563
"1991-10" 381 .14556962025316456
"1991-11" 382 .15569143932267168
"1991-12" 383 .1694015444015444
"1992-01" 384 .20812928501469147
"1992-02" 385 .257590597453477
"1992-03" 386 .2204724409448819
"1992-04" 387 .22096456692913385
"1992-05" 388 .21601941747572814
"1992-06" 389 .1675025075225677
"1992-07" 390 .22176591375770022
"1992-09" 392 .15128968253968253
"1992-10" 393 .15841584158415842
"1992-11" 394 .1849112426035503
"1992-12" 395 .19642857142857142
"1993-01" 396 .22469252601702933
"1993-02" 397 .2796528447444552
"1993-03" 398 .290811339198436
"1993-04" 399 .24108910891089108
"1993-05" 400 .2562437562437562
"1993-06" 401 .22127872127872128
"1993-07" 402 .27874743326488705
"1993-09" 404 .3391472868217054
"1993-10" 405 .3840155945419103
"1993-11" 406 .45184824902723736
"1993-12" 407 .43987975951903807
end
format %tm time3
[/CODE]

The graph you've posted doesn't seem surprising.
Using the data and code you've posted
clear
input str7 time2 double(time3 share)
"1989-05" 352 .1536926147704591
"1989-06" 353 .1665024630541872
"1989-08" 355 .12674650698602793
"1989-09" 356 .18095712861415753
"2019-10" 717 .13052208835341367
"2019-11" 718 .13559059987631417
"2019-12" 719 .13997555012224938
end
format %tm time3
tsset time3
tsline share, ///
ttitle("years") ytitle("") ylabel(0(.2).65) ///
tlabel(1989m5(12)2019m12, format(%tmY) labsize(small))
it's hard to see what might be a problem.
tsline doesn't purport to draw a trend line, just a line graph for the data specified.

Related

How to shift side-by-side bars in Proc SGPLOT for Time Data? discreteoffset doesn't work with (type=time) option

I am currently trying to create a side-by-side dual axis bar chart in proc sgplot for the data which is based on dates. I am currently stuck at last thing, where I am not able to shift the bars using discreteoffset option on vbar, because I am using Type=time on xaxis. If I comment this, then the bars are shifted but then the xaxis tick values look clumsy. So I wonder if there is any other option that can move the bars for Date/time Data? Following is my SAS code.
data input;
input people visits outcome date date9.;
datalines;
41 448 210 1-Jan-18
43 499 207 1-Feb-18
45 544 221 1-Mar-18
49 564 239 1-Apr-18
39 575 236 1-May-18
37 549 210 1-Jun-18
51 602 263 1-Jul-18
32 586 208 1-Aug-18
52 557 225 1-Sep-18
41 534 227 1-Oct-18
48 499 217 1-Nov-18
44 514 235 1-Dec-18
31 582 281 1-Jan-19
33 545 269 1-Feb-19
38 574 259 1-Mar-19
29 564 247 1-Apr-19
29 642 274 1-May-19
28 556 216 1-Jun-19
20 531 187 1-Jul-19
31 604 226 1-Aug-19
19 513 186 1-Sep-19
24 483 185 1-Oct-19
28 401 156 1-Nov-19
18 450 158 1-Dec-19
21 418 178 1-Jan-20
28 396 149 1-Feb-20
43 488 177 1-Mar-20
33 539 205 1-Apr-20
57 631 244 1-May-20
54 695 291 1-Jun-20
58 732 309 1-Jul-20
62 681 301 1-Aug-20
42 654 291 1-Sep-20
57 749 365 1-Oct-20
60 627 249 1-Nov-20
56 623 244 1-Dec-20
54 712 298 1-Jan-21
62 655 262 1-Feb-21
;
run;
proc sgplot data=input;
format date monyy7.;
styleattrs datacolors=(Red DarkBlue) datacontrastcolors=(black black) datalinepatterns=(solid);
vbar date / response=visits discreteoffset=-0.17 barwidth=0.3;
vbar date / response=outcome discreteoffset=0.17 barwidth=0.3;
vline date / response=people y2axis lineattrs=(color=black thickness=3);
xaxis display=(nolabel) /*fitpolicy=rotate valuesrotate=vertical*/ type=time /*interval=month*/;
yaxis grid label='Label1' values=(0 to 800 by 100);
y2axis label='Label2' values=(0 to 70 by 10);
keylegend / title="";
run;
Output I am getting:
Output I want: (With shifted bars, but it is changing dates)
Appreciate any help!
Thank you.
Reshape the data with transpose so the variables wanted side by side become categorical, i.e. name value pairs. The name can be used in vbar as the group= with groupdisplay=cluster.
Note: The xaxis type=time appears to perform special checks based on the format of the vbar variable, and will rendered a pretty two-line axis label when that format is date9. I've never seen this discussed in the documentation.
Example:
Uses name= in the plotting statements so the keylegend can look prettier.
proc transpose data=input out=plot;
by rowid date;
copy people;
var visits outcome;
run;
proc sgplot data=plot;
vbar date / response=col1 group=_name_ groupdisplay=cluster name='relatedcounts';
vline date / response=people group=_name_ y2axis lineattrs=(color=black thickness=3) name='people';
xaxis
type = time
interval = month
;
format date date9.;
yaxis grid label='Related counts' values=(0 to 800 by 100);
y2axis label='# People' values=(0 to 70 by 10);
keylegend 'relatedcounts' / title="";
run;
Will produce

How to determine the number of filled drums, and the room left in each drum

Not quite a homework problem, but it may as well be:
You have a long list of positive integer values stored in column A. These are packets in unit U.
A Drum can fit up to 500 U, but you cannot break up packets.
How many drums are required for any given list of values in column A?
This does not have to be the most efficient answer, processing in row order is absolutely fine.
I Think you should be able to solve this with a formula, but the closest I got was
=CEILING(SUM(A1:A1000)/500;1)
Of course, this breaks up packets.
Additionally, this problem requires me to be able to find the room left in each drum used, but emphasis for this question should remain on just the number required.
This cannot be done with a single simple formula. Each drum and packet needs to be counted. However contrary to my comment, for this particular problem a spreadsheet works well, and there is no need for a macro.
First, set B2 to 500 for use in other formulas. If column A is not yet filled, use the formula =RANDBETWEEN(1,B$2) to add some values.
Column C is the main formula that determines how full each drum is. Set C2 to =A2. C3 is =IF(C2+A3>B$2,A3,C2+A3). Fill C3 down to fill the remaining rows.
For column D, use =IF(C2+A3>B$2,B$2-C2,""). However the last row of column D is shorter: =B$2-C21 and change 21 to whatever the last row is.
Finally in column E we find the answer, which is simply =COUNT(D2:D21).
Packets Drum Size How Full Room left in each drum used Number of filled drums
------- --------- -------- --------------------------- ----------------------
206 500 206 294 13
309 309
68 377
84 461 39
305 305 195
387 387 113
118 118
8 126 374
479 479 21
492 492 8
120 120
291 411 89
262 262
108 370 130
440 440 60
88 88
100 188
102 290 210
478 478 22
87 87 413
For OpenOffice Calc, use semicolons ; instead of commas , in formulas.

Verifying output to "Find the numbers between 1 and 1000 whose prime factors' sum is itself a prime" from Allain's Jumping into C++ (ch7 #3)

The question:
Design a program that finds all numbers from 1 to 1000 whose prime factors, when added
together, sum up to a prime number (for example, 12 has prime factors of 2, 2, and 3, which
sum to 7, which is prime). Implement the code for that algorithm.
I modified the problem to only sum unique factors, because I don't see why you'd count a factor twice, as in his example using 12.
My solution. Is there any good (read: automated) way to verify the output of my program?
Sample output for 1 to 1000:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
17
19
20
22
23
24
25
26
28
29
30
31
34
37
40
41
43
44
46
47
48
49
52
53
58
59
60
61
63
67
68
70
71
73
76
79
80
82
83
88
89
92
94
96
97
99
101
103
107
109
113
116
117
118
120
121
124
127
131
136
137
139
140
142
147
148
149
151
153
157
160
163
164
167
169
171
172
173
176
179
181
184
188
189
191
192
193
197
198
199
202
207
210
211
212
214
223
227
229
232
233
239
240
241
244
251
252
257
261
263
268
269
271
272
273
274
275
277
279
280
281
283
286
289
292
293
294
297
298
306
307
311
313
317
320
325
331
332
333
334
337
347
349
351
352
353
358
359
361
367
368
369
373
376
379
382
383
384
388
389
394
396
397
399
401
404
409
412
414
419
421
423
424
425
428
431
433
439
443
449
454
457
459
461
462
463
464
467
468
472
475
478
479
480
487
491
495
499
503
509
513
521
522
523
524
529
531
538
539
541
544
546
547
548
549
550
557
560
561
562
563
567
569
571
572
575
577
587
588
593
594
599
601
603
604
605
607
612
613
617
619
621
622
628
631
639
640
641
643
646
647
651
652
653
659
661
664
668
673
677
683
684
691
692
694
701
704
709
712
714
718
719
725
726
727
733
736
738
739
741
743
751
752
756
757
759
761
764
765
768
769
772
773
775
777
783
787
792
797
798
801
809
811
821
823
825
827
828
829
833
837
838
839
841
846
847
848
850
853
856
857
859
862
863
873
877
881
883
887
891
892
903
904
907
908
909
911
918
919
922
925
928
929
932
937
941
944
947
953
954
957
960
961
966
967
971
975
977
981
983
991
997
999
Update: I have solved my problem and verified the output of my program using an OEIS given series, as suggested by #MVW (shown in the source given by my new github solution). In the future, I will aim to test my programs by doing zero or more of the following (depending on the scope/importance of the problem):
google keywords for an existing solution to the problem, comparing it against my solution if I find it
unit test components for correctness as they're built and integrated, comparing these tests with known correct outputs
Some suggestions:
You need to check the properties of your calculated numbers.
Here that means
calculating the prime factors and
calculating their sum and
testing if that sum is a prime number.
Which is what your program should do in the first place, by the way.
So one nice option for checking is comparing your output with a known solution or the output of a another program which is known to work. The tricky bit is to have such a solution or program available. And I neglect that your comparison could be plagued by errors as well :-)
If you just compare it with other implementations, e.g. programs from other folks here, it would turn out more of a voting, it would not be a proof. It would just give increased probability that your program is correct, if several independent implementations come up with the same result. Of course all implementations could err :-)
The more agree the better.
And the more diverse the implementations are, the better.
E.g. you could use different programming languages, algebraic systems or a friend with time and paper and pencil and Wikipedia. :-)
Another means is to add checks to your intermediate steps, to get more confidence in your result. Kind of building a chain of trust.
You could output the prime factors you determined and compare it with the output
of a prime factorization program which is known to work.
Then you check if your summing works.
Finally you could check if the primality test you apply to the candidate sums is working correctly by feeding it with known prime numbers and non prime numbers and so on.
That is kind of what folks do with unit testing for example. Trying to cover most parts of the code as working, hoping if the parts work, that the whole will work.
Or you could formally prove your program step by step, using Hoare Calculus for example or another formal method.
But that is tricky, and you might end up shifting program errors to errors in the proof.
And today, in the era of internet, of course, you could internet search for the solution:
Try searching for sum of prime factors is prime in the online encyclopedia of integer sequences, which should give you series A100118. :-)
It is the problem with multiplicity, but shows you what the number theory pros do, with Mathematica and program fragments to calculate the series, the argument for the case of 1 and literature. Quite impressive.
Here's the answer I get. I exclude 1 as it has no prime divisors so their sum is 0, not a prime.
Haskell> filter (isPrime . sum . map fst . primePowers) [2..1000]
[2,3,4,5,6,7,8,9,10,11,12,13,16,17,18,19,20,22,23,24,25,27,29,31,32,34,36,37,40,
41,43,44,47,48,49,50,53,54,58,59,61,64,67,68,71,72,73,79,80,81,82,83,88,89,96,97
,100,101,103,107,108,109,113,116,118,121,125,127,128,131,136,137,139,142,144,149
,151,157,160,162,163,164,165,167,169,173,176,179,181,191,192,193,197,199,200,202
,210,211,214,216,223,227,229,232,233,236,239,241,242,243,250,251,256,257,263,269
,271,272,273,274,277,281,283,284,288,289,293,298,307,311,313,317,320,324,328,331
,337,343,345,347,349,352,353,358,359,361,367,373,379,382,383,384,385,389,390,394
,397,399,400,401,404,409,419,420,421,428,431,432,433,435,439,443,449,454,457,461
,462,463,464,467,472,478,479,484,486,487,491,495,499,500,503,509,512,521,523,529
,538,541,544,547,548,557,561,562,563,568,569,570,571,576,577,578,587,593,595,596
,599,601,607,613,617,619,622,625,630,631,640,641,643,647,648,651,653,656,659,661
,665,673,677,683,691,694,701,704,709,714,715,716,719,727,729,733,739,743,751,757
,759,761,764,768,769,773,777,780,787,788,795,797,798,800,808,809,811,819,821,823
,825,827,829,838,839,840,841,853,856,857,858,859,862,863,864,877,881,883,885,887
,903,907,908,911,919,922,924,928,929,930,937,941,944,947,953,956,957,961,967,968
,971,972,977,983,991,997,1000]
Haskell> primePowers 12
[(2,2),(3,1)]
Haskell> primePowers 14
[(2,1),(7,1)]
You could hard-code this list in and test against it. I'm pretty confident these results are without error.
(read . is "of").

SQL Server 2008, numeric library, c++, LAPACK, memory question

I am trying to send a table of numbers in SQL Server 2008 like:
1att 2att 3att 4att 5att 6att 7att ... attn
--------------------------------------------
565 526 472 527 483 529 476 470 502
497 491 483 488 488 483 496 515 491
467 516 480 477 494 497 478 519 471
488 466 547 498 477 466 475 480 516
543 491 449 485 495 468 452 479 516
473 475 431 474 460 342 471 386 549
489 477 462 428 489 491 481 483 475
485 474 472 452 525 508 459 561 529
473 457 476 498 485 465 540 475 525
455 477 415 434 475 499 476 482 551
463 476 476 471 488 526 394 439 475
479 473 491 519 483 474 476 474 478
455 518 465 445 496 500 518 470 536
557 498 492 449 478 491 492 476 460
484 509 538 473 548 497 551 477 498
471 430 482 437 516 483 487 453 456
505 476 489 495 472 476 487 516 466
466 495 488 475 550 565 510 473 515
470 490 480 475 479 544 468 486 496
484 495 524 435 469 612 493 467 477
....
.... (several more rows)
....
511 471 529 553 539 501 477 474 494
via visual studio 2008 (in a c++ project) to a mathematical library LAPACK.
Is it possible to pass the table in SQL Server to LAPACK (via c++ in visual studio 2008) like a memory pointer, or store all the table in RAM, and LAPACK read memory or pointer to memory, but without writing to a file and reading it
Could you please suggest how to pass a table like this (maybe the location of table in memory, or something similar) to LAPACK?
(so I am able to do some computing with LAPACK of the table stored in SQL Server via visual studio 2008 c++ project)
----EDIT---
#MarkD, As you said in your anwer could you please give an example of computing SVD with the idea in the example, using std::vector class
?
LAPACK requires the data sent to it, to be in a FORTRAN style (Column-order) array. You won't be able to pass the data directly from SQL to LAPACK but will need to read the data into a column-ordered contiguous memory array, and pass a pointer to the first element of the array to the LAPACK routine of interest.
There are many LAPACK wrappers for C/C++ out there that make this much easier.
Edit: just saw you are looking specifically for how to pass such an array. As I mentioned, there are many wrappers out there for doing this (just do a search for C/C++ LAPACK). An easy way to create your array is to use the std::vector class. You would then read the data in, column-by-column, adding the elements to your vector- So if you wanted to column-order the array you show in your exmaple, your vector would end up looking something like:
//Column 1 Column 2 Column 3 ... last element
[565 497 467 488 ... 526 491 516 466 ... 472 483 480 547 ... ... 494]
You would then pass the LAPACK routine of interest the memory location of the first element, eg:
&myVector[0]
This is possible using std::vector, as the standard ensures that a vector uses contiguous memory storage. The LAPACK routines all also require the size/dimensions of the matrix/vectors you are passing to it (so you'll need to calculate/specify these values for the function call).
If you can post the specific LAPACK routine you want to use, I can give a more thorough example.

How to draw a (bezier) path with a fill color using GDI+?

I am making a SVG renderer for Windows using the Windows API and GDI+. SVG allows setting the 'fill' and 'stroke' style attributes on a Path. I am having some difficulty with the implementation of the 'fill' attribute.
The following path represents a spiral:
<svg:path style="fill:yellow;stroke:blue;stroke-width:2"
d="M153 334
C153 334 151 334 151 334
C151 339 153 344 156 344
C164 344 171 339 171 334
C171 322 164 314 156 314
C142 314 131 322 131 334
C131 350 142 364 156 364
C175 364 191 350 191 334
C191 311 175 294 156 294
C131 294 111 311 111 334
C111 361 131 384 156 384
C186 384 211 361 211 334
C211 300 186 274 156 274" />
The fill color is yellow, and it should fill the entire shape, this is however what I get:
My GDI+ calls look like this:
Gdiplus::GraphicsPath bezierPath;
bezierPath.AddBeziers(&gdiplusPoints[0], gdiplusPoints.size());
g.FillPath(&solidBrush, &bezierPath);
g.DrawPath(&pen, &bezierPath);
Apparently the code is correct for drawing the shape, but not for filling it. Can anyone help me in figuring out what's going wrong?
Try to set the FillMode property of your GraphicsPath to FillMode::Winding, an alternate filling method that should suits your needs.