Stata: tsline graphs values not at the right date - stata
I want to depict the evolution of a variable called share over time. I do so by using tsline, but the resulting graph looks off: Although my data starts in May 1989 and ends in December 1993, the trendline is drawn so that it begins in January 1989 and ends in
mid-1993.
gen double time3 = monthly(time2, "YM")
format time3 %tm
tsset time3
tsline share, ///
ttitle("years") ytitle("") ylabel(0(.2).65) ///
tlabel(1989m5(12)1994m5, format(%tmY) labsize(small))
I know that Stata stores dates as integers and tried replacing the year-month-indications after tlabel by integers. Since the time variable is defined as months since 1960m1, 1989m5 is stored internally as 352 and 1993m12 as 407. I learned this by running dis tm(1989m5). But even with tlabel(352(12)407), the trendline is not drawn correctly. Has anyone an idea about how to fix this? This is the how the graph looks like by now.
This is a subsample of the data that I used:
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 time2 double(time3 share)
"1989-05" 352 .1536926147704591
"1989-06" 353 .1665024630541872
"1989-08" 355 .12674650698602793
"1989-09" 356 .18095712861415753
"1989-10" 357 .24629080118694363
"1989-11" 358 .23008849557522124
"1989-12" 359 .17638036809815952
"1990-01" 360 .20521653543307086
"1990-02" 361 .1754473161033797
"1990-03" 362 .17401960784313725
"1990-04" 363 .14173998044965788
"1990-05" 364 .1669970267591675
"1990-06" 365 .1398838334946757
"1990-08" 367 .10461689587426326
"1990-09" 368 .14965312190287414
"1990-10" 369 .1921182266009852
"1990-11" 370 .18038617886178862
"1990-12" 371 .19577735124760076
"1991-01" 372 .10562685093780849
"1991-02" 373 .09596928982725528
"1991-03" 374 .1941747572815534
"1991-04" 375 .1889106967615309
"1991-05" 376 .1794234592445328
"1991-06" 377 .1968390804597701
"1991-08" 379 .17846309403437816
"1991-09" 380 .19425173439048563
"1991-10" 381 .14556962025316456
"1991-11" 382 .15569143932267168
"1991-12" 383 .1694015444015444
"1992-01" 384 .20812928501469147
"1992-02" 385 .257590597453477
"1992-03" 386 .2204724409448819
"1992-04" 387 .22096456692913385
"1992-05" 388 .21601941747572814
"1992-06" 389 .1675025075225677
"1992-07" 390 .22176591375770022
"1992-09" 392 .15128968253968253
"1992-10" 393 .15841584158415842
"1992-11" 394 .1849112426035503
"1992-12" 395 .19642857142857142
"1993-01" 396 .22469252601702933
"1993-02" 397 .2796528447444552
"1993-03" 398 .290811339198436
"1993-04" 399 .24108910891089108
"1993-05" 400 .2562437562437562
"1993-06" 401 .22127872127872128
"1993-07" 402 .27874743326488705
"1993-09" 404 .3391472868217054
"1993-10" 405 .3840155945419103
"1993-11" 406 .45184824902723736
"1993-12" 407 .43987975951903807
end
format %tm time3
[/CODE]
The graph you've posted doesn't seem surprising.
Using the data and code you've posted
clear
input str7 time2 double(time3 share)
"1989-05" 352 .1536926147704591
"1989-06" 353 .1665024630541872
"1989-08" 355 .12674650698602793
"1989-09" 356 .18095712861415753
"2019-10" 717 .13052208835341367
"2019-11" 718 .13559059987631417
"2019-12" 719 .13997555012224938
end
format %tm time3
tsset time3
tsline share, ///
ttitle("years") ytitle("") ylabel(0(.2).65) ///
tlabel(1989m5(12)2019m12, format(%tmY) labsize(small))
it's hard to see what might be a problem.
tsline doesn't purport to draw a trend line, just a line graph for the data specified.
Related
How to shift side-by-side bars in Proc SGPLOT for Time Data? discreteoffset doesn't work with (type=time) option
I am currently trying to create a side-by-side dual axis bar chart in proc sgplot for the data which is based on dates. I am currently stuck at last thing, where I am not able to shift the bars using discreteoffset option on vbar, because I am using Type=time on xaxis. If I comment this, then the bars are shifted but then the xaxis tick values look clumsy. So I wonder if there is any other option that can move the bars for Date/time Data? Following is my SAS code. data input; input people visits outcome date date9.; datalines; 41 448 210 1-Jan-18 43 499 207 1-Feb-18 45 544 221 1-Mar-18 49 564 239 1-Apr-18 39 575 236 1-May-18 37 549 210 1-Jun-18 51 602 263 1-Jul-18 32 586 208 1-Aug-18 52 557 225 1-Sep-18 41 534 227 1-Oct-18 48 499 217 1-Nov-18 44 514 235 1-Dec-18 31 582 281 1-Jan-19 33 545 269 1-Feb-19 38 574 259 1-Mar-19 29 564 247 1-Apr-19 29 642 274 1-May-19 28 556 216 1-Jun-19 20 531 187 1-Jul-19 31 604 226 1-Aug-19 19 513 186 1-Sep-19 24 483 185 1-Oct-19 28 401 156 1-Nov-19 18 450 158 1-Dec-19 21 418 178 1-Jan-20 28 396 149 1-Feb-20 43 488 177 1-Mar-20 33 539 205 1-Apr-20 57 631 244 1-May-20 54 695 291 1-Jun-20 58 732 309 1-Jul-20 62 681 301 1-Aug-20 42 654 291 1-Sep-20 57 749 365 1-Oct-20 60 627 249 1-Nov-20 56 623 244 1-Dec-20 54 712 298 1-Jan-21 62 655 262 1-Feb-21 ; run; proc sgplot data=input; format date monyy7.; styleattrs datacolors=(Red DarkBlue) datacontrastcolors=(black black) datalinepatterns=(solid); vbar date / response=visits discreteoffset=-0.17 barwidth=0.3; vbar date / response=outcome discreteoffset=0.17 barwidth=0.3; vline date / response=people y2axis lineattrs=(color=black thickness=3); xaxis display=(nolabel) /*fitpolicy=rotate valuesrotate=vertical*/ type=time /*interval=month*/; yaxis grid label='Label1' values=(0 to 800 by 100); y2axis label='Label2' values=(0 to 70 by 10); keylegend / title=""; run; Output I am getting: Output I want: (With shifted bars, but it is changing dates) Appreciate any help! Thank you.
Reshape the data with transpose so the variables wanted side by side become categorical, i.e. name value pairs. The name can be used in vbar as the group= with groupdisplay=cluster. Note: The xaxis type=time appears to perform special checks based on the format of the vbar variable, and will rendered a pretty two-line axis label when that format is date9. I've never seen this discussed in the documentation. Example: Uses name= in the plotting statements so the keylegend can look prettier. proc transpose data=input out=plot; by rowid date; copy people; var visits outcome; run; proc sgplot data=plot; vbar date / response=col1 group=_name_ groupdisplay=cluster name='relatedcounts'; vline date / response=people group=_name_ y2axis lineattrs=(color=black thickness=3) name='people'; xaxis type = time interval = month ; format date date9.; yaxis grid label='Related counts' values=(0 to 800 by 100); y2axis label='# People' values=(0 to 70 by 10); keylegend 'relatedcounts' / title=""; run; Will produce
How to determine the number of filled drums, and the room left in each drum
Not quite a homework problem, but it may as well be: You have a long list of positive integer values stored in column A. These are packets in unit U. A Drum can fit up to 500 U, but you cannot break up packets. How many drums are required for any given list of values in column A? This does not have to be the most efficient answer, processing in row order is absolutely fine. I Think you should be able to solve this with a formula, but the closest I got was =CEILING(SUM(A1:A1000)/500;1) Of course, this breaks up packets. Additionally, this problem requires me to be able to find the room left in each drum used, but emphasis for this question should remain on just the number required.
This cannot be done with a single simple formula. Each drum and packet needs to be counted. However contrary to my comment, for this particular problem a spreadsheet works well, and there is no need for a macro. First, set B2 to 500 for use in other formulas. If column A is not yet filled, use the formula =RANDBETWEEN(1,B$2) to add some values. Column C is the main formula that determines how full each drum is. Set C2 to =A2. C3 is =IF(C2+A3>B$2,A3,C2+A3). Fill C3 down to fill the remaining rows. For column D, use =IF(C2+A3>B$2,B$2-C2,""). However the last row of column D is shorter: =B$2-C21 and change 21 to whatever the last row is. Finally in column E we find the answer, which is simply =COUNT(D2:D21). Packets Drum Size How Full Room left in each drum used Number of filled drums ------- --------- -------- --------------------------- ---------------------- 206 500 206 294 13 309 309 68 377 84 461 39 305 305 195 387 387 113 118 118 8 126 374 479 479 21 492 492 8 120 120 291 411 89 262 262 108 370 130 440 440 60 88 88 100 188 102 290 210 478 478 22 87 87 413 For OpenOffice Calc, use semicolons ; instead of commas , in formulas.
Verifying output to "Find the numbers between 1 and 1000 whose prime factors' sum is itself a prime" from Allain's Jumping into C++ (ch7 #3)
The question: Design a program that finds all numbers from 1 to 1000 whose prime factors, when added together, sum up to a prime number (for example, 12 has prime factors of 2, 2, and 3, which sum to 7, which is prime). Implement the code for that algorithm. I modified the problem to only sum unique factors, because I don't see why you'd count a factor twice, as in his example using 12. My solution. Is there any good (read: automated) way to verify the output of my program? Sample output for 1 to 1000: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 20 22 23 24 25 26 28 29 30 31 34 37 40 41 43 44 46 47 48 49 52 53 58 59 60 61 63 67 68 70 71 73 76 79 80 82 83 88 89 92 94 96 97 99 101 103 107 109 113 116 117 118 120 121 124 127 131 136 137 139 140 142 147 148 149 151 153 157 160 163 164 167 169 171 172 173 176 179 181 184 188 189 191 192 193 197 198 199 202 207 210 211 212 214 223 227 229 232 233 239 240 241 244 251 252 257 261 263 268 269 271 272 273 274 275 277 279 280 281 283 286 289 292 293 294 297 298 306 307 311 313 317 320 325 331 332 333 334 337 347 349 351 352 353 358 359 361 367 368 369 373 376 379 382 383 384 388 389 394 396 397 399 401 404 409 412 414 419 421 423 424 425 428 431 433 439 443 449 454 457 459 461 462 463 464 467 468 472 475 478 479 480 487 491 495 499 503 509 513 521 522 523 524 529 531 538 539 541 544 546 547 548 549 550 557 560 561 562 563 567 569 571 572 575 577 587 588 593 594 599 601 603 604 605 607 612 613 617 619 621 622 628 631 639 640 641 643 646 647 651 652 653 659 661 664 668 673 677 683 684 691 692 694 701 704 709 712 714 718 719 725 726 727 733 736 738 739 741 743 751 752 756 757 759 761 764 765 768 769 772 773 775 777 783 787 792 797 798 801 809 811 821 823 825 827 828 829 833 837 838 839 841 846 847 848 850 853 856 857 859 862 863 873 877 881 883 887 891 892 903 904 907 908 909 911 918 919 922 925 928 929 932 937 941 944 947 953 954 957 960 961 966 967 971 975 977 981 983 991 997 999 Update: I have solved my problem and verified the output of my program using an OEIS given series, as suggested by #MVW (shown in the source given by my new github solution). In the future, I will aim to test my programs by doing zero or more of the following (depending on the scope/importance of the problem): google keywords for an existing solution to the problem, comparing it against my solution if I find it unit test components for correctness as they're built and integrated, comparing these tests with known correct outputs
Some suggestions: You need to check the properties of your calculated numbers. Here that means calculating the prime factors and calculating their sum and testing if that sum is a prime number. Which is what your program should do in the first place, by the way. So one nice option for checking is comparing your output with a known solution or the output of a another program which is known to work. The tricky bit is to have such a solution or program available. And I neglect that your comparison could be plagued by errors as well :-) If you just compare it with other implementations, e.g. programs from other folks here, it would turn out more of a voting, it would not be a proof. It would just give increased probability that your program is correct, if several independent implementations come up with the same result. Of course all implementations could err :-) The more agree the better. And the more diverse the implementations are, the better. E.g. you could use different programming languages, algebraic systems or a friend with time and paper and pencil and Wikipedia. :-) Another means is to add checks to your intermediate steps, to get more confidence in your result. Kind of building a chain of trust. You could output the prime factors you determined and compare it with the output of a prime factorization program which is known to work. Then you check if your summing works. Finally you could check if the primality test you apply to the candidate sums is working correctly by feeding it with known prime numbers and non prime numbers and so on. That is kind of what folks do with unit testing for example. Trying to cover most parts of the code as working, hoping if the parts work, that the whole will work. Or you could formally prove your program step by step, using Hoare Calculus for example or another formal method. But that is tricky, and you might end up shifting program errors to errors in the proof. And today, in the era of internet, of course, you could internet search for the solution: Try searching for sum of prime factors is prime in the online encyclopedia of integer sequences, which should give you series A100118. :-) It is the problem with multiplicity, but shows you what the number theory pros do, with Mathematica and program fragments to calculate the series, the argument for the case of 1 and literature. Quite impressive.
Here's the answer I get. I exclude 1 as it has no prime divisors so their sum is 0, not a prime. Haskell> filter (isPrime . sum . map fst . primePowers) [2..1000] [2,3,4,5,6,7,8,9,10,11,12,13,16,17,18,19,20,22,23,24,25,27,29,31,32,34,36,37,40, 41,43,44,47,48,49,50,53,54,58,59,61,64,67,68,71,72,73,79,80,81,82,83,88,89,96,97 ,100,101,103,107,108,109,113,116,118,121,125,127,128,131,136,137,139,142,144,149 ,151,157,160,162,163,164,165,167,169,173,176,179,181,191,192,193,197,199,200,202 ,210,211,214,216,223,227,229,232,233,236,239,241,242,243,250,251,256,257,263,269 ,271,272,273,274,277,281,283,284,288,289,293,298,307,311,313,317,320,324,328,331 ,337,343,345,347,349,352,353,358,359,361,367,373,379,382,383,384,385,389,390,394 ,397,399,400,401,404,409,419,420,421,428,431,432,433,435,439,443,449,454,457,461 ,462,463,464,467,472,478,479,484,486,487,491,495,499,500,503,509,512,521,523,529 ,538,541,544,547,548,557,561,562,563,568,569,570,571,576,577,578,587,593,595,596 ,599,601,607,613,617,619,622,625,630,631,640,641,643,647,648,651,653,656,659,661 ,665,673,677,683,691,694,701,704,709,714,715,716,719,727,729,733,739,743,751,757 ,759,761,764,768,769,773,777,780,787,788,795,797,798,800,808,809,811,819,821,823 ,825,827,829,838,839,840,841,853,856,857,858,859,862,863,864,877,881,883,885,887 ,903,907,908,911,919,922,924,928,929,930,937,941,944,947,953,956,957,961,967,968 ,971,972,977,983,991,997,1000] Haskell> primePowers 12 [(2,2),(3,1)] Haskell> primePowers 14 [(2,1),(7,1)] You could hard-code this list in and test against it. I'm pretty confident these results are without error. (read . is "of").
SQL Server 2008, numeric library, c++, LAPACK, memory question
I am trying to send a table of numbers in SQL Server 2008 like: 1att 2att 3att 4att 5att 6att 7att ... attn -------------------------------------------- 565 526 472 527 483 529 476 470 502 497 491 483 488 488 483 496 515 491 467 516 480 477 494 497 478 519 471 488 466 547 498 477 466 475 480 516 543 491 449 485 495 468 452 479 516 473 475 431 474 460 342 471 386 549 489 477 462 428 489 491 481 483 475 485 474 472 452 525 508 459 561 529 473 457 476 498 485 465 540 475 525 455 477 415 434 475 499 476 482 551 463 476 476 471 488 526 394 439 475 479 473 491 519 483 474 476 474 478 455 518 465 445 496 500 518 470 536 557 498 492 449 478 491 492 476 460 484 509 538 473 548 497 551 477 498 471 430 482 437 516 483 487 453 456 505 476 489 495 472 476 487 516 466 466 495 488 475 550 565 510 473 515 470 490 480 475 479 544 468 486 496 484 495 524 435 469 612 493 467 477 .... .... (several more rows) .... 511 471 529 553 539 501 477 474 494 via visual studio 2008 (in a c++ project) to a mathematical library LAPACK. Is it possible to pass the table in SQL Server to LAPACK (via c++ in visual studio 2008) like a memory pointer, or store all the table in RAM, and LAPACK read memory or pointer to memory, but without writing to a file and reading it Could you please suggest how to pass a table like this (maybe the location of table in memory, or something similar) to LAPACK? (so I am able to do some computing with LAPACK of the table stored in SQL Server via visual studio 2008 c++ project) ----EDIT--- #MarkD, As you said in your anwer could you please give an example of computing SVD with the idea in the example, using std::vector class ?
LAPACK requires the data sent to it, to be in a FORTRAN style (Column-order) array. You won't be able to pass the data directly from SQL to LAPACK but will need to read the data into a column-ordered contiguous memory array, and pass a pointer to the first element of the array to the LAPACK routine of interest. There are many LAPACK wrappers for C/C++ out there that make this much easier. Edit: just saw you are looking specifically for how to pass such an array. As I mentioned, there are many wrappers out there for doing this (just do a search for C/C++ LAPACK). An easy way to create your array is to use the std::vector class. You would then read the data in, column-by-column, adding the elements to your vector- So if you wanted to column-order the array you show in your exmaple, your vector would end up looking something like: //Column 1 Column 2 Column 3 ... last element [565 497 467 488 ... 526 491 516 466 ... 472 483 480 547 ... ... 494] You would then pass the LAPACK routine of interest the memory location of the first element, eg: &myVector[0] This is possible using std::vector, as the standard ensures that a vector uses contiguous memory storage. The LAPACK routines all also require the size/dimensions of the matrix/vectors you are passing to it (so you'll need to calculate/specify these values for the function call). If you can post the specific LAPACK routine you want to use, I can give a more thorough example.
How to draw a (bezier) path with a fill color using GDI+?
I am making a SVG renderer for Windows using the Windows API and GDI+. SVG allows setting the 'fill' and 'stroke' style attributes on a Path. I am having some difficulty with the implementation of the 'fill' attribute. The following path represents a spiral: <svg:path style="fill:yellow;stroke:blue;stroke-width:2" d="M153 334 C153 334 151 334 151 334 C151 339 153 344 156 344 C164 344 171 339 171 334 C171 322 164 314 156 314 C142 314 131 322 131 334 C131 350 142 364 156 364 C175 364 191 350 191 334 C191 311 175 294 156 294 C131 294 111 311 111 334 C111 361 131 384 156 384 C186 384 211 361 211 334 C211 300 186 274 156 274" /> The fill color is yellow, and it should fill the entire shape, this is however what I get: My GDI+ calls look like this: Gdiplus::GraphicsPath bezierPath; bezierPath.AddBeziers(&gdiplusPoints[0], gdiplusPoints.size()); g.FillPath(&solidBrush, &bezierPath); g.DrawPath(&pen, &bezierPath); Apparently the code is correct for drawing the shape, but not for filling it. Can anyone help me in figuring out what's going wrong?
Try to set the FillMode property of your GraphicsPath to FillMode::Winding, an alternate filling method that should suits your needs.