How do you Resample daily with a conditional statement in pandas - python-2.7

I have a pandas data frame below: (it does have other columns but these are the important ones) Date column is the Index
Number_QA_VeryGood Number_Valid_Cells Time
Date
2015-01-01 91 92 18:55
2015-01-02 6 6 18:00
2015-01-02 13 13 19:40
2015-01-03 106 106 18:45
2015-01-05 68 68 18:30
2015-01-06 111 117 19:15
2015-01-07 89 97 18:20
2015-01-08 86 96 19:00
2015-01-10 9 16 18:50
I need to resample daily the first two columns will be resampled with sum.
The last column needs to look at the highest daily value for the Number_Valid_Cells column and use that time for the value.
example output should be: (1/2/02 is line which changed)
Number_QA_VeryGood Number_Valid_Cells Time
Date
2015-01-01 91 92 18:55
2015-01-02 19 19 19:40
2015-01-03 106 106 18:45
2015-01-05 68 68 18:30
2015-01-06 111 117 19:15
2015-01-07 89 97 18:20
2015-01-08 86 96 19:00
2015-01-10 9 16 18:50
What is the best way to get this to work.

Or you can try
df.groupby(df.index).agg({'Number_QA_VeryGood':'sum','Number_Valid_Cells':'sum','Time':'last'})
Out[276]:
Time Number_QA_VeryGood Number_Valid_Cells
Date
2015-01-01 18:55 91 92
2015-01-02 19:40 19 19
2015-01-03 18:45 106 106
2015-01-05 18:30 68 68
2015-01-06 19:15 111 117
2015-01-07 18:20 89 97
2015-01-08 19:00 86 96
2015-01-10 18:50 9 16
Update: sort_values first
df.sort_values('Number_Valid_Cells').groupby(df.sort_values('Number_Valid_Cells').index)\
.agg({'Number_QA_VeryGood':'sum','Number_Valid_Cells':'sum','Time':'last'})
Out[314]:
Time Number_QA_VeryGood Number_Valid_Cells
Date
1/1/2015 18:55 91 92
1/10/2015 18:50 9 16
1/2/2015 16:40#here.changed 19 19
1/3/2015 18:45 106 106
1/5/2015 18:30 68 68
1/6/2015 19:15 111 117
1/7/2015 18:20 89 97
1/8/2015 19:00 86 96
Data input :
Number_QA_VeryGood Number_Valid_Cells Time
Date
1/1/2015 91 92 18:55
1/2/2015 6 6 18:00
1/2/2015 13 13 16:40#I change here
1/3/2015 106 106 18:45
1/5/2015 68 68 18:30
1/6/2015 111 117 19:15
1/7/2015 89 97 18:20
1/8/2015 86 96 19:00
1/10/2015 9 16 18:50

You can use groupby sum for first two columns, if you have the values of Number_Valid_Cells sorted then
ndf = df.reset_index().groupby('Date').sum()
ndf['Time'] = df.reset_index().drop_duplicates(subset='Date',keep='last').set_index('Date')['Time']
Number_QA_VeryGood Number_Valid_Cells Time
Date
2015-01-01 91 92 18:55
2015-01-02 19 19 19:40
2015-01-03 106 106 18:45
2015-01-05 68 68 18:30
2015-01-06 111 117 19:15
2015-01-07 89 97 18:20
2015-01-08 86 96 19:00
2015-01-10 9 16 18:50

Related

Is there a way to make a dummy variable in SAS for a Country in my SAS Data Set?

I am looking to create a dummy variable for Latin American Countries in my data set which I need to make for a log-log model. I know how to log all of them for my later regression. Any suggestion or help on how to make a dummy variable for the Latin American countries with my data would be appreciated.
data HW6;
input country : $25. midyear sancts lprots lfrac ineql pop;
cards;
CHILE 1955 58 44 65 57 6.743
CHILE 1960 19 34 65 57 7.785
CHILE 1965 27 24 65 57 8.510
CHILE 1970 36 29 65 57 9.369
CHILE 1975 38 58 65 57 10.214
COSTA_RICA 1955 16 7 54 60 1.024
COSTA_RICA 1960 6 1 54 60 1.236
COSTA_RICA 1965 2 1 54 60 1.482
COSTA_RICA 1970 3 1 54 60 1.732
COSTA_RICA 1975 2 3 54 60 1.965
INDIA 1955 81 134 47 52 404.478
INDIA 1960 101 190 47 52 445.857
INDIA 1965 189 845 47 52 494.882
INDIA 1970 133 915 47 52 553.619
INDIA 1975 132 127 47 52 616.551
JAMICA 1955 11 12 47 62 1.542
JAMICA 1960 9 2 47 62 1.629
JAMICA 1965 8 6 47 62 1.749
JAMICA 1970 1 1 47 62 1.877
JAMICA 1975 7 1 47 62 2.043
PHILIPPINES 1955 26 123 48 56 24.0
PHILIPPINES 1960 20 38 48 56 27.898
PHILIPPINES 1965 9 5 48 56 32.415
PHILIPPINES 1970 79 25 48 56 37.540
SRI_LANKA 1955 29 2 73 52 8.679
SRI_LANKA 1960 75 35 73 52 9.879
SRI_LANKA 1965 25 63 73 52 11.202
SRI_LANKA 1970 34 14 73 52 12.532
TURKEY 1955 79 1 67 61 24.145
TURKEY 1960 138 19 67 61 28.217
TURKEY 1965 36 51 67 61 31.951
TURKEY 1970 51 22 67 61 35.743
URUGUAY 1955 8 4 57 48 2.372
URUGUAY 1960 12 1 57 48 2.538
URUGUAY 1965 16 14 57 48 2.693
URUGUAY 1970 21 19 57 48 2.808
URUGUAY 1975 24 45 57 48 2.829
VENEZUELA 1955 38 14 76 65 6.110
VENEZUELA 1960 209 23 76 65 7.632
VENEZUELA 1965 100 162 76 65 9.119
VENEZUELA 1970 9 27 76 65 10.709
VENEZUELA 1975 4 12 76 65 12.722
;
data newData;
set HW6;
sancts = log (sancts);
lprots = log (lprots);
lfrac = log (lfrac);
ineql = log (ineql);
pop = log (pop);
run;
The GLMSELECT procedure is one simple way of creating dummy variables.
There is a nice article about how to use it to generate dummy variables
data newData;
set HW6;
sancts = log (sancts);
lprots = log (lprots);
lfrac = log (lfrac);
ineql = log (ineql);
pop = log (pop);
Y = 0; *-- Create a fake response variable --*
run;
proc glmselect data=newData noprint outdesign(addinputvars)=want(drop=Y);
class country;
model Y = country / noint selection=none;
run;
If needed in further step, use the macro-variable &_GLSMOD created by the procedure that contains the names of the dummy variables.
The real question here is not related to SAS, it is related on how to get the region of a country by its name.
I would give a try to the ISO 3166 which lists all countries and their geographical location.
Getting that list is straight forward, then import that list in SAS, use a merge by country and finally identify the countries in Latin America

SAS Restructure Data

I need help restructuring the data. My Table looks like this
NameHead Department Per_test Per_Delta Per_DB Per_Vul
Nancy Health 55 33.2 33 63
Jim Air 25 22.8 23 11
Shu Water 26 88.3 44 12
Dick Electricity 77 55.9 66 10
Elena General 88 22 67 9
Nancy Internet 66 12 44 79
And I want my table to look like this
NameHead Nancy Jim Shu Dick Elena Nancy
Department Health Air Water Electricity General Internet
Per_test 55 25 26 77 88 66
Per_Delta 33.2 22.8 88.3 55.9 22 12
PerDB 33 23 44 66 67 44
Per_Vul 63 11 12 10 9 79
I tried proc transpose but couldnt get the desired result. Please help!
Thanks!
PROC TRANSPOSE does exactly what you want. You must include a VAR statement if you want to include the character variables.
proc transpose data=have out=want;
var _all_;
run;
Note that you cannot have variables that do not have names. Here is what the dataset looks like.
Obs _NAME_ COL1 COL2 COL3 COL4 COL5 COL6
1 NameHead Nancy Jim Shu Dick Elena Nancy
2 Department Health Air Water Electricity General Internet
3 Percent_test 55 25 26 77 88 66
4 Percent_Delta 33.2 22.8 88.3 55.9 22 12
5 Percent_DB 33 23 44 66 67 44
6 Percent_Vul 63 11 12 10 9 79

How to parse & read from .txt file (NOAA Weather)

I am trying to parse and read from a .txt file, from the NOAA Weather site. How can I search the file to find certain text and insert it into the database?
I'm trying to search for Greenville and pull the conditions and temp. Then push that into my database, along with other cities? Any code that you are willing to share would be appreciated.
Code:
<cffile action="read" file="#expandPath("./NWS Raleigh Durham.txt")#" variable="myFile">"
<cfdump var="#myfile#">
Content:
National Weather Service Text Product Display Skip Navigation Regional Weather Roundup Issued by NWS Raleigh/Durham, NC Home | Current Version | Previous Version | Graphics & Text | Print | Product List | Glossary On Versions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 000 ASUS42 KRAH 081410 RWRRAH NORTH CAROLINA WEATHER ROUNDUP NATIONAL WEATHER SERVICE RALEIGH NC 900 AM EST THU MAR 08 2018 NOTE: "FAIR" INDICATES FEW OR NO CLOUDS BELOW 12,000 FEET WITH NO SIGNIFICANT WEATHER AND/OR OBSTRUCTIONS TO VISIBILITY. NCZ001-053-055-056-065-067-081500- WESTERN NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS ASHEVILLE FAIR 35 23 61 VRB3 29.92F JEFFERSON FLURRIES 26 16 65 W9 29.83F WCI 16 MORGANTON FAIR 37 25 64 NW3 29.97F HICKORY CLOUDY 35 24 64 SW5 29.94F WCI 31 RUTHERFORDTON CLOUDY 37 27 67 W6 29.97S WCI 33 MOUNT AIRY FAIR 37 21 53 NW8 29.94F WCI 31 BOONE PTSUNNY 27 16 63 NW13G18 29.85F WCI 16 $$ NCZ021-022-025-041-071-084-088-081500- CENTRAL NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS CHARLOTTE CLOUDY 38 27 64 W5 29.97F WCI 34 GREENSBORO PTSUNNY 38 24 57 W8 29.93S WCI 32 WINSTON-SALEM FAIR 38 20 48 W8 29.94F WCI 32 RALEIGH-DURHAM PTSUNNY 36 26 67 CALM 29.96R FORT BRAGG CLOUDY 39 23 52 NW5 29.97R FAYETTEVILLE CLOUDY 38 28 67 W6 29.98R WCI 33 BURLINGTON CLOUDY 39 25 57 SW5 29.94S LAURINBURG CLOUDY 38 28 67 NW8 29.99R WCI 32 $$ NCZ011-015-027-028-043-044-047-080-103-081500- NORTHEASTERN NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS ROCKY MT-WILSO PTSUNNY 40 24 53 NW6 29.96R GREENVILLE FAIR 41 23 48 N6 29.97S WASHINGTON FAIR 41 25 51 NW9 29.94F ELIZABETH CITY PTSUNNY 40 27 59 NW7 29.92S MANTEO CLOUDY 42 28 58 N9 29.91S CAPE HATTERAS FAIR 45 33 63 N5 29.90S $$ NCZ078-087-090-091-093-098-101-081500- SOUTHEASTERN NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS LUMBERTON CLOUDY 40 29 64 NW8 29.99R WCI 34 GOLDSBORO CLOUDY 39 25 57 NW5 29.94S KINSTON PTSUNNY 43 25 49 NW8 29.96S KENANSVILLE CLOUDY 39 27 60 NW7 29.96S WCI 34 NEW BERN FAIR 41 27 57 NW8 29.95S CHERRY POINT NOT AVBL BEAUFORT FAIR 45 28 51 NW13 29.93S JACKSONVILLE CLOUDY 43 27 53 NW9 29.95S WILMINGTON FAIR 44 27 51 NW13 29.96S $$ National Weather Service Raleigh, NC Weather Forecast Office 1005 Capability Drive, Suite 300 Centennial Campus Raleigh, NC 27606-5226 (919) 326-1042 Page Author: RAH Webmaster Web Master's E-mail: rah.webmaster#noaa.gov Page last modified: Jan 10th, 2018 19:57 UTC Disclaimer Credits Glossary Privacy Policy About Us Career Opportunities

SAS Proc Ttest finding differences and where statement

DATA OZONE;
INPUT MONTH $ STMF YKRS ##;
CARDS;
A 80 66 A 68 82 A 24 47 A 24 28 A 82 44 A 100 55
A 55 34 A 91 60 A 87 70 A 64 41 A . 67 A . 127 A 170 96 A . 56
JN 215 93 JN 230 106 JN . 49 JN 69 64 JN 98 83 JN 125 97
JN 72 51 JN 125 75 JN 143 104 JN 192 107 JN . 56 JN 122 68
JN 32 20 JN 23 35 JN 71 30 JN 38 31 JN 136 81 JN 169 119
JL 152 76 JL 201 108 JL 134 85 JL 206 96 JL 92 48 JL 101 60
JL 133 . JL 83 50 JL . 27 JL 60 37 JL 124 47 JL 142 71
JL 75 49 JL 103 59 JL . 53 JL 46 25 JL 68 45 JL . 78
S 38 23 S 80 50 S 80 34 S 99 58 S 71 35 S 42 24 S 52 27 S 33 17
;
run;
Proc Ttest data=Ozone PLOT=NONE ALPHA=0.01;
Where MONTH='JN';
Paired STMF*YKRS;
Run;
Question 2
Data Baseball;
Input ba league;
Datalines;
276 National League
288 National League
281 National League
290 National League
303 National League
257 American League
254 American League
263 American League
261 American League
Run;
Proc Ttest data=Baseball ALPHA=0.02 ;
Question 3
Proc Ttest data=ozone ALPHA=0.01 Plot=NONE;
Where Month='A'-'S';
Paired STMF*YKRS;
Run;
Question 2 Test to see if both leagues have different batting averages. Use a alpha = 0.02 in your conclusions and compute a 98% confidence interval for the means.
Question 3 From the first question test to see the differences the A and S average ozone values. Use alpha = 0.01 in your conclusions. Include a 99% confidence interval for the difference.
So my questions are for Question 2 kind of a stupid question but for whatever reason I am confused as to what you are suppose to do.
For question 3 (my main question) how do I use one proc ttest to check and see the differences between months A and S? I tried using a Where statement as you can see above, but of course that does not work I’m abit stomped on where to go from here. Also I omited a good bit of the month data in the ozone portion as I couldn't properly format all of the data without it looking extremely confusing.
Thanks for your help in advance!

Find edge of black and white image

I need to find the edge and generate points of a black and white image like the one below:
I am not sure how to go about doing this. I know OpenCV is an option, but that is way overkill for what is sure to be a simple task. Does anyone know any easy way to do this? Libraries are okay, as long as they aren't too heavyweight (header only preferred)
I would use a Canny Edge Detection, though you can easily experiment with the others that #therainmaker suggests. I would use ImageMagick which is free and installed on most Linux distros and also available for OS X and Windows.
At the command line, you would use this:
convert blob.png -canny 0x1+10%+30% result.png
or this:
convert blob.png -canny 0x1+10%+30% -negate result.png
To use with C++, you would use Magick++, which is described here. There is a reasonable tutorial here.
If you want a description of the theory and examples of usage, including Sobel etc, please look at Anthony Thyssen's excellent pages here.
Depending on what you are actually doing, you may be better served by a Morphological technique (shape detection) rather than an Edge Detection technique. If so, ImageMagick can do that for you also. For example:
convert blob.png -morphology EdgeIn Octagon edgein.png
That technique is described nicely here.
If, you want the outline as a vector path, you can combine ImageMagick and potrace through an intermediate PBM file like this:
convert blob.png -canny 0x1+10%+30% -negate pbm:- | potrace -s -o result.svg
That will give you a nice smooth vector path like this:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
width="745.000000pt" height="1053.000000pt" viewBox="0 0 745.000000 1053.000000"
preserveAspectRatio="xMidYMid meet">
<metadata>
Created by potrace 1.12, written by Peter Selinger 2001-2015
</metadata>
<g transform="translate(0.000000,1053.000000) scale(0.100000,-0.100000)"
fill="#000000" stroke="none">
<path d="M6145 8276 c-159 -39 -545 -231 -975 -485 -276 -163 -313 -179 -630
-267 -567 -157 -1108 -385 -1550 -652 -182 -111 -178 -107 -359 -289 -173
-174 -351 -387 -483 -579 -42 -61 -84 -116 -92 -123 -8 -7 -18 -25 -21 -41 -3
-16 -13 -34 -21 -41 -8 -7 -27 -33 -41 -58 -14 -25 -41 -68 -58 -96 -18 -27
-48 -81 -66 -120 -18 -38 -44 -83 -57 -100 -38 -46 -183 -353 -246 -516 -142
-373 -156 -550 -76 -979 76 -403 215 -867 299 -999 40 -62 121 -138 167 -157
58 -24 119 -32 179 -22 74 11 276 94 775 316 423 188 561 243 900 362 568 199
1059 434 1478 706 261 170 403 298 552 496 261 346 439 756 494 1138 38 261
72 696 81 1025 8 272 17 342 72 554 85 332 112 563 79 691 -49 188 -210 283
-401 236z m221 -27 c64 -30 115 -84 150 -155 28 -57 29 -64 28 -199 0 -165
-16 -262 -84 -531 -59 -229 -67 -295 -75 -569 -13 -471 -64 -995 -120 -1230
-86 -363 -361 -858 -621 -1119 -229 -229 -721 -529 -1279 -778 -220 -99 -319
-138 -615 -242 -340 -120 -556 -208 -1001 -406 -581 -260 -633 -278 -736 -259
-103 20 -207 116 -273 253 -106 221 -260 821 -301 1176 -35 311 33 578 273
1062 37 75 78 149 91 165 12 15 38 60 56 98 18 39 48 93 66 120 17 28 44 71
58 96 14 25 33 51 41 58 8 7 18 25 21 41 3 16 13 34 21 41 8 7 50 62 92 123
207 300 562 688 732 801 45 30 85 55 88 55 3 0 37 20 76 44 375 232 967 478
1521 631 268 74 353 108 535 216 333 197 793 440 927 491 143 54 243 59 329
17z"/>
</g>
</svg>
What you are looking for is edge detection. If the image is an clean as the one posted above, the results of edge detection will be perfect, and no other processing will be needed to done after it.
So how do we do the edge detection? I'm assuming you know that an image is stored as a 2D matrix with intensity values in the computer. So, if you applied a mask over the image, i.e. take a small matrix, compute its values at different points of the image, and substitute the value at the center of the matrix by the computed result, you can do edge detection.
There are many masks for this purpose. I suggest you look at Sobell, Roberts and Prewit filters. One of the simplest filters you can use is
0 1 0
1 -4 1
0 1 0
You can do this in openCV (but I don't have much experience in it). My preferred tool is by MATLAB. You can use their inbuilt functions such as edge (here's a tutorial), or write a simple code in which you use two for loops to iterate over all pixels in the image and calculate the values applied by these filters.