I am trying to parse and read from a .txt file, from the NOAA Weather site. How can I search the file to find certain text and insert it into the database?
I'm trying to search for Greenville and pull the conditions and temp. Then push that into my database, along with other cities? Any code that you are willing to share would be appreciated.
Code:
<cffile action="read" file="#expandPath("./NWS Raleigh Durham.txt")#" variable="myFile">"
<cfdump var="#myfile#">
Content:
National Weather Service Text Product Display Skip Navigation Regional Weather Roundup Issued by NWS Raleigh/Durham, NC Home | Current Version | Previous Version | Graphics & Text | Print | Product List | Glossary On Versions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 000 ASUS42 KRAH 081410 RWRRAH NORTH CAROLINA WEATHER ROUNDUP NATIONAL WEATHER SERVICE RALEIGH NC 900 AM EST THU MAR 08 2018 NOTE: "FAIR" INDICATES FEW OR NO CLOUDS BELOW 12,000 FEET WITH NO SIGNIFICANT WEATHER AND/OR OBSTRUCTIONS TO VISIBILITY. NCZ001-053-055-056-065-067-081500- WESTERN NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS ASHEVILLE FAIR 35 23 61 VRB3 29.92F JEFFERSON FLURRIES 26 16 65 W9 29.83F WCI 16 MORGANTON FAIR 37 25 64 NW3 29.97F HICKORY CLOUDY 35 24 64 SW5 29.94F WCI 31 RUTHERFORDTON CLOUDY 37 27 67 W6 29.97S WCI 33 MOUNT AIRY FAIR 37 21 53 NW8 29.94F WCI 31 BOONE PTSUNNY 27 16 63 NW13G18 29.85F WCI 16 $$ NCZ021-022-025-041-071-084-088-081500- CENTRAL NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS CHARLOTTE CLOUDY 38 27 64 W5 29.97F WCI 34 GREENSBORO PTSUNNY 38 24 57 W8 29.93S WCI 32 WINSTON-SALEM FAIR 38 20 48 W8 29.94F WCI 32 RALEIGH-DURHAM PTSUNNY 36 26 67 CALM 29.96R FORT BRAGG CLOUDY 39 23 52 NW5 29.97R FAYETTEVILLE CLOUDY 38 28 67 W6 29.98R WCI 33 BURLINGTON CLOUDY 39 25 57 SW5 29.94S LAURINBURG CLOUDY 38 28 67 NW8 29.99R WCI 32 $$ NCZ011-015-027-028-043-044-047-080-103-081500- NORTHEASTERN NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS ROCKY MT-WILSO PTSUNNY 40 24 53 NW6 29.96R GREENVILLE FAIR 41 23 48 N6 29.97S WASHINGTON FAIR 41 25 51 NW9 29.94F ELIZABETH CITY PTSUNNY 40 27 59 NW7 29.92S MANTEO CLOUDY 42 28 58 N9 29.91S CAPE HATTERAS FAIR 45 33 63 N5 29.90S $$ NCZ078-087-090-091-093-098-101-081500- SOUTHEASTERN NORTH CAROLINA CITY SKY/WX TMP DP RH WIND PRES REMARKS LUMBERTON CLOUDY 40 29 64 NW8 29.99R WCI 34 GOLDSBORO CLOUDY 39 25 57 NW5 29.94S KINSTON PTSUNNY 43 25 49 NW8 29.96S KENANSVILLE CLOUDY 39 27 60 NW7 29.96S WCI 34 NEW BERN FAIR 41 27 57 NW8 29.95S CHERRY POINT NOT AVBL BEAUFORT FAIR 45 28 51 NW13 29.93S JACKSONVILLE CLOUDY 43 27 53 NW9 29.95S WILMINGTON FAIR 44 27 51 NW13 29.96S $$ National Weather Service Raleigh, NC Weather Forecast Office 1005 Capability Drive, Suite 300 Centennial Campus Raleigh, NC 27606-5226 (919) 326-1042 Page Author: RAH Webmaster Web Master's E-mail: rah.webmaster#noaa.gov Page last modified: Jan 10th, 2018 19:57 UTC Disclaimer Credits Glossary Privacy Policy About Us Career Opportunities
Related
Link to CausalFS GitHub I'm using v.2.0 of CausalFS cpp package by Kui Yu.
Upon running the structural learning algos, my DAG and MB are not matching.
I'm trying to generate a DAG based on the data given in the CDD/data/data.txt directory and CDD/data.txt via some of the Local-to-global structure learning algos mentioned in the manual (PCMB-CSL, STMB-CSL etc.). Running the commands as given by the manual (pg. 18 of 26).
But my resulting DAG is just filled with zeros (for the most part). Given that this is a example dataset that looks suspicious. Upon then checking CDD/mb/mb.out I find that the Markov blankets for the variables do not agree with the DAG output.
For ex, running ./main ./data/data.txt ./data/net.txt 0.01 PCMB-CSL "" "" -1 gives a 1 at position (1,22) (one-indexed) only (relaxing alpha value to 0.1 (kept at 0.01 in ex) gives just another 1). However, this doesn't agree with the output MB for each variable, which looks like (upon running IAMB as ./main ./data/data.txt ./net/net.txt 0.01 IAMB all "" "")-
0 21
1 22 26 28
2 29
3 14 21
4 5 12
5 4 12
6 8 12
7 8 12
8 6 7 12
9 11 15
10 35
11 9 12 15 33
12 4 6 7 8 11 13
13 8 12 14 15 17 30 34
14 3 13 20
15 8 9 11 13 14 17 30
16 15
17 13 15 18 27 30
18 17 19 20 27
19 18 20
20 14 18 21 28
21 0 3 20 26
22 1 21 23 24 28
23 1 22 24
24 5 22 23 25
25 24
26 1 21 22
27 17 18 28 29
28 1 18 21 22 27 29
29 2 27
30 13 14 15 17
31 34
32 15 18 34
33 11 12 32 35 36
34 30 31 32 35 36
35 10 33 34
36 33 34 35
Such an MB profile suggests the DAG to be much more connected.
I would love to hear suggestions from people who've managed to get the package to behave appropriately. I just do not understand the error here from my side. (I'm running on PopOS 20.04)
Thanks a bunch <3
P.S- The files just continue to write upon rerunning the code, so make sure to appropriately delete them.
I am looking to create a dummy variable for Latin American Countries in my data set which I need to make for a log-log model. I know how to log all of them for my later regression. Any suggestion or help on how to make a dummy variable for the Latin American countries with my data would be appreciated.
data HW6;
input country : $25. midyear sancts lprots lfrac ineql pop;
cards;
CHILE 1955 58 44 65 57 6.743
CHILE 1960 19 34 65 57 7.785
CHILE 1965 27 24 65 57 8.510
CHILE 1970 36 29 65 57 9.369
CHILE 1975 38 58 65 57 10.214
COSTA_RICA 1955 16 7 54 60 1.024
COSTA_RICA 1960 6 1 54 60 1.236
COSTA_RICA 1965 2 1 54 60 1.482
COSTA_RICA 1970 3 1 54 60 1.732
COSTA_RICA 1975 2 3 54 60 1.965
INDIA 1955 81 134 47 52 404.478
INDIA 1960 101 190 47 52 445.857
INDIA 1965 189 845 47 52 494.882
INDIA 1970 133 915 47 52 553.619
INDIA 1975 132 127 47 52 616.551
JAMICA 1955 11 12 47 62 1.542
JAMICA 1960 9 2 47 62 1.629
JAMICA 1965 8 6 47 62 1.749
JAMICA 1970 1 1 47 62 1.877
JAMICA 1975 7 1 47 62 2.043
PHILIPPINES 1955 26 123 48 56 24.0
PHILIPPINES 1960 20 38 48 56 27.898
PHILIPPINES 1965 9 5 48 56 32.415
PHILIPPINES 1970 79 25 48 56 37.540
SRI_LANKA 1955 29 2 73 52 8.679
SRI_LANKA 1960 75 35 73 52 9.879
SRI_LANKA 1965 25 63 73 52 11.202
SRI_LANKA 1970 34 14 73 52 12.532
TURKEY 1955 79 1 67 61 24.145
TURKEY 1960 138 19 67 61 28.217
TURKEY 1965 36 51 67 61 31.951
TURKEY 1970 51 22 67 61 35.743
URUGUAY 1955 8 4 57 48 2.372
URUGUAY 1960 12 1 57 48 2.538
URUGUAY 1965 16 14 57 48 2.693
URUGUAY 1970 21 19 57 48 2.808
URUGUAY 1975 24 45 57 48 2.829
VENEZUELA 1955 38 14 76 65 6.110
VENEZUELA 1960 209 23 76 65 7.632
VENEZUELA 1965 100 162 76 65 9.119
VENEZUELA 1970 9 27 76 65 10.709
VENEZUELA 1975 4 12 76 65 12.722
;
data newData;
set HW6;
sancts = log (sancts);
lprots = log (lprots);
lfrac = log (lfrac);
ineql = log (ineql);
pop = log (pop);
run;
The GLMSELECT procedure is one simple way of creating dummy variables.
There is a nice article about how to use it to generate dummy variables
data newData;
set HW6;
sancts = log (sancts);
lprots = log (lprots);
lfrac = log (lfrac);
ineql = log (ineql);
pop = log (pop);
Y = 0; *-- Create a fake response variable --*
run;
proc glmselect data=newData noprint outdesign(addinputvars)=want(drop=Y);
class country;
model Y = country / noint selection=none;
run;
If needed in further step, use the macro-variable &_GLSMOD created by the procedure that contains the names of the dummy variables.
The real question here is not related to SAS, it is related on how to get the region of a country by its name.
I would give a try to the ISO 3166 which lists all countries and their geographical location.
Getting that list is straight forward, then import that list in SAS, use a merge by country and finally identify the countries in Latin America
I need help restructuring the data. My Table looks like this
NameHead Department Per_test Per_Delta Per_DB Per_Vul
Nancy Health 55 33.2 33 63
Jim Air 25 22.8 23 11
Shu Water 26 88.3 44 12
Dick Electricity 77 55.9 66 10
Elena General 88 22 67 9
Nancy Internet 66 12 44 79
And I want my table to look like this
NameHead Nancy Jim Shu Dick Elena Nancy
Department Health Air Water Electricity General Internet
Per_test 55 25 26 77 88 66
Per_Delta 33.2 22.8 88.3 55.9 22 12
PerDB 33 23 44 66 67 44
Per_Vul 63 11 12 10 9 79
I tried proc transpose but couldnt get the desired result. Please help!
Thanks!
PROC TRANSPOSE does exactly what you want. You must include a VAR statement if you want to include the character variables.
proc transpose data=have out=want;
var _all_;
run;
Note that you cannot have variables that do not have names. Here is what the dataset looks like.
Obs _NAME_ COL1 COL2 COL3 COL4 COL5 COL6
1 NameHead Nancy Jim Shu Dick Elena Nancy
2 Department Health Air Water Electricity General Internet
3 Percent_test 55 25 26 77 88 66
4 Percent_Delta 33.2 22.8 88.3 55.9 22 12
5 Percent_DB 33 23 44 66 67 44
6 Percent_Vul 63 11 12 10 9 79
I have a pandas data frame below: (it does have other columns but these are the important ones) Date column is the Index
Number_QA_VeryGood Number_Valid_Cells Time
Date
2015-01-01 91 92 18:55
2015-01-02 6 6 18:00
2015-01-02 13 13 19:40
2015-01-03 106 106 18:45
2015-01-05 68 68 18:30
2015-01-06 111 117 19:15
2015-01-07 89 97 18:20
2015-01-08 86 96 19:00
2015-01-10 9 16 18:50
I need to resample daily the first two columns will be resampled with sum.
The last column needs to look at the highest daily value for the Number_Valid_Cells column and use that time for the value.
example output should be: (1/2/02 is line which changed)
Number_QA_VeryGood Number_Valid_Cells Time
Date
2015-01-01 91 92 18:55
2015-01-02 19 19 19:40
2015-01-03 106 106 18:45
2015-01-05 68 68 18:30
2015-01-06 111 117 19:15
2015-01-07 89 97 18:20
2015-01-08 86 96 19:00
2015-01-10 9 16 18:50
What is the best way to get this to work.
Or you can try
df.groupby(df.index).agg({'Number_QA_VeryGood':'sum','Number_Valid_Cells':'sum','Time':'last'})
Out[276]:
Time Number_QA_VeryGood Number_Valid_Cells
Date
2015-01-01 18:55 91 92
2015-01-02 19:40 19 19
2015-01-03 18:45 106 106
2015-01-05 18:30 68 68
2015-01-06 19:15 111 117
2015-01-07 18:20 89 97
2015-01-08 19:00 86 96
2015-01-10 18:50 9 16
Update: sort_values first
df.sort_values('Number_Valid_Cells').groupby(df.sort_values('Number_Valid_Cells').index)\
.agg({'Number_QA_VeryGood':'sum','Number_Valid_Cells':'sum','Time':'last'})
Out[314]:
Time Number_QA_VeryGood Number_Valid_Cells
Date
1/1/2015 18:55 91 92
1/10/2015 18:50 9 16
1/2/2015 16:40#here.changed 19 19
1/3/2015 18:45 106 106
1/5/2015 18:30 68 68
1/6/2015 19:15 111 117
1/7/2015 18:20 89 97
1/8/2015 19:00 86 96
Data input :
Number_QA_VeryGood Number_Valid_Cells Time
Date
1/1/2015 91 92 18:55
1/2/2015 6 6 18:00
1/2/2015 13 13 16:40#I change here
1/3/2015 106 106 18:45
1/5/2015 68 68 18:30
1/6/2015 111 117 19:15
1/7/2015 89 97 18:20
1/8/2015 86 96 19:00
1/10/2015 9 16 18:50
You can use groupby sum for first two columns, if you have the values of Number_Valid_Cells sorted then
ndf = df.reset_index().groupby('Date').sum()
ndf['Time'] = df.reset_index().drop_duplicates(subset='Date',keep='last').set_index('Date')['Time']
Number_QA_VeryGood Number_Valid_Cells Time
Date
2015-01-01 91 92 18:55
2015-01-02 19 19 19:40
2015-01-03 106 106 18:45
2015-01-05 68 68 18:30
2015-01-06 111 117 19:15
2015-01-07 89 97 18:20
2015-01-08 86 96 19:00
2015-01-10 9 16 18:50
I am doing some research on image compression via discrete cosine transformations and i want to change to quantization tables sizes so that i can study what happens when i change the matrix sizes that i divide my pictures into. The standard sub-matrix size is 8X8 and there is a lot of tables for those dimensions. For example the standard JPEG quantization table (that i use) is:
standardmatrix8 = np.matrix('16 11 10 16 24 40 51 61;\
12 12 14 19 26 58 60 55;\
14 13 16 24 40 57 69 56;\
14 17 22 29 51 87 80 62;\
18 22 37 56 68 109 103 77;\
24 35 55 64 81 104 103 92;\
49 64 78 77 103 121 120 101;\
72 92 95 98 112 100 103 99').astype('float')
I have assumed that the quantization tabel for 2X2 and 4X4 would be:
standardmatrix2 = np.matrix('16 11; 12 12.astype('float')
standardmatrix4 = np.matrix('16 11 10 16;
12 12 14 19;
14 13 16 24;
18 22 37 56').astype('float')
Since the entries in the standard table correspond to the same frequencies in the smaller matrixes.
But what about a quantization with dimentions 16X16, 24X24 and so on. I know that the standard quantization tabels are worked out by experiments and can't be calculated from some formula, but i assume that someone have tried changing the matrix sizes before me! Where can i find these tabels? Or can i just make something up and scale the last entries to higher frequenzies?