Output specific strings from a file in RedHat Linux 7 - Regex possibly - regex

First time asking a questions so my apologies if I skipped over some of the basics before posting this question.
Basically my questions is fairly simple....I have a file that gets written to very often and the first string/column always has the word "CLEAR" or "CRITICAL", sometimes "WARNING", but I want to ignore those entries.
Around the 17th column there is a specific32bit alpha-numeric # that accompanies each entry. I'm trying to find a way to, without modifying the original file....write out just the 1st column and the 32bit alpha-numeric # into a new file for starters. Unfortunately the 32bit # is not always in column 17 or else I could do this on my own.
Here is a glance of a portion of the log file that I'm referring to. Please don't bash me to hard on my ignorance if my question is not detailed enough or has already been answered before.
CLEAR ; lnx20162.csxt.csx.com ; Database Instance ; actd ; Dec 14,
2012 4:46:31 PM EST ; D0C53D1FB19075C2E0405C0A6FF002BF ; Metric Alert
; Response:State ; The database status is OPEN.
CRITICAL ; lnx20016.csxt.csx.com ; Database Instance ; GISP_GISP2 ;
Dec 14, 2012 4:39:54 PM EST ; D0C53D32C0E53F85E0405C0A6FF002C9 ;
Metric Alert ; alertLog:genericErrStack ; ORA-error stack (4,031)
logged in
/oramisc01/oracle/diag/rdbms/gisp/GISP2/trace/alert_GISP2.log.
CRITICAL ; lnx20016.csxt.csx.com ; Database Instance ; GISP_GISP2 ;
Dec 14, 2012 4:40:00 PM EST ; D0C53D32C1093F85E0405C0A6FF002C9 ;
Metric Alert ; alertLog:genericErrStack ; ORA-error stack (04031,
04031) logged in
/oramisc01/oracle/diag/rdbms/gisp/GISP2/trace/alert_GISP2.log.
CRITICAL ; lnx20016.csxt.csx.com ; Database Instance ; GISP_GISP2 ;
Dec 14, 2012 4:39:55 PM EST ; D0C53D32C0EB3F85E0405C0A6FF002C9 ;
Metric Alert ; alertLog:genericErrStack ; ORA-error stack (04031,
04031, 04031, 04031, 04031) logged in
/oramisc01/oracle/diag/rdbms/gisp/GISP2/trace/alert_GISP2.log.

grep -E -o "EST ;.{0,33}" file1| cut -d ";" -f2 > outputfile
you need to find a consistent "hook" which is "EST ;"
if you want this done all the time say on the minute, make a script and put in on crontab

Related

Format error, operation not executed in WPS using SAS language

Good afternoon.
Data description first:
if cell = . , rightfully blank
if cell = yyyymm, missing value needs to be imputed
if cell = 0 or other numeric value, number of days
I have monthly data from January 2010 to November 2016. I need to derive the missing values based on the available data. I don't touch change anything if it should be blank but if it's missing, I need to derive it.
Here is my code (I've changed the variable names because it might be confusing):
data trial3;
set work.trial2;
if currentmonth=. Then currentmonth=.; else do;
if currentmonth=201001 then do;
if nextmonth=0 then currentmonth=0;
else do;
if nextmonth ne 201002 then _201001=nextmonth-31;
if currentmonth<0 then currentmonth=0;
end;
end;
end;
There's no error notification but the log would just end running and every line it did run is preceded by an exclamation point.

Defining a new field conditionally using put function with user-defined formats

I am trying to define a new value for an observation with a user defined format. However, my if/then/else statement seems to only work for observations with a year value of "2014". The put statements are not working for other values. In SAS, the put statement is blue in the first statement, and black in the other two. Here is a picture of what I mean:
Does anyone know what I am missing here? Here is my complete code:
data claims_t03_group;
set output.claims_t02_group;
if year = "2014" then test = put(compress(lookup,"_"),$G_14_PROD35.);
else if year = "2015" then test = put(compress(lookup,"_"),$G_15_PROD35.);
else test = put(compress(lookup,"_"),$G_16_PROD35.);
run;
Here is an example of what I mean when I say that the process seems to "work" for 2014:
As you can see, when the Year value is 2014, the format lookup works correctly, and the test field returns the value I am expecting. However, for years 2015 and 2016, the test field returns the lookup value without any formatting.
Your code utilises user-defined formats, $G_14_PROD.-$G_16_PROD.. My guess would be that there is a problem with one or more of these, but unless you can provide the format definitions it will be difficult to assist you further.
Try running the following and sharing the resulting output dataset work.prdfmts:
proc sql noprint;
select cats(libname,'.',memname) into :myfmtlib
from sashelp.vcatalg
where objname = 'G_14_PROD';
quit;
proc format cntlout = prdfmts library=&myfmtlib;
select G_14_PROD G_15_PROD G_16_PROD;
run;
N.B. this assumes that you only have one catalogue containing a format with that name, and that the format definitions for all 3 formats are contained in the same catalogue. If not, you will need to adapt this a bit and run it once for each format to find and export the definition.
Not that it solves your actual problem, but you could eliminate the IF/THEN by using the PUTC() function instead.
data have ;
do year=2014,2015,2016;
do lookup='00_01','00_02' ;
output;
end;
end;
run;
proc format ;
value $G_14_PROD '0001'='2014 - 1' '0002'='2014 - 2' ;
value $G_15_PROD '0001'='2015 - 1' '0002'='2015 - 2' ;
value $G_16_PROD '0001'='2016 - 1' '0002'='2016 - 2' ;
run;
data want ;
set have ;
length test $35 ;
if 2014 <= year <= 2016 then
test = putc(compress(lookup,'_'),cats('$G_',year-2000,'_PROD.'))
;
run;
Result
Obs year lookup test
1 2014 00_01 2014 - 1
2 2014 00_02 2014 - 2
3 2015 00_01 2015 - 1
4 2015 00_02 2015 - 2
5 2016 00_01 2016 - 1
6 2016 00_02 2016 - 2

Assigning time-tied variable data to the previous minute's seconds in SAS

I'm trying to figure out how to do the following in SAS. I have data taken by an NO2 sensor every minute. A GPS is recording a location at give or take every second. I need to assign the value of each data recording for every minute to the previous seconds of that past minute. The NO2 data recorded is an average of the previous minute.
Here's a sample of my data: Sample Data
I am looking to bring the data from the last line (NO2, Humidity, Temperature) "up" to seconds of the previous minute which have a GPS reading. The column is in DateTime format.
Would love any pointers on how to do this... Thanks in advance!
Your question is a bit unclear unfortunately. Assuming that you want to retrospectively assign NO2, Humidity and Temperature for all missing values within the last minute (60 seconds) you can do the following:
Dummy data:
data input ;
format datetime datetime20. ;
datetime='28SEP2015:07:21:26'dt ;
do i=1 to 120 ;
datetime+1 ;
if i=80 then do ; no2=0.007 ; humidite=55.9 ; temperature=22.4 ; end ;
else if i=120 then do ; no2=0.020 ; humidite=65.0 ; temperature=23.5 ; end ;
else call missing(no2, humidite,temperature) ;
output ;
end ;
run ;
Solution:
Keep only the key records that contain values:
data key(rename=(datetime=keytime)) ;
set input(where=(nmiss(no2,humidite,temperature) ne 3));
run ;
Run the key records over the original data as a hashtable:
data output(drop=rc rd i keytime) ;
*Load key table into memory ;
if _n_=1 then do ;
declare hash pt(dataset:"key",multidata:"yes",ordered:"yes");
declare hiter iter('pt');
rc=pt.defineKey('id');
rc=pt.defineData('keytime','no2','humidite','temperature');
rc=pt.defineDone();
end ;
*Read in original data ;
set input ;
rc=pt.find() ;
*Update with key table values whenever needed ;
if rc=0 then do ;
rd=iter.first() ;
if datetime gt keytime then rd=iter.next();
if rd=0 and keytime-60 <= datetime <= keytime then output ;
else do ;
call missing(no2,humidite,temperature);
output ;
end;
end ;
run;
It sounds like you have two data sources. One generates an observation every minute. The other generates an observation as often as every second. You want to join the two datasets together such that all of the frequently-sampled values for a within a given minute get the same occasionally-sampled value that corresponds to that minute.
Assuming that the NO2 samples really are every minute and have a timedate value of NO2_TD, and the GPS timedate stamp is "GPS_TD" [having a variable named the same as a function, informat, and format is generally a bad idea]. One of the best ways to join tables is through SQL. Doing a FULL OUTER JOIN guarantees that if we have missing data on either side, we still get the observations from the other data source.
PROC SQL ;
CREATE TABLE joined_data AS
SELECT a.*, b.NO2_MSR, b.NO2_TD
FROM gps_data a
FULL OUTER JOIN
no2_data b
ON INTNX('DTMINUTE',a.GPS_TD,0,'B') EQ INTNX('DTMINUTE',b.NO2_DT,0,'B')
;
QUIT ;
Here we use INTNX to essentially ignore the seconds. We shift the datetime values to a minute boundary (DTMINUTE), by 0 minutes, to the Beginning of the minute, essentially truncating the seconds for the purposes of the join. INTNX is very commonly used to get the first of the month, last of the month, etc.
If all you need to do it back-fill the "time sorted" environmental data without regard to length of time between gps and environmental data sampling, then the following code pattern (based off Bendy's dummy data creation code) should be sufficient.
/* assuming input data is sorted by datetime value */
/* based on dummy data created with Bendy's code */
data fill_missing;
* read in the data but only keep the gps data - need to add lat and long to keep statement;
set input(keep=i datetime);
if _n_=1 or datetime > datetime2 then do;
drop datetime2; * drop the datetime value associated with the non-gps data ;
* read only the rows that have the non-missing data using this set statement ;
* and only keep the datetime (renamed to datetime2) and the non-gps data ;
set input(keep=datetime no2 humidite temperature
rename=(datetime=datetime2)
where=(no2 ne .));
end;
run;
Also note that this code pattern will stop outputting data when the datetime value of the input data exceeds the max datetime value associated with non-missing environmental data (non-gps). It sounds like this likely isn't a problem for your usage.

Reading next k observation from current observation

Here's a very similar question
My question is a bit different from the one in the above link.
Background
I have a data set contains hourly data. So each object has 24 records per day. Now I want to create K new columns represents next 1,2,...K hourly records for each object. If not exist, replace them with missing values.
K is dynamic and is defined by users.
The original order must be preserved. No matter it's guaranteed in the data steps or by using sorting in the end.
I'm looking for an efficient way to achieve this.
Example
Original data:
Object Hour Value
A 1 2.3
A 2 2.3
A 3 4.0
A 4 1.3
Given K = 2, desired output is
Object Hour Value Value1 Value2
A 1 2.3 2.3 4.0
A 2 2.3 4.0 1.3
A 3 4.0 1.3 .
A 4 1.3 . .
Possible solutions
sort in reverse order -> obtain previous k records -> sort them back.
When the no. of observation is large, this shouldn't be an ideal way.
proc expand. I don't familiar with it cause it's never licensed on my pc.
Using point in data step.
retain statement inside data step. I'm not sure how this works.
Assuming this is provided as a macro variable, this is pretty easily done with a side to side merge-ahead. Certainly faster than a transpose for K much larger than the total record count, and probably faster than looping POINTs.
Basically you merge the original dataset to itself, and use FIRSTOBS to push the starting point down one for each successive merge iteration. This needs a bit of extra work if you have BY groups that need protecting, but that's usually not too hard to manage.
Here's an example using SASHELP.CLASS:
%let K=5;
%macro makemergesets(k=, datain=, varin=, keepin=);
%do _i = 2 %to &k;
&datain (firstobs=&_i rename=&varin.=&varin._&_i. keep=&keepin. &varin.)
%end;
%mend makemregesets;
data class_all;
merge sashelp.class
%makemergesets(k=&k,datain=sashelp.class, varin=age,keepin=)
;
run;
You could transpose the hours and then freely access the hours ahead within each object. Just to set the value of K and generate some dummy data:
* Assign K ;
%let K=3 ;
%let Kn=value&k;
* Generate test objects each containing 24 hourly records ;
data time ;
do object=1 to 10 ;
do hour=1 to 24 ;
value=round(ranuni(1)*10,0.1) ;
output ;
end ;
end ;
run ;
EDIT: I updated the below step as realised the transpose isn't needed. Doing it all in one step gives ~20% improvement in CPU time
Use an array of the 24 hour values and loop through do i=1 to &k for each hour:
* Populate K variables ;
data output(keep=object hour value value1-&kn ) ;
set time ;
by object ;
retain k1-k24 . ;
array k(2,24) k1-k24 value1-value24 ;
k(1,hour)=value ;
if last.object then do hour=1 to 24 ;
value=k(1,hour) ;
do i=1 to &k ;
if hour+i <=24 then k(2,i)=k(1,hour+i) ;
else k(2,i)=.;
end ;
output ;
end ;
run ;

month and year function combination is not giving expected results in SAS

I'm trying to delete all the rows that are in the BATCH: May-2014.
data out
set INPUT;
if MONTH(BATCH) NE 05 and YEAR(BATCH) NE 2014;
RUN;
Data in Batch column is Numeric in the format MONYY5.
EX:::: MAR13, APR14, MAY14, FEB14, JAN14, FEB12
After I run the code it is deleting all 2014 records instead of deleting MAY and 2014.
Thanks in advance.
Because you're asking for everythign that is neither 2014 nor 05. You want everything that is not (both 2014 and 05).
data out
set INPUT;
if NOT (MONTH(BATCH) eq 05 and YEAR(BATCH) eq 2014);
RUN;
Another option if you know it's MONYY:
data out
set INPUT;
if vvalue(batch) ne 'MAY14'; *vvalue gives formatted value
RUN;
Only works if you're sure it's formatted that way, though.