Input keeps missing record at end of line

Input keeps missing record at end of line - sas

I am learning little sas book. Below is a code from book. and raw data. The issue is when I run it, the final data set keeps missing the record at end of line, i.e., it keeps missing 75 and 56, and label them as missing ("."). Could anyone point out where could possible be the problem? When I add spaces after 75 and 56 at line ends, the problem is gone.
DATA class;
INFILE 'c:\MyRawData\Scores.dat';
INPUT Score ##;
RUN;
PROC UNIVARIATE DATA = class;
VAR Score;
TITLE;
RUN;
Data in that file:
56 78 84 73 90 44 76 87 92 75
85 67 90 84 74 64 73 78 69 56
87 73 100 54 81 78 69 64 73 65
after run it shows more like
56 78 84 73 90 44 76 87 92 .
85 67 90 84 74 64 73 78 69 .
87 73 100 54 81 78 69 64 73 65

My suspicion is that you have something wrong with your end of lines; either you have a spurious character, or your end of line isn't correct in some fashion. Most likely you are using a windows file and you are running in Unix, so you have
75CRLF85
and since Unix uses only LF for line terminator, it sees "75CR" endofline "85", not "75" endofline "85" like it should.
In that case you can either do what you did - add a space, though that likely will still leave some 'blank' records in there - or use TERMSTR in your infile statement to tell SAS how to properly read the file in.
Otherwise, you may have some spurious end characters - for example, if you pasted this from the web, it's possible you have a non-breaking space that is not converted to a regular space.
You can find out by doing this:
data _null_;
infile 'c:\rawdata\myfile.dat';
input #;
put _infile_ $HEX60.;
run;
The 60 is 2x the length of the line. That tells you what SAS is seeing. What you should see:
3536203738203834203733203930203434203736203837203932203735
3835203637203930203834203734203634203733203738203639203536
383720373320313030203534203831203738203639203634203733203635
Digits in ASCII are 30+digit, so 35 is a 5, 36 is a 6, etc. Space is 20. The first line:
35|36|20|37|38|20|38|34|20|37|33|20| ...
so 5 6 space 7 8 space 3 8 space 7 3 space. If you see something else after the 37 35, then you know there is a problem. You might see any of the following:
0A = Line feed.
0D = Carriage return.
A0 = Nonbreaking (web) space.
There are lots of other things you could see, but those are the most likely to trip you up. Pasting from the web is often a problem.

Related

Why does SAS skip an entire row of data values due to missing value?

When I run the following code the third observation is not output. Why does SAS omit the third observation?
data info;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62
M 61 72 271
F 29 73 125
M 16 65 178
;
Run;
title "Listing of Dataset Demographics";
proc print data=info;
run;

Defaults will get you, the default in SAS is FLOWOVER, so if a record is missing it looks for it on the next line. You want MISSOVER or TRUNCOVER instead.
Your log tells you this happened with the following note:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
This works:
data info;
infile cards truncover;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62
M 61 72 271
F 29 73 125
M 16 65 178
;
Run;
More details are available in the Example 2 in the documentation here.
Specifically:
When you omit the MISSOVER option or use FLOWOVER (which is the default), SAS moves the
input pointer to line 2 and reads values for TEMP4 and TEMP5 (variables it cannot find). The next
time the DATA step executes, SAS reads a new line which, in this case,
is line 3. This message appears in the SAS log:
NOTE: SAS went to a new line when INPUT statement
reached past the end of a line.

Lines of text do not have "observations". They just have lines.
It didn't skip any of the lines of data. It just used two lines for the second observation because the first of the lines only had values for 3 of the 4 variables the INPUT statement requested.
This behavior is what SAS calls the flowover option of the INFILE statement. This allows you to have more than one line of text to represent the data for a single observation without having to be too persnickety about which fields you insert the line breaks between across the different observations of data.
If you don't want it to have to go hunt for the next field on the next line of text then make sure every variable has a value in the text lines. You can represent missing values by using a period for either numeric or character variables.
So use something like this:
data info;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62 .
M 61 72 271
. 29 73 125
M 16 65 178
;
When using flowover you can insert as many extra line breaks as you want as long as each new observation starts on a new line. Like this
data info;
input Gender $ Age Height Weight;
datalines;
M 45 72
149
F 64
62 .
M
61 72 271
F 29 73 125
M 16 65 178
;
If you want SAS to just give up when a there are no more values on the line use the flowover option on the infile statement.
data info;
infile datalines flowover;
input Gender $ Age Height Weight;
datalines;
M 45 72 149
F 64 62
M 61 72 271
F 29 73 125
M 16 65 178
;
There is also the older missover option, but you would normally never want that as it will set values at the end of the line that too short for an explicit INFORMAT width to missing instead of just use the number of characters that are available.
PS Don't indent lines of data. That will just make the code harder to read and the diagnostic messages about invalid data values harder to interpret. To make it easier don't intend the DATALINES (aka CARDS) statement line either. That will also make it clearer the data step definition ends where the lines of data starts and prevent you from accidentally inserting other statements for the data step after the data.

JREPL.BAT - regex to remove lines in file

I have a source data file which I have been using JREPL.BAT to perform some very simple search and replace operations quite successfully. I now need to expand on that to do 2 jobs.
1. remove all lines that start with the string "appX_formContent". This line contain a lot of html output also, it all needs to be deleted on that line.
2. remove all lines that start with "Hex Payload:" and the subsequent line that comes with it.
This is an example of the input data file which shows 2 records. The delimiter between each record is the row that contains "-----------------".
-----------------
Message Headers
JMSCorrelationID: 60bb7750-e9e2-11e9-98bb-42010a961307
JMSPriority: 4
JMSRedelivered: false
Message Properties
app_trackingId: 190990a2-d8d8-43eb-814a-36ceba7a9111
appX_formInstanceIdentifier: FRM389083
appX_formContent: {"data":{"C7d14a6eb-70e7-402d-9d6e-4efd01ba561c":"N","Y","test.</p>\n<p>test form data to be informed </p>\n<p>...............</p>\n<p><strong>Update</strong></p>\n<p><strong>years</strong>"<p>supervision</p>","<p>:true,"c9377ae2-901d-4461-929c-c76e26dc6183":false}}}
app_sourceSystemId: source
app_eventCode: FORM_OUTPUT
app_instigatingUserId: 66
JMSXGroupSeq: 0
Hex Payload:
25 50 44 46 2D 31 2E 35 0D 0A 34 20 30 20 6F 62 6A 0D 0A 3C 3C 2F 54
-----------------
Message Headers
JMSCorrelationID: 641a80d0-e9e2-11e9-98bb-42010a961307
JMSPriority: 4
JMSTimestamp: 2019 10 08 16:43:40
JMSRedelivered: false
Message Properties
app_trackingId: a3c2fe93-ef71-4611-9605-9858ff67a6e8
appX_formInstanceIdentifier: FRM388843
appX_formContent: {"data":{"C7d14a6eb-70e7-402d-9d6e-4efd01ba561c":"N","Y","test.</p>\n<p>test form data to be informed </p>\n<p>...............</p>\n<p><strong>Update</strong></p>\n<p><strong>years</strong>"<p>supervision</p>","<p>:true,"c9377ae2-901d-4461-929c-c76e26dc6183":false}}}
app_sourceSystemId: source
app_eventCode: FORM_OUTPUT
app_instigatingUserId: 433
JMSXGroupSeq: 0
Hex Payload:
25 50 44 46 2D 31 2E 35 0D 0A 34 20 30 20 6F 62 6A 0D 0A 3C 3C 2F
-----------------
This is the batch file that I use to call jrepl - very simple.
call jrepl ".*(?:appX_formContent: .*)" "" /m /f "inpu.txt" /o "output.txt"
I've only tried to remove the appX_formContent line with the regex but it isn't producing any output. I'm not good with regex so help appreciated.
Not sure how to handle the second task of deleting the Hex Payload: line.

SAS Proc Ttest finding differences and where statement

DATA OZONE;
INPUT MONTH $ STMF YKRS ##;
CARDS;
A 80 66 A 68 82 A 24 47 A 24 28 A 82 44 A 100 55
A 55 34 A 91 60 A 87 70 A 64 41 A . 67 A . 127 A 170 96 A . 56
JN 215 93 JN 230 106 JN . 49 JN 69 64 JN 98 83 JN 125 97
JN 72 51 JN 125 75 JN 143 104 JN 192 107 JN . 56 JN 122 68
JN 32 20 JN 23 35 JN 71 30 JN 38 31 JN 136 81 JN 169 119
JL 152 76 JL 201 108 JL 134 85 JL 206 96 JL 92 48 JL 101 60
JL 133 . JL 83 50 JL . 27 JL 60 37 JL 124 47 JL 142 71
JL 75 49 JL 103 59 JL . 53 JL 46 25 JL 68 45 JL . 78
S 38 23 S 80 50 S 80 34 S 99 58 S 71 35 S 42 24 S 52 27 S 33 17
;
run;
Proc Ttest data=Ozone PLOT=NONE ALPHA=0.01;
Where MONTH='JN';
Paired STMF*YKRS;
Run;
Question 2
Data Baseball;
Input ba league;
Datalines;
276 National League
288 National League
281 National League
290 National League
303 National League
257 American League
254 American League
263 American League
261 American League
Run;
Proc Ttest data=Baseball ALPHA=0.02 ;
Question 3
Proc Ttest data=ozone ALPHA=0.01 Plot=NONE;
Where Month='A'-'S';
Paired STMF*YKRS;
Run;
Question 2 Test to see if both leagues have different batting averages. Use a alpha = 0.02 in your conclusions and compute a 98% confidence interval for the means.
Question 3 From the first question test to see the differences the A and S average ozone values. Use alpha = 0.01 in your conclusions. Include a 99% confidence interval for the difference.
So my questions are for Question 2 kind of a stupid question but for whatever reason I am confused as to what you are suppose to do.
For question 3 (my main question) how do I use one proc ttest to check and see the differences between months A and S? I tried using a Where statement as you can see above, but of course that does not work I’m abit stomped on where to go from here. Also I omited a good bit of the month data in the ozone portion as I couldn't properly format all of the data without it looking extremely confusing.
Thanks for your help in advance!

VSS Hardware Provider Get_TargetLuns copy serial number in m_rgbIdentifier

In Get_Targetluns, i cloned the ZFS volume and shared with targetGroups and get the serial number of the disk
My serial number is in the form of 69 71 6e 2e 32 30 31 30 2d 30 38 2e
6f 72 67 2e
In BSTR string i get the serial number in form of 69716e2e323031302d30382e6f72672e
Now i want to copy this BSTR value to the
rgDestinationLuns[0]->m_deviceIdDescriptor[0]->m_rgIdentifiers[0]->m_rgbIdentifier
When i do the mem copy and copy the string its not working, always getting Missing Disk Error
Even in the VSS trace am not getting the expected value in the
Am getting some thing like this
[ 1:34:09.466 P:03C0 T:0878 CORHWUTC(2805) HWDIAG] * PARAM OUT:
m_rgbIdentifier:
[ 1:34:09.466 P:03C0 T:0878 CORHWUTC(2971) HWDIAG]
* PARAM OUT: 30 31 34 30 42 30 41 41 30 30 35 35 37 36 30 32 0140B0AA00557602
How to assign the BSTR value to the m_rgbIdentifier ?

Separate adress-chunk: making 3 columns out of one

i have a spreadsheed in calc. with some records. There is a column that contains the following information
Ecole Saint-Exupery
Rue Saint-Malo 24
67544 Paris
Well i need to have those lines divided into at least three columns
name: Ecole Saint-Exupery
street: Rue Saint-Malo 24
postal code and town 67544 Paris
Or even better - i have divided the postal code and town into two seperate columns!?
Question: is this possible? Can (or should) i do this in calc (open document-formate)?
Do i need to have to use a regex and perl or am i able to solve this issues without an regex?
Note - finally i need to transfer the data into MySQL-database...
I look forward to a tipp...
greetings
BTW: you can see all the things in a real world-live-demo: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=50&s=1750 - see the filed
Schulname
Straße
PLZ Ort
These field contains three things - the name, the street and the Postal Code and the town!
Question: can this be divided into parts!? If you copy and paste the information - and drop it to calc then you get all the information in only one cell. How to divide and seperate all those information into three cells or even four?
BTW - i tried to translate the information to hex-code - see the follwoing...:
Staatl. Realschule Grafenau
Rachelweg 20
94481 Grafenau
00000000: 53 74 61 61 74 6C 2E 20 52 65 61 6C 73 63 68 75
00000010: 6C 65 20 47 72 61 66 65 6E 61 75 20 0A 52 61 63
00000020: 68 65 6C 77 65 67 20 32 30 0A 39 34 34 38 31 20
00000030: 20 47 72 61 66 65 6E 61 75 20 20
but i do not know if this helps here!??
Can you help me to solve the problem. Do i need to have a regex!?
Many thanks in advance for any and all help!

You may not need a regex. You should be able to take the contents of the cell in question and split it up using the newline character that is present. I am not familiar with calc, but if there is a split() or explode() function that returns an array, then splitting on a newline will yield the 3 pieces you are looking for.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js