awk command with if and substring

awk command with if and substring - if-statement

I have an input file (input.txt) that looks like this:
id01 90 5
id01 80 4
id01 79 3
id13 95 5
id01 77 3
id01 85 4
id15 92 5
id17 99 5
id18 65 2
id19 72 3
And I want to output the file as in output.txt:
1 90 5
1 80 4
1 79 3
13 95 5
1 77 3
1 85 4
15 92 5
17 99 5
18 65 2
19 72 3
I did search and was able to find some code example that worked individually (like just the substring part, or just if part) but when I put the entire thing together I am getting syntax errors. I am doing this in ssh environment and I saw there is a slight difference in syntax between sh and bash. Below is what I was able to come up with but gives me syntax errors:
awk -F $'\t' 'BEGIN {OFS = FS} { num = substr($1, 3, 1) if (num == "0") num2 = substr($1,4,1) else num2= substr($1,3,2) {print num2, $2, $3 } }' input.txt > output.txt
I will appreciate any help on this one.
Thanks!

Some like this awk
awk '{sub(/id/,"",$1);$1=$1+0}8' OFS="\t"
1 90 5
1 80 4
1 79 3
13 95 5
1 77 3
1 85 4
15 92 5
17 99 5
18 65 2
19 72 3
Updated to get rid of leading 0

Try this sed,
sed 's/id//g' file.txt
To get rid of the leading zeros,
sed 's/id0*//g' file.txt

Related

How to add a row where there is a disruption in series of numbers in Stata

I'm attempting to format a table of 40 different age-race-sex strata to be inputted into R-INLA and noticed that it's important to include all strata (even if they are not present in a county). These should be zeros. However, at this point my table only contains records for strata that are not empty. I can identify places where strata are missing for each county by looking at my strata variable and finding the breaks in the series 1 through 40 (marked with a red x in the image below).
In these places (marked by the red x) I need to add the missing rows and fill in the corresponding county code, strata code, population=0, and the correct corresponding race, sex, age code for the strata.
If I can figure out a way to add an empty row in the spaces with the red Xs from the image, and correctly assign the strata code (and county code) to these empty/missing rows, I am able to populate the rest of the values with the code below:
recode race = 1 & sex= 1 & age =4 if strata = 4
...etc
I'm wondering if there is a way to add the missing rows using an if statement that considers the fact that there are supposed to be forty strata for each county code. It would be ideal if this could populate the correct county code and strata code as well!
Dataex sample data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float OID str5 fips_statecounty double population byte(race sex age) float strata
1 "" 672 1 1 1 1
2 "" 1048 1 1 2 2
3 "" 883 1 1 3 3
4 "" 1129 1 1 4 4
5 "" 574 1 2 1 5
6 "" 986 1 2 2 6
7 "" 899 1 2 3 7
8 "" 1820 1 2 4 8
9 "" 96 2 1 1 9
10 "" 142 2 1 2 10
11 "" 81 2 1 3 11
12 "" 99 2 1 4 12
13 "" 71 2 2 1 13
14 "" 125 2 2 2 14
15 "" 103 2 2 3 15
16 "" 162 2 2 4 16
17 "" 31 3 1 1 17
18 "" 32 3 1 2 18
19 "" 18 3 1 3 19
20 "" 31 3 1 4 20
21 "" 22 3 2 1 21
22 "" 28 3 2 2 22
23 "" 28 3 2 3 23
24 "" 44 3 2 4 24
25 "" 20 4 1 1 25
26 "" 24 4 1 2 26
27 "" 21 4 1 3 27
28 "" 43 4 1 4 28
29 "" 19 4 2 1 29
30 "" 26 4 2 2 30
31 "" 24 4 2 3 31
32 "" 58 4 2 4 32
33 "" 6 5 1 1 33
34 "" 11 5 1 2 34
35 "" 13 5 1 3 35
36 "" 7 5 1 4 36
37 "" 7 5 2 1 37
38 "" 9 5 2 2 38
39 "" 10 5 2 3 39
40 "" 11 5 2 4 40
41 "01001" 239 1 1 1 1
42 "01001" 464 1 1 2 2
43 "01001" 314 1 1 3 3
44 "01001" 232 1 1 4 4
45 "01001" 284 1 2 1 5
46 "01001" 580 1 2 2 6
47 "01001" 392 1 2 3 7
48 "01001" 440 1 2 4 8
49 "01001" 41 2 1 1 9
50 "01001" 38 2 1 2 10
51 "01001" 23 2 1 3 11
52 "01001" 26 2 1 4 12
53 "01001" 34 2 2 1 13
54 "01001" 52 2 2 2 14
55 "01001" 40 2 2 3 15
56 "01001" 50 2 2 4 16
57 "01001" 4 3 1 1 17
58 "01001" 2 3 1 2 18
59 "01001" 3 3 1 3 19
60 "01001" 6 3 2 1 21
61 "01001" 4 3 2 2 22
62 "01001" 6 3 2 3 23
63 "01001" 4 3 2 4 24
64 "01001" 1 4 1 4 28
65 "01003" 1424 1 1 1 1
66 "01003" 2415 1 1 2 2
67 "01003" 1680 1 1 3 3
68 "01003" 1823 1 1 4 4
69 "01003" 1545 1 2 1 5
70 "01003" 2592 1 2 2 6
71 "01003" 1916 1 2 3 7
72 "01003" 2527 1 2 4 8
73 "01003" 68 2 1 1 9
74 "01003" 82 2 1 2 10
75 "01003" 52 2 1 3 11
76 "01003" 54 2 1 4 12
77 "01003" 72 2 2 1 13
78 "01003" 129 2 2 2 14
79 "01003" 81 2 2 3 15
80 "01003" 106 2 2 4 16
81 "01003" 10 3 1 1 17
82 "01003" 14 3 1 2 18
83 "01003" 8 3 1 3 19
84 "01003" 4 3 1 4 20
85 "01003" 8 3 2 1 21
86 "01003" 14 3 2 2 22
87 "01003" 17 3 2 3 23
88 "01003" 10 3 2 4 24
89 "01003" 4 4 1 1 25
90 "01003" 1 4 1 3 27
91 "01003" 2 4 1 4 28
92 "01003" 2 4 2 1 29
93 "01003" 3 4 2 2 30
94 "01003" 4 4 2 3 31
95 "01003" 10 4 2 4 32
96 "01003" 5 5 1 1 33
97 "01003" 4 5 1 2 34
98 "01003" 3 5 1 3 35
99 "01003" 5 5 1 4 36
100 "01003" 5 5 2 2 38
end
label values race race
label values sex sex

My answer to your previous question
Nested for-loop: error variable already defined
detailed how to create a minimal dataset with all strata present. Therefore you should just merge that with your main dataset and replace missings on the absent strata with whatever your other software expects, zeros it seems.
The complication most obvious at this point is you need to factor in a county variable. I can't see any information on how many counties you have in your dataset, which may affect what is practical. You should be able to break down the preparation into: first, prepare a minimal county dataset with identifiers only; then merge that with a complete strata dataset.

Replace middle row elements of nested list with new list elements Q kdb

Hi so I have created the nested list/matrix:
q)m:((1 2 3);(4 5 6);(7 8 9))
q)m
1 2 3
4 5 6
7 8 9
I have also identified the middle column in the list:
q)a:m[0;1],m[1;1],m[2;1]
I now want to replace the middle row (4 5 6) with a to finish with m looking like:
q)m
1 2 3
2 5 8
7 8 9

You've already seen you can index into the matrix with syntax like m[0;1], where 0 refers to the first level of nesting and 1 refers to the second level
KDB also allows you to assign to an index of a list in a similar way e.g.
q)l:1 2 3 4
q)l[1]:20
q)l
1 20 3 4
So you can use something similar in this example:
q)m[1]:a
q)m
1 2 3
2 5 8
7 8 9
As an aside, KDB also allows you to leave out an index, in which case it will take all items from the corresponding level of nesting, e.g.
q)m[0] /first level of nesting i.e. first row
1 2 3
q)m[;0] /second level of nesting i.e. first column
1 4 7
Hope that helps
Jonathon McMurray
AquaQ Analytics

You want to generalise for larger matrices (which must also be square) so your answer needs two parts:
how to construct a
how to insert it
for row/col x where x<count m.
The general expression you want is simply m[x;]:m[;x], because m[x;] denotes row x and m[;x] denotes column x.
See Q for Mortals 3.11.3 Two- and Three-Dimensional Matrices
You can make this a function of the index and the matrix:
q)show m:5 5#1_til 26
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
q){y[x;]:y[;x];:y}[3;m]
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
4 9 14 19 24
21 22 23 24 25

Just adding another approach for you.
q)m:8 cut til 64
q)0 0+\:til 8
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
q)(m)./:flip 0 0+\:til 8
0 9 18 27 36 45 54 63
q)#[m;4;:;(m)./:flip 0 0+\:til 8]
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
0 9 18 27 36 45 54 63
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
q)
For fun, here it is in a function which takes the length&width of the matrix and replaces the 'middle' row with the diagonal values
q){n:x*x;m:x cut til n;#[m;x div 2;:;](m)./:flip 0 0+\:til x}8
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
0 9 18 27 36 45 54 63
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
q){n:x*x;m:x cut til n;#[m;x div 2;:;](m)./:flip 0 0+\:til x}5
0 1 2 3 4
5 6 7 8 9
0 6 12 18 24
15 16 17 18 19
20 21 22 23 24
q){n:x*x;m:x cut til n;#[m;x div 2;:;](m)./:flip 0 0+\:til x}4
0 1 2 3
4 5 6 7
0 5 10 15
12 13 14 15
q)

q)#[((1 2 3);(4 5 6);(7 8 9));1;:;(2;5;8)]
1 2 3
2 5 8
7 8 9

Indexing in q can be straight forward and I believe a intermediate can be omitted:
q)m:((1 2 3);(4 5 6);(7 8 9))
q)m[1]:m[;1]
q)m
1 2 3
2 5 8
7 8 9

SAS macro variables in PROC MIXED

This is my first foray into using SAS macros, and I'm following this page from the amazing UCLA Stats Consulting Group. I'm interested in using macro variables in PROC MIXED to avoid copying and pasting blocks of code (my actual data set has ~400 variables).
My example modifies the UCLA example to have students in many schools.
data hsb3;
input id school female race ses prog
read write math science socst;
datalines;
1 1 0 4 1 1 57 52 41 47 57
2 1 1 4 2 3 68 59 53 63 61
3 1 0 2 3 1 44 33 54 58 31
4 1 0 4 3 3 63 44 47 53 56
5 1 0 4 2 2 47 51 43 50 61
6 1 1 4 2 2 44 52 51 50 61
7 1 0 3 2 1 50 59 60 56 52
8 1 0 1 2 2 34 46 52 53 57
9 1 0 4 2 2 63 57 51 63 61
19 2 0 3 1 2 57 63 41 63 61
20 2 1 4 2 2 60 57 51 58 31
21 2 0 4 3 2 57 55 51 53 56
22 2 0 4 3 2 73 46 71 50 61
23 2 0 4 2 1 54 65 57 50 61
24 2 1 4 2 2 45 60 50 56 52
25 2 0 3 2 1 42 63 43 53 57
26 2 0 1 1 2 34 57 51 63 61
27 2 0 4 2 2 63 49 60 55 31
10 3 1 3 2 2 57 55 51 55 31
11 3 1 4 3 3 60 46 71 31 56
12 3 1 4 2 2 57 66 57 55 61
13 3 0 3 3 2 50 60 50 31 61
14 3 0 4 3 2 57 57 57 55 46
15 3 0 3 3 3 68 55 50 31 56
16 3 0 4 1 2 34 46 43 50 56
17 3 0 4 3 2 34 65 51 50 56
18 3 0 4 1 2 63 60 60 47 57
28 4 1 3 2 2 57 52 52 53 61
29 4 1 4 2 3 60 57 51 63 61
30 4 1 1 2 2 57 65 51 55 46
31 4 0 4 3 2 73 60 71 31 56
32 4 0 4 3 2 54 63 57 55 46
33 4 0 3 1 2 45 57 50 31 56
34 4 0 1 1 1 42 49 43 50 56
35 4 0 4 3 2 47 52 51 50 56
36 4 0 4 2 1 57 57 60 56 52
;
run;
The UCLA example shows how to use macro variables with proc reg to do several simple linear regression models to predict reading score with any of the other variables:
%let indvars = write math female socst;
proc reg data = hsb3;
model read = &indvars;
run;
quit;
To do this taking school into account, we can use PROC MIXED instead:
proc mixed data = hsb3;
class school;
model read = &indvars;
random school;
run;
quit;
But what I really want to do is to see if any of the scores differ by gender (still taking school into account).
%let scores = read write math science socst;
proc mixed data = hsb3;
class school;
model &scores = female;
random school;
run;
quit;
Now I get the error:
NOTE: The SAS System stopped processing this step because of errors.
167 class school;
168 model &indvars = female;
-
22
200
NOTE: Line generated by the macro variable "INDVARS".
1 write math female socst
----
73
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, (, *, -, /, :, #,
_CHARACTER_, _CHAR_, _NUMERIC_, |.
ERROR 200-322: The symbol is not recognized and will be ignored.
ERROR 73-322: Expecting an =.
Somehow the macro variable is not working. Is there a problem with using macro variables as a response variable in PROC MIXED? They work as a response variable in PROC REG....
proc reg data = hsb3;
model &scores = female;
run;
quit;

Your problem doesn't have anything to do with macro variables or macro code. Instead you are not creating a valid MODEL statement to use in PROC MIXED.
The MODEL statement names a single dependent variable ...
Try transforming the data perhaps?
%let scores = read write math science socst;
data want ; set hsb3 ;
array scores &scores ;
do i=1 to dim(scores);
score=scores(i);
name=vname(scores(i));
output;
end;
run;
proc sort; by name ; run;
proc mixed data = want;
by name;
class school;
model score = female;
random school;
run;

Consecutively regex-replace separated values

Reading a raster grid file into #grid containing arbitrary numbers, like
82 8 98 98 42 12 3342 321 34 34 09434 9232
(and many more of those rows).
Herein, I do like to replace some numbers, like 34 with 42.
But only single, separated numbers! Eg. I do not want to replace the 34 in 3342.
So for numbers $a (search,eg 34) and $b (replace, eg 42), my approach is
s/(^|\s)$a(\s|$)/$1$b$2/g for #grid;
But this only replaces every second of consecutive occurrences (like 34 34 34 34=>42 34 42 34), because the suffix \s is not taken into account as prefix of the next pattern.
Is there any solution for this problem, other than putting two of those commands back-to-back (which is slow for large arrays)?

You're looking for \b : the boundary between a word char (\w) and something that is not a word char
s/\b$a\b/$b/g
Live DEMO

You can set up a hash that contains your replacement pairs, and then capture each number on a line and do the replacement if that number's a hash key:
use strict;
use warnings;
my %replacements = ( 34 => 42, 8 => 100 );
while (<DATA>) {
s/(\d+)/exists $replacements{$1} ? $replacements{$1} : $1/ge;
print;
}
__DATA__
82 8 98 98 42 12 3342 321 34 34 09434 9232
97 8 8 8 27 37 34 55 19 100 8 34 07932 8
Output:
82 100 98 98 42 12 3342 321 42 42 09434 9232
97 100 100 100 27 37 42 55 19 100 100 42 07932 100
Hope this helps!

Reorder lines and columns by a specific pattern in a text file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a tab-separated file where column 1 is either 0 or 16. If Column 1 has 16 then I need to move 2nd and 3rd column parallel to the 2nd and 3rd column where first column has 0. Below is an example.
0 69 24
0 69 27
16 55 27
0 85 25
16 77 23
to
0 69 24 55 27
0 69 27 77 23
0 85 25
I guess this can be done with awk, will appreciate any help.
Thanks

This should work:
awk '
BEGIN{FS=OFS="\t"}
$1==0{zero[++i]=$0;next}
{notzero[++y]=$2"\t"$3}
END{for(c=1;c<NR;c++) print zero[c],notzero[c]}' file
Test:
$ cat file
0 69 24
0 69 27
16 55 27
0 85 25
16 77 23
[JS웃:~/Temp]$ awk 'BEGIN{FS=OFS="\t"}$1==0{zero[++i]=$0;next}{notzero[++y]=$2"\t"$3}END{for(c=1;c<NR;c++) print zero[c],notzero[c]}' file
0 69 24 55 27
0 69 27 77 23
0 85 25

With Perl:
perl -lane '{!$F[0]&&push(#h,$_)||print(shift #h," $F[1] $F[2]")}
END{print for #h}' input

Code for GNU sed:
sed -nr '/0\s/{H;${x;s/\`\n(.*)/\1/mp};d};/16\s/s/16\s(.*)/\1/;H;g;s/\`\n(.*)\n.*\n(.*)\'/\1 \2/p;g;s/\`\n(.*)(\n.*)\n(.*)\'/\2/;h;${s/\`\n(.*)/\1/mp};d' file
$cat file
0 69 24
0 69 27
16 55 27
0 85 25
16 77 23
$sed -nr '/0\s/{H;${x;s/\`\n(.*)/\1/mp};d};/16\s/s/16\s(.*)/\1/;H;g;s/\`\n(.*)\n.*\n(.*)\'/\1 \2/p;g;s/\`\n(.*)(\n.*)\n(.*)\'/\2/;h;${s/\`\n(.*)/\1/mp};d' file
0 69 24 55 27
0 69 27 77 23
0 85 25

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

awk command with if and substring - if-statement

Some like this awk awk '{sub(/id/,"",$1);$1=$1+0}8' OFS="\t" 1 90 5 1 80 4 1 79 3 13 95 5 1 77 3 1 85 4 15 92 5 17 99 5 18 65 2 19 72 3 Updated to get rid of leading 0

Try this sed, sed 's/id//g' file.txt To get rid of the leading zeros, sed 's/id0*//g' file.txt

Related

How to add a row where there is a disruption in series of numbers in Stata

Replace middle row elements of nested list with new list elements Q kdb

SAS macro variables in PROC MIXED

Consecutively regex-replace separated values

Reorder lines and columns by a specific pattern in a text file [closed]

Categories

Resources