Consecutively regex-replace separated values

Consecutively regex-replace separated values - regex

Reading a raster grid file into #grid containing arbitrary numbers, like
82 8 98 98 42 12 3342 321 34 34 09434 9232
(and many more of those rows).
Herein, I do like to replace some numbers, like 34 with 42.
But only single, separated numbers! Eg. I do not want to replace the 34 in 3342.
So for numbers $a (search,eg 34) and $b (replace, eg 42), my approach is
s/(^|\s)$a(\s|$)/$1$b$2/g for #grid;
But this only replaces every second of consecutive occurrences (like 34 34 34 34=>42 34 42 34), because the suffix \s is not taken into account as prefix of the next pattern.
Is there any solution for this problem, other than putting two of those commands back-to-back (which is slow for large arrays)?

You're looking for \b : the boundary between a word char (\w) and something that is not a word char
s/\b$a\b/$b/g
Live DEMO

You can set up a hash that contains your replacement pairs, and then capture each number on a line and do the replacement if that number's a hash key:
use strict;
use warnings;
my %replacements = ( 34 => 42, 8 => 100 );
while (<DATA>) {
s/(\d+)/exists $replacements{$1} ? $replacements{$1} : $1/ge;
print;
}
__DATA__
82 8 98 98 42 12 3342 321 34 34 09434 9232
97 8 8 8 27 37 34 55 19 100 8 34 07932 8
Output:
82 100 98 98 42 12 3342 321 42 42 09434 9232
97 100 100 100 27 37 42 55 19 100 100 42 07932 100
Hope this helps!

Related

Algorithm for finding minimum value of abs(A[i]+A[j]-k)

I have an integer array A containing both positive and negative numbers. I have to find the minimum value of abs(A[i] + A[j] - k), where i != j.
I thought of sorting the array and using the two-pointer approach (as described at https://www.geeksforgeeks.org/two-pointers-technique/) and find the minimum. Time complexity is O(n*log(n)). Can this be done in O(n)?

Assuming that the O(n) requirement applies after any sorting (or that your problem domain supports linear-time sorting), you can use a trivial variation on the two-pointers algorithm (even for the case with two distinct arrays, where presumably one would not require i!=j). Consider the sums of the elements of two sorted arrays laid out in a rectangle:
A= 4 9 17 22 29
B= 7 11 16 24 29 36
19 23 28 36 41 48
20 24 29 37 42 49
35 39 44 52 57 64
Suppose that k=40. By checking the lower-leftmost value (which is smaller), we can immediately rule out most of a column as containing the closest value, since those values must be even smaller:
A= 4 9 17 22 29
B= 7 16 24 29 36
19 28 36 41 48
20 29 37 42 49
35 39 44 52 57 64
So we next check the value to the right (which is to say we increment the pointer into A). It is larger than k, so it eliminates the rest of that row:
A= 4 9 17 22 29
B= 7 16 24 29 36
19 28 36 41 48
20 29 37 42 49
35 39 44
The next move must then be --b. Continuing this way cuts a path through the rectangle:
A= 4 9 17 22 29
B= 7 29 36
19 41
20 29 37 42
35 39 44
You can move either direction (or diagonally) on an exact match (or just bail early if one hit is enough). In general, the path may exit the rectangle other than at a corner. For the case with only one array, you can stop as soon as it hits the diagonal (i.e., when i>=j), disregarding any last value stepped to.
This path obviously has O(n) entries, since at every step it moves up or right (or both). One of them must be the closest to k (here, 4+35 and 22+19 are tied).
See also X+Y sorting; this problem is a sort of "X+Y binary search".

Use ODS Graphics to produce grouped histogram

I have this data set:
data a1q1;
input pid los age gender $ temp wbc anti service $ ;
cards;
1 5 30 F 99 82 2 M
2 10 73 F 98 52 1 M
3 6 40 F 99 122 2 S
4 11 47 F 98 42 2 S
5 5 25 F 99 112 2 S
6 14 82 M 97 61 2 S
7 30 60 M 100 81 1 M
8 11 56 F 99 72 2 M
9 17 43 F 98 72 2 M
10 3 50 M 98 122 1 S
11 9 59 F 98 72 1 M
12 3 4 M 98 32 2 S
13 8 22 F 100 111 2 S
14 8 33 F 98 141 1 S
15 5 20 F 98 112 1 S
16 5 32 M 99 92 2 S
17 7 36 M 99 61 2 S
18 4 69 M 98 62 2 S
19 3 47 M 97 51 2 M
20 7 22 M 98 62 2 S
21 9 11 M 98 102 2 S
22 11 19 M 99 141 2 S
23 11 67 F 98 42 2 M
24 9 43 F 99 52 2 S
25 4 41 F 98 52 2 M
;
I need to use PROC SGPLOT to output an identical, if not, similar barchart that would be outputted from the following PROC:
proc gchart data = a1q1;
vbar wbc / group = gender;
run;
I need PROC SGPLOT to group the two genders together and not stack them. I have tried coding this way but to no avail:
proc sgplot data = a1q1;
vbar wbc / group= gender response =wbc stat=freq nostatlabel;
run;
How would I go about coding to get the output I need?
Thank you for your time!

Sounds like you should use SGPANEL, not SGPLOT. SGPLOT can make grouped bar charts, but not automatically make histogram bins without using a format (you could do that if you want) and doesn't support group with the histogram plot. However, SGPANEL can handle that.
proc sgpanel data=a1q1;
panelby gender;
histogram wbc;
run;

awk command with if and substring

I have an input file (input.txt) that looks like this:
id01 90 5
id01 80 4
id01 79 3
id13 95 5
id01 77 3
id01 85 4
id15 92 5
id17 99 5
id18 65 2
id19 72 3
And I want to output the file as in output.txt:
1 90 5
1 80 4
1 79 3
13 95 5
1 77 3
1 85 4
15 92 5
17 99 5
18 65 2
19 72 3
I did search and was able to find some code example that worked individually (like just the substring part, or just if part) but when I put the entire thing together I am getting syntax errors. I am doing this in ssh environment and I saw there is a slight difference in syntax between sh and bash. Below is what I was able to come up with but gives me syntax errors:
awk -F $'\t' 'BEGIN {OFS = FS} { num = substr($1, 3, 1) if (num == "0") num2 = substr($1,4,1) else num2= substr($1,3,2) {print num2, $2, $3 } }' input.txt > output.txt
I will appreciate any help on this one.
Thanks!

Some like this awk
awk '{sub(/id/,"",$1);$1=$1+0}8' OFS="\t"
1 90 5
1 80 4
1 79 3
13 95 5
1 77 3
1 85 4
15 92 5
17 99 5
18 65 2
19 72 3
Updated to get rid of leading 0

Try this sed,
sed 's/id//g' file.txt
To get rid of the leading zeros,
sed 's/id0*//g' file.txt

Reorder lines and columns by a specific pattern in a text file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I have a tab-separated file where column 1 is either 0 or 16. If Column 1 has 16 then I need to move 2nd and 3rd column parallel to the 2nd and 3rd column where first column has 0. Below is an example.
0 69 24
0 69 27
16 55 27
0 85 25
16 77 23
to
0 69 24 55 27
0 69 27 77 23
0 85 25
I guess this can be done with awk, will appreciate any help.
Thanks

This should work:
awk '
BEGIN{FS=OFS="\t"}
$1==0{zero[++i]=$0;next}
{notzero[++y]=$2"\t"$3}
END{for(c=1;c<NR;c++) print zero[c],notzero[c]}' file
Test:
$ cat file
0 69 24
0 69 27
16 55 27
0 85 25
16 77 23
[JS웃:~/Temp]$ awk 'BEGIN{FS=OFS="\t"}$1==0{zero[++i]=$0;next}{notzero[++y]=$2"\t"$3}END{for(c=1;c<NR;c++) print zero[c],notzero[c]}' file
0 69 24 55 27
0 69 27 77 23
0 85 25

With Perl:
perl -lane '{!$F[0]&&push(#h,$_)||print(shift #h," $F[1] $F[2]")}
END{print for #h}' input

Code for GNU sed:
sed -nr '/0\s/{H;${x;s/\`\n(.*)/\1/mp};d};/16\s/s/16\s(.*)/\1/;H;g;s/\`\n(.*)\n.*\n(.*)\'/\1 \2/p;g;s/\`\n(.*)(\n.*)\n(.*)\'/\2/;h;${s/\`\n(.*)/\1/mp};d' file
$cat file
0 69 24
0 69 27
16 55 27
0 85 25
16 77 23
$sed -nr '/0\s/{H;${x;s/\`\n(.*)/\1/mp};d};/16\s/s/16\s(.*)/\1/;H;g;s/\`\n(.*)\n.*\n(.*)\'/\1 \2/p;g;s/\`\n(.*)(\n.*)\n(.*)\'/\2/;h;${s/\`\n(.*)/\1/mp};d' file
0 69 24 55 27
0 69 27 77 23
0 85 25

c source code to remove subset transactions from text file

I have a file containing data as follows
10 20 30 40 70
20 30 70
30 40 10 20
29 70
80 90 20 30 40
40 45 65 10 20 80
45 65 20
I want to remove all subset transaction from this file.
output file should be like follows
10 20 30 40 70
29 70
80 90 20 30 40
40 45 65 10 20 80
Where records like
20 30 70
30 40 10 20
45 65 20
are removed because of they are subset of other records.

the algorithm could be like this:
sets = list()
f = open("data.txt")
for line in f:
currentSet = set()
for item in line.split():
currentSet.add(int(item))
printIt = True
for s in sets:
if currentSet.issubset(s):
printIt = False
break
if printIt:
print line,
sets.append(currentSet)
Incidentally, this is also a Python program :) Also I believe that an algorithm with better effeciency could be made.
Your next step: rewrite this to C/C++. Good luck :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Consecutively regex-replace separated values - regex

You're looking for \b : the boundary between a word char (\w) and something that is not a word char s/\b$a\b/$b/g Live DEMO

Related

Algorithm for finding minimum value of abs(A[i]+A[j]-k)

Use ODS Graphics to produce grouped histogram

awk command with if and substring

Reorder lines and columns by a specific pattern in a text file [closed]

c source code to remove subset transactions from text file

Categories

Resources