Right Align Columns in Text File with Sed - regex

I have a file containing a lot of information that I want to get in a specific format, i.e. add a specific number of spaces between the different columns. I can add the same amount of spaces to every line, but some of the columns need to be right aligned, meaning that I might need to add more spaces in some lines. I have no idea how to do this, and awk doesn't seem to work since I have more than two lines modify.
Here's an example:
I have managed to get a file looking something like this
apple 1 33.413 C cat 10
banana 2 21.564 B horse 356
cherry 3 43.223 D cow 32
pear 4 26.432 A goat 22
raspberry 5 72.639 C eagle 4
watermelon 6 54.436 A fox 976
pumpkin 7 42.654 B mouse 1
peanut 8 36.451 B dog 56
orange 9 57.333 C elephant 32
coconut 10 10.445 A frog 3
blueberry 11 46.435 B camel 446
But I want to get the file on this format
apple 1 33.413 C cat 10
banana 2 21.564 B horse 356
cherry 3 43.223 D cow 32
pear 4 26.432 A goat 22
raspberry 5 72.639 C eagle 4
watermelon 6 54.436 A fox 976
pumpkin 7 42.654 B mouse 1
peanut 8 36.451 B dog 56
orange 9 57.333 C elephant 32
coconut 10 10.445 A frog 3
blueberry 11 46.435 B camel 446
What bash command can I use to right align the second and fifth columns?

You can use printf with width as you want like this:
awk '{printf "%-15s%3d%10s%2s%15s %-5d\n", $1, $2, $3, $4, $5, $6}' file
apple 1 33.413 C cat 10
banana 2 21.564 B horse 356
cherry 3 43.223 D cow 32
pear 4 26.432 A goat 22
raspberry 5 72.639 C eagle 4
watermelon 6 54.436 A fox 976
pumpkin 7 42.654 B mouse 1
peanut 8 36.451 B dog 56
orange 9 57.333 C elephant 32
coconut 10 10.445 A frog 3
blueberry 11 46.435 B camel 446
Feel free to adjust widths to tweak the output.

Related

Find most frequent word in string variables

I have a string variable with different colors:
gen cols="red green red red blue maroon green pink"
I want to find which color in this list appears most frequently.
I tried the count command but this produces wrong results.
There is a community-contributed command that does this in one. tabsplit from tab_chi on SSC is designed for this purpose.
clear
input strL (colors numbers)
"red green red red blue maroon green pink" "87 45 65 87 98 12 90 43"
end
tabsplit colors, sort
colors | Freq. Percent Cum.
------------+-----------------------------------
red | 3 37.50 37.50
green | 2 25.00 62.50
blue | 1 12.50 75.00
maroon | 1 12.50 87.50
pink | 1 12.50 100.00
------------+-----------------------------------
Total | 8 100.00
tabsplit numbers, sort
numbers | Freq. Percent Cum.
------------+-----------------------------------
87 | 2 25.00 25.00
12 | 1 12.50 37.50
43 | 1 12.50 50.00
45 | 1 12.50 62.50
65 | 1 12.50 75.00
90 | 1 12.50 87.50
98 | 1 12.50 100.00
------------+-----------------------------------
Total | 8 100.00
.
EDIT As documented in its help, tabsplit allows options of tabulate as appropriate, including those for saving results. However, that is not especially helpful here as matrow() won't work for string variables. That isn't documented directly but follows from the principle that Stata matrices can't hold strings. matcell() does work here, but knowing the frequencies alone is not especially helpful. The overarching principle is that for many questions involving words within strings a structure with single words in each value of a string variable is much easier to work with.

Linux grep - print numbers from file in x to y

I just asked how to print from -10 and 10, although I understand it now, I have no understanding how I could print from a different range, eg. from -8 to 23.
What I first did
egrep '^-?[0-8]?[0]?[1-9]$' numbers.txt
Prints from -24 to 24
egrep '^[-]?[0-8]$+\.?' numbers.txt
Prints from -8 to 8.
How could I combine each other so the result would be -8 .. 23?
You can for example say:
egrep '^(-?0?[0-8]|9|1[0-9]|2[0-3])$'
This uses a ^(option1|option2|...|option_n)$ to match the following cases:
-?0?[0-8] -8 to 8
9 9
1[0-9] 10 to 19
2[0-3] 20 to 23
My version
egrep --color '[-][1-8]|([0]|[1])[0-9]|[2][0-3]'
[-][1-8] # -1 to -8
([0]|[1])[0-9] # 0-19
[2][0-3] #20-23

Why is grep showing lines that don't match?

I am trying to print out all lines with at least one character that is NOT numeric.
My grep code looks like this: grep '[^[:digit:]]' GTEST
Where GTEST is this:
TEST
55 55 Pink
123
sss
aaa
ss aaa ss
a 1 b 2 a b a
Doop Dap
12 13
77a
59360
And the output is exactly what is in GTEST, except with the matching parts of lines (AKA all of the alpha characters) in red. Instead of displaying the matching characters in red, I /only/ want to print out the lines that contain matching characters.
I've been looking around the grep tags (-o, -w, etc), but none of them seem to do it for me.
Am I missing something?
EDITED:
Expected output would be:
TEST
55 55 Pink
sss
aaa
ss aaa ss
a 1 b 2 a b a
Doop Dap
77a
From your data, I get this output:
grep '[^[:digit:]]' file
TEST
55 55 Pink
sss
aaa
ss aaa ss
a 1 b 2 a b a
Doop Dap
12 13
77a
You get the 12 13, since space between 12 and 13 is a non digit character.
This will also give output if you have space before or after digits, like: 123<space>
To overcome this, you can do like this:
grep '[^[:digit:] ]' file
TEST
55 55 Pink
sss
aaa
ss aaa ss
a 1 b 2 a b a
Doop Dap
77a
Or even better:
grep '[^[:digit:][:blank:]]' file

How to perform "greater than" with sed or awk to delete specific lines?

as tittle, how sed perform "greater than" to delete specific line?
I got a file like this:
bash-4.2$ cat testfile.txt
string1 1 1
AAA 2 2
string2 2 2
BBB 3 3
string3 3 3
string4 4 4
string5 5 5
string6 6 6
CCC 6 6
string7 7 7
string8 8 8
string9 9 9
string10 10 10
DDD 11 11
string11 11 11
string12 12 12
string13 13 13
I wanna delete some lines which contains "string[[:digit:]]" but string1 to string"$num" is needed, num is defined by a variable. For example, I wanna keep those lines which contains string1-5 and delete string6-string99, I'd tried:
#!/bin/bash
read -p "Please Assign the Number of String Line that You Wanna Keep: " num
cat testfile.txt | sed -e "/string[`expr $num + 1`-9]/d" > new_testfile.txt
but it's only working in range 0-8, is there any way to perform it with sed or awk?
This 'awk` should do:
awk '/^string/ {n=substr($1,7)+0;if (n>5 && n<100) next}1' file
string1 1 1
AAA 2 2
string2 2 2
BBB 3 3
string3 3 3
string4 4 4
string5 5 5
CCC 6 6
DDD 11 11
It just skips any line with string"x" where x is larger then 5 and less then 100
If high/low comes from variables, this should do:
high=99
low=6
awk '/^string/ {n=substr($1,7)+0;if (n>=l && n<=h) next}1' h="$high" l="$low" file
string1 1 1
AAA 2 2
string2 2 2
BBB 3 3
string3 3 3
string4 4 4
string5 5 5
CCC 6 6
DDD 11 11
Here is one way with awk:
$ read -p "Please Assign the Number of String Line that You Wanna Keep: " num
Please Assign the Number of String Line that You Wanna Keep: 5
$ awk -v max="$num" '/string/{line=$0;sub(/string/,"",$1);if($1+0<=max){print line};next}1' file
string1 1 1
AAA 2 2
string2 2 2
BBB 3 3
string3 3 3
string4 4 4
string5 5 5
CCC 6 6
DDD 11 11
This is an old question, but if you are coming from a duplicate, perhaps the important thing to understand is that sed does not have any facilities for arithmetic, which is why all the old answers here use Awk.
If you can articulate a regex which reimplements your mathematical constraint as a textual constraint, these things are possible;
sed 's/string\([1-9][0-9]\+\|[6-9]\)//' testfile.txt
To briefly spell this out, this finds and replaces string if it is followed by two or more digits, or a single digit which is 6-9, which implements the requirement by matching the digits as string sequences.
GNU sed also has a limited facility for executing external commands on the matched text with the non-standard flag /e, but my advice would be to switch to Awk at that point anyway, which allows you to reason about mathematical properties of numbers with a more readable and beginner-friendly syntax as well as vastly better efficiency by way of avoiding to spawn an external process for each expression you want to evaluate.

Print specific number of lines furthest from the current pattern match and just before matching another pattern

I have a tab delimited file such as the one below. I want to find the specific number of minimum values in a group. The group starts after finding E in the last column. For example, I want to print two lines (records) that are furthest from, first occurrence of E, the items are sorted in column with E. Here Jack's case and also after second occurrence of E in Gareth's case.
Jack 2 98 E
Jones 6 25 8.11
Mike 8 11 5.22
Jasmine 5 7 4
Simran 5 7 3
Gareth 1 85 E
Jones 4 76 178.32
Mark 11 12 157.3
Steve 17 8 88.5
Clarke 3 7 12.3
Vid 3 7 2.3
I want my result to be
Jasmine 5 7 4
Simaran 5 7 3
Clarke 3 7 12.3
Vid 3 7 2.3
There can be different number of records in a group. I tried with grep
grep -B 2 F$ inputfile.txt
But it repeats the results with E and also does not work with the last record.
quick & dirty:
kent$ awk '/E$/&&a&&b{print b RS a;a=b="";next}{b=a;a=$0}END{print b RS a}' file
Jasmine 5 7 4
Simran 5 7 3
Clarke 3 7 12.3
Vid 3 7 2.3
Using arrays of arrays in Gnu Awk version 4, you can try
gawk -vnum=2 -f e.awk input.txt
where e.awk is:
$4=="E" {
N[j++]=i
i=0
}
{
l[j][++i]=$0
}
END {
N[j]=i; ngr=j
for (i=1; i<=ngr; i++) {
m=N[i]
for (j=m-num+1; j<=m; j++)
print l[i][j]
}
}
I don't see an F in you last column. But assuming you want to get every 2 lines above a line ending in E:
grep -B2 'E$' <(cat inputfile.txt;echo "E")|sed "/E$\|^--/d"
Should do the trick
'E$' look for an "E" at the end of a line
the -B2 gets the 2 lines before as well
<(cat inputfile.txt;echo "E") add an "E" as last line to match the last ones as well (this does not chage the actual file)
sed "/E$\|^--/d" delete all lines ending in "E" or beginning with "--" (separator of grep)
awk '$2 ~/5|3/ && $3 ~/7/' file
Jasmine 5 7 4
Simran 5 7 3
Clarke 3 7 12.3
Vid 3 7 2.3