Columns not aligning when using sed \t - regex

I have an input file named rectangle.txt and my aim is to re-format the contents and output it to rectangle_f.txt using only sed commands.
I have managed to format it the content from:
Name,Length,Width,Area,Owner
Rec1,9,9,81,Em
Rec2,2,2,4,Soph
(etc...)
To look like this:
Name: Rec1 Length: 9 Width: 9 Area: 81 Owner: Em
Name: Rec2 Length: 2 Width: 2 Area: 4 Owner: Soph
My issue is when it comes to names with double digits (example: Sq10) the tab that I have inserted between Name and Length is one space too long. For example:
Name: Rec9 Length: 6 Width: 6 Area: 36 Owner: Jay
Name: Rec10 Length: 7 Width: 7 Area: 49 Owner: Chris
Does anyone know the best way to make all columns line up?
The code I currently have is:
sed -e "1d" \
-e 's/^/Name: /;s/,/ \t\tHeight: /;s/,/ \t\tWidth: /;s/,/\t\tArea: /;s/,/ \t\tColour: /' rectangle.txt > rectangle_f.txt
I need it to apply only to Rec10 to Rec20 (inclusive) which also are code lines 10-20.
Any help would be great!

Have you tried:
sed 1d file.txt | sed -E 's/([^,]+),([^,]+),([^,]+),([^,]+),([^,]+)/Name: \1\tLength: \2\tWidth: \3\tArea: \4\tOwner: \5\t/'

Related

How to parse text blocks and get a value depending on another value in this block

Let's consider example output of vboxmanage list hdds:
UUID: abc
Parent UUID: base
State: created
Type: normal (base)
Location: /home/me/VirtualBox VMs/not_me/b.vmdk
Storage format: VMDK
Capacity: 100000 MBytes
Encryption: disabled
UUID: def
Parent UUID: base
State: created
Type: normal (base)
Location: /home/me/VirtualBox VMs/my_file/a.vmdk
Storage format: VMDK
Capacity: 100000 MBytes
Encryption: disabled
UUID: ghi
Parent UUID: base
State: created
Type: normal (base)
Location: /home/me/VirtualBox VMs/my_file/a.vmdk
Storage format: VMDK
Capacity: 100000 MBytes
Encryption: disabled
I would like to get output like:
def
ghi
In other words I need UUIDs of disks from /home/me/VirtualBox VMs/my_file and not UUID that belongs to /home/me/VirtualBox VMs/not_me/b.vmdk
Using grep and sed:
vboxmanage list hdds | grep -B 4 '/home/me/VirtualBox VMs/my_file/' |
sed -n 's/^UUID:\s*//p'
Using awk:
vboxmanage list hdds | awk '$1=="UUID:"{uid=$2} /^Location:.*my_file/{print uid}'
Output is
def
ghi
Using perl, with this, the order of lines can change, unlike other responses:
vboxmanage list hdds | perl -lne '
BEGIN{ $/ = "\n\n" }
print $1 if m/UUID:\s+(\w+)\s+.*my_file/s
'
Output
def
ghi

Using Regex to capture content after first occurrence of a string

I've done some research and I'm struggling to figure out how to answer this question. I have the following text and I want to extract the zip code in the business address field:
BUSINESS ADDRESS:
STREET 1: 101 AWESOME DRIVE
STREET 2: P O BOX 144
CITY: HOUSTON
STATE: TX
ZIP: 77027
BUSINESS PHONE: 7138675309
MAIL ADDRESS:
STREET 1: P O BOX 144
CITY: HOUSTON
STATE: TX
ZIP: 77001
This code captures the last instance (77001):
(BUSINESS\s*ADDRESS:)(.*)(ZIP:\s*)(.*)
How can I capture the first zip code (77027)?
Thanks for helping a noob.
Well, in your example you just need to add question mark to (.*?) and specify that zip consists only digits:
BUSINESS\s*ADDRESS:.*?ZIP:\s*(\d+)
By default asterisk and plus are greedy.
And no need to capture things other than zip code
Given:
my $tgt="BUSINESS ADDRESS:
STREET 1: 101 AWESOME DRIVE
STREET 2: P O BOX 144
CITY: HOUSTON
STATE: TX
ZIP: 77027
BUSINESS PHONE: 7138675309
MAIL ADDRESS:
STREET 1: P O BOX 144
CITY: HOUSTON
STATE: TX
ZIP: 77001";
You can do:
print "$1: $2\n" while $tgt=~/^(\S[^:]+):[^\R]*\R.*?^\s+ZIP:\s+(\d+)/gms;
Prints:
BUSINESS ADDRESS: 77027
MAIL ADDRESS: 77001
Same method you can construct a hash mapping the address to the zip for each block.
The match operator running in list context will return all the matching values that were found. So you could do something like this:
my $data = '
BUSINESS ADDRESS:
STREET 1: 101 AWESOME DRIVE
STREET 2: P O BOX 144
CITY: HOUSTON
STATE: TX
ZIP: 77027
BUSINESS PHONE: 7138675309
MAIL ADDRESS:
STREET 1: P O BOX 144
CITY: HOUSTON
STATE: TX
ZIP: 77001
';
my #allzips = ($data =~ /ZIP:\s*(\d+)/g);
foreach my $zip (#allzips) {
print "Found ZIP: $zip\n";
}
Which prints:
Found ZIP: 77027
Found ZIP: 77001
For those about to awk...
There is a tested version below, given that the file is named test.txt in current directory:
awk '{if ($0 ~ /BUSINESS ADDRESS:/) { inzone=1; } if (inzone) {if ($0 ~ /ZIP:/) { print $2; } else if ($0 ~ /MAIL ADDRESS:/) { inzone=0; }}}' test.txt
It will print the second field for all lines containing ZIP:, but only the lines encountered in a block between a line containing BUSINESS ADDRESS: and another line containing MAIL ADDRESS:
The test is below:
awk '{if ($0 ~ /BUSINESS ADDRESS:/) { inzone=1; } if (inzone) {if ($0 ~ /ZIP:/) { print $2; } else if ($0 ~ /MAIL ADDRESS:/) { inzone=0; }}}' test.txt
77027

bash/sed script to get output from a file using a regex

I have a file which contains lines of the form object 0: data: 2, object 0: data: 232132 in between other lines in the file.I need to extract the data values from the file for all object i and store them space separate in a output file say output using bash or sed.It would great if someone can help me in achieving this.
Example input:
num objects: 3
object 0: name: 'x'
object 0: size: 4
object 0: data: 1
object 1: name: 'y'
object 1: size: 4
object 1: data: 3231
object 2: name: 'x'
object 2: size: 4
object 3: data: -32
Example output:
1 3231 -32
You could use something like this:
awk '$3=="data:"{print $4}' file
This outputs the 4th field when the 3rd field is equal to "data:".
Shorter still, you could just match the pattern /data:/:
awk '/data:/{print $4}' file
To output the numbers on the same line, use printf rather than print. To keep things cleaner, you can use an array and print the values in the END block:
awk '/data:/{a[++n]=$4}END{for(i=1;i<=n;++i)printf "%s%s",$4,(i<n?FS:RS)}' file
Using an array like this makes it easy to separate each value with a space FS and add a newline RS at the end.
Any of these commands can produce an output file using redirection > output.

How do i create my own clasiifier

Now I am creating my own classifier for face detection.I have two folder one for storing positive images and other for storing negative images. And I make .txt files for both. Now I want to create training samples of positive imgaes. So I give command 'opencv_createsamples -info positives.txt -vec myvec.vec -w 24 -h 24 '. But It shows like this.It doesn't create any samples.What is the reason?Could any one help me. Thanks in advance.
Info file name: positives.txt
Img file name: (NULL)
Vec file name: myvec.vec
BG file name: (NULL)
Num: 1000
BG color: 0
BG threshold: 80
Invert: FALSE
Max intensity deviation: 40
Max x angle: 1.1
Max y angle: 1.1
Max z angle: 0.5
Show samples: FALSE
Width: 24
Height: 24
Create training samples from images collection...
positives.txt(1) : parse errorDone. Created 0 samples
The info file should not contain only file names, but also ROI specification.
each line should look like this:
path/to/image.bmp num_rois x y width height x y width height ...
For example if you have files that are exactly as big as the sample size, each line should be:
path/to/image.bmp 1 0 0 24 24
note that the path to the image file should be relative to the location of the info file. also the default number of samples is 1000, if you want to include all the samples in your info file you should specify it through the command line.
a good guide can be found on the opencv web site: http://docs.opencv.org/doc/user_guide/ug_traincascade.html#positive-samples

How to parse through a string in perl to extract certain value?

I have following string
> show box detail
2 boxes:
1) Box ID: 1
IP: 127.0.0.1
Interface: 1/1
Priority: 31
2) Box ID: 2
IP: 192.68.1.1
Interface: 1/2
Priority: 31
How to get BOX ID from above string in perl?
The number of boxes here can vary . So based on the number of boxes "n", how to extract box Ids if the show box detail can go upto n nodes in the same format ?
my #ids = $string =~ /Box ID: ([0-9]+)/g;
More restrictive:
my #ids = $string =~ /^[0-9]+\) Box ID: ([0-9]+)$/mg;