Parse a file without common delimiter in shell

Parse a file without common delimiter in shell - regex

I would like to ask you for help with parsing a file in shell.
Here is my data:
ID:1 g-t="Demo one" rfid="af7e 25" t-link="http://demo.site.com/api2",User af73 25 http://example.com/useraf73
ID:2 g-t="Demo one" rfid="77 63" t-link="http://demo.site.com/api",User 77 http://example.com/user77
There is no common delimiter, basically I need these fields:
ID=1 | g-t="Demo one" | rfid="af7e 25" | t-link="http://demo.site.com/api2" | User af73 25 | http://example.com/useraf73
Here is where I am stuck:
awk '{match($0,"g-t=([^\" ]+)",a)}END{print a[1]}'
I am trying to match double quote with space but I have no idea why it is not printing the result. All the chars work fine except double quotes.
What I am doing wrong? Awk is not a must here, I am open to suggestions.
Thanks.

It has been quite a while since I regularly used awk but if I remember correctly match() takes only 2 args and END{} happens only once, not for every line like I think you want. Something like:
awk '{match($0,/g-t="([^\"]+")/); print substr($0, RSTART, RLENGTH)}' dataFile
may be closer to what you had in mind?
A brute force Perl one-liner could look something like this:
perl -lne 'if (m/ID:(\S+) g-t="([^"]+)" rfid="([^"]+)" t-link="([^"]+)",User (.*) (http:.*)/){print "$1|$2|$3|$4|$5|$6"}' dataFile
and demonstrates getting all of the fields data separated by OR bars. You can move the () groups around to get more or less of the text you want for each resultant $1, $2 etc... See perldoc perl for more information.

Related

AWK\SED Replace both ^(beginning) and $(end) of a string in a single command

I've been looking around but couldn't find a way to do it with both AWK and SED.
I was wondering if there's a way to replace a string's start and end in a single command.
more specifically, there's a file with a lot of words in it, and I would like to add something before the word and after the word.
Thanks,
Roy

Since you said: more specifically, there's a file with a lot of words in it, and I would like to add something before the word and after the word.
The only thing you need is $& that is match itself. So you simply can write anything that you want just before and end of this whildcard. that's it.
For example say you have this file:
this is line 1.
this is line 2.
this is line 3.
And I tested with perl:
perl -lne 'print "beginning->", $&, "<-end" if /.+/g' file
which the output is:
beginning->this is line 1.<-end
beginning->this is line 2.<-end
beginning->this is line 3.<-end
May you would like to match only one word, so still this is a good solution such as:
perl -lne 'use English; print "$PREMATCH", "[$MATCH]","$POSTMATCH" if /line/g' file
Here I matched line and put around that: [ then $& then ]
the output
this is [line] 1.
this is [line] 2.
this is [line] 3.
NOTE
As you can see the only things you need just are prematch and match and postmatch. I tested it with perl for you, and if you are interesting in Perl you can use it or may you want to use Sed or Awk. Since you have no specific examples I tested with Perl.

If you want to wrap a particular word with markers you can use & in the replacement string to achieve what you want.
For example to put square brackets around every occurrence of the word bird:
$ echo "hello bird, are you really a bird?" | sed "s/\bbird\b/[&]/g"
hello [bird], are you really a [bird]?

to replace a string's start and end in a single command
Let's say we have a test file with line:
tag hello, world tag
To enclose each tag word with angle brackets < ... > we can apply:
awk approach with gsub() function:
awk '{ gsub(/\<tag\>/, "<&>"); print}' test_file
word boundaries \<, \> may differ depending on awk implementations
sed approach:
sed 's/\btag\b/<&>/g' test_file
The output(for both approaches):
<tag> hello, world <tag>

Replace last occurrence of a character in a field with awk

I'm trying to replace the last occurrence of a character in a field with awk. Given is a file like this one:
John,Doe,Abc fgh 123,Abc
John,Doe,Ijk-nop 45D,Def
John,Doe,Qr s Uvw 6,Ghi
I want to replace the last space " " with a comma ",", basically splitting the field into two. The result is supposed to look like this:
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
I've tried to create a variable with the number of occurrences of spaces in the field with
{var1=gsub(/ /,"",$3)}
and then integrate it in
{var2=gensub(/ /,",",var1,$4); print var2}
but the how-argument in gensub does not allow any characters besides numbers and G/g.
I've found a similar thread here but wasn't able to adapt the solution to my problem.
I'm fairly new to this so any help would be appreciated!

With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=","} {$3=gensub(/(.*) /,"\\1,","",$3)}1' file
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
Get the book Effective Awk Programming by Arnold Robbins.
Very well-written question btw!

Here is a short awk
awk '{$NF=RS$NF;sub(" "RS,",")}1' file
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
Updated due to Eds comment.
Or you can use the rev tools.
rev file | sed 's/ /,/' | rev
John,Doe,Abc fgh,123,Abc
John,Doe,Ijk-nop,45D,Def
John,Doe,Qr s Uvw,6,Ghi
Revers the line, then replace first space with ,, then revers again.

Awk 3 Spaces + 1 space or hyphen

I have a rather large chart to parse. Each column is separated by either 4 spaces or by 3 spaces and a hyphen (since the numbers in the chart can be negative).
cat DATA.txt | awk "{ print match($0,/\s\s/) }"
does nothing but print a slew of 0's. I'm trying to understand AWK and when to escape, etc, but I'm not getting the hang of it. Help is appreciated.
One line:
1979 1 -0.176 -0.185 -0.412 0.069 -0.129 0.297 -2.132 -0.334 -0.019
1979 1 -0.176 0.185 -0.412 0.069 -0.129 0.297 -2.132 -0.334 -0.019
I would like to get just, say, the second column. I copied the line, but I'd like to see -0.185 and 0.185.

You need to start by thinking about bash quoting, since it is bash which interprets the argument to awk which will be the awk program. Inside double-quoted strings, bash expands $0 to the name of the bash executable (or current script); that's almost certainly not what you want, since it will not be a quoted string. In fact, you almost never want to use double quotes around the awk program argument, so you should get into the habit of writing awk '...'.
Also, awk regular expressions don't understand \s (although Gnu awk will handle that as an extension). And match returns the position of the match, which I don't think you care about either.
Since by default, awk considers any sequence of whitespace a field separator, you don't really need to play any games to get the fourth column. Just use awk '{print $4}'

Why not just use this simple awk
awk '$0=$4' Data.txt
-0.185
0.185
It sets $0 to value in $4 and does the default action, print.
PS do not use cat with program that can read data itself, like awk
In case of filed 4 containing 0, you can make it more robust like:
awk '{$0=$4}1' Data.txt

If you're trying to split the input according to 3 or 4 spaces then you will get the expected output only from column 3.
$ awk -v FS=" {3,4}" '{print $3}' file
-0.185
0.185
FS=" {3,4}" here we pass a regex as FS value. This regex get parsed and set the Field Separator value to three or four spaces. In regex {min,max} called range quantifier which repeats the previous token from min to max times.

How to fetch the matched items using awk and regexp?

I am trying to parse "/boot/grub/grubenv" but really not very good at regexp.
Suppose the content of /boot/grub/grubenv is:
saved_entry=1
I want to output the number "1", like below. I am currently using "awk", but open to other tools.
$ awk '/^(saved_entry=)([0-9]+)/ {print $2}' /boot/grub/grubenv
But obviously not working, thanks for the help.

Specify a field separator with -F option:
awk -F= '/^saved_entry=/ {print $2}' /boot/grub/grubenv
$1, $2, .. here represents fields (separated by =), not a backreferences to captured groups.

If you want to match things probably best to use match!
This will work even if there are more fields after and does not need you to change the field separator(incase you are doing any other stuff with the data).
The only drawback of this method is that it will only match the left-most match of the record, so if the data appears twice in the same record(line) then it will only match the first one it finds.
awk 'match($0,/^(saved_entry=)([0-9]+)/,a){print a[2]}' file
Example
input
saved_entry=1 blah blah more stuff
output
1
Explanation
Matches the regex in $0(the record) and then stores anything in brackets as separate array elements.
From the example, there would be these outputs
a[0] is saved_entry=1
a[1] is saved_entry=
a[2] is 1

Split string on a backslash ("\") delimiter in awk?

I am trying to split the string in a file based on some delimiter.But I am not able to achieve it correctly... Here is my code below.
awk 'var=split($2,arr,'\'); {print $var}' file1.dat
Here is my sample data guys.
Col1 Col2
abc 123\abc
abcd 123\abcd
Desire output:
Col1 Col2
abc abc
abcd abcd

You don't need to call split. Just use \\ as field separator:
echo 'a\b\c\d' | awk -F\\ '{printf("%s,%s,%s,%s\n", $1, $2, $3, $4)}'
OUTPUT:
a,b,c,d

Sample data and output is my best guess at your requirement
echo '1:2\\a\\b:3' | awk -F: '{
n=split($2,arr,"\\")
# print "#dbg:n=" n
var=arr[3]
print var
}'
output
b
Recall that split returns the number of fields that it found to split. You can uncomment the debug line and you'll see the value 3 returned.
Note also that for my test, I had to use 2 '\' chars for 1 to be processed. I don't think you'll need that in a file, but if this doesn't work with a file, then try adding extra '\' as needed to your data. I tried several variations on how to use '\', and this seems the most straightforward. Others are welcome to comment!
I hope this helps.

As some of the comments mentioned, you have nested single quotes. Switching one set to use double quotes should fix it.
awk 'var=split($2,arr,"\"); {print $var}' file1.dat
I'd prefer piping to another awk command to using split.I don't know that one is better than the other, it is just a preference.
awk '{print $2}' file1.dat | awk -F'\' '{...}'

You need to escape the backslash you're trying to split on. You can do this in you split using double-quotes like this: "\\"
Also, you can take an array slice to make your code more readable (and avoid defining another var). This should work for you:
awk 'NR==1 { print } NR>=2 { split($0,array,"\\"); print $1,array[2] }' file1.dat
HTH

awk '{sub(/123\\/,"")}1' file
Col1 Col2
abc abc
abcd abcd

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse a file without common delimiter in shell - regex

Related

AWK\SED Replace both ^(beginning) and $(end) of a string in a single command

Replace last occurrence of a character in a field with awk

Awk 3 Spaces + 1 space or hyphen

How to fetch the matched items using awk and regexp?

Split string on a backslash ("\") delimiter in awk?

Categories

Resources