reorder great number of columns in a text file [closed] - regex

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I need a method (awk/perl/sed/shell) to modify the contents of a file as below:
Before:
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|A1|A2|B1|B2|C1|C2|2013|2013|9999|Y
After:
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|9999|Y|A1|B1|C1|NULL|NULL|NULL|2013|2013
I need to move the last 2 columns after the 9th column, remove the columns 11,13, 15 and also insert the NULL|NULL|NULL in between the 14th and 15th columns (C1|2013). Any tips appreciated. cut command cannot change the order of insertion, so will need to go another way. The input file has 10 million such rows and i'm looking for the best way to do this.

Ugly question calls for ugly solution:
awk -F"|" '{
for(i=1;i<=9;i++) { printf "%s|" ,$i }
printf "%s|%s|",$(NF-1),$NF
for(i=10;i<16;i+=2) { printf "%s|" ,$i }
printf "%s|%s|%s|","NULL","NULL","NULL"
for(i=16;i<(NF-2);i++) { printf "%s|" ,$i }
print $(NF-2)
}' inputFile

Code for GNU sed:
sed -r 's/((\w+\|){9})(\w+\|)\w+\|(\w+\|)\w+\|(\w+\|)\w+(\|\w+\|)(\w+)\|(\w+\|\w+)/\1\8|\3\4\5NULL|NULL|NULL\6\7/' file
$cat file
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|A1|A2|B1|B2|C1|C2|2013|2013|9999|Y
$sed -r 's/((\w+\|){9})(\w+\|)\w+\|(\w+\|)\w+\|(\w+\|)\w+(\|\w+\|)(\w+)\|(\w+\|\w+)/\1\8|\3\4\5NULL|NULL|NULL\6\7/' file
123456|ABCDEF|123|011|A|E|NULL|R|UNKNOWN|9999|Y|A1|B1|C1|NULL|NULL|NULL|2013|2013

You can use awk for this:
awk 'BEGIN{FS=OFS="|"}{print $1,$2,...,"9999|Y",..."NULL|NULL|NULL",...'
$1 is the first field, $2 the second, etc.

Don't want count your columns, but you can get the idea from the next perl script:
perl -F'/\|/' -lanE 'say join("|", $F[2], "NULL", "NULL", $F[0], $F[3], $F[1])'
for the input
123456|ABCDEF|123|011
produces
123|NULL|NULL|123456|011|ABCDEF
tha autosplit mode splits each line on the | character and you can reorder the fields anyhow you need. the join joins fields together with the |.
For fun - pure bash - and slow :)
while IFS='|' read -r a b c d
do
echo "$a|NULL|$d|$c|NULL|$b"
done << EOF
123456|ABCDEF|123|011
EOF
prints
123456|NULL|011|123|NULL|ABCDEF

Related

AWK/SED Split a list if the row starts with OR ends with [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
trying to split my list by redirecting all the lines that match a condition (starts with OR ends with) into a file otherwise in another one.
Trying to use this AWK but seems not working:
awk '{print >out}; /^abc|abc.com$/{out="file2"}' out=file1 MyLargeList.lst
Any help would be appreciated...
The problem is that you assign out on the first match and then never change it again. In short, you seem to assume that out=file1 on the command line will be re-evaluated on each iteration of the script, but this is not true.
Also, you print before you reassign, so the first match goes to the wrong file.
awk '{ if (/^abc|abc.com$/) out="file2"
else out="file1"
print >out }' MyLargeList.lst
As already suggested (without any explanation) in a comment, this can be elegantly but somewhat obscurely rearticulated to use a ternary boolean operator.
awk '{ print > (/^abc|abc.com$/ ? "file2" : "file1") }' MyLargeList.lst
In brief, x ? y : z returns y if x is true, otherwise z.
This might work for you (GNU sed):
sed -ne '/^abc\|abc\.com$/w file2' -e '//!w file1' file
Turn off implicit printing -n.
If a line begins abc or ends abc.com, write to file2, otherwise write to file1.
Or if you only want to separate the first condition, use:
sed '/^abc\|abc\.com$/w file2' file > file1
Of course, grep would probably be quicker:
grep '^abc\|abc\.com$' file > file2
and
grep -v '^abc\|abc\.com$' file > file1
A simple solution for this task, in one awk pass, is:
awk '/^abc|abc\.com$/{print > "file1"; next} {print > "file2"}' file
In general, if you want to print lines to many files, based on many patterns matching, this scales to:
awk '/pattern1/{out=f1;next} /pattern2/{out=f2;next} ... {print > out}' file
where you probably need a default output (for no matching lines) like this:
awk '... /pattern3/{out=f3;next} {out=f} {print > out}' file
and in case of many outputs, to avoid the too many open files error, you may need a close statement at the beginning:
awk '{close(out)} /pattern1/{out=f1} ... {print > out}' file
Testing
Here is an example file:
> cat file
abc
test
abc.com
test
test
abc
end
And the result:
> cat file1
abc
abc.com
abc
> cat file2
test
test
test
end

More elegant way to extract substring in shell [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I wrote regex to get chartname(auth-token-service)). But this seems very crude, can someone write a more precise way.
chartname=`echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | cut -d= -f1 | sed -e "s/^.*-//"`
Gets text between '=' and '/'
sed "s/.*=\(.*\)\/.*/\1/" = xxx.azurecr.io
Gets text between '/' and ':'
sed "s/.*\/\(.*\):.*/\1/" = auth-token-service
Gets text after ':'
sed "s/.*:\(.*\)/\1/" = latest
Not familiar with the format of token, but if I understood correctly you just want the part after the slash and before the colon.
echo my-auth-token-service=xxx.azurecr.io/auth-token-service:latest | sed -e 's/^.\+\/\([^\/]\+\):[^:]\+$/\1/'
Since you asked for a regex solution:
string=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
[[ $string =~ /([^:]*) ]] && chartname=${BASH_REMATCH[1]}
This assumes that the chartname is always between the / and the :. Note that chartname would be unassigned with this, if the reges does not match.
The Unix shell has parameter expansion built in. You can't nest these, so it takes multiple steps, but you avoid the overhead of starting multiple external processes.
var='my-auth-token-service=xxx.azurecr.io/auth-token-service:latest'
chartname=${var%%=*}
chartname=${chartname#*-}
The suffix operator ${var%pattern} returns the value of $var with any suffix matching pattern removed; the ${var#pattern} operator does the same for a prefix match. Doubling the operator changes it to trim the longest possible pattern match instead of the shortest. (These are shell glob patterns, not regular expressions, though.)
If you require a one-liner, you can refactor the cut into the sed script.
chartname=$(sed 's/[^-]*\([^=]*\)=.*/\1/' <<< 'my-auth-token-service=xxx.azurecr.io/auth-token-service:latest')
Notice the modernized syntax $(cmd ...) over the obsolescent `cmd ...` and the Bash "here string" with <<< (not POSIX-compatible though).
With awk only tested on the GNU variant.
var=my-auth-token-service=xxx.azurecr.io/auth-token-service:latest
echo "$var" | awk -F'[=:/]' -vOFS='\n' '{print $1, $2, $3, $NF}'
Output
my-auth-token-service
xxx.azurecr.io
auth-token-service
latest

Regular Expression to extract multiple values from a delimited string [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to extract both i-name & ipaddress from the below string (where ; is delimiter)
INPUT:
i-03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
I was able to retrieve the ipaddress only from this using ([0-9]{1,3}[\.]){3}[0-9]{1,3} but I need both strings in one line
OUTPUT:
i-03ghijklmn345;192.186.40.255
No need for AWK. Use grep:
# Partial Bash script
I_NAME=$(cat your_file | grep -Po 'i-\w+')
IP_ADDR=$(cat your_file | grep -Po '\d{1,3}(?:\.\d{1,3}){3}')
The RegEx is between the single quotes in the commands above.
If you want a awk solution and for a bit of diversity you can use the following commands:
iName=$(awk 'BEGIN{RS=";"}/^i-\w+/{print $1; exit}' inputFile)
ipAddress=$(awk 'BEGIN{RS=";"}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit}' inputFile)
echo $iName
echo $ipAddress
output:
i-03ghijklmn345
192.186.40.255
explanations:
BEGIN{RS=";"} you defined ; as record separator
/^i-\w+/{print $1; exit} when you reach the i-name it will be printed and the process will stop at that point and will not continue analyzing the input string
/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print $1; exit} works the same way to extract the IP address.
finally you assign the result to the 2 variables and display their content or do whatever you want with them.
change the inputFile with what fit your needs.
If you want to put it in one variable use the following awk command:
$ awk 'BEGIN{RS=";"}/^i-\w+/{printf $1;}/([0-9]{1,3}[\.]){3}[0-9]{1,3}/{print ";"$1;exit}' inputFile;
i-03ghijklmn345;192.186.40.255
TESTED:
Considering your pattern, the first field is some sort of an id and so it is inappropriate for an id to contain an asterisk(*). Also the ip address is always enclosed between asterisks(*). In that case below awk would also help.
$ cat 48437686
i03ghijklmn345;abc;xyz;pqr;null;abc;null;null;null;disabled;/dev/sda1;abc;abc: User initiated shutdown;abc;abc;vpc-abc;**192.186.40.255**;abc /dev/sda1 vol-abc 2017-15-14T12:04:17.000z
$ awk -v RS=";" 'BEGIN{oldORS=ORS}NR==1 || /^\*\*.*\*\*$/{gsub(/*/,"");ORS=NR==1?";":oldORS;print}' 48437686
i03ghijklmn345;192.186.40.255
With awk. Set input and output field separator to ; and print columns 1 and 17:
awk 'BEGIN{FS=OFS=";"} {print $1,$17}' file
Output:
i-03ghijklmn345;192.186.40.255

how to make the data in a file inorder? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a testfile.txt as below
/path/ 345 firstline
/path2/ 346 second line
/path3/ 347 third line having spaces
/path4/ 3456 fourthline
Now I want to make it as below
/path/,345,firstline
/path2/,346,second line
/path3/,347,third line having spaces
/path4/,3456,fourthline
According the above output, The data should be formed with 3 columns and all columns separated with commas.There may be one or more spaces between the columns in the input file.Example,
/path/ and 3456
can be separated by 2 tabs also. The third column contains the spaces as it is.
Can anyone can help me on this?
You can use the sed command to achieve this:
I tried this and it worked for me:
sed 's/[ \t]\+/,/' testfile.txt | sed 's/[ \t]\+/,/'
It replaces only the first and second instances of spaces or tabs in your file.
Update
sed (stream editor) uses commands to perform editing. The first parameter 's/[ \t]\+/,/' is the command that is used for replacing one expression with another. Splitting this command further:
Syntax: s/expression1/expression2/
s - substitute
/ - expression separator
[ \t]\+ - (expression1) this is the expression to find and replace (This regular expression matches one or more space or tab characters)
, - (expression2) this is what you want to replace the first expression with
When the above command is run, on each line the first instance of your expression is replaced. Because you need the second delimiter also to be replaced, I piped the output of the first sed command to another similar command. This replaced the second delimiter with a ,.
Hope this helps!
tr -s ' ' <data |sed -e 's/\s/,/' -e 's/\s/,/'
/path/,345,firstline
/path2/,346,second line
/path3/,347,third line having spaces
/path4/,3456,fourthline
Not the best way to do this, but it will generate the desired output.

awk/Perl, select different fields and combine them [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have stream output in the below format which I like to filter and print out specific fields:
customer $2 $3
Address $2 $3 $4 $5 $6
for example print field#2 from line 1 and print field#6 from line 2 and then print them together separated by space.
Can someone share how this can be done in perl, awk or sed..etc?
In awk you can hold data in variables (and use the line number in patterns). For example, in your sample
print field#2 from line 1 and print field#6 from line 2 and then print them together separated by space.
The command would be
awk 'NR==1 {x=$2} NR==2 {print x " " $6}'
awk 'NR==1{s=$2;next} {print s ORS $6 ORS s, $6}' file
This might work for you (GNU sed):
sed -rn '/^customer/{N;s/^((\S+)\s*){2}.*\n((\S+)\s*){6}.*/\2 \4/p}' file