C# How to convert one file format to another format? - file-conversion

I have two pipe-delimited text files (say, A.txt and B.txt).
Below is A.txt file format, a standard format (can think of as Universal Set)
"EmpId"|"FName"|"LName"|"Sex"|"DOB"|"SSN"|"TagId1"|"TagId2"
Below is B.txt file (can think of Sub Set) with column and 2 records.
"SSN"|"LName"|"FName"|"DOB"|"Sex"|"EmpId"
"123"|"Barrat"|"Alanzon"|"1983"|"F"|"4455"
"678"|"Alexia"|"Timothy"|"1975"|"M"|"2222"||"baz"
I need to convert B.txt into A.txt format’s columns order.
Expected result is:
"EmpId"|"FName"|"LName"|"Sex"|"DOB"|"SSN"|"TagId1"|"TagId2"
"4455"|"Alanzon"|"Barrat"|"F"|"1983"|"123"|||
"2222"|" Timothy "|" Alexia"|"M"|"1975"|"678"||"baz"
How to go about it?

Since the two formats are basically the same, except the columns are reordered, I would recommend just reading B.txt into a dictionary (with the column name as the key) and then printing this dictionary to a file, with the columns in the right order.

awk -F'|' -v OFS='|' '{print $6, $3, $2, $5, $4, $1, $7, $8}' B.txt > A.txt

Related

extract data column-wise from text file using regex

I have a text file like
27/02/2017 17:59:39 562803 299060 235155
27/02/2017 17:59:44 562803 299058 235158
27/02/2017 17:59:49 562803 299057 235158
27/02/2017 17:59:54 562803 299057 235158
I want to extract data from a particular column using regex. Which expression will extract the 3rd column?
If you have Linux, awk is the best utility for this
awk '{print $3}' file
awk divides a file into fields, usually separated by tabs or whitespace. $3 represents the third column or field.
Another good utility is cut
cut -d " " -f3 file

Adding commas when necessary to a csv file using regex

I have a csv file like the following:
entity_name,data_field_name,type
Unit,id
Track,id,LONG
The second row is missing a comma. I wonder if there might be some regex or awk like tool in order to append commas to the end of line in case there are missing commas in these rows?
Update
I know the requirements are a little vague. There might be several alternative ways to narrow down the requirements such as:
The header row should define the number of columns (and commas) that is valid for the whole file. The script should read the header row first and find out the correct number of columns.
The number of columns might be passed as an argument to the script.
The number of columns can be hardcoded into the script.
I didn't narrow down the requirements at first because I was ok with any of them. Of course, the first alternative is the best but I wasn't sure if this was easy to implement or not.
Thanks for all the great answers and comments. Next time, I will state acceptable alternative requirements explicitly.
You can use this awk command to fill up all rows starting from 2nd row with the empty cell values based on # of columns in the header row, in order to avoid hard-coding # of columns:
awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} NF{$nc=$nc} 1' file
entity_name,data_field_name,type
Unit,id,
Track,id,LONG
Earlier solution:
awk 'BEGIN{FS=OFS=","} NR==1{nc=NF} {printf "%s", $0;
for (i=NF+1; i<=nc; i++) printf "%s", OFS; print ""}' file
I would use sed,
sed 's/^[^,]*,[^,]*$/&,/' file
Example:
$ echo 'Unit,id' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,
$ echo 'Unit,id,bar' | sed 's/^[^,]*,[^,]*$/&,/'
Unit,id,bar
Try this:
$ awk -F , 'NF==2{$2=$2","}1' file
Output:
entity_name,data_field_name,type
Unit,id,
Track,id,LONG
With another awk:
awk -F, 'NF==2{$3=""}1' OFS=, yourfile.csv
to present balance to all the awk solutions, following could be a vim only solution
:v/,.*,/norm A,
rationale
/,.*,/ searches for 2 comma's in a line
:v apply a global command on each line NOT matching the search
norm A, enters normal mode and appends a , to the end of the line
This MIGHT be all you need, depending on the info you haven't shared with us in your question:
$ awk -F, '{print $0 (NF<3?FS:"")}' file
entity_name,data_field_name,type
Unit,id,
Track,id,LONG

awk search column from one file, if match print columns from both files

I'm trying to compare column 1 from file1 and column 3 from file 2, if they match then print the first column from file1 and the two first columns from file2.
here's a sample from each file:
file1
Cre01.g000100
Cre01.g000500
Cre01.g000650
file2
chromosome_1 71569 |655|Cre01.g000500|protein_coding|CODING|PAC:26902937|1|1)
chromosome_1 93952 |765|Cre01.g000650|protein_coding|CODING|PAC:26903448|11|1)
chromosome_1 99034 |1027|Cre01.g000100 |protein_coding|CODING|PAC:26903318|9|1)
desired output
Cre01.g000100 chromosome_1 99034
Cre01.g000500 chromosome_1 71569
Cre01.g000650 chromosome_1 93952
I've been looking at various threads that are somewhat similar, but I can't seem to get it to print the columns from both files. Here are some links that are somewhat related:
awk compare 2 files, 2 fields different order in the file, print or merge match and non match lines
Obtain patterns from a file, compare to a column of another file, print matching lines, using awk
awk compare columns from two files, impute values of another column
Obtain patterns in one file from another using ack or awk or better way than grep?
Awk - combine the data from 2 files and print to 3rd file if keys matched
I feel like I should have been able to figure it out based on these threads, but it's been two days that I've been trying different variations of the codes and I haven't gotten anywhere.
Here is some code that I've tried using on my files:
awk 'FNR==NR{a[$3]=$1;next;}{print $0 ($3 in a ? a[$3]:"NA")}' file1 file2
awk 'NR==FNR{ a[$1]; next} ($3 in a) {print $1 $2 a[$1]}' file1 file2
awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' file1 file2
I know i have to create a temp matrix that contains the first column of file1 (or the 3rd column of file2) then compare it to the other file. If there is a match, then print first column from file1 and column 1 and column 2 from file 2.
Thanks for the help!
You can use this awk:
awk -F '[| ]+' -v OFS='\t' 'NR==FNR{a[$4]=$1 OFS $2; next}
$1 in a{print $1, a[$1]}' file2 file1
Cre01.g000100 chromosome_1 99034
Cre01.g000500 chromosome_1 71569
Cre01.g000650 chromosome_1 93952
Your middle attempt of the three is closest, but:
You haven't specified the field delimiter is |.
You don't assign to a[$1].
Your sample output is inconsistent with your desired output (the sample output shows column 1 from file 1 and column 1 from file 2; the desired output is reputedly column 1 from file 1 and columns 1 and 2 from file 2, though this interpretation depends on the interpretation of $3 in file 2 being the name between two pipe symbols).
Citing the question at the time this answer was created:
… compare column 1 from file1 and column 3 from file 2, if they match then print the first column from file1 and the two first columns from file2.
desired output
Cre01.g000100 chromosome_1 99034
Cre01.g000500 chromosome_1 71569
Cre01.g000650 chromosome_1 93952
We can observe that if $3 in file 2 is equal to a value from file 1, then it is as easy to print $3 as a saved value.
So, fixing this up:
awk -F'|' 'NR==FNR { a[$1]=1; next } ($3 in a) { print $3, $1 }' file1 file2
The key change is the assignment to a[$1] (and the -F'|'); the rest is cosmetic and can be tweaked to suit your requirements (since the question is self-inconsistent, it is hard to give a better answer).

file comparision using Awk in linux

I have two files
File A.txt (Groupname; Groupid)
wheel:1
www:2
ftpshare:3
others:4
File B.txt (username:UserID:Groupid)
pi:1:1
useradmin:2:3
usertwo:3:3
trout:4:3
apachecri:5:2
guestthree:6:4
I need to create a output where it shows username:userID: Groupname like below
pi:1:wheel
useradmin:2:ftpshare
(and so on)
This needs to be done using awk for a unix class. After spending countless hrs trying to figure it out here is what I came up with.
awk -F ':' 'NR==FNR{ if ($2==[a-z]) a[$1] = $4;next} NF{ print $1, $2, $4}' fileA.txt fileB.txt
OR
awk -F, 'NR==FNR{ a[$2]=$2$1;next } NF{ print $1, $2 ((a[$2]==$2$3)?",ok":",nok") }' FileA.txt FileB.txt
can someone help me figure this out to get the right input and explain it to me what I am doing wrong.
You can use awk:
awk 'BEGIN{FS=OFS=":"} FNR==NR{a[$2]=$1; next} $3 in a{print $1, $2, a[$3]}' a.txt b.txt
pi:1:wheel
useradmin:2:ftpshare
usertwo:3:ftpshare
trout:4:ftpshare
apachecri:5:www
guestthree:6:others
How it works:
BEGIN{FS=OFS=":"} - Make input and output field separator as colon
FNR==NR - Execute this block for fileA only
{a[$2]=$1; next} - Create an associative array a with key as $2 and value as $1 and then skip to next record
$3 in a - Execute this block for 2nd file if $3 is found in array a
print $1, $2, a[$3] Print field1, field2 and a[field3]
I know you said you want to use awk, but you should also consider the standard tool designed for a task like this, namely join. Here is one way you could apply it:
join -o '2.1 2.2 1.1' -t: -1 2 -2 3 <(sort -t: -k2,2n fileA.txt) \
<(sort -t: -k3,3n fileB.txt)
Because the input to join needs to be sorted on the join-field, this method leaves the output unordered, if this is important use anubhava's answer.
Output in this case:
pi:1:wheel
apachecri:5:www
trout:4:ftpshare
useradmin:2:ftpshare
usertwo:3:ftpshare
guestthree:6:others

How to fetch the matched items using awk and regexp?

I am trying to parse "/boot/grub/grubenv" but really not very good at regexp.
Suppose the content of /boot/grub/grubenv is:
saved_entry=1
I want to output the number "1", like below. I am currently using "awk", but open to other tools.
$ awk '/^(saved_entry=)([0-9]+)/ {print $2}' /boot/grub/grubenv
But obviously not working, thanks for the help.
Specify a field separator with -F option:
awk -F= '/^saved_entry=/ {print $2}' /boot/grub/grubenv
$1, $2, .. here represents fields (separated by =), not a backreferences to captured groups.
If you want to match things probably best to use match!
This will work even if there are more fields after and does not need you to change the field separator(incase you are doing any other stuff with the data).
The only drawback of this method is that it will only match the left-most match of the record, so if the data appears twice in the same record(line) then it will only match the first one it finds.
awk 'match($0,/^(saved_entry=)([0-9]+)/,a){print a[2]}' file
Example
input
saved_entry=1 blah blah more stuff
output
1
Explanation
Matches the regex in $0(the record) and then stores anything in brackets as separate array elements.
From the example, there would be these outputs
a[0] is saved_entry=1
a[1] is saved_entry=
a[2] is 1