match variable string at end of field with awk - regex

Yet again my unfamiliarity with AWK lets me down, I can't figure out how to match a variable at the end of a line?
This would be fairly trivial with grep etc, but I'm interested in matching integers at the end of a string in a specific field of a tsv, and all the posts suggest (and I believe it to be the case!) that awk is the way to go.
If I want to just match a single one explicity, that's easy:
Here's my example file:
PVClopT_11 PAU_02102 PAU_02064 1pqx 1pqx_A 37.4 13 0.00035 31.4 >1pqx_A Conserved hypothetical protein; ZR18,structure, autostructure,spins,autoassign, northeast structural genomics consortium; NMR {Staphylococcus aureus subsp} SCOP: d.267.1.1 PDB: 2ffm_A 2m6q_A 2m8w_A No DOI found.
PVCpnf_18 PAK_3526 PAK_03186 3fxq 3fxq_A 99.7 2.7e-21 7e-26 122.2 >3fxq_A LYSR type regulator of TSAMBCD; transcriptional regulator, LTTR, TSAR, WHTH, DNA- transcription, transcription regulation; 1.85A {Comamonas testosteroni} PDB: 3fxr_A* 3fxu_A* 3fzj_A 3n6t_A 3n6u_A* 10.1111/j.1365-2958.2010.07043.x
PVCunit1_19 PAU_02807 PAU_02793 3kx6 3kx6_A 19.7 45 0.0012 31.3 >3kx6_A Fructose-bisphosphate aldolase; ssgcid, NIH, niaid, SBRI, UW, emerald biostructures, glycolysis, lyase, STRU genomics; HET: CIT; 2.10A {Babesia bovis} No DOI found.
PVClumt_17 PAU_02231 PAU_02190 3lfh 3lfh_A 39.7 12 0.0003 28.9 >3lfh_A Manxa, phosphotransferase system, mannose/fructose-speci component IIA; PTS; 1.80A {Thermoanaerobacter tengcongensis} No DOI found.
PVCcif_11 plu2521 PLT_02558 3h2t 3h2t_A 96.6 2.6e-05 6.7e-10 79.0 >3h2t_A Baseplate structural protein GP6; viral protein, virion; 3.20A {Enterobacteria phage T4} PDB: 3h3w_A 3h3y_A 10.1016/j.str.2009.04.005
PVCpnf_16 PAU_03338 PAU_03377 5jbr 5jbr_A 29.2 22 0.00058 23.9 >5jbr_A Uncharacterized protein BCAV_2135; structural genomics, PSI-biology, midwest center for structu genomics, MCSG, unknown function; 1.65A {Beutenbergia cavernae} No DOI found.
PVCunit1_17 PAK_2892 PAK_02622 1cii 1cii_A 63.2 2.7 6.9e-05 41.7 >1cii_A Colicin IA; bacteriocin, ION channel formation, transmembrane protein; 3.00A {Escherichia coli} SCOP: f.1.1.1 h.4.3.1 10.1038/385461a0
PVCunit1_11 PAK_2886 PAK_02616 3h2t 3h2t_A 96.6 1.9e-05 4.9e-10 79.9 >3h2t_A Baseplate structural protein GP6; viral protein, virion; 3.20A {Enterobacteria phage T4} PDB: 3h3w_A 3h3y_A 10.1016/j.str.2009.04.005
PVCpnf_11 PAU_03343 PAU_03382 3h2t 3h2t_A 97.4 4.4e-07 1.2e-11 89.7 >3h2t_A Baseplate structural protein GP6; viral protein, virion; 3.20A {Enterobacteria phage T4} PDB: 3h3w_A 3h3y_A 10.1016/j.str.2009.04.005
PVCunit1_5 afp5 PAU_02779 4tv4 4tv4_A 63.6 2.6 6.7e-05 30.5 >4tv4_A Uncharacterized protein; unknown function, ssgcid, virulence, structural genomics; 2.10A {Burkholderia pseudomallei} No DOI found.
And I can pull out all the lines which have a "_11" at the end of the first column by running the following on the commandline:
awk '{ if ($1 ~ /_11$/) { print } }' 02052017_HHresults_sorted.tsv
I want to enclose this in a loop to cover all integers from 1 - 5 (for instance), but I'm having trouble passing a variable in to the text match.
I expect it should be something like the following, but $i$ seems like its probably incorrect and by google-fu failed me:
awk 'BEGIN{ for (i=1;i<=5;i++){ if ($1 ~ /_$i$/) { print } } }' 02052017_HHresults_sorted.tsv
There may be other issues I haven't spotted with that awk command too, as I say, I'm not very awk-savvy.
EDIT FOR CLARIFICATION
I want to separate out all the matches, so can't use a character class. i.e. I want all the lines ending in "_1" in one file, then all the ones ending in "_2" in another, and so on (hence the loop).

You can't put variables inside //. Use string concatenation, which is done by simply putting the strings adjacent to each other in awk. You don't need to use a regexp literal when you use the ~ operator, it always treats the second argument as a regexp.
awk '{ for (i = 1; i <= 5; i++) {
if ( $1 ~ ("_" i "$") ) { print; break; }
}' 02052017_HHresults_sorted.tsv

It sounds like you're thinking about this all wrong and what you really need is just (with GNU awk for gensub()):
awk '{ print > ("out" gensub(/.*_/,"",1,$1)) }' 02052017_HHresults_sorted.tsv
or with any awk:
awk '{ n=$1; sub(/.*_/,"",n); print > ("out" n) }' 02052017_HHresults_sorted.tsv

No need to loop, use regex character class [..]:
awk 'match($1,/_([1-5])$/,a){ print >> a[1]".txt" }' 02052017_HHresults_sorted.tsv

Related

How do I retrieve values from successive lines in perl?

I have this data below,called data.txt, I want to retrieve four columns from this data. First, I want to retrieve degradome category, then p-value, then the text before and after Query:. So the result should look like this(showing the first row only):
Degardome Category: 3 Degradome p-value: 0.0195958324320822 3' UGACGUUUCAGUUCCCAGUAU 5' Seq_3694_200
data.txt:
5' CCGGUAAGGUUAUGGGUCAUG 3' Transcript: Supercontig_2.8_1446328:1451-1471 Slice Site:1462
|o||o||o| |||||||o
3' UGACGUUUCAGUUCCCAGUAU 5' Query: Seq_3694_200
SiteID: Supercontig_2.8_1446328:1462
MFE of perfect match: -36.10
MFE of this site: -23.60
MFEratio: 0.653739612188366
Allen et al. score: 7.5
Paired Regions (query5'-query3',transcript3'-transcript5')
1-8,1471-1464
10-18,1462-1454
Unpaired Regions (query5'-query3',transcript3'-transcript5')
9-9,1463-1463 SIL: Symmetric internal loop
19-21,1453-1451 UP3: Unpaired region at 3' of query
Degradome data file: /media/owner/newdrive/phasing/degradome/_degradome.20171210/bbduk_trimmed/merged_HV2.fasta_dd.txt
Degardome Category: 3
Degradome p-value: 0.0195958324320822
T-Plot file: T-plots-IGR/Seq_3694_200_Supercontig_2.8_1446328_1462_TPlot.pdf
Position Reads Category
1462 4 3 <<<<<<<<<<
2949 7 3
4179 517 0
---------------------------------------------------
---------------------------------------------------
5' GGUGAGGAGGGGGGUUUG-GUC 3' Transcript: Supercontig_2.8_1511075:1311-1331 Slice Site:1323
| |||||oo||| |||o |||
3' AC-CUCCUUUCCCGAAAUACAG 5' Query: Seq_2299_664
SiteID: Supercontig_2.8_1511075:1323
MFE of perfect match: -37.90
MFE of this site: -25.30
MFEratio: 0.66754617414248
Allen et al. score: 8
Paired Regions (query5'-query3',transcript3'-transcript5')
1-3,1331-1329
5-8,1328-1325
10-19,1323-1314
20-20,1312-1312
Unpaired Regions (query5'-query3',transcript3'-transcript5')
4-4,x-x BULq: Bulge on query side
9-9,1324-1324 SIL: Symmetric internal loop
x-x,1313-1313 BULt: Bulge on transcript side
21-21,1311-1311 UP3: Unpaired region at 3' of query
Degradome data file: /media/owner/newdrive/phasing/degradome/_degradome.20171210/bbduk_trimmed/merged_HV2.fasta_dd.txt
Degardome Category: 4
Degradome p-value: 0.013385336399181
I tried to do this for before and after values, then I keep getting errors. Sorry I am new to perl and would really appreciate your help. Here are some of the codes I tried:
#!/usr/bin/perl
use warnings;
use strict;
use LWP::Simple;
use Modern::Perl;
my word = "Query:";
my $filename = $ARGV[0];
open(INPUT_FILE, $filename);
while (<<>>) {
chomp;
my ($before, $after) = m/(.+)(?:\t\Q$word\E:\t)(.+)/i;
say "word: $word\tbefore: $before\tafter: $after";
}
Since you need straight pieces of data from each section, and both sections and data come clearly demarcated, the only question is of what data structure to use. Given that you want mere lines with values collected from each section a simple array should be fine.
It is known that the phrases of interest, Query: then Degardome Category: N then p-value, are unique to the context and places shown in the sample.
use warnings;
use strict;
use feature 'say';
my $file = shift || die "Usage $0 file\n";
open my $fh, '<', $file or die "Can't open $file: $!";
my (#res, #query, $category, $pvalue);
while (<$fh>) {
next if not /\S/;
if (/(.*?)\s+Query:\s+(.*)/) {
#query = ($1, $2);
next;
}
if (/^\s*(Degardome Category:\s+[0-9]+)/) {
$category = $1;
}
elsif (/^\s*(Degradome p-value:\s+[0-9.]+)/) {
$pvalue = $1;
push #res, [$category, $pvalue, #query];
}
}
say "#$_" for #res;
The end of a section is detected with the p-value: line, at which point we add to the #res an arrayref with all needed values captured up to that point.
The regex throughout depends on properties of data seen in the sample. Please review and adjust if some of my assumptions aren't right.
Details can also be pried from data more precisely, even by simply adding capture groups to the regexes above (and saving those captures into additional data structures).

Extracting part of lines with specific pattern and sum the digits using bash

I am just learning bash scripting and commands and i need some help with this assignment.
I have txt file that contains the following text and i need to:
Extract guest name ( 1.1.1 ..)
Sum guest result and output the guest name with result.
I used sed with simple regex to extract out the name and the digits but i have no idea about how to summarize the numbers becuase the guest have multiple lines record as you can see in the txt file. Note: i can't use awk for processing
Here is my code:
cat file.txt | sed -E 's/.*([0-9]{1}.[0-9]{1}.[0-9]{1}).*([0-9]{1})/\1 \2/'
And result is:
1.1.1 4
2.2.2 2
1.1.1 1
3.3.3 1
2.2.2 1
Here is the .txt file:
Guest 1.1.1 have "4
Guest 2.2.2 have "2
Guest 1.1.1 have "1
Guest 3.3.3 have "1
Guest 2.2.2 have "1
and the output should be:
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
Thank you in advance
I know your teacher wont let you use awk but, since beyond this one exercise you're trying to learn how to write shell scripts, FYI here's how you'd really do this job in a shell script:
$ awk -F'[ "]' -v OFS=' = ' '{sum[$2]+=$NF} END{for (id in sum) print id, sum[id]}' file
3.3.3 = 1
2.2.2 = 3
1.1.1 = 5
and here's a bash builtins equivalent which may or may not be what you've covered in class and so may or may not be what your teacher is expecting:
$ cat tst.sh
#!/bin/env bash
declare -A sum
while read -r _ id _ cnt; do
(( sum[$id] += "${cnt#\"}" ))
done < "$1"
for id in "${!sum[#]}"; do
printf '%s = %d\n' "$id" "${sum[$id]}"
done
$ ./tst.sh file
1.1.1 = 5
2.2.2 = 3
3.3.3 = 1
See https://www.artificialworlds.net/blog/2012/10/17/bash-associative-array-examples/ for how I'm using the associative array. It'll be orders of magnitude slower than the awk script and I'm not 100% sure it's bullet-proof (since shell isn't designed to process text there are a LOT of caveats and pitfalls) but it'll work for the input you provided.
OK -- since this is a class assignment, I will tell you how I did it, and let you write the code.
First, I sorted the file. Then, I read the file one line at a time. If the name changed, I printed out the previous name and count, and set the count to be the value on that line. If the name did not change, I added the value to the count.
Second solution used an associative array to hold the counts, using the guest name as the index. Then you just add the new value to the count in the array element indexed on the guest name.
At the end, loop through the array, print out the indexes and values.
It's a lot shorter.

Replace empty CSV value with NULL using sed

I have an CSV file which looks like this:
16949839,49.5474463,8.6692215,4,31336605,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Schau- und Sichtungsgarten Hermannshof,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
17025149,49.5444114,8.6715051,2,670583,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Voliere,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
24557757,50.0550103,11.8494971,5,8289559,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1000-Meter-Stein,stone,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
25505794,50.0407824,11.8647266,5,7301040,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Nußhardtstube,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
25631399,49.8270356,11.3753338,9,39834385,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sophienhöhle,cave_entrance,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
25932371,50.0527832,11.2319208,2,6747603,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Gaaskerng,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
26309933,50.1225540,11.3759787,2,11211206,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Staafelsn,cliff,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
26945605,49.7668812,11.4281553,7,39554170,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Großes Hasenloch,cave_entrance,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
27133657,49.6351754,8.4938556,2,1638377,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Schießbuckel,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
27133658,49.6339946,8.4853839,1,9218,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Grillhütte,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
27153008,50.1229239,11.4117973,1,7826,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Schwedenschanze,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
27374034,51.5455027,12.9933573,2,4177298,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Sommer-Rodelbahn,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
27440567,50.7319121,7.0993708,10,11317500,,,,,,,,bus_station,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Busbahnhof Bonn,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
27543115,51.8473136,12.2314568,8,19447141,,,,,,,,,,,,,,,,,,,,,,,,,,,,,building,,,,,,,,,,,Weißer Bogen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,,
28788682,51.3384522,10.8616651,9,31145828,,,,,,,,,,,,,,,,,,,,,,,430,,,,,,,,,,,,,,,,,Freizeit und Erholungpark "Zum Possen",,,,,,,,,,,,,,,,,,,,attraction,,,,,,,de:Possen (Sondershausen),
And I want to replace every empty CSV value with NULL. Therefor I'm using sed:
sed -r 's;^,|,$;NULL,;g
:l
s;,,;,NULL,;g
t l'
Which creates this:
16949839,49.5474463,8.6692215,4,31336605,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schau- und Sichtungsgarten Hermannshof,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
17025149,49.5444114,8.6715051,2,670583,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Voliere,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
24557757,50.0550103,11.8494971,5,8289559,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,1000-Meter-Stein,stone,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25505794,50.0407824,11.8647266,5,7301040,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Nußhardtstube,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25631399,49.8270356,11.3753338,9,39834385,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Sophienhöhle,cave_entrance,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25932371,50.0527832,11.2319208,2,6747603,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Gaaskerng,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
26309933,50.1225540,11.3759787,2,11211206,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Staafelsn,cliff,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
26945605,49.7668812,11.4281553,7,39554170,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Großes Hasenloch,cave_entrance,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27133657,49.6351754,8.4938556,2,1638377,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schießbuckel,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27133658,49.6339946,8.4853839,1,9218,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Grillhütte,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27153008,50.1229239,11.4117973,1,7826,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schwedenschanze,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27374034,51.5455027,12.9933573,2,4177298,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Sommer-Rodelbahn,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27440567,50.7319121,7.0993708,10,11317500,NULL,NULL,NULL,NULL,NULL,NULL,NULL,bus_station,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Busbahnhof Bonn,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27543115,51.8473136,12.2314568,8,19447141,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,building,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Weißer Bogen,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
28788682,51.3384522,10.8616651,9,31145828,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,430,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Freizeit und Erholungpark "Zum Possen",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,de:Possen (Sondershausen)NULL,
Which is nearly what I want. The problem is that in the last line it adds NULL after de:Possen (Sondershausen) which is not what I want. Can someone tell me what is wrong with my sed command?
You can use this sed to replace all ,, with NULL or if , is at start, using a recursive label:
sed 's/^,/NULL,/; :a;s/,,/,NULL,/g;ta' file
This will not add NULL after trailing , in each line.
Output:
16949839,49.5474463,8.6692215,4,31336605,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schau- und Sichtungsgarten Hermannshof,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
17025149,49.5444114,8.6715051,2,670583,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Voliere,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
24557757,50.0550103,11.8494971,5,8289559,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,1000-Meter-Stein,stone,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25505794,50.0407824,11.8647266,5,7301040,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Nußhardtstube,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25631399,49.8270356,11.3753338,9,39834385,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Sophienhöhle,cave_entrance,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25932371,50.0527832,11.2319208,2,6747603,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Gaaskerng,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
26309933,50.1225540,11.3759787,2,11211206,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Staafelsn,cliff,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
26945605,49.7668812,11.4281553,7,39554170,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Großes Hasenloch,cave_entrance,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27133657,49.6351754,8.4938556,2,1638377,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schießbuckel,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27133658,49.6339946,8.4853839,1,9218,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Grillhütte,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27153008,50.1229239,11.4117973,1,7826,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schwedenschanze,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27374034,51.5455027,12.9933573,2,4177298,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Sommer-Rodelbahn,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27440567,50.7319121,7.0993708,10,11317500,NULL,NULL,NULL,NULL,NULL,NULL,NULL,bus_station,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Busbahnhof Bonn,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27543115,51.8473136,12.2314568,8,19447141,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,building,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Weißer Bogen,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
28788682,51.3384522,10.8616651,9,31145828,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,430,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Freizeit und Erholungpark "Zum Possen",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,de:Possen (Sondershausen),
I know the question wasn't tagged for awk, but for reference, here is a script doing the job:
awk -F, -v OFS=',' '{for(i=1;i<NF;i++) if(!$i)$i="NULL"}1' file
-F, -v OFS=',' set the input and output delimeter to ,, aka field separator.
for(i=1;i<NF;i++) is iterating through all fields except the last one.
if(!$i)$i="NULL" is setting the parameter to the string "NULL" if the parameter is empty.
The 1 at the end of the command is default action in awk: print the whole line.
This awk script will also do the same job
awk 'BEGIN{FS=",";OFS=","}
{
for(i=1;i<=(NF-1);i++)
{
if($i == ""){
$i="NULL"
}
}
print
}' file
Output
16949839,49.5474463,8.6692215,4,31336605,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schau- und Sichtungsgarten Hermannshof,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
17025149,49.5444114,8.6715051,2,670583,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Voliere,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
24557757,50.0550103,11.8494971,5,8289559,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,1000-Meter-Stein,stone,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25505794,50.0407824,11.8647266,5,7301040,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Nußhardtstube,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25631399,49.8270356,11.3753338,9,39834385,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Sophienhöhle,cave_entrance,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
25932371,50.0527832,11.2319208,2,6747603,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Gaaskerng,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
26309933,50.1225540,11.3759787,2,11211206,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Staafelsn,cliff,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
26945605,49.7668812,11.4281553,7,39554170,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Großes Hasenloch,cave_entrance,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27133657,49.6351754,8.4938556,2,1638377,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schießbuckel,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27133658,49.6339946,8.4853839,1,9218,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Grillhütte,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27153008,50.1229239,11.4117973,1,7826,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Schwedenschanze,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27374034,51.5455027,12.9933573,2,4177298,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Sommer-Rodelbahn,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27440567,50.7319121,7.0993708,10,11317500,NULL,NULL,NULL,NULL,NULL,NULL,NULL,bus_station,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Busbahnhof Bonn,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
27543115,51.8473136,12.2314568,8,19447141,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,building,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Weißer Bogen,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
28788682,51.3384522,10.8616651,9,31145828,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,430,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,Freizeit und Erholungpark "Zum Possen",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,attraction,NULL,NULL,NULL,NULL,NULL,NULL,de:Possen (Sondershausen),
What happens here
In the for-loop,we iterate through each field using the delimiter ,
If any field is empty ie if($i == "") we assign the field to null
Since csv file ends with , we don't have to process the last record, so the for loop has i<=(NF-1)

Print remaining lines in file after regular expression that includes variable

I have the following data:
====> START LOG for Background Process: HRBkg Hello on 2013/09/27 23:20:20 Log Level 3 09/27 23:20:20 I Background process is using
processing model #: 3 09/27 23:20:23 I 09/27 23:20:23 I --
Started Import for External Key
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3 09/30 07:31:07 I Background process is using
processing model #: 3 09/30 07:31:09 I 09/30 07:31:09 I --
Started Import for External Key
I need to extract the remaining file contents after the LAST match of ====> START LOG.....
I have tried numerous times to use sed/awk, however, I can not seem to get awk to utilize a variable in my regular expression. The variable I was trying to include was for the date (2013/09/30) since that is what makes the line unique.
I am on an HP-UX machine and can not use grep -A.
Any advice?
There's no need to test for a specific time just to find the last entry in the file:
awk '
BEGIN { ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
NR == FNR { if (/START LOG/) lastMatch=NR; next }
FNR == lastMatch { found=1 }
found
' file
This might work for you (GNU sed):
a=2013/09/30
sed '\|START LOG.*'"$a"'|{h;d};H;$!d;x' file
This will return your desired output.
sed -n '/START LOG/h;/START LOG/!H;$!b;x;p' file
If you have tac available, you could easily do..
tac <file> | sed '/START LOG/q' | tac
Here is one in Python:
#!/usr/bin/python
import sys, re
for fn in sys.argv[1:]:
with open(fn) as f:
m=re.search(r'.*(^====> START LOG.*)',f.read(), re.S | re.M)
if m:
print m.group(1)
Then run:
$ ./re.py /tmp/log.txt
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
If you want to exclude the ====> START LOGS.. bit, change the regex to:
r'.*(?:^====> START LOG.*?$\n)(.*)'
For the record, you can easily match a variable against a regular expression in Awk, or vice versa.
awk -v date='2013/09/30' '$0 ~ date {p=1} p' file
This sets p to 1 if the input line matches the date, and prints if p is non-zero.
(Recall that the general form in Awk is condition { actions } where the block of actions is optional; if omitted, the default action is to print the current input line.)
This prints the last START LOG, it set a flag for the last block and print it.
awk 'FNR==NR { if ($0~/^====> START LOG/) f=NR;next} FNR>=f' file file
You can use a variable, but if you have another file with another date, you need to know the date in advance.
var="2013/09/30"
awk '$0~v && /^====> START LOG/ {f=1}f' v="$var" file
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
With GNU awk (gawk) or Mikes awk (mawk) you can set the record separator (RS) so that each record will contain a whole log message. So all you need to do is print the last one in the END block:
awk 'END { printf "%s", RS $0 }' RS='====> START LOG' infile
Output:
====> START LOG for Background Process: HRBkg Hello on 2013/09/30 07:31:07 Log Level 3
09/30 07:31:07 I Background process is using processing model #: 3
09/30 07:31:09 I
09/30 07:31:09 I -- Started Import for External Key
Answer in perl:
If your logs are in assume filelog.txt.
my #line;
open (LOG, "<filelog.txt") or "die could not open filelog.tx";
while(<LOG>) {
#line = $_;
}
my $lengthline = $#line;
my #newarray;
my $j=0;
for(my $i= $lengthline ; $i >= 0 ; $i++) {
#newarray[$j] = $line[$i];
if($line[$i] =~ m/^====> START LOG.*/) {
last;
}
$j++;
}
print "#newarray \n";

How do I replace G with white space?

I have to check the free space on a disk volume and am using the command df -h, which produces this output:
Filesystem Size Used Avail Use% Mounted on
/dev/md0 27G 24G 2.1G 92% /
udev 506M 112K 506M 1% /dev
/dev/sda1 102M 40M 63M 39% /boot
This piece of script
my (#space, #freesp);
#space = grep /\/dev\/md0/,`df -h`;
print "#space\n";
is giving me
/dev/md0 27G 24G 2.1G 92% /
Which I can split on whitespace giving the values in $freesp[0], $freesp[1] etc.
But I want to remove or replace that G with white space, so that I can compare $freesp[3] with some value and can proceed with my script.
So what is the regex I have to use for this to happen, or is there any other better way to do this?
This code is working for me, but I'm looking in a direction I have mentioned.
#!/usr/bin/perl
use strict;
use warnings;
my (#space, #freesp);
#space = grep /\/dev\/md0/, `df`;
print "#space\n";
for (#space) {
chomp;
#freesp = split /\s+/, $_;
}
if ($freesp[3]/1024/1024 < 2.0) {
print "Space is less\n";
}
else {
print "Space is OK\n";
}
#print ($freesp[3]/1024/1024);
The real solution is not to use the -h option to df if you intend to parse the output with a script. That way, the output won't contain the "human-readable" K / M / G suffixes in the first place.
You may wish to instead use the -B1 option, which causes all sizes to be reported in bytes.
Edit: To answer the literal question, you could remove the suffixes and rescale the values appropriately like this:
my %units = (K => 2**10, M => 2**20, G => 2**30, T => 2**40,
P => 2**50, E => 2**60, Z => 2**70, Y => 2**80);
s/^(\d+(?:\.\d+)?)([KMGTPEZY])$/$1 * $units{$2}/e for #freesp[1 .. 3];
However, using the -B1 switch instead of -h would give the same result more easily, not to mention more accurately and reliably.