removing very first token of a file only - regex

i am not much familiar in scripting. it can be very easy problem. I want to remove first token of every file.
file 1
1 this is good
file 2
2 this is another file.
i would like to remove 1 and 2 from file 1 and file 2. how would do it? any bash command for it?

Or with awk:
$ awk '{if (NR==1) {$1="";print $0;} else print $0}' input_file
(This preserves the space at the start of the line)

Using sed and assuming you don't want to preserve a leading space:
sed '1{s/\s*\w*//}' input_file
This will works on the very first line (1{}) and uses substitute command (s/pattern/replace/) to delete the first white spaces and following word characters (\s*\w*). The word characters are [a-zA-Z0-9].

$ sed '0,/1/{s/1//}' f1
this is good

Related

Skipping a part of a line using sed

I have a file with content like so - #1: 00001109
Each line is of the same format. I want the final output to be #1: 00 00 11 09.
I used command in sed to introduce a space every 2 characters - sed 's/.\{2\}/& /g'. But that will give me spaces in the part before the colon too which I want to avoid. Can anyone advise how to proceed?
Could you please try following, written and tested with shown samples.
awk '{gsub(/../,"& ",$2);sub(/ +$/,"")} 1' Input_file
Explanation: First globally substituting each 2 digits pair with same value by appending space to it where gsub is globally substitution to perform it globally). Once this is done, using single sub to substitute last coming space with NULL to avoid spaces at last of lines.
With sed:
sed -E 's/[0-9]{2}/& /g;s/ +$//' Input_file
Explanation: Globally substituting each pair of digits with its same value and appending spaces to it. Then substituting space coming last space of line(added by previous substitution) with NULL.
This might work for you (GNU sed):
sed 's/[0-9][0-9]\B/& /g' file
After a pair of digits within a word, insert a space.
If perl happens to be your option, how about:
perl -pe '1 while s/(\d+)(\d\d)/$1 $2/g' file
you can use pure bash:
for line in "$(<your_file.txt)"; do
first=`echo $line | cut -d' ' -f1`" "
last=`echo $line | cut -d' ' -f2`
for char in `seq 0 2 ${#last}`; do
first+=${last:$char:2}" "
done;
done;

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

Remove new line only if after a number

I've collected some CSV data from terminal but every line is only 80 characters long so it's not importing properly.
Here's two lines of data:
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691
,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5
267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9
746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,1230
0,11970,12240,12867,13475,14310,15962,17624,19105,21075,
I wanna remove the newline char only if it's after any number or comma, but not if it's only on it's own, since that means it's a new line of CSV data.
I couldn't figure out how to do this on shell with sed. If any other program like awk or perl is better for this scenario then feel free to show me a solution for that.
Expected output:
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,12300,11970,12240,12867,13475,14310,15962,17624,19105,21075,
Just remove the newline if it's preceded by a digit or comma:
perl -pe 'chomp if /[\d,]$/' input-file > output-file
-p reads the input line by line and prints the result
chomp removes newline if present at the end
\d matches a digit
$ matches the end of line
With awk by reading in paragraph mode and replacing all \n
$ awk -v RS= '{gsub("\n","")} 1' ip.txt
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,12300,11970,12240,12867,13475,14310,15962,17624,19105,21075,
To leave the blanks, set ORS to double newlines, however this will add an extra newline at end
$ awk -v RS= -v ORS='\n\n' '{gsub("\n","")} 1' ip.txt
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,12300,11970,12240,12867,13475,14310,15962,17624,19105,21075,
You can use this regex:
(?<!\n)\n(?!\n)
and replace with empty string.
perl -0pe 's/([\d,])\n([\d,])/$1$2/sg' (file)
should do it.
That is, read the file without line delimiters, treat the whole thing as one string and remove the newlines that are preceded and followed by a digit or comma.

replace a pipe delimiter with a space using awk or sed

I have a pipe delimited file with a sample lines like below;
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct|23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct|23|14:04|957147508
is there a way that awk or sed can transform the lines into the output like below where the pipe between the month and the date was replaced by space?
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
With GNU sed:
sed -E 's/(\|[A-Z][a-z]{2})\|([0-9]{1,2}\|)/\1 \2/' file
Output:
/u/chaintrk/bri/sh/picklist_autoprint.sh|-rwxrwxr-x|bdr|bdr|2665|Oct 23|14:04|3919089454
/u/chaintrk/bri/sh/generate_ct2020.pl|-rwxrwxr-x|bdr|bdr|15916|Oct 23|14:04|957147508
If you want to edit file "in place" add sed's option -i.
Yes, it is possible to change a "|" with an space.
The real problem is to identify which of the field(s) to change.
Are those always the 6th and 7th? If so, this works:
awk -vFS='|' '{sub($6"|"$7,$6" "$7)}1' file
Are those with a text Upper-lower-lower followed by a 1 or 2 digits?
If so, this other works:
gawk '{c="[|]([[:upper:]][[:lower:]]{2})[|]([0-9]{1,2})[|]";print gensub(c,"|\\1 \\2|",1,$0)}' file

Replace strings with double quotes in a XML file

I have a huge XML file with longer lines (5000-10000 characters per line) with following text:
Pattern="abc"
and I want to replace it with
Pattern="def"
As the line sizes are huge, I have no choice but to use awk. Please suggest how this can be achieved. I tried with the below but it is not working:
CMD="{sub(\"Pattern=\"abc\"\",\"Pattern=\"def\"\"); print}"
echo "$CMD"
awk "$CMD" "Some File Name.xml"
Any help is highly appreciated.
one suggestion with awk
BEGIN {FS="\""; OFS=""}
/Pattern="abc"/{$2="\"def\""}1
I don't understand why you said "As the line sizes are huge, I have no choice but to use awk". AFAIK sed is no more limited on line length than awk is and since this is a simple substitution on a single line, sed is the better choice of tool:
$ cat file
Pattern="abc"
$ sed -r 's/(Pattern=")[^"]+/\1def/' file
Pattern="def"
If the pattern occurs multiple times on the line, add a "g" to the end of the line.
Since you mention in your comment being stuck with a sed that can't handle long lines, let's assume you can't install GNU tools so you'll need a non-GNU awk solution like this:
$ awk '{sub(/Pattern="[^"]+/,"Pattern=\"def")}1' file
Pattern="def"
If you LITERALLY mean you only want to replace Pattern="abc" then just do:
$ awk '{sub(/Pattern="abc"/,"Pattern=\"def\"")}1' file
Pattern="def"
If You have bash you can try this:
Create file with long lines (>10_000 chars):
for((i=0;i<2500;++i));{ s="x$s";}
l="${s}Pattern=\"abc\"$s"
for i in {1..5}; { echo "$l$l";} >infile
The script:
while read x; do echo "${x//Pattern=\"abc\"/Pattern=\"def\"}";done <infile
This replaces all occurrences of Pattern="abc" to Pattern="def" in each line.