sed not substituting as expected - regex

I need to substitute each \n in a line with "\n (double quote followed by newline).
This should work. But it does nothing. Reports no error either. Any clues anyone?
sed -i 's/\n/\"\n/' filename
before, the file contains:
line 1
line 2
after, it contains the exact same.
Thanks
Balt

A line can't contain \n, because \n is the delimiter between lines. sed operates on a single line at a time, and the newline is not included in it.
If you want to put a character before the end of each line, use the $ regexp:
sed -i 's/$/"/' filename

Try following:
sed -i 's/$/"/' filename
used $ to denote end of the line.

Using awk
awk '{$0=$0"\""}1' filename

Related

Sed is not matching a backslash literal or am I doing something wrong?

I have a file which has 3 lines:
%fddfdffd
\%dffdfd
hello %12345678
I need to remove anything after "%" character (inlcuding the "%" character) but not if the "%" begins with a "\".
I tried this but it didn't work:
sed -i "s/[^\\]%.*//g"
The task is actually working on a latex file to remove the comments using sed
The file after using sed should be:
\%dffdfd
hello
I suggest with your three cases:
sed '/^%/d; /\\%/b; s/%.*//' file
Output:
\%dffdfd
hello
See: man sed
This might work for you (GNU sed):
sed -E 's/(^|[^\])%.*/\1/' file
If the line starts with a % or % follows any character other than \, delete the rest of the line.
If as a consequence the line is empty and is also to be deleted, use:
sed -E '/^%/d;s/([^\])%.*/\1/' file

Remove new line only if after a number

I've collected some CSV data from terminal but every line is only 80 characters long so it's not importing properly.
Here's two lines of data:
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691
,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5
267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9
746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,1230
0,11970,12240,12867,13475,14310,15962,17624,19105,21075,
I wanna remove the newline char only if it's after any number or comma, but not if it's only on it's own, since that means it's a new line of CSV data.
I couldn't figure out how to do this on shell with sed. If any other program like awk or perl is better for this scenario then feel free to show me a solution for that.
Expected output:
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,12300,11970,12240,12867,13475,14310,15962,17624,19105,21075,
Just remove the newline if it's preceded by a digit or comma:
perl -pe 'chomp if /[\d,]$/' input-file > output-file
-p reads the input line by line and prints the result
chomp removes newline if present at the end
\d matches a digit
$ matches the end of line
With awk by reading in paragraph mode and replacing all \n
$ awk -v RS= '{gsub("\n","")} 1' ip.txt
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,12300,11970,12240,12867,13475,14310,15962,17624,19105,21075,
To leave the blanks, set ORS to double newlines, however this will add an extra newline at end
$ awk -v RS= -v ORS='\n\n' '{gsub("\n","")} 1' ip.txt
28,26166,25180,23645,22824,21257,20080,18921,17893,16702,15650,14647,13667,12691,11971,11179,10393,9885,9294,8930,8390,8079,7660,7341,6907,6425,6120,5789,5588,5267,4924,4581,4246,4025,3857,
3423,3567,3636,3633,3714,3844,4543,5887,7287,8499,9746,10704,11658,12591,13379,13950,14679,14954,14756,14224,13921,13494,12849,12300,11970,12240,12867,13475,14310,15962,17624,19105,21075,
You can use this regex:
(?<!\n)\n(?!\n)
and replace with empty string.
perl -0pe 's/([\d,])\n([\d,])/$1$2/sg' (file)
should do it.
That is, read the file without line delimiters, treat the whole thing as one string and remove the newlines that are preceded and followed by a digit or comma.

How to add a line break before and after a regex in a text file?

This is an excerpt from the file I want to edit:
>chr1|-|9|S|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG >chr1|+|9|Y|somatic ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
I would a new text file in which I add a line break before ">" and after "somatic" or after "germline", how can I do in R or Unix?
Expected output:
>chr1|-|9|S|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
>chr1|+|9|Y|somatic
ACCACAGCCCTGTTTTACGTTGCGTCATCGCCCCGGGTGCCTGGTGACGTCACCAGCCCGCTCG
By the looks of your input, you could simply replace spaces with newlines:
tr -s ' ' '\n' <infile >outfile
(Some tr dialects don't like \n. Try '\012' or a literal newline: opening quote, newline, closing quote.)
If that won't work, you can easily do this in sed. If somatic is static, just hard-code it:
sed -e 's/somatic */&\n/g' -e 's/ >/\n>/g' file >newfile
The usual caveats about different sed dialects apply. Some versions don't like \n for newline, some want a newline or a semicolon instead of multiple -e arguments.
On Linux, you can modify the file in-place:
sed -i 's/somatic */&\
/g
s/ >/\
/g' file
(For variation, I'm showing how to do this if your sed doesn't recognize \n but allows literal newlines, and how to put the script in a single multi-line string.)
On *BSD (including MacOS) you need to add an argument to -i always; sed -i '' ...
If somatic is variable, but you always want to replace the first space after a wedge, try something like
sed 's/\(>[^ ]*\) /\1\n/g'
>[^ ] matches a wedge followed by zero or more non-space characters. The parentheses capture the matched string into \1. Again, some sed variants don't want backslashes in front of the parentheses, or are otherwise just ... different.
If you have very long lines, you might bump into a sed which has problems with that. Maybe try Perl instead. (Luckily, no dialects to worry about!)
perl -i -pe 's/(>[^ ]*) /$1\n/g;s/ >/\n>/g' file
(Skip the -i option if you don't want to modify the input file. Then output will be to standard output.)
(\bsomatic\b|\bgermline\b)|(?=>)
Try this.See demo.Replace by $1\n
http://regex101.com/r/tF5fT5/53
If there's no support for lookahead then try
(\bsomatic\b|\bgermline\b)
Try this.Replace by $1\n.See demo.
http://regex101.com/r/tF5fT5/50
and
(>)
Replace by \n$1.See demo.
http://regex101.com/r/tF5fT5/51
Thank you everyone!
I used:
tr -s ' ' '\n' <infile >outfile
as suggested by tripleee and it worked perfectly!

how to trim trailing spaces after all delimiter in a text file

Need help to remove trailing spaces after all delimiter in a text file
I have Text file with below data.
eg.
ADDRESS_ID| COUNTRY_TP_CD| RESIDENCE_TP_CD| PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0| 76.0|||169 Park lane||Scottish||lane||KU|||||||2013-09-19 14:48:49.609000|
I want to remove spaces after the delimiter and the first letter of the word.
Any regex or unix script that can do the same. Looking for output as below:
ADDRESS_ID|COUNTRY_TP_CD|RESIDENCE_TP_CD|PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0|76.0|||169 Park lane||Scottish||lane||KU||||||2013-09-19 14:48:49.609000|
Any help will be appreciated.
awk 'BEGIN{FS=OFS="|"} {for (i=1;i<=NF;i++) gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i)} 1' file
Using a perl one-liner to remove the spacing around every field. Assumes no embedded delimiters:
perl -i -lpe 's/\s*([^|]*?)\s*/$1/g' file.txt
Switches:
-i: Edit <> files in place (makes backup if extension supplied)
-l: Enable line ending processing
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
The below perl code would remove the spaces which are present at the start of a line or the spaces after to the delimiter | ,
$ perl -pe 's/(?<=\|) +|^ +//g' file
ADDRESS_ID|COUNTRY_TP_CD|RESIDENCE_TP_CD|PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0|76.0|||169 Park lane||Scottish||lane||KU|||||||2013-09-19 14:48:49.609000|
To save the changes made to that file,
perl -i -pe 's/(?<=\|) +|^ +//g' file
sed 's/\ //g' input.txt > output.txt
With sed:
sed -r -e 's/(^|\|)\s+/\1/g' -e 's/\s+$//' filename
In the first expression:
(^|\|) matches the beginning of the line or a | character, and saves this in capture group 1.
\s+ matches a sequence of whitespace characters after that.
The replacement \1 substitutes capture group 1, so this deletes the whitespace at the beginning of the line and after the delimiter.
The g modifier makes it operate on all the matches in the line.
In the second expression:
\s+ again matches a sequence of whitespace
$ matches the end of the line
The replacement replaces the whole thing with an empty string, this removing trailing spaces.
for posix sed (for GNU sed add --posix)
sed 's/^[[:space:]]//;s/|[[:space:]]/|/g' YourFile
use 2 substitution (there are no OR (|) in sed regex posix version)
Remove starting space by replacing space at start( ^[[:space:]]*) by nothing
Replace any sequence pipe than any space (|[[:space:]]*) by pipe
[[:space:]] could be replace by a single space char if text only have space (ASCII 32) char

How can I use regex to exclude lines with extra characters?

I have a bunch of email addresses:
abc#google.com
bdc#yahoo.com
\\ske#google.com
I'd like to delete the bolded line because there is extra character in the string other than # . and letters. How do I do this ?
Through awk,
$ awk '/^\w+#\w+/{print}' file
abc#google.com
bdc#yahoo.com
Awk searches for the lines which starts with one or more word character followed by an # symbol and again followed by one or more word characters. If it founds any, then prints the whole line.
This line \\ske#google.com wouldn't starts with a word character, so it not get printed.
You can use this sed:
sed -i.bak -n '/^[[:alnum:]]*#/p' file
You can use vim to take care of it too:
vim -c 'v/^[[:alnum:]]*#/d' -c 'wq' file
You could also use a perl module:
perl -ne 'use Email::Valid; print if Email::Valid->address($_)'