Remove new lines except when preceded by specific set of characters - regex

How can I remove new lines using Perl and / or Sed at the bash command line but avoiding a specific set of characters?
The closest I came from this is:
perl -C -i -p -e 's/[^.:]\n//' ~/Desktop/bak2
The above code is working well on avoid removing lines ended with a dot or a colon, but its failling because when removing the correct new lines its also erasing the very last character of the string. I also would need the removed \n to be substituted by a space.
Would be great, if possible, to have this solution by Perl and also by Sed.
I've searched for a similar solution in perl or sed and I haven't found it,sorry if it does exists.
Examples:
Existing content:
Violets are blue and
Buda has great teachings.
Programming can be easy because:
Stackoverflow exists,
and the community always helps
a lot.
Desired output:
Violets are blue and Buda has great teachings.
Programming can be easy because:
Stackoverflow exists, and the community always helps a lot.

With sed
sed -e ':A;/[^.:]$/{N;bA' -e '};y/\n/ /' ~/Desktop/bak2
or gnu sed
sed -z 's/\([^.:]\)\n/\1 /g' ~/Desktop/bak2

You may preserve pre new-line match (I added "empty" lines handling):
perl -C -i -p -e 's/(^|[^.:])\n/$1/' ~/Desktop/bak2
or use positive look behind
perl -C -i -p -e 's/(?<=[^.:])\n//' ~/Desktop/bak2

perl -i pe 's/[^.:]\K\n/ /' ~/Desktop/bak2

Related

Why does `perl -i -p0e <expression>` work, but not `perl -0 -pie <expression>`?

If I try perl -pie 's/foo/bar/' file.txt it works as expected: the find-replace expression is executed, and the result is saved to the original file.
However, if I want to use the -0 to run an expression that includes newlines, simply prepending the option doesn't work:
$ perl -0 -pie 's/foo\nbar/qux/' file.txt
Can't open perl script "s/foo\nbar/qux/": No such file or directory
After several attempts, the following combination worked:
$ perl -i -p0e 's/foo\nbar/qux/' file.txt
My question is: why does the first order of options produce an error (especially when plain -pie works as expected), while the second ordering is correctly handled?
-i means work in-place without backup.
-ie means work in-place, with backup. The backup has the same name as the original file, but with e appended.
That means that perl -pie 's/foo/bar/' file.txt didn't work either (unless you have a Perl file named s/foo/bar/).
If you simply arrange the options logically, you avoid the problem. -i has nothing to do with the program —it'll still work if added/removed— so it makes more sense to place it first anyway. -p and -0777, otoh, are part of the program, so it makes sense to place them next to -e. So writing the command sensibly results in one of the following:
perl -i -0777pe'...' ...
perl -i~ -0777pe'...' ...
perl -0777pe'...' ...
Note that I used -0777, since -0 treats the input as NUL-terminated lines rather than activating slurp mode.

simple SED replace

Just attempting to write a script to do a simple regex replace in php.ini, what I want to do is replace the line ;cgi.fix_pathinfo=1 with cgi.fix_pathinfo=0.
Ideally want to avoid installing any additional packages so sed seems a logical choice since it is bundled with FreeBSD. I have tried the following but doesn't seem to work:
sed 's/;cgi\.fix_pathinfo=1/cgi\.fix_pathinfo=0/' /usr/local/etc/php.ini
To change the content of a file in place with sed BSD, you can do that:
sed -i.bak -e 's/;cgi\.fix_pathinfo=1/cgi.fix_pathinfo=0/;' /usr/local/etc/php.ini
That creates a copy of the old file with a .bak extension.
Or without creating a copy:
sed -i '' -e 's/;cgi\.fix_pathinfo=1/cgi.fix_pathinfo=0/;' /usr/local/etc/php.ini
Note that in this case, a space and an empty string enclosed between quotes are mandatory. You can't simply write sed -i -e '... like with GNU sed.

How to scrub emails from all CSVs in a directory?

I have this regex that works fine enough for my purposes for identifying emails in CSVs within a directory using grep on Mac OS X:
grep --no-filename -E -o "\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b" *
I've tried to get this working with sed so that I can replace the emails with foo#bar.baz:
sed -E -i '' -- 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
However, I can't seem to get it to work. Admittedly, sed and regex are not my strong points. Any ideas?
The sed in OSX is broken. Replace it with GNU sed using Homebrew that will be used as a replacement for the one bundled in OSX. Use this command for installation
sudo brew install gnu-sed
and use this for substitution
sed -E -i 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
Reference
You seem to assume that grep and sed support the same regex dialect, but that is not necessarily, or even usually, the case.
If you want a portable solution, you could easily use Perl for this, which however supports yet another regex dialect...
perl -i -p -e 's/\b[a-zA-Z0-9.-]+#[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b/foo#bar.baz/g' *
For a bit of an overview of regex dialects, see https://stackoverflow.com/a/11857890/874188
Your regex kind of sucks, but I understand that is sort of beside the point here.

Whats the difference between sed -E and sed -e

I'm working on some old code and I found that I used to use
sed -E 's/findText/replaceWith/g' #findText would contain a regex
but I now try
sed -e 's/findText/replaceWith/g'
It seems to do the same thing, or does it?
I kinda remember there being a reason I done it but I can't remember and doing "man sed" doesn't help as they don't have anything about -E only -e that doesn't make much sense ether.
-e, --expression=script
Append the editing commands in script to the end of
the editing command script. script may contain more
than one newline separated command.
I thought -e meant it would match with a regex...
GNU sed version 4.2.1
From source code, -E is an undocumented option for compatibility with BSD sed.
/* Undocumented, for compatibility with BSD sed. */
case 'E':
case 'r':
if (extended_regexp_flags)
usage(4);
extended_regexp_flags = REG_EXTENDED;
break;
And from manual, -E in BSD sed is used to support extended regular expressions.
From sed's documentation:
-E
-r
--regexp-extended
Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have fewer backslashes. Historically this was a GNU extension, but the -E extension has since been added to the POSIX standard (http://austingroupbugs.net/view.php?id=528), so use -E for portability. GNU sed has accepted -E as an undocumented option for years, and *BSD seds have accepted -E for years as well, but scripts that use -E might not port to other older systems. See Extended regular expressions.
Therefore it seems that -E should be the preferred way to declare that you are going to use (E)xtended regular expressions, rather than -r.
Instead, -e just specifies that what follows is the script that you want to execute with sed (something like 's/bla/abl/g').
Always from the documentation:
Without -e or -f options, sed uses the first non-option parameter as the script, and the following non-option parameters as input files.

How do I use a new-line replacement in a BSD sed?

Greetings, how do I perform the following in BSD sed?
sed 's/ /\n/g'
From the man-page it states that \n will be treated literally within a replacement string, how do I avoid this behavior? Is there an alternate?
I'm using Mac OS Snow Leopard, I may install fink to get GNU sed.
In a shell, you can do:
sed 's/ /\
/g'
hitting the enter key after the backslash to insert a newline.
Another way:
sed -e 's/ /\'$'\n/g'
See here.
For ease of use, i personally often use
cr="\n"
# or (depending version and OS)
cr="
"
sed "s/ /\\${cr}/g"
so it stays on 1 line.
To expand on #sikmir's answer: In Bash, which is the default shell on Mac OS X, all you need to do is place a $ character in front of the quoted string containing the escape sequence that you want to get interpreted. Bash will automatically translate it for you.
For example, I removed all MS-DOS carriage returns from all the source files in lib/ and include/ by writing:
grep -lr $'\r' lib include | xargs sed -i -e $'s/\r//'
find . -name '*-e' -delete
BSD grep would have interpreted '\r' correctly on its own, but using $'\r' doesn't hurt.
BSD sed would have misinterpreted 's/\r//' on its own, but by using $'s/\r//', I avoided that trap.
Notice that we can put $ in front of the entire string, and it will take care of all the escape sequences in the whole string.
$ echo $'hello\b\\world'
hell\world