How to trim lines includes specific string with sed - regex

I have separate files include path string for each like ;
path = /aaa/bbb/ccc.com/user#ccc.com/dddd/user#yahoo.com/
path = /aaa/bbb/ccc.com/user#ccc.com/dddd/user#hotmail.co.uk/
path = /aaa/bbb/ccc.com/user#ccc.com/dddd/user#abc.xxx.co.uk/
path = /aaa/bbb/ccc.com/user#ccc.com/dddd/user55#ccc.com/
what i want to trim lines like;
path = /aaa/bbb/ccc.com/user/dddd/.user#yahoo/
path = /aaa/bbb/ccc.com/user/dddd/.user#hotmail/
path = /aaa/bbb/ccc.com/user/dddd/.user#abc/
path = /aaa/bbb/ccc.com/user/dddd/.user55#ccc.com/
I am almost be able to achieve with below (all strings are in separate files but at the 15th line)
sed -r '15s!#[^/]+(/[^/]+/[^.#]+#[^.]+).*$!\1/!g' $file
however, i have an issue with dot part that cuts it as ;
path = /aaa/bbb/ccc.com/user/dddd/user55#ccc/
instead, it should have been;
path = /aaa/bbb/ccc.com/user/dddd/.user55#ccc/
Thanks in advance,

Using a pattern with three capture groups should do what you need. The first group will capture the portion behind the initial # (as a group we omit from the replacement), the second group will include the /dddd/ portion, and the third being the complete user#somewhere with a prepended .
's!(#.+\..+)(/.+/)(.+#.+)!\2.\3!g'
Depending on your version of bash you could use it like this:
sed -i.bak -r 's!(#.+\..+)(/.+/)(.+#.+)!\2.\3!g' $file
↳ (GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
sed -i bak -E 's!(#.+\..+)(/.+/)(.+#.+)!\2.\3!g' $file
↳ GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin12)
result:
path = /aaa/bbb/ccc.com/user/dddd/.user#yahoo.com/
path = /aaa/bbb/ccc.com/user/dddd/.user#hotmail.co.uk/
path = /aaa/bbb/ccc.com/user/dddd/.user#abc.xxx.co.uk/
path = /aaa/bbb/ccc.com/user/dddd/.user55#ccc.com/
It's a little unclear if you want to keep the full extension on the end of the last match; if not sed is probably not the best choice because it can't do look-ahead, look-behind assertions, nor toggle greedy in any straight-forward manner. In the case that's a deal breaker, you could use this pattern on one of many other avenues:
(#.+\..+)(/.+/)(.+#.+?)(\..*/)
result:
path = /aaa/bbb/ccc.com/user/dddd/.user#yahoo
path = /aaa/bbb/ccc.com/user/dddd/.user#hotmail
path = /aaa/bbb/ccc.com/user/dddd/.user#abc
path = /aaa/bbb/ccc.com/user/dddd/.user55#ccc

You would have to use two matches:
sed -E 's/(.*?\..*?)\/(.*?)#\1/\1\/\2/g'
Regex: (.*?\..*?)\/(.*?)#\1
Replacement: \1\/\2
Flags: g (Global)
Result:
path = /aaa/bbb/ccc.com/user/dddd/user#yahoo.com/
path = /aaa/bbb/ccc.com/user/dddd/user#hotmail.co.uk/
path = /aaa/bbb/ccc.com/user/dddd/user#abc.xxx.co.uk/
path = /aaa/bbb/ccc.com/user/dddd/user55#ccc.com/
sed -E 's/(\w+#\w+)[\w\.]*/\1/g'
Regex: (\w+#\w+)[\w\.]*
Replacement: \1
Flags: g (Global)
Result:
path = /aaa/bbb/ccc.com/user/dddd/user#yahoo/
path = /aaa/bbb/ccc.com/user/dddd/user#hotmail/
path = /aaa/bbb/ccc.com/user/dddd/user#abc/
path = /aaa/bbb/ccc.com/user/dddd/user55#ccc/
If the -E switch is not available on your version of sed, then you might have to use perl.
Example:
perl -pe 's/(.*?\..*?)\/(.*?)#\1/\1\/\2/g' -i filename.ext
If I try this in bash, I get the following result:
root#home [~]# echo "path = /aaa/bbb/ccc.com/user#ccc.com/dddd/user55/" | sed -E 's/(.*?\..*?)\/(.*?)#\1/\1\/\2/g'
path = /aaa/bbb/ccc.com/user/dddd/user55/
root#home [~]# echo "path = /aaa/bbb/ccc.com/user/dddd/user55/" | sed -E 's/(\w+#\w+)[\w\.]*/\1/g'
path = /aaa/bbb/ccc.com/user/dddd/user55/

Related

Using sed regex for string replacement

I am trying to replace a string in the config file.
I would like to run something like this:
OS
(docker image php:8.1-apache-buster)
Debian GNU/Linux 10 (buster)
sed (GNU sed) 4.7 Packaged by Debian
Possible inputs:
post_max_size = 4M
post_max_size = 24M
post_max_size = 248M
...
Example output (any user given value):
post_max_size = 128M
Example cmd:
sed -i 's/(post_max_size = ([0-9]{1,})M/post_max_size = 128M/g' /usr/local/etc/php/php.ini
Joining regex with strings does not work here.
It works when I run string replace without any regex
sed -i 's/post_max_size = 8M/post_max_size = 128M/g' /usr/local/etc/php/php.ini
This works only if the value of the post_max_size is set exactly to 2M. I would like to be able to make a change with regex regardless of the value set.
I searched the Internet and sed cmd docs but did not find anything which fits my use case.
The following should work:
sed -i 's/^post_max_size = .*/post_max_size = 128M/g' /usr/local/etc/php/php.ini
You can match optional spaces with [[:space:]]*, the -E for extended-regexp and use group 1 noted as \1 followed by your replacement like \1128M
sed -E -i 's/(post_max_size[[:space:]]*=[[:space:]]*)[0-9]+M/\1128M/g' /usr/local/etc/php/php.ini

Finding a string with sed, then replacing a number within that string with incremented number

I have a file that contains somehting like this:
[project]
name = "sinntelligence"
version = "1.1.dev12"
dependencies = [
"opencv-python",
"matplotlib",
"PySide6",
"numpy",
"numba"
]
Now I want to find the "version" string and increment the last number after "dev". Thus in the above example I would like to change
version = "1.1.dev12"
to
version = "1.1.dev13"
and so forth. With grep I was able to get this line with this regular expression:
grep -P "^version.*dev[0-9]+"
But since I want to replace something in a file I thought it would make more sense to use sed instead. However, with sed I don't even find that line (i.e. nothing is replaced) with this:
sed -i "s/^version.*dev[0-9]+/test/g" sed-test.txt
Any ideas 1) what I am doing wrong here with sed and 2) how can increase that "dev" number by one and write that back to the file (with just typical Ubuntu Linux command line tools)?
You used grep with -P option that enables the PCRE regex engine, and with sed you are using a POSIX BRE pattern. That is why you do not even match that line.
Then, with sed, you won't be able to easily eval and change the number, you can do that with perl:
perl -i -pe 's/^version.*dev\K(\d+)/$1 + 1/e' sed-test.txt
See the online demo:
#!/bin/bash
s='[project]
name = "sinntelligence"
version = "1.1.dev12"
dependencies = [
"opencv-python",
"matplotlib",
"PySide6",
"numpy",
"numba"
]'
perl -pe 's/^version.*dev\K(\d+)/$1 + 1/e' <<< "$s"
Output:
[project]
name = "sinntelligence"
version = "1.1.dev13"
dependencies = [
"opencv-python",
"matplotlib",
"PySide6",
"numpy",
"numba"
]
what I am doing wrong here with sed
You have to use -E option to enable extended regular expressions:
$ sed -E "s/^version.*dev[0-9]+/test/g" sed-test.txt
[project]
name = "sinntelligence"
test"
dependencies = [
"opencv-python",
"matplotlib",
"PySide6",
"numpy",
"numba"
]
how can increase that "dev" number by one and write that back to the
file (with just typical Ubuntu Linux command line tools)?
I'd use awk, below is the adaptation of solution in this Ed Morton's
answer:
awk -i inplace '/^version/ {split($3,lets,/[0-9]+"$/,digs); $3=lets[1] digs[1]+1 "\""} 1' sed-test.txt

How can I use perl to delete files matching a regex

Due to a Makefile mistake, I have some fake files in my git repo...
$ ls
=0.1.1 =4.8.0 LICENSE
=0.5.3 =5.2.0 Makefile
=0.6.1 =7.1.0 pyproject.toml
=0.6.1, all_commands.txt README_git_workflow.md
=0.8.1 CHANGES.md README.md
=1.2.0 ciscoconfparse/ requirements.txt
=1.7.0 configs/ sphinx-doc/
=2.0 CONTRIBUTING.md tests/
=2.2.0 deploy_docs.py tutorial/
=22.2.0 dev_tools/ utils/
=22.8.0 do.py
=2.7.0 examples/
$
I tried this, but it seems that there may be some more efficient means to accomplish this task...
# glob "*" will list all files globbed against "*"
foreach my $filename (grep { /\W\d+\.\d+/ } glob "*") {
my $cmd1 = "rm $filename";
`$cmd1`;
}
Question:
I want a remove command that matches against a pcre.
What is a more efficient perl solution to delete the files matching this perl regex: /\W\d+\.\d+/ (example filename: '=0.1.1')?
Fetch a wider set of files and then filter through whatever you want
my #files_to_del = grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "$dir/*";
I added an anchor (^) so that the regex can only match a string that begins with that pattern, otherwise this can blow away files other than intended. Reconsider what exactly you need.
Altogether perhaps (or see a one-liner below †)
use warnings;
use strict;
use feature 'say';
use File::Glob ':bsd_glob'; # for better glob()
use Cwd qw(cwd); # current-working-directory
my $dir = shift // cwd; # cwd by default, or from input
my $re = qr/^\W[0-9]+\.[0-9]+/;
my #files_to_del = grep { /$re/ and not -d } glob "$dir/*";
say for #files_to_del; # please inspect first
#unlink or warn "Can't unlink $_: $!" for #files_to_del;
where that * in glob might as well have some pre-selection, if suitable. In particular, if the = is a literal character (and not an indicator printed by the shell, see footnote)‡ then glob "=*" will fetch files starting with it, and then you can pass those through a grep filter.
I exclude directories, identified by -d filetest, since we are looking for files (and to not mix with some scary language about directories from unlink, thanks to brian d foy comment).
If you'd need to scan subdirectories and do the same with them, perhaps recursively -- what doesn't seem to be the case here? -- then we could employ this logic in File::Find::find (or File::Find::Rule, or yet others).
Or read the directory any other way (opendir+readdir, libraries like Path::Tiny), and filter.
† Or, a quick one-liner ... print (to inspect) what's about to get blown away
perl -wE'say for grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "*"'
and then delete 'em
perl -wE'unlink or warn "$_: $!" for grep /^\W[0-9]+\.[0-9]+/ && !-d, glob "*"'
(I switched to a more compact syntax just so. Not necessary)
If you'd like to be able to pass a directory to it (optionally, or work in the current one) then do
perl -wE'$d = shift//q(.); ...' dirpath (relative path fine. optional)
and then use glob "$d/*" in the code. This works the same way as in the script above -- shift pulls the first element from #ARGV, if anything was passed to the script on the command line, or if #ARGV is empty it returns undef and then // (defined-or) operator picks up the string q(.).
‡ That leading = may be an "indicator" of a file type if ls has been aliased with ls -F, what can be checked by running ls with suppressed aliases, one way being \ls (or check alias ls).
If that is so, the = stands for it being a socket, what in Perl can be tested for by the -S filetest.
Then that \W in the proposed regex may need to be changed to \W? to allow for no non-word characters preceding a digit, along with a test for a socket. Like
my $re = qr/^\W? [0-9]+ \. [0-9]+/x;
my #files_to_del = grep { /$re/ and -S } glob "$dir/*";
Why not just:
$ rm =*
Sometimes, shell commands are the best option.
In these cases, I use perl to merely filter the list of files:
ls | perl -ne 'print if /\A\W\d+\.\d+/a' | xargs rm
And, when I do that, I feel guilty for not doing something simpler with an extended pattern in grep:
ls | grep -E '^\W\d+\.\d+' | xargs rm
Eventually I'll run into a problem where there's a directory so I need to be more careful about the file list:
find . -type f -maxdepth 1 | grep -E '^\./\W\d+\.\d+' | xargs rm
Or I need to allow rm to remove directories too should I want that:
ls | grep -E '^\W\d+\.\d+' | xargs rm -r
Here you go.
unlink( grep { /\W\d+\.\d+/ && !-d } glob( "*" ) );
This matches the filename, and excludes directories.
To delete filenames matching this: /\W\d+\.\d+/ pcre, use the following one-liners...
1> $fn is a filename... I'm also removing the my keywords since the one-liner doesn't have to worry about perl lexical scopes:
perl -e 'foreach $fn (grep { /\W\d+\.\d+/ } glob "*") {$cmd1="rm $fn";`$cmd1`;}'
2> Or as Andy Lester responded, perhaps his answer is as efficient as we can make it...
perl -e 'unlink(grep { /\W\d+\.\d+/ } glob "*");'

Trouble getting regex to work with grep/sed

I've been working on a script to update the PRODUCT_BUNDLE_IDENTIFIER in a pbxproj file with a new value using a build script. The regex I've come up with selects everything between 'PRODUCT_BUNDLE_IDENTIFIER = ' and any text following up to 2 occurrences of '.' which is what I want.
The regex I've put together to find these occurences is shown here:
(?<=PRODUCT_BUNDLE_IDENTIFIER = )([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+){0,2})
I've tested it with a validator here: https://regex101.com/r/jUhJm7/1
To save time, here's a screenshot with the regex applied and the green portions selected as desired, so the regex seems to be working and recognizes the bundle id portion of the following examples as expected:
The issue I'm experiencing is that when using this regex with grep, grep -e, egrep, or sed it doesn't seem to be working in the same manner. I would like to use sed to run the string replacement and have tried the following methods to achieve this:
# variable definitions
BUNDLE_ID='mynew.bundle.id'
PBXFILE="$SRCROOT/myproject.xcodeproj/project.pbxproj"
# check if the test bundle id is currently in the file
if grep -Fq "REPLACEABLE_BUNDLE_ID" $PBXFILE; then
# this commented version works as expected as it's using simple string replacement
#sed -i '' "s/REPLACEABLE_BUNDLE_ID/$BUNDLE_ID/g" $PBXFILE
# these are the versions of the regex I've tried with sed #
# basic version working in validator & testing with sublime text regex engine
(?<=PRODUCT_BUNDLE_IDENTIFIER = )([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+){0,2})
# added extra parentheses around product id first portion
(?<=(PRODUCT_BUNDLE_IDENTIFIER = ))([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+){0,2})
# escaped version
\(?<=PRODUCT_BUNDLE_IDENTIFIER = \)\([a-zA-Z0-9_]+\(?:\.[a-zA-Z0-9_]+\){0,2}\)
# try replacing the current bundle id using the regex
sed -i -E '' "s/I put the regex here/$BUNDLE_ID/g" $PBXFILE
fi
I'm fairly new with regex and have not used sed before. I've read about extended regular expressions here: http://www.grymoire.com/Unix/Regular.html#uh-12 and feel like I'm just failing to put the pieces together properly.
Try this for GNU sed (use -E for unix):
$ sed -r "s/(PRODUCT_BUNDLE_IDENTIFIER = )[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+){0,2}/\1${BUNDLE_ID}/"
for example:
$ cat test.txt
PRODUCT_BUNDLE_IDENTIFIER = com.test.mybundle.keyboard;
PRODUCT_BUNDLE_IDENTIFIER = com.test.mybundle.iMessage;
PRODUCT_BUNDLE_IDENTIFIER = com.test;
PRODUCT_BUNDLE_IDENTIFIER = replaceable;
$ BUNDLE_ID='mynew.bundle.id'
$ sed -r "s/(PRODUCT_BUNDLE_IDENTIFIER = )[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+){0,2}/\1${BUNDLE_ID}/" test.txt
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id.keyboard;
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id.iMessage;
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id;
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id;

How do I replace this text with quotation marks?

I want to replace
$rcmail_config['default_host'] = '';
in /var/lib/roundcube/config/main.inc.php
with
$rcmail_config['default_host'] = 'localhost';
I've tried:
sed -i "s/$rcmail_config['default_host'] = '';/$rcmail_config['default_host'] = 'localhost';/g" /var/lib/roundcube/config/main.inc.php
and
sed -i s/$rcmail_config['default_host'] = '';/$rcmail_config['default_host'] = 'localhost';/g /var/lib/roundcube/config/main.inc.php
But it does not work.
What could I try next?
You need to escape the $ and [ symbols and also you don't need to repeat the same string in the replacement part. Instead of this, you may use capturing groups.
sed -i "s/\(\$rcmail_config\['default_host'\] = \)'';/\1'localhost';/g" file