Replace more than 150000 character with sed

Replace more than 150000 character with sed - regex

I want to replace this LONG string with sed
And I got the string from grep which I store it into variable var
Here is my grep command and my var variable :
var=$(grep -P -o "[^:]//.{0,}" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map | grep -P -o "//.{0,})
Here is the output from grep : string
Then I try to replace it with sed command
sed -i "s|$var||g" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
But it give me output bash: /bin/sed: Argument list too long
How can I replace it?
NB : That string has 183544 character in one line.

What are you actually trying to accomplish here? sed is line-oriented, so you cannot replace a multi-line string (not even if you replace literal newlines with \n .... Well, there are ways to write a sed script which effectively replaces a sequence of lines, but it gets tortured quickly).
bash$ var=$(head -n 2 /etc/mtab)
bash$ sed "s|$var||" /etc/mtab
sed: -e expression #1, char 25: unterminated `s' command
bash$ sed "s|${var//$'\n'/\\n}||" /etc/mtab | diff -u /etc/mtab -
bash$ # (didn't replace anything, so no output)
As a workaround, what you probably want could be approached by replacing the newlines in $var with \| (or possibly just |, depending on your sed dialect) similarly to what was demonstrated above, but you'd still be bumping into the ARG_MAX limit and have a bunch of other pesky wrinkles to iron out, so let's not go there.
However, what you are attempting can be magnificently completed by sed itself, all on its own. You don't need a list of the strings; after all, sed too can handle regular expressions (and nothing in the regex you are using actually requires Perl extensions, so the -P option is by and large superfluous).
sed -i 's%\([^:]\)//.*%\1%' file
There is a minor caveat -- if there are strings which occur both with and without : in front, your original command would have replaced them all (if it had worked), whereas this one will only replace the occurrences which do not have a colon in front. That means comments at beginning of line will not be touched -- if you want them removed too, just add a line anchor as an alternative; sed -i 's%\(^\|[^:]\)//.*%\1%' file
If you want the comments in var for other reasons, the grep can be cleaned up significantly, too. (Obviously, you'd run this before performing the replacement.)
var=$(grep -P -o '[^:]\K//.*' file)
(The \K extension is one which genuinely requires -P. And of course, the common, clear, standard, readable, portable, obvious, simple way to write {0,} is *.)

On most systems these days, the value of ARG_MAX is big enough to handle 150k without problems, but it is important to note that while the limit is called ARG_MAX and the error message indicates that the command line is too long, the real limit is the sum of the sizes of the arguments and all (exported) environment variables. Also, Linux imposes a limit of 128k (131,072 bytes) for a single argument string. Exceeding any of these limits triggers an error return of E2BIG, which is printed as "Argument list too long".
In any case, bash built-ins are exempt from the limit, so you should be able to feed the command into sed as a command file:
echo "s|$var||g" | sed -f - -i /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
That may not help you much, though. Your variable is full of regex metacharacters, so it will not match the string itself. You'll need to clean it up in order to be able to use it as a regular expression.
There's probably a cleaner way to do that edit, though.

Related

Can I perform a 'non-global' grep and capture only the first match found for each line of input?

I understand that what I'm asking can be accomplished using awk or sed, I'm asking here how to do this using GREP.
Given the following input:
.bash_profile
.config/ranger/bookmarks
.oh-my-zsh/README.md
I want to use GREP to get:
.bash_profile
.config/
.oh-my-zsh/
Currently I'm trying
grep -Po '([^/]*[/]?){1}'
Which results in output:
.bash_profile
.config/
ranger/
bookmarks
.oh-my-zsh/
README.md
Is there some simple way to use GREP to only get the first matched string on each line?

I think you can grep non / letters like:
grep -Eo '^[^/]+'
On another SO site there is another similar question with solution.

You don't need grep for this at all.
cut -d / -f 1

The -o option says to print every substring which matches your pattern, instead of printing each matching line. Your current pattern matches every string which doesn't contain slashes (optionally including a trailing slash); but it's easy to switch to one which only matches this pattern at the beginning of a line.
grep -o '^[^/]*' file
Notice the addition of the ^ beginning of line anchor, and the omission of the -P option (which you were not really using anyway) as well as the silly beginner error {1}.
(I should add that plain grep doesn't support parentheses or repetitions; grep -E would support these constructs just fine, of you could switch to toe POSIX BRE variation which requires a backslash to use round or curly parentheses as metacharacters. You can probably ignore these details and just use grep -E everywhere unless you really need the features of grep -P, though also be aware that -P is not portable.)

Where is this Regex expression not closed in sed (apostrophe parenthesis)?

I'm trying to update some setting for wordpress and I need to use sed. When I run the below command, it seems to think the line is not finished. What am I doing wrong?
$ sed -i 's/define\( \'DB_NAME\', \'database_name_here\' \);/define\( \'DB_NAME\', \'wordpress\' \);/g' /usr/share/nginx/wordpress/wp-settings.php
> ^C
Thanks.

Single quotes in most shells don't support any escaping. If you want to include a single quote, you need to close the single quotes and add the single quote - either in double quotes, or backslashed:
sed 's/define\( '\''DB_NAME'\'', '\''database_name_here'\'' \);/define\( '\''DB_NAME'\'', '\''wordpress'\'' \);/g'
I fear it still wouldn't work for you, as \( is special in sed. You probably want just a simple ( instead.
sed 's/define( '\''DB_NAME'\'', '\''database_name_here'\'' );/define( '\''DB_NAME'\'', '\''wordpress'\'' );/g'
or
sed 's/define( '"'"'DB_NAME'"'"', '"'"'database_name_here'"'"' );/define( '"'"'DB_NAME'"'"', '"'"'wordpress'"'"' );/g'

Normally, using single quotes around the script of a sed script is sensible. This is a case where double quotes would be a better choice — there are no shell metacharacters other than single quotes in the sed script:
sed -e "s/define( 'DB_NAME', 'database_name_here' );/define( 'DB_NAME', 'wordpress' );/g" /usr/share/nginx/wordpress/wp-settings.php
or:
sed -e "s/\(define( 'DB_NAME', '\)database_name_here' );/\1wordpress' );/g" /usr/share/nginx/wordpress/wp-settings.php
or even:
sed -e "/define( 'DB_NAME', 'database_name_here' );/s/database_name_here/wordpress/g" /usr/share/nginx/wordpress/wp-settings.php
One other option to consider is using sed's -f option to provide the script as a file. That saves you from having to escape the script contents from the shell. The downside may be that you have to create the file, run sed using it, and then remove the file. It is likely that's too painful for the current task, but it can be sensible — it can certainly make life easier when you don't have to worry about shell escapes.
I'm not convinced the g (global replace) option is relevant; how many single lines are you going to find in the settings file containing two independent define DB_NAME operations with the default value?
You can add the -i option when you've got the basic code working. Do note that if you might ever work on macOS or a BSD-based system, you'll need to provide a suffix as an extra argument to the -i option (e.g. -i '' for a null suffix or no backup; or -i.bak to be able to work reliably on both Linux (or, more accurately, with GNU sed) and macOS and BSD (or, more accurately, with BSD sed). Appealing to POSIX is no help; it doesn't support an overwrite option.
Test case (first example):
$ echo "define( 'DB_NAME', 'database_name_here' );" |
> sed -e "s/\(define( 'DB_NAME', '\)database_name_here' );/\1wordpress' );/g"
define( 'DB_NAME', 'wordpress' );
$
If the spacing around 'DB_NAME' is not consistent, then you'd end up with more verbose regular expressions, using [[:space:]]* in lieu of blanks, and you'd find the third alternative better than the others, but the second could capture both the leading and trailing contexts and use both captures in the replacement.
Parting words: this technique works this time because the patterns don't involve shell metacharacters like $ or  ` . Very often, the script does need to match those, and then using mainly single quotes around the script argument is sensible. Tackling a different task — replace $DB_NAME in the input with the value of the shell variable $DB_NAME (leaving $DB_NAMEORHOST unchanged):
sed -e 's/$DB_NAME\([^[:alnum:]]\)/'"$DB_NAME"'\1/'
There are three separate shell strings, all concatenated with no spaces. The first is single-quoted and contains the s/…/ part of a s/…/…/ command; the second is "$DB_NAME", the value of the shell variable, double-quoted so that if the value of $DB_NAME is 'autonomous vehicle recording', you still have a single argument to sed; the third is the '\1/' part, which puts back whatever character followed $DB_NAME in the input text (with the observation that if $DB_NAME could appear at the end of an input line, this would not match it).
Most regexes do fuzzy matching; you have to consider variations on what might be in the input to determine how hard your regular expressions have to work to identify the material accurately.

Extracting a match from a string with sed and a regular expression in bash

In bash, I want to get the name of the last folder in a folder path.
For instance, given ../parent/child/, I want "child" as the output.
In a language other than bash, this regex works .*\/(.*)\/$ works.
Here's one of my attempts in bash:
echo "../parent/child/" | sed "s_.*/\(.*?\)/$_\1_p"
This gives me the error:
sed: -e expression #1, char 17: unterminated `s' command
What have I failed to understand?

One problem with your script is that inside the "s_.*/\(.*?\)/$_\1_p" the $_ is interpreted by the shell as a variable name.
You could either replace the double-quotes with single-quotes or escape the $.
Once that's fixed, the .*? may or may not work with your implementation of sed. It will be more robust to write something roughly equivalent that's more widely supported, for example:
sed -e 's_.*/\([^/]*\)/$_\1_'
Note that I dropped the p flag of sed to avoid printing the result twice.
Finally, a much simpler solution will be to use the basedir command.
$ basename ../parent/child/
child
Finally, a native Bash solution is also possible using parameter expansion:
path=../parent/child/
path=${path%/}
path=${path##*/}

You can use cut too
echo '../parent/child/' | cut -d/ -f3

Use of grep + sed based on a pattern file?

Here's the problem: i have ~35k files that might or might not contain one or more of the strings in a list of 300 lines containing a regex each
if I grep -rnwl 'C:\out\' --include=*.txt -E --file='comp.log' i see there are a few thousands of files that contain a match.
now how do i get sed to delete each line in these files containing the strings in comp.log used before?
edit: comp.log contains a simple regex in each line, but for the most part each string to be matched is unique
this is is an example of how it is structured:
server[0-9]\/files\/bobba fett.stw
[a-z]+ mochaccino
[2-9] CheeseCakes
...
etc. silly examples aside, it goes to show each line is unique save for a few variations so it shouldn't affect what i really want: see if any of these lines match the lines in the file being worked on. it's no different than 's/pattern/replacement/' except that i want to use the patterns in the file instead of inline.
Ok here's an update (S.O. gets inpatient if i don't declare the question answered after a few days)
after MUCH fiddling with the #Kenavoz/#Fischer approach, i found a totally different solution, but first things first.
creating a modified pattern list for sed to work with does work.
as well as #werkritter's approach of dropping sed altogether. (this one i find the most... err... "least convoluted" way around the problem).
I couldn't make #Mklement's answer work under windows/cygwin (it did work on under ubuntu, so...not sure what that means. figures.)
What ended up solving the problem in a more... long term, reusable form was a wonderful program pointed out by a colleage called PowerGrep. it really blows every other option out of the water. unfortunately it's windows only AND it's not free. (not even advertising here, the thing is not cheap, but it does solve the problem).
so considering #werkiter's reply was not a "proper" answer and i can't just choose both #Lars Fischer and #Kenavoz's answer as a solution (they complement each other), i am awarding #Kenavoz the tickmark for being first.
final thoughts: i was hoping for a simpler, universal and free solution but apparently there is not.

You can try this :
sed -f <(sed 's/^/\//g;s/$/\/d/g' comp.log) file > outputfile
All regex in comp.log are formatted to a sed address with a d command : /regex/d. This command deletes lines matching the patterns.
This internal sed is sent as a file (with process substitition) to the -f option of the external sed applied to file.
To delete just string matching the patterns (not all line) :
sed -f <(sed 's/^/s\//g;s/$/\/\/g/g' comp.log) file > outputfile
Update :
The command output is redirected to outputfile.

Some ideas but not a complete solution, as it requires some adopting to your script (not shown in the question).
I would convert comp.log into a sed script containing the necessary deletes:
cat comp.log | sed -r "s+(.*)+/\1/ d;+" > comp.sed`
That would make your example comp.sed look like:
/server[0-9]\/files\/bobba fett.stw/ d;
/[a-z]+ mochaccino/ d;
/[2-9] CheeseCakes/ d;
then I would apply the comp.sed script to each file reported by grep (With your -rnwl that would require some filtering to get the filename.):
sed -i.bak -f comp.sed $AFileReportedByGrep
If you have gnu sed, you can use -i inplace replacement creating a .bak backup, otherwise use piping to a temporary file

Both Kenavoz's answer and Lars Fischer's answer use the same ingenious approach:
transform the list of input regexes into a list of sed match-and-delete commands, passed as a file acting as the script to sed via -f.
To complement these answers with a single command that puts it all together, assuming you have GNU sed and your shell is bash, ksh, or zsh (to support <(...)):
find 'c:/out' -name '*.txt' -exec sed -i -r -f <(sed 's#.*#/\\<&\\>/d#' comp.log) {} +
find 'c:/out' -name '*.txt' matches all *.txt files in the subtree of dir. c:/out
-exec ... + passes as many matching files as will fit on a single command line to the specified command, typically resulting only in a single invocation.
sed -i updates the input files in-place (conceptually speaking - there are caveats); append a suffix (e.g., -i.bak) to save backups of the original files with that suffix.
sed -r activates support for extended regular expressions, which is what the input regexes are.
sed -f reads the script to execute from the specified filename, which in this case, as explained in Kenavoz's answer, uses a process substitution (<(...)) to make the enclosed sed command's output act like a [transient] file.
The s/// sed command - which uses alternative delimiter # to facilitate use of literal / - encloses each line from comp.log in /\<...\>/d to yield the desired deletion command; the enclosing of the input regex in \<...\>ensures matching as a word, as grep -w does.
This is the primary reason why GNU sed is required, because neither POSIX EREs (extended regular expressions) nor BSD/OSX sed support \< and \>.
However, you could make it work with BSD/OSX sed by replacing -r with -E, and \< / \> with [[:<:]] / [[:>:]]

Sed substitution not doing what I want and think it should do

I have am trying to use sed to get some info that is encoded within the path of a file which is passed as a parameter to my script (Bourne sh, if it matters).
From this example path, I'd like the result to be 8
PATH=/foo/bar/baz/1-1.8/sing/song
I first got the regex close by using sed as grep:
echo $PATH | sed -n -e "/^.*\/1-1\.\([0-9][0-9]*\).*/p"
This properly recognized the string, so I edited it to make a substitution out of it:
echo $PATH | sed -n -e "s/^.*\/1-1\.\([0-9][0-9]*\).*/\1/"
But this doesn't produce any output. I know I'm just not seeing something simple, but would really appreciate any ideas about what I'm doing wrong or about other ways to debug sed regular expressions.
(edit)
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
It is also possible to have an input string that the regular expression should not match and should product no output.

The -n option to sed supresses normal output, and since your second line doesn't have a p command, nothing is output. Get rid of the -n or stick a p back on the end

It looks like you're trying to get the 8 from the 1-1.8 (where 8 is any sequence of numerics), yes? If so, I would just use:
echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
No doubt you could get it working with one sed "instruction" (-e) but sometimes it's easier just to break it down.
The first strips out everything from the start up to and including 1-1., the second strips from the first non-numeric after that to the end.
$ echo /foo/bar/baz/1-1.8/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
8
$ echo /foo/bar/baz/1-1.752/sing/song | sed -e "s/.*\/1-1\.//" -e "s/[^0-9].*//"
752
And, as an aside, this is actually how I debug sed regular expressions. I put simple ones in independent instructions (or independent part of a pipeline for other filtering commands) so I can see what each does.
Following your edit, this also works:
$ echo /foo/bar/baz/1-1.962/sing/song | sed -e "s/.*\/1-1\.\([0-9][0-9]*\).*/\1/"
962
As to your comment:
In the example path the components other than the numerical one can contain numbers similar to the numeric path component that I listed, but not quite the same. I'm trying to exactly match the component that that is 1-1. and see what some-number is.
The two-part sed command I gave you should work with numerics anywhere in the string (as long as there's no 1-1. after the one you're interested in). That's because it actually deletes up to the specific 1-1. string and thereafter from the first non-numeric). If you have some examples that don't work as expected, toss them into the question as an update and I'll adjust the answer.

You can shorten you command by using + (one or more) instead of * (zero or more):
sed -n -e "s/^.*\/1-1\.\([0-9]\+\).*/\1/"

don't use PATH as your variable. It clashes with PATH environment variable
echo $path|sed -e's/.*1-1\.//;s/\/.*//'

You needn't divide your patterns with / (s/a/b/g), but may choose every character, so if you're dealing with paths, # is more useful than /:
echo /foo/1-1.962/sing | sed -e "s#.*/1-1\.\([0-9]\+\).*#\1#"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js