How to delete a specific number of random lines matching a pattern - regex

I have an svg file with a grid of dots represented by lines that have the word use in them. I would like to delete a specific number of random lines matching that use pattern, then save a new version of the file. This answer was very close.
So it will be a combination of this (delete one random line in a specific range):
sed -i '.svg' $((9 + RANDOM % 579))d /filename.svg
and this (delete all lines matching pattern use):
sed -i '.svg' /use/d /filename.svg
In other words, the logic would go something like this:
sed -i delete 'x' number of RANDOM lines matching 'use' from 'input.svg' and save to 'output.svg'
I'm running these commands from Terminal on a Mac and am inexperienced with syntax so formatting the command for that would be ideal.

Delete each line containing "use" with a probability of 10%:
awk '!/use/ || rand() > 0.10' file
Randomly delete exactly one line containing "use":
awk -v n="$(( RANDOM % $(grep -c "use" file) ))" '!/use/ || n-- != 0' file
Here's an example invocation:
$ cat file
some string
a line containing "use"
another use-ful line
more random data
$ awk -v n="$(( RANDOM % $(grep -c "use" file) ))" '!/use/ || n-- != 0' file
some string
another use-ful line
more random data
One of the lines containing use was removed.

This might work for you: (GNU sed & sort):
sed -n '/\<use\>/=' file | sort -r | head -5 | sed 's/$/d/' | sed -i.bak -f - file
Extract the line numbers of the lines containing the word use from the file. Randomly sort those line numbers then take the first say 5 and build a sed script to delete them from the original file.

Related

SED: How to search for word "tokens" on consecutive lines (Windows)?

I have EDI files I need to find, by using SED to search for some anomalies.
The anomaly is when I search for a "token" called SGP, and where they are on multiple consecutive lines — so one SGP on one line and another SGP on another line — regardless of what's after the token:
SGP+SEGU1037087'
SGP+DFSU1143210'
SGP+SEGU1166926'
SGP+TGHU1203545'
But I don't want to find files where there are other segment lines between each SGP line:
SGP+TGHU1643436'
GID+2+3:BAG'
FTX+AAA+++sdfjkhsdfjkhsdfjkh'
MEA+AAE+AAB+KGM:20000.0000'
MEA+AAE+AAW+MTQ:.0000'
SGP+HCIU2090577'
So I've tried this:
sed 'SGP.*\n.*SGP' < *.txt
And as probably expected, I get nothing.
Any ideas on how to feed into SED a list of files in DOS, and get a list of files that meet the above criteria?
UPDATE
I think I have the "feed the files" bit here. But I am still stuck on how to use SED properly.
for i in *.txt; do
sed -i '<<WHAT DO I PLACE HERE?>>' $i
done
UPDATE 2
Please no Unix/Bash/etc solutions.. I am in Windows only! Thank you
UPDATE 3
Tried a DOS equivalent of #tshiono's answer but I get nothing..
for %%f in (*.txt) do (
sed -ne ':l;N;$!b l;/SGP[^\n]\+\nSGP/p' %%f
}
UPDATE 4
#tshiono - I want the script to find files that have this pattern...
SGP+SEGU1037087'
SGP+DFSU1143210'
SGP+SEGU1166926'
SGP+TGHU1203545'
Not this pattern ...
SGP+SEGU1037087'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+DFSU1143210'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+SEGU1166926'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+TGHU1203545'
Again - only lines with SGP as a token on every NEWLINE
Could you please try following.
awk '
FNR==1{
if(count){
if(fnr==count){
print prev_file " has all lines of SGP."
}
}
prev_file=FILENAME
count=fnr=""
}
/^SGP/{
++count
}
{
fnr++
}
END{
if(fnr==count){
print prev_file " has all lines of SGP."
}
}
' *.txt
The requirement is to detect which files contain consecutive lines both starting SGP.
Using standard (POSIX) sed, there's no way to get sed to print the file name. You can use this combination of shell script and sed, though, to detect which files contain consecutive lines starting with SGP:
for file in *.txt;
do
if [ -n "$(sed -n -e '/^SGP/{N;/^SGP.*\nSGP/{p;q;}}' "$file")" ]
then echo "$file"
fi
done
The shell test [ … ] checks whether the output of $(sed …) is a non-empty string, and reports the name of the file if it is. Note that the script is more flexible if, instead of using the glob *.txt, it uses the "$#" (list of arguments, preserving spaces etc). You can the write:
sh find-consecutive-SGP.sh *.txt
or use other more fanciful ways of specifying the file names as arguments.
The sed command doesn't print by default (-n). It looks for a line starting SGP and appends the next line into the 'pattern space'. It then looks to see if the result has two lots of SGP in it; one at the start (we know that will be there) and one after a newline. If that's found, it prints both lines (the pattern space) and quits because its job is done; it has found two consecutive lines both starting SGP. If the pattern space doesn't match, it is not printed (because of the -n) and more data is read. Any lines that don't start SGP are ignored and not printed.
With GNU sed, the F command prints the file name and a newline, so you could use:
for file in *.txt;
do
sed -n -e '/^SGP/{N;/^SGP.*\nSGP/{F;q;}}' "$file"
done
AFAICT from the GNU sed manual, there's no way to 'skip to the start of the next file' so you have to test each file separately as shown, rather than trying sed -n -e '…' *.txt — that will only report the first file that breaches the condition, not all the files.
If your objective is to get the list of filenames which meet the criteria,
how about:
for i in *.txt; do
[[ -n $(sed -ne ':l;N;$!b l;/SGP[^\n]\+\nSGP/p' "$i") ]] && echo "$i"
done
The sed commands :l;N;$!b makes a loop and slurps the whole lines
in the pattern space including "\n"
Then it matches the lines with the pattern of two consecutive lines
which both contain SGP.
If the sed output is non-empty, it prints the current filename.
[Update]
If your requirement is DOS platform, please try instead:
setlocal EnableDelayedExpansion
for %%f in (text*.txt) do (
set result=
for /f "usebackq tokens=*" %%a in (`sed.exe -ne ":l;N;$!b l;/SGP.\+\nSGP.\+/p" %%f`) do set result=!result!%%a
if "!result!" neq "" (
echo %%f
)
)
I've tested with Windows10 and sed-4.2.1.

Regex to add a character at the beginning of a particular line in a file

I have a file which is having n number of lines but I want to find only one line and edit it without printing the file contents on the screen. File is dynamically created so I can't count the spaces and all. So I want to use the RegEx for this.
My file is:
hey retry=3
hello so
password so
And I want to make it as:
#hey retry=3
hello so
password so
I tried all these:
sed 's/password[ \t]+requisite[ \t]+pam_pwquality.so/s/^/#/' test1
x='/password[ \t]+requisite[ \t]+pam_pwquality.so/'
sed -i -e "s/\($x\)/#\1/" test1
re="^[password][[ :blank: ]]*[requisite][[ :blank:]]*[pam_pwquality.so][[ :blank:]]*[retry=3]"
But no changes in the file.
I would use awk:
awk '$1=="password" && $2=="requisite" && $3=="pam_deny.so" {
$0="#"$0
}1' file
awk splits the line into fields separated by one or more whitespace characters (which includes tabs). That makes it simple to check the content of the individual fields.
With gawk you can change the file in place:
gawk -i inplace '$1=="password" && $2=="requisite" && $3=="pam_deny.so"
{
$0="#"$0
}1' file
grep -n "password requisite pam_deny.so"
man grep states
-n, --line-number
Prefix each line of output with the 1-based line number within its input file. (-n is specified by POSIX.)

Search for text between two patterns with multiple lines in between

I have a simple question. I have a file containing:
more random text
*foo*
there
is
random
text
here
*foo*
foo
even
more
random
text
here
foo
more random text
(to clarify between which parts i want the result from, i added the *'s next to foo. The *'s are not in the file.)
I only want to print out the multiple lines between the first 2 instances of foo.
I tried searching for ways to let "foo" occur only once and then remove it. But i didnt get that far. However i did find the way to remove all the "more random text" using: sed '/foo/,/foo/p' but i couldnt find a way using sed, or awk to only match ones and print the output.
Can anyone help me out?
With sed:
$ sed -n '/foo/{:a;n;/foo/q;p;ba}' infile
there
is
random
text
here
Explained:
/foo/ { # If we match "foo"
:a # Label to branch to
n # Discard current line, read next line (does not print because of -n)
/foo/q # If we match the closing "foo", then quit
p # Print line (is a line between two "foo"s)
ba # Branch to :a
}
Some seds complain about braces in one-liners; in those cases, this should work:
sed -n '/foo/ {
:a
n
/foo/q
p
ba
}' infile
$ awk '/foo/{++c;next} c==1' file
there
is
random
text
here
$ awk '/foo/{++c;next} c==3' file
even
more
random
text
here
or with GNU awk for multi-char RS you COULD do:
$ awk -v RS='(^|\n)[^\n]*foo[^\n]*(\n|$)' 'NR==2' file
there
is
random
text
here
$ awk -v RS='(^|\n)[^\n]*foo[^\n]*(\n|$)' 'NR==4' file
even
more
random
text
here
See https://stackoverflow.com/a/17914105/1745001 for other ways of printing after a condition is true.
Since checking for "foo" (using /foo/) is relatively expensive, the following avoids that check and will work with all awks worthy of the name:
awk 'c==2 {next} /foo/{++c;next} c==1' file

Using multiple sed commands

Hi I'm looking to search through a file and output the values of a line that matches the following regex with the matching text removed, I don't need it output to a file. This is what I am currently using and it is outputting the required text but multiple times:
#!/bin/sh
for file in *; do
sed -e 's/^owner //g;p;!d ; s/^admin //g;p;!d ; s/^loc //g;p;!d ; s/^ser //g;p;!d' $file
done
The preferred format would be something like this so I could have control over what happens inbetween:
for file in *; do
sed 's/^owner //g;p' $file | head -1
sed 's/^admin //g;p' $file | head -1
sed '/^loc //g;p' $file | head -1
sed '/^ser //g;p' $file | head -1
done
An example input file would be the following:
owner sys group
admin guy
loc Q-30934
ser 18r9723
comment noisy fan is something
and the required output is the following:
sys group
guy
Q-30934
18r9723
You're giving sed the p (for Print) command several times. It prints the entire line each time. And unless you tell it not to with the -n option, sed will print the line at the end anyway.
You also give the !d command multiple times.
Edited after you added the multiple-sed version: instead of using head -q, just use -n to avoid printing lines you don't want. Or even use q (Quit) to stop processing after printing the bit you do want.
For instance:
sed -n '/^owner / { s///gp; q; }' $file
The {} group the substitution and quit commands together, so that they are both executed if and only if the pattern is matched. Having used the pattern in the address at the beginning, you can leave it out of the s command. So that command is short for:
sed -n '/^owner / { s/^owner //gp; q; }' $file
I'd suggest:
sed -n -e '/^owner / { s///; p; }' \
-e '/^admin / { s///; p; }' \
-e '/^loc / { s///; p; }' \
-e '/^ser / { s///; p; }' \
*
sed is perfectly capable of reading many files, so the loop control is unnecessary (you aren't doing per-file I/O redirection, for example) and it's reasonable to list the files after the rest of the sed command (that's the * on its own). If you've got a more modern version of sed (e.g. GNU sed), you can combine the patterns into a single line:
sed -r -n -e '/^(owner|admin|loc|ser) / { s///; p; }' *
This might work for (GNU sed):
sed '0,/^owner /{//s///p};0,/^admin /{//s///p};0,/^loc /{//s///p};0,/^ser /{//s///p}' file
Creates a series of toggle switches, one for each of the desired strings. The switches apply once only throughout the file for each string i.e. only the first occurence of each string is printed.
An alternative and depending on file sizes maybe quicker method:
sed -rn '1{x;s/^/owner admin loc ser /;x};/^(owner |admin |loc |ser )/{G;/^(owner |admin |loc |ser )(.*\n.*)\1/!b;s//\2/;P;/\n$/q;s/.*\n//;h}' file
This preps the hold space with the desired strings. For only those lines that contain the desired strings, append the hold space and check if the current line needs to be amended. Match the desired string with the same string in the hold space. If the line has already appeared the match will fail and the line can be disregarded. If the line is yet to be amended, the desired string is removed from the current line and then the first half of the line is printed. If no strings appear in the remaining half of the line the process is over and can be quit. Otherwise remove the first half of the string and replace the hold space with the desired string removed.

sed/awk replace in all matches

I want to invert all the color values in a bunch of files. The colors are all in the hex format #ff3300 so the inversion could be done characterwise with the sed command
y/0123456789abcdef/fedcba9876543210/
How can I loop through all the color matches and do the char translation in sed or awk?
EDIT:
sample input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
asdfghj
desired output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
asdfghj
EDIT: I changed my response as per your edit.
OK, sed may result in a difficult processing. awk could do the trick more or less easily, but I find perl much more easy for this task:
$ perl -pe 's/#[0-9a-f]+/$&=~tr%0123456789abcdef%fedcba9876543210%r/ge' <infile >outfile
Basically you find the pattern, then execute the right-hand side, which executes the tr on the match, and substitutes the value there.
The inversion is really a subtraction. To invert a hex, you just subtract it from ffffff.
With this in mind, you can build a simple script to process each line, extract hexes, invert them, and inject them back to the line.
This is using Bash (see arrays, printf -v, += etc) only (no external tools there):
#!/usr/bin/env bash
[[ -f $1 ]] || { printf "error: cannot find file: %s\n" "$1" >&2; exit 1; }
while read -r; do
# split line with '#' as separator
IFS='#' toks=( $REPLY )
for tok in "${toks[#]}"; do
# extract hex
read -n6 hex <<< "$tok"
# is it really a hex ?
if [[ $hex =~ [0-9a-fA-F]{6} ]]; then
# compute inversion
inv="$((16#ffffff - 16#$hex))"
# zero pad the result
printf -v inv "%06x" "$inv"
# replace hex with inv
tok="${tok/$hex/$inv}"
fi
# build the modified line
line+="#$tok"
done
# print the modified line and clean it for reuse
printf "%s\n" "${line#\#}"
unset line
done < "$1"
use it like:
$ ./invhex infile > outfile
test case input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
bdf#cvb_foo
asdfghj
#bdfg
processed output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
bdf#cvb_foo
asdfghj
#bdfg
This might work for you (GNU sed):
sed '/#[a-f0-9]\{6\}\>/!b
s//\n&/g
h
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g
y/0123456789abcdef/fedcba9876543210/
H
g
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta
s/\n//' file
Explanation:
/#[a-f0-9]\{6\}\>/!b bail out on lines not containing the required pattern
s//\n&/g prepend every pattern with a newline
h copy this to the hold space
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g delete everything but the required pattern(s)
y/0123456789abcdef/fedcba9876543210/ transform the pattern(s)
H append the new pattern(s) to the hold space
g overwrite the pattern space with the contents of the hold space
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta replace the old pattern(s) with the new.
s/\n// remove the newline artifact from the H command.
This works...
cat test.txt |sed -e 's/\#\([0123456789abcdef]\{6\}\)/\n\#\1\n/g' |sed -e ' /^#.*/ y/0123456789abcdef/fedcba9876543210/' | awk '{lastType=type;type= substr($0,1,1)=="#";} type==lastType && length(line)>0 {print line;line=$0} type!=lastType {line=line$0} length(line)==0 {line=$0} END {print line}'
The first sed command inserts line breaks around the hex codes, then it is possible to make the substitution on all lines starting with a hash. There are probably an elegant solution to merge the lines back again, but the awk command does the job. The only assumption there is that there won't be two hex-codes following directly after each other. If so, this step has to be revised.