Bash add regex values to an array

Bash add regex values to an array - regex

I am trying to write a bash script that will take in a file and look for all values matching a regular expression, and then add those to an array.
As a first step, I wrote a script that adds all lines in the log file to an array, and echoes them. Then, I tried editing that script to search for the regular expression in the log file, which is where I got a tremendous amount of errors.
What I am trying to do is grab the value inside the brackets of the log file. Some lines in the log file contain a syntax like [23423234 s] which is a time stamp. I want to grab the values (digits, space, and the "s") inside the brackets( but not the brackets!) and add those values to an array.
My initial script is below:
#!/bin/bash
echo "STARTING SCRIPT"
getArray(){
array=()
while IFS= read -r line
do
array+=("$line")
done <"$1"
}
getArray "testlog.txt"
for e in "${array[#]}"
do
echo "$e"
done
echo "DONE SCRIPT"
The log I am looking at looks like this:
[1542053213 s] Starting Program:
-----------------------------------------
[1542053213 s] PROGRAM ERROR
ERRHAND: 1033
ERRHAND: 233545
ERRHAND: 1
[1542053213 s] Program completed!
[1542053213 s] Config File complete. Stopping!
What I am aiming to do is do something with the following pseudocode:
For each line in file{
regex = [\d\ws]
if line matches regex{
add to array
}
}
for each item in array{
echo item
}
Currently, I have edited my script to look like below:
#!/bin/bash
echo "STARTING SCRIPT"
getArray(){
array=()
while IFS= read -r line
do
if [[$line =~ [\d\ws]; then
array+=("$line");
fi
done <"$1"
}
getArray "log.txt"
for e in "${array[#]}"
do
echo "$e"
done
echo "DONE SCRIPT"
But whenever I run it, I get the following set of errors:
[jm#local Home]$ ./Parser.sh
STARTING SCRIPT
./Parser.sh: line 9: [[[1542053213: command not found
./Parser.sh: line 9: [[-----------------------------------------: command not found
./Parser.sh: line 9: [[[1542053213: command not found
./Parser.sh: line 9: [[ERRHAND:: command not found
./Parser.sh: line 9: [[ERRHAND:: command not found
./Parser.sh: line 9: [[ERRHAND:: command not found
./Parser.sh: line 9: [[[1542053213: command not found
./Parser.sh: line 9: [[: command not found
./Parser.sh: line 9: [[[1542053213: command not found
DONE SCRIPT
Any advice would be greatly appreciated. I have tried looking at other posts but none have been able to really address my problem, which is creating a proper regex for the [2342323 s] pattern, and then adding that to an array. TiA

As pointed out in the comments
if [[ is missing its closing ]].
In a regex [ is not a literal, but starts a character group. To match something like [1234 s] you have to write \[[0-9]* s\].
To extract just the number 1234 from \[1234 s\] you can use tr, sed, perl -P, or a second grep -o.
Overall, your script seems way too complicated. You can drastically simplify it. Replace the for loop by mapfile and use grep -o to extract matches. You can replace your whole script with this
mapfile -t array < <(grep -o '\[[0-9]* s\]' logfile | tr -d '[] s')
printf '%s\n' "${array[#]}"
Note that if you only want to print the matches then you don't need an array. Just the grep part would be sufficient:
grep -o '\[[0-9]* s\]' logfile | tr -d '[] s'

Related

SED: How to search for word "tokens" on consecutive lines (Windows)?

I have EDI files I need to find, by using SED to search for some anomalies.
The anomaly is when I search for a "token" called SGP, and where they are on multiple consecutive lines — so one SGP on one line and another SGP on another line — regardless of what's after the token:
SGP+SEGU1037087'
SGP+DFSU1143210'
SGP+SEGU1166926'
SGP+TGHU1203545'
But I don't want to find files where there are other segment lines between each SGP line:
SGP+TGHU1643436'
GID+2+3:BAG'
FTX+AAA+++sdfjkhsdfjkhsdfjkh'
MEA+AAE+AAB+KGM:20000.0000'
MEA+AAE+AAW+MTQ:.0000'
SGP+HCIU2090577'
So I've tried this:
sed 'SGP.*\n.*SGP' < *.txt
And as probably expected, I get nothing.
Any ideas on how to feed into SED a list of files in DOS, and get a list of files that meet the above criteria?
UPDATE
I think I have the "feed the files" bit here. But I am still stuck on how to use SED properly.
for i in *.txt; do
sed -i '<<WHAT DO I PLACE HERE?>>' $i
done
UPDATE 2
Please no Unix/Bash/etc solutions.. I am in Windows only! Thank you
UPDATE 3
Tried a DOS equivalent of #tshiono's answer but I get nothing..
for %%f in (*.txt) do (
sed -ne ':l;N;$!b l;/SGP[^\n]\+\nSGP/p' %%f
}
UPDATE 4
#tshiono - I want the script to find files that have this pattern...
SGP+SEGU1037087'
SGP+DFSU1143210'
SGP+SEGU1166926'
SGP+TGHU1203545'
Not this pattern ...
SGP+SEGU1037087'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+DFSU1143210'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+SEGU1166926'
FTT+asdjkfhsdkf hsdjkfh sdfjkh sdf
FTX+f sdfjsdfkljsdkfljsdklfj
GID+sdfjkhsdjkfhsdjkfsdf
SGP+TGHU1203545'
Again - only lines with SGP as a token on every NEWLINE

Could you please try following.
awk '
FNR==1{
if(count){
if(fnr==count){
print prev_file " has all lines of SGP."
}
}
prev_file=FILENAME
count=fnr=""
}
/^SGP/{
++count
}
{
fnr++
}
END{
if(fnr==count){
print prev_file " has all lines of SGP."
}
}
' *.txt

The requirement is to detect which files contain consecutive lines both starting SGP.
Using standard (POSIX) sed, there's no way to get sed to print the file name. You can use this combination of shell script and sed, though, to detect which files contain consecutive lines starting with SGP:
for file in *.txt;
do
if [ -n "$(sed -n -e '/^SGP/{N;/^SGP.*\nSGP/{p;q;}}' "$file")" ]
then echo "$file"
fi
done
The shell test [ … ] checks whether the output of $(sed …) is a non-empty string, and reports the name of the file if it is. Note that the script is more flexible if, instead of using the glob *.txt, it uses the "$#" (list of arguments, preserving spaces etc). You can the write:
sh find-consecutive-SGP.sh *.txt
or use other more fanciful ways of specifying the file names as arguments.
The sed command doesn't print by default (-n). It looks for a line starting SGP and appends the next line into the 'pattern space'. It then looks to see if the result has two lots of SGP in it; one at the start (we know that will be there) and one after a newline. If that's found, it prints both lines (the pattern space) and quits because its job is done; it has found two consecutive lines both starting SGP. If the pattern space doesn't match, it is not printed (because of the -n) and more data is read. Any lines that don't start SGP are ignored and not printed.
With GNU sed, the F command prints the file name and a newline, so you could use:
for file in *.txt;
do
sed -n -e '/^SGP/{N;/^SGP.*\nSGP/{F;q;}}' "$file"
done
AFAICT from the GNU sed manual, there's no way to 'skip to the start of the next file' so you have to test each file separately as shown, rather than trying sed -n -e '…' *.txt — that will only report the first file that breaches the condition, not all the files.

If your objective is to get the list of filenames which meet the criteria,
how about:
for i in *.txt; do
[[ -n $(sed -ne ':l;N;$!b l;/SGP[^\n]\+\nSGP/p' "$i") ]] && echo "$i"
done
The sed commands :l;N;$!b makes a loop and slurps the whole lines
in the pattern space including "\n"
Then it matches the lines with the pattern of two consecutive lines
which both contain SGP.
If the sed output is non-empty, it prints the current filename.
[Update]
If your requirement is DOS platform, please try instead:
setlocal EnableDelayedExpansion
for %%f in (text*.txt) do (
set result=
for /f "usebackq tokens=*" %%a in (`sed.exe -ne ":l;N;$!b l;/SGP.\+\nSGP.\+/p" %%f`) do set result=!result!%%a
if "!result!" neq "" (
echo %%f
)
)
I've tested with Windows10 and sed-4.2.1.

Source and run shell function within perl regex

The Problem
I am attempting to reuse a shell function I have defined in bash script later on in the script, within a perl cmd execution block. The call to perl cmd basically needs to to run the defined shell function after matching a piece of the regex (capture group #2). See code definitions below.
The Code
The pertinent function definition in bash shell script:
evalPS() {
PS_ARGS=$(eval 'echo -en "'${1}'"' | sed -e 's#\\\[##g' -e 's#\\\]##g')
PS_STR=$((set +x; (PS4="+.$PS_ARGS"; set -x; :) 2>&1) | cut -d':' -f1 | cut -d'.' -f2)
echo -en "${PS_STR}"
}
The definition above uses some bashisms and hacks to evaluate the users real prompt to a string.
That function needs to be called within perl in the next function:
remPS() {
# store evalPS definition
EVALPS_SOURCE=$(declare -f evalPS)
# initalize prompt
eval "$PROMPT_COMMAND" &> /dev/null
# handle args
( [[ $# -eq 0 ]] && cat - || cat "${1}" ) |
# ridiculous regex
perl -pe 's/^[^\e].*//gs' |
perl -s -0777 -e '`'"$EVALPS_SOURCE"'`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}' -- \
-ps1="$(evalPS "${PS1}")" -ps2="$(evalPS "${PS2}")" \
-ps3="${PS3}" -ps4="${PS4:0:1}" |
perl -pe 's/(.*?)\e\[[A-Z](\a)*/\1/g'
}
The call to perl could be moved to a separate script but either way the issue is I can not find a way to "import" or "source" the remPS() function, within the perl script. I also tried sourcing the function from a separate file definition, into the perl command. Like so:
perl -s -0777 -e '`. /home/anon/Desktop/flyball_labs/scripts/recsesh_lib.sh`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}'
...
Or using the source builtin:
perl -s -0777 -e '`source /home/anon/Desktop/flyball_labs/scripts/recsesh_lib.sh`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}'
...
And for clarity, the final attempt was passing the function declaration into perl like so:
perl -s -0777 -e '`'"$EVALPS_SOURCE"'`; while (<>) { s%(.*?\Q$ps1\E)(?{`eval $PROMPT_COMMAND`})|(.*?\Q$ps2\E)(?{$ps2=`$\(evalPS "${PS2}"\)`})|(\Q$ps3\E)|(^\Q$ps4\E+.*?\b)%%g; } continue {print;}'
...
The Results
With no luck in any of the above cases.. It seems that the . cmd runs whereas the source cmd does not, and the syntax for passing the declaration of the function into perl may not be possible, as shown from the output of my tests:
Sourcing library definitions w/ source cmd
(1)|anon#devbox /tmp|$ remPS "${TEXT_FILE}"
sh: 1: source: not found
...
Sourcing w/ shell . cmd
(1)|anon#devbox /tmp|$ remPS "${TEXT_FILE}"
sh: 1: evalPS: not found
...
Passing declaration to perl
(1)|anon#devbox /tmp|$ remPS "${TEXT_FILE}"
sh: 3: Syntax error: ")" unexpected (expecting "}")
sh: 1: evalPS: not found
...
To summarize
Q) How to "import" and run a user defined shell command within perl?
A) 2 Possible solutions:
source the function from separate file definition
pass into perl command from bash using variable expansion
Sources & Research
Evaluating real bash prompt value:
how-to-print-current-bash-prompt echo-expanded-ps1
Note: I chose this implementation of evalPS() because using the script cmd workaround was unreliable and using call bind_variable() bash function required root privileges (effectively changing user's prompt).
Perl regex embeded code
Note: the function has to be run after every match of $PS2 to re-evaluate the new prompt and effectively match the next iteration (as it would in a real shell session). The use case I have for this is many people have (including myself) set their $PROMPT_COMMAND to iterate an integer indicating which line number (or offset from $PS1) the current line is, and displayed within $PS2.
running a shell command in perl
Sourcing shell code in perl:
how-to-run-source-command-linux-from-a-perl-script can-we-source-a-shell-script-in-perl-script sourcing-a-shell-script-from-a-perl-script
Alternatively if anyone knows how to translate my implementation of evalPS() into perl code, that would work too, but I believe this is impossible because the evaluated string is obtained using a "set hack" which as far as I know is strictly a bashism.
how-can-i-translate-a-shell-script-to-perl
Any suggestions would be much appreciated!
Edit
Some more info on the data being parsed..
The text file looks like the following (cat -A output):
^[]0;anon# - 3^G^[[1m^[[31m(^[[36m1^[[31m)|^[[33manon^[[32m#^[[34mdevbox ^[[35m/tmp^[[31m|^[[36m^[[37m$ ^[(B^[[mecho test^M$
test^M$
^[[1m^[[31m(^[[36m1^[[31m)|^[[33manon^[[32m#^[[34mdevbox ^[[35m/tmp^[[31m|^[[36m^[[37m$ ^[(B^[[mecho \^M$
^[[1m^[[31m[^[[36m2^[[31m]|^[[33m-^[[32m-^[[34m-^[[35m> ^[(B^[[m\^M$
^[[1m^[[31m[^[[36m3^[[31m]|^[[33m-^[[32m-^[[34m-^[[35m> ^[(B^[[m\^M$
^[[1m^[[31m[^[[36m4^[[31m]|^[[33m-^[[32m-^[[34m-^[[35m> ^[(B^[[mtest^M$
test^M$
^[[1m^[[31m(^[[36m1^[[31m)|^[[33manon^[[32m#^[[34mdevbox ^[[35m/tmp^[[31m|^[[36m^[[37m$ ^[(B^[[mexit^M$
exit^M$
Or similarly (less formatted):
ESC]0;anon# - 3^GESC[1mESC[31m(ESC[36m1ESC[31m)|ESC[33manonESC[32m#ESC[34mdevbox ESC[35m/tmpESC[31m|ESC[36mESC[37m$ ESC(BESC[mecho test
test
ESC[1mESC[31m(ESC[36m1ESC[31m)|ESC[33manonESC[32m#ESC[34mdevbox ESC[35m/tmpESC[31m|ESC[36mESC[37m$ ESC(BESC[mecho \
ESC[1mESC[31m[ESC[36m2ESC[31m]|ESC[33m-ESC[32m-ESC[34m-ESC[35m> ESC(BESC[m\
ESC[1mESC[31m[ESC[36m3ESC[31m]|ESC[33m-ESC[32m-ESC[34m-ESC[35m> ESC(BESC[m\
ESC[1mESC[31m[ESC[36m4ESC[31m]|ESC[33m-ESC[32m-ESC[34m-ESC[35m> ESC(BESC[mtest
test
ESC[1mESC[31m(ESC[36m1ESC[31m)|ESC[33manonESC[32m#ESC[34mdevbox ESC[35m/tmpESC[31m|ESC[36mESC[37m$ ESC(BESC[mexit
exit
My $PROMPT_COMMAND and corresponding prompts ($PS1-$PS4) for example:
PROMPT_COMMAND='TERM_LINE_NO=1'
PS1="\[$(tput bold)\]\[$(tput setaf 1)\](\[$(tput setaf 6)\]\${TERM_LINE_NO}\[$(tput setaf 1)\])|\[$(tput setaf 3)\]\u\[$(tput setaf 2)\]#\[$(tput setaf 4)\]\h \[$(tput setaf 5)\]\w\[$(tput setaf 1)\]|\[$(tput setaf 6)\]\$(parse_git_branch)\[$(tput setaf 7)\]\\$ \[$(tput sgr0)\]"
PS2="\[$(tput bold)\]\[$(tput setaf 1)\][\[$(tput setaf 6)\]\$((++TERM_LINE_NO))\[$(tput setaf 1)\]]|\[$(tput setaf 3)\]-\[$(tput setaf 2)\]-\[$(tput setaf 4)\]-\[$(tput setaf 5)\]> \[$(tput sgr0)\]"
PS3=""
PS4="+ "

The answer was to scrap this whole idea and use a better one..
Lets step back first.. Big Picture:
Goal was to make the script program output an executable shell script of the entire recorded session.
Back to Answers..
The above implementation was supposed to remove all prompts and control characters from the output of script (which is the input examples I gave) and then remove the output of each command (i.e. any line that didn't contain control characters).
Passing the evalPS function to perl to execute proved to be quite redundant and getting bash and perl to expand the parameters correctly was a nightmare..
The Final Solution
Scrapped the perl regex idea and used a combination of subshell and history redirection to grab the commands for the entire script session, while it was running.
The entire implementation looks like this:
# log cmds to script file as they are entered (unbuffered)
# spawn script cmd in subshell and wait for it to finish
wait -n
(
history -c
export HISTFILE="${SCRIPT_FILE}"
shopt -s histappend
script -q --timing="${TIME_FILE}" "${REC_FILE}"
history -a
)
...
Simple and much easier to read! :)
Hope this helps anyone trying to make their own mods to script in the future, cheers!

Shell :Select lowercase words from a file,sort them and copy to another file

I want to make a shell script which gets two parameters from command line,the first should be an existing file,another one the new file which will contents the result.From the first file,i want to select the lowercase words and then sort them and copy the result in second file. The grep command is obviously not good,how should i change it to get the result?
#!/bin/bash
file1=$1
file2=$2
if [ ! -f $file1]
then
echo "this file doesn't exist or is not a file
break
else
grep '/[a-z]*/' $file1 | sort > $file2

You can change the grep command like this:
grep -o '\<[[:lower:]][[:lower:]]*\>' "$file1" | sort -u > "$file2"
The -o is an output control switch that forces grep to return each match in a newline.
\< is a left word boundary and \> a right word boundary. (this way the word Site doesn't return ite)
[[:lower:]][[:lower:]]* ensures there's at least one lower case letter.
(The use of [[:lower:]] instead of the range [a-z] is preferable because with some locales, letters may be alphabetically ordered despite of the character case: aBbCcDd...YyZz)
Notice: I added the -u switch to the sort command to remove duplicate entries, if you don't want this behaviour, remove it.

I'm in a hurry so I won't rewrite what I pointed out in a comment, but here is your code with all these problems fixed :
#!/bin/bash
file1=$1
file2=$2
if [ ! -f $file1 ]
then
echo "this file doesn't exist or is not a file"
else
grep '[a-z]*' $file1 | sort > $file2
fi
ShellCheck gives one more tip which you should definitely apply, I'll let you check it out.
It would also be a good practice to exit with a non-zero code when the script can't execute its task, that is in your case when the file isn't found.

Using awk and sort, First the test file:
$ cat file
This is a test.
This is another one.
Code:
$ awk -v RS="[ .\n]+" '/^[[:lower:]]+$/' file | sort
a
another
is
is
one
test
I'm using space, newline and period as record separator to separate each word as its own record and print words that consists of only lower case letters.

Your shell code could use some fixing up.
#!/bin/bash
file1=$1
file2=$2
if [ ! -f "$file1" ] # need space before ]; quote expansions
# send error messages to stderr instead of stdout
# include program and file name in message
printf >&2 '%s: file "%s" does not exist or is not a file\n' "$0" "$file1"
# exit with nonzero code when something goes wrong
exit 1
fi
# -w to get only whole words
# -o to print out each match on a separate line
grep -wo '[a-z][a-z]*' "$file1" | sort > "$file2"
As written that will include multiple copies of the same word if it occurs multiple times in the file; change to sort -u if you don't want that.

Recreate output of tail -n to text files

I had a bunch of bash scripts in a directory that I "backed up" doing $ tail -n +1 -- *.sh
The output of that tail is something like:
==> do_stuff.sh <==
#! /bin/bash
cd ~/my_dir
source ~/my_dir/bin/activate
python scripts/do_stuff.py
==> do_more_stuff.sh <==
#! /bin/bash
cd ~/my_dir
python scripts/do_more_stuff.py
These are all fairly simple scripts with 2-10 lines.
Given the output of that tail, I want to recreate all of the above files with the same content.
That is, I'm looking for a command that can ingest the above text and create do_stuff.sh and do_more_stuff.sh with the appropriate content.
This is more of a one-off task so I don't really need anything robust and I believe there are no big edge cases given files are simple (e.g none of the files actually contain ==> in them).
I started with trying to come up with a matching regex and it will probably look something like this (==>.*\.sh <==)(.*)(==>.*\.sh <==), but I'm stuck into actually getting it to capture filename, content and output to file.
Any ideas?

Presume your backup file is named backup.txt
perl -ne "if (/==> (\S+) <==/){open OUT,'>',$1;next}print OUT $_" backup.txt
Above version is for Windows
fixed version on *nix:
perl -ne 'if (/==> (\S+) <==/){open OUT,">",$1;next}print OUT $_' backup.txt

#!/bin/bash
while read -r line; do
if [[ $line =~ ^==\>[[:space:]](.*)[[:space:]]\<==$ ]]; then
out="${BASH_REMATCH[1]}"
continue
fi
printf "%s\n" "$line" >> "$out"
done < backup.txt
Drawback: extra blank line at the end of every created file except the last one.

Regex to remove lines in file(s) that ending with same or defined letters

i need a bash script for mac osx working in this way:
./script.sh * folder/to/files/
#
# or #
#
./script.sh xx folder/to/files/
This script
read a list of files
open each file and read each lines
if lines ended with the same letters ('*' mode) or with custom letters ('xx') then
remove line and RE-SAVE file
backup original file
My first approach to do this:
#!/bin/bash
# ck init params
if [ $# -le 0 ]
then
echo "Usage: $0 <letters>"
exit 0
fi
# list files in current dir
list=`ls BRUTE*`
for i in $list
do
# prepare regex
case $1 in
"*") REGEXP="^.*(.)\1+$";;
*) REGEXP="^.*[$1]$";;
esac
FILE=$i
# backup file
cp $FILE $FILE.bak
# removing line with same letters
sed -Ee "s/$REGEXP//g" -i '' $FILE
cat $FILE | grep -v "^$"
done
exit 0
But it doesn't work as i want....
What's wrong?
How can i fix this script?
Example:
$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$
If i use '*' i want all files that ended with same letters to be clean.
If i use 'ff' i want all files that ended with 'ff' to be clean.
Ah, it's on Mac OSx. Remember that sed is a little different from classical linux sed.
man sed
sed [-Ealn] command [file ...]
sed [-Ealn] [-e command] [-f command_file] [-i extension] [file
...]
DESCRIPTION
The sed utility reads the specified files, or the standard input
if no files are specified, modifying the input as specified by a list
of commands. The
input is then written to the standard output.
A single command may be specified as the first argument to sed.
Multiple commands may be specified by using the -e or -f options. All
commands are applied
to the input in the order they are specified regardless of their
origin.
The following options are available:
-E Interpret regular expressions as extended (modern)
regular expressions rather than basic regular expressions (BRE's).
The re_format(7) manual page
fully describes both formats.
-a The files listed as parameters for the ``w'' functions
are created (or truncated) before any processing begins, by default.
The -a option causes
sed to delay opening each file until a command containing
the related ``w'' function is applied to a line of input.
-e command
Append the editing commands specified by the command
argument to the list of commands.
-f command_file
Append the editing commands found in the file
command_file to the list of commands. The editing commands should
each be listed on a separate line.
-i extension
Edit files in-place, saving backups with the specified
extension. If a zero-length extension is given, no backup will be
saved. It is not recom-
mended to give a zero-length extension when in-place
editing files, as you risk corruption or partial content in situations
where disk space is
exhausted, etc.
-l Make output line buffered.
-n By default, each line of input is echoed to the standard
output after all of the commands have been applied to it. The -n
option suppresses this
behavior.
The form of a sed command is as follows:
[address[,address]]function[arguments]
Whitespace may be inserted before the first address and the
function portions of the command.
Normally, sed cyclically copies a line of input, not including
its terminating newline character, into a pattern space, (unless there
is something left
after a ``D'' function), applies all of the commands with
addresses that select that pattern space, copies the pattern space to
the standard output, append-
ing a newline, and deletes the pattern space.
Some of the functions use a hold space to save all or part of the
pattern space for subsequent retrieval.
anything else?
it's clear my problem?
thanks.

I don't know bash shell too well so I can't evaluate what the failure is.
This is just an observation of the regex as understood (this may be wrong).
The * mode regex looks ok:
^.*(.)\1+$ that ended with same letters..
But the literal mode might not do what you think.
current: ^.*[$1]$ that ended with 'literal string'
This shouldn't use a character class.
Change it to: ^.*$1$
Realize though the string in $1 (before it goes into the regex) should be escaped
incase there are any regex metacharacters contained within it.
Otherwise, do you intend to have a character class?

perl -ne '
BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
/$re/ && next || print
'
Example:
echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
/$re/ && next || print
' '*'
produces
ab
ac
ad
ee
ef

A possible issue:
When you put * on the command line, the shell replaces it with the name of all the files in your directory. Your $1 will never equal *.
And some tips:
You can replace replace:
This:
# list files in current dir
list=`ls BRUTE*`
for i in $list
With:
for i in BRUTE*
And:
This:
cat $FILE | grep -v "^$"
With:
grep -v "^$" $FILE
Besides the possible issue, I can't see anything jumping out at me. What do you mean clean? Can you give an example of what a file should look like before and after and what the command would look like?

This is the problem!
grep '\(.\)\1[^\r\n]$' *
on MAC OSX, ( ) { }, etc... must be quoted!!!
Solved, thanks.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js