Strict Regular Expression String Comparison in Bash Script - regex

I'm attempting to write a simple Bash script that will help me verify some Windows registry settings. All the data is within folders (which is the computer's hostname), and at the path stated in the script. The end goal is to verify the hosts that have the value incorrectly set, based on registry key value at the end of the line.
For reference of the script, parameter one is the registry key I would like to look for, parameter two is the value it should be. If $control matches what is on the end of $value's string then it should output the machine name which is the variable $FOLDER
list=$(ls)
regkey=$1
control=$2
for FOLDER in $list
do
value=$($FOLDER/policies/Effective-Security-policy.txt | grep "$regkey")
if [[ "$value" =~ $control ]] ;
then
echo $FOLDER
else
continue
fi
done
However, I can't get it to do a strict compare, because there is also a registry key named RestrictAnonymousSAM and it will list out values that are incorrect.
Here are some of the lines from within the text file, I need to be able to differentiate between the two, so the returned values are for that particular registry key:
MACHINE\System\CurrentControlSet\Control\Lsa\RestrictAnonymous=4,0
MACHINE\System\CurrentControlSet\Control\Lsa\RestrictAnonymousSAM=4,1

If I understand correctly what you're trying to do, I think you should write:
regkey="$1"
control="$2"
for FOLDER in * ; do
file="$FOLDER/policies/Effective-Security-policy.txt"
if iconv -s -f utf-16 -t utf-8 "$file" \
| grep -q '^[^=]*\\'"$regkey=$control$"
then
echo "$FOLDER"
fi
done
grep -q searches the file for lines matching the pattern, but does not print them out. This makes it well-suited to use in if-tests, since grep returns 0 (success/true) if it finds a match and 1 (error/false) if it does not.
(Important note: the above assumes that $regkey and $control can't contain any metacharacters that grep might treat specially. If they can, then this becomes trickier.)

Related

Using grep for listing files by owner/read perms

The rest of my bash script works, just having trouble using grep. On each file I am using the following command:
ls -l $filepath | grep "^.r..r..r.*${2}$"
How can I properly use the second argument in the regular expression? What I am trying to do is print the file if it can be read by anyone and the owner is who is passed by the second argument.
Using:
ls -l $filepath | grep "^.r..r..r"
Will print the information successfully based on the read permissions. What I am trying to do is print based on... [read permission][any characters in between][ending with the owner's name]
The immediate problem with your attempt is the final $ which anchors the search to the end of the line, which is the end of the file name, not the owner field. A better solution would replace grep with Awk instead, which has built-in support for examining only specific fields. But actually don't use ls for this, or really in scripts at all.
Unfortuntately, the stat command's options are not entirely portable, but for Linux, try
case $(stat -c %a:%u "$filepath") in
[4-7][4-7][4-7]:"$2") ls -l "$filepath";;
esac
or maybe more portably
find "$filepath" -user "$2" -perm /444 -ls
Sadly, the -perm /444 predicate is not entirely portable, either.
Paradoxically, the de facto most portable replacement for stat to get a file's permissions might actually be
perl -le '#s = stat($ARGV[0]); printf "%03o\n", $s[2]' "$filepath"
The stat call returns a list of fields; if you want the owner, too, the numeric UID is in $s[4] and getpwuid($s[4]) gets the user name.

For the love of BASH, regex, locate & find - contains A not B

Goal: Regex pattern for use with find and locate that "Contains A but not B"
So I have a bash script that manipulates a few video files.
In its current form, I create a variable to act on later with a for loop that works well:
if [ "$USE_FIND" = true ]; then
vid_files=$(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
else
vid_files=$(locate -ir "${DIR}.*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
fi
So "contains A" is any one of the listed extensions.
I'd like to add to a condition where if a certain string (B) is contained the file isn't added to the array (can be a directory or a filename).
I've spent some time with lookaheads trying to implement this to no avail. So an example of "not contains B" as "Robot" - I've used different forms of .*(?!Robot).*
e.g. ".*\(\?\!Robot\).*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" for find but it doesn't work.
I've sort of exhausting regex101.com, terminal and chmod +x at this point and would welcome some help. I think it's the case that's it's called through a bash script causing me the difficulty.
One of my many sources of reference in trying to sort this:
Ref: Is there a regex to match a string that contains A but does not contain B
You may want to avoid the use find inside a process substitution to build a list of files, as, while this is admittedly rare, filenames could contain newlines.
You could use an array, which will handle file names without issues (assuming the array is later expanded properly).
declare -a vid_files=()
while IFS= read -r -d '' file
do
! [[ "$file" =~ Robot ]] || continue
vid_files+=("$file")
done < <(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" -print0)
The -print0 option of find generates a null byte to separate the file names, and the -d '' option of read allows a null byte to be used as a record separator (both obviously go together).
You can get the list of files using "${vid_files[#]}" (double quotes are important to prevent word splitting). You can also iterate over the list easily :
for file in "${vid_files[#]}"
do
echo "$file"
done

Extracting group from regex in shell script using grep

I want to extract the output of a command run through shell script in a variable but I am not able to do it. I am using grep command for the same. Please help me in getting the desired output in a variable.
x=$(pwd)
pw=$(grep '\(.*\)/bin' $x)
echo "extracted is:"
echo $pw
The output of the pwd command is /opt/abc/bin/ and I want only /root/abc part of it. Thanks in advance.
Use dirname to get the path and not the last segment of the path.
You can use:
x=$(pwd)
pw=`dirname $x`
echo $pw
Or simply:
pw=`dirname $(pwd)`
echo $pw
All of what you're doing can be done in a single echo:
echo "${PWD%/*}"
$PWD variable represents current directory and %/* removes last / and part after last /.
For your case it will output: /root/abc
The second (and any subsequent) argument to grep is the name of a file to search, not a string to perform matching against.
Furthermore, grep prints the matching line or (with -o) the matching string, not whatever the parentheses captured. For that, you want a different tool.
Minimally fixing your code would be
x=$(pwd)
pw=$(printf '%s\n' "$x" | sed 's%\(.*\)/bin.*%\1%')
(If you only care about Bash, not other shells, you could do sed ... <<<"$x" without the explicit pipe; the syntax is also somewhat more satisfying.)
But of course, the shell has basic string manipulation functions built in.
pw=${x%/bin*}

batch renaming of files with perl expressions

This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.
If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.
I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?
This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh

Extracting username from UNIX path using Regex

I need to get a username from an Unix path with this format:
/home/users/myusername/project/number/files
I just want "myusername" I've been trying for almost a hour and I'm completely clueless.
Any idea?
Thanks!
Maybe just /home/users/([a-zA-Z0-9_\-]*)/.*?
Note that the critical part [a-zA-Z0-9_\-]* has to contain all valid characters for unix usernames. I took from here, that a username should only contain digits, characters, dashes and underscores.
Also note that the extracted username is not the whole matching, but the first group (indicated by (...)).
The best answer to this depends on what you are trying to achieve. If you want to know the user who owns that file then you can use the stat command, this unfortunately has slightly different syntax dependant on the operating system however the following two commands work
Max OS/X
stat -f '%Su' /home/users/myusername/project/number/files
Redhat/Fedora/Centos
stat -c '%U' /home/users/myusername/project/number/files
If you really do want the string following /home/users then the either of the Regexes provided above will do that, you could use that in a bash script as follows (Mac OS/X)
USERNAME=$(echo '/home/users/myusername/project/number/files' | \
sed -E -e 's!^/home/users/([^/]+)/.*$!\1!g')
Check http://rubular.com/r/84zwJmV62G. The first match, not the entire match, is the username.
in a bourne shell something like :
string="/home/users/STRINGWEWANT/some/subdir/here"
echo $string | awk -F\/ '{print $3}'
would be one option, assuming its always the third element of the path. There are more lightweight that use only the shell builtins :
echo ${x#*users/}
will strip out everything up to and including 'users/'
echo ${y%%/*}
Will strip out the remainder.
So to put it all together :
export path="/home/users/STRINGWEWABT/some/other/dirs"
export y=`echo ${path#*users/}` && echo ${y%%/*}
STRINGWEWABT
Also checkout the bash manpage and search for "Parameter Expansion"
(\/home\/users\/)([^\/]+)
The 2nd capture group (index 1) will be myusername