Using sed to trim the beginning of stdout - regex

I'm writing a small script to list all the directories being shared in a macos system. Macos has a simple tool called sharing -l that will list all the paths once it's combined with sharing -l | grep path The problem is the output looks like this:
path: /Volumes/Storage A/File Server/
and I need it to look like this instead
/Volumes/Storage\ A/File\ Server/
So the white spaces need to be escaped and the beginning of the line with path: and the white space needs to be trimmed. I'm been messing about with sed for hours now but I just don't know enough about it to do this all in one command. I'm hoping to append something to the end of sharing -l | grep path

You may use this:
sharing -l | sed -En '/^path:/{ s/^path:[[:blank:]]*//; s/[[:blank:]]+/\\&/g; p;}'

Could you please try following.
sharing -l | awk '{$2=$2"\\";$3=$3"\\";sub(/^path: +/,"")} 1'

If you don't need the white spaces escaped:
$ sharing -l | sed -n 's/^path:[[:space:]]*//p'
/Volumes/Storage A/File Server/
and if you do:
$ sharing -l | awk 'sub(/^path:[[:space:]]*/,""){gsub(/[[:space:]]/,"\\\\&"); print}'
/Volumes/Storage\ A/File\ Server/

Related

How to use find command with sed and awk to remove duplicate IP from files

Howdie do,
I'm writing a script that will remove duplicate IP's from two files. For example,
grep -rw "123.234.567" /home/test/ips/
/home/test/ips/codingte:123.234.567
/home/test/ips/codingt2:123.234.567
Ok, so that IP is in two different files and so I need to remove the IP from the second file.
The grep gives me the file path and the IP address. My thinking: store the file path in a variable with awk and then use find to go to that file and use sed to remove the duplicate IP, so I changed my grep statement to:
grep -rw "123.234.567" . | awk -F ':' '{print $1}'
which returns:
./codingte
./codingt2
I originally tried to use the fully pathname in the find command, but that didn't work either
find -name /var/cpanel/dips/codingte -exec sed '/123.234.567/d' {} \;
So, I just did a CD in the directory and changed the find command to:
find -name 'codingt2' -exec sed '/123.234.567/d' {} \;
Which runs, but doesn't delete the IP address:
cat codingt2
123.234.567
Now, I know the issue is with the dots in the IP address. They need to be escaped, but I'm not sure how to do this. I've been reading for hours on escaping the regex, but I'm not sure how to do this with sed
Any help would be appreciated. I'm just trying to learn more about regex and using them with other linux tools such as awk and find.
I haven't written the full script yet. I'm trying to break it into pieces and then bring it together in the script.
So you know what the output should look like:
codingte
123.234.567
codingt2
The second file would just have the IP removed
cat FILE1.txt | while read IP ; do sed -i "/^${IP}$/d" FILE2.txt ; done
The command does the following:
There are two files: FILE1.txt and FILE2.txt
It will remove in FILE2.txt lines (in your case, IP addresses) found in FILE1.txt
You want grep -l which only print the filenames containing a match:
grep -lrw "123.234.567" /home/test/ips/
would print
/home/test/ips/codingte
/home/test/ips/codingt2
So, to skip the first file and work on the rest:
grep -l ... | sed 1d | while IFS= read -r filename; do
whatever with "$filename"
done
I think you're just missing the -i argument to sed to edit the files in place.
echo foo > test
find -name test -exec sed -i 's/foo/bar/' {} \;
seems to do the trick.

Regex grep file contents and invoke command

I have a file that has been generated containing MD5 info along with filenames. I'm wanting to remove the files from the directory they are in. I'm not sure how to go about doing this exactly.
filelist (file) contains:
MD5 (dupe) = 1fb218dfef4c39b4c8fe740f882f351a
MD5 (somefile) = a5c6df9fad5dc4299f6e34e641396d38
my command (which i would like to include with rm) looks like this:
grep -o "\((.*)\)" filelist
returns this:
(dupe)
(somefile)
*almost good, although the parentheses need to be eliminated (not sure how). I tried using grep -Po "(?<=\().*(?=\))" filelist using a lookahead/lookaround, but the command didn't work.
The next thing I would like to do is take the output filenames and delete them from the directory they are in. I'm not sure how to script it, but it would essentially do:
<returned results from grep>
rm dupe $target
rm somefile $target
If I understand correctly, you want to take lines like these
MD5 (dupe) = 1fb218dfef4c39b4c8fe740f882f351a
MD5 (somefile) = a5c6df9fad5dc4299f6e34e641396d38
extract the second column without the parentheses to get the filenames
dupe
somefile
and then delete the files?
Assuming the filenames don't have spaces, try this:
# this is where your duplicate files are.
dupe_directory='/some/path'
# Check that you found the right files:
awk '{print $2}' file-with-md5-lines.txt | tr -d '()' | xargs -I{} ls -l "$dupe_directory/{}"
# Looks ok, delete:
awk '{print $2}' file-with-md5-lines.txt | tr -d '()' | xargs -I{} rm -v "$dupe_directory/{}"
xargs -I{} means to replace the argument (dupe filename) with {} so it can be used in a more complex command.
The tool you're looking for is xargs: http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
It's pretty standard on *nix systems.
UPDATE: Given that target equals the directory where the files live...
I believe the syntax would look something like:
yourgrepcmd | xargs -I{} rm "$target{}"
The -I creates a placeholder string, and each line from your grep command gets inserted there.
UPDATE:
The step you need to remove the parens is a little use of sed's substitution command (http://unixhelp.ed.ac.uk/CGI/man-cgi?sed)
Something like this:
cat filelist | sed "s/MD5 (\([^)]*\)) .*$/\1/" | xargs -I{} rm "$target/{}"
The moral of the story here is, if you learn to utilize sed and xargs (or awk if you want something a little more advanced) you'll be a more capable linux user.

Regular expression required for replacing string in shell script

Can anyone please help me write a shell script in linux which would replace the hostname in a particular file.
eg : I have multiple files which have certain ip addresses.
http://10.160.228.12:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Basically what I would want to replace is the string between "http://" and ":8001" with any required string.
Can someone help me with this please.
Some More info:-
I want to do this iteratively across many folders. So basically it will search all the files in each folder and perform the necessary changes.
You could use sed. Saying:
sed -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
would replace is the string between "http://" and ":8001" with something.
If you want to make the change to the file in-place, use the -i option:
sed -i -r 's|(http://)([^:]*)(:8001)|\1something\3|g' filename
Use sed command from Linux shell
sed -i 's%OldHost%NewHost%g' /yourfolder/yourfile
Tried with "for"
# cat replace.txt
http://10.160.228.12:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://VQAIAAPPDEV:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
# for i in `cat replace.txt | awk -F: '{print $2}' | sed 's/^\/\///g' | sed '/^$/d'` ; do sed -i "s/$i/Your_hostname/" replace.txt ; done
# cat replace.txt
http://Your_hostname:8001/soa- infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
http://Your_hostname:8001/soa-infra/services/default/AIAAsyncErrorHandlingBPELProcess/client?WSDL
Its working for me...!

How to find all files in a Directory with grep and regex?

I have a Directory(Linux/Unix) on a Apache Server with a lot of subdirectory containing lot of files like this:
- Dir
- 2010_01/
- 142_78596_101_322.pdf
- 12_10.pdf
- ...
- 2010_02/
- ...
How can i find all files with filesnames looking like: *_*_*_*.pdf ? where * is always a digit!!
I try to solve it like this:
ls -1Rl 2010-01 | grep -i '\(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$' | wc -l
But the regular expression \(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$ doesn't work with grep.
Edit 1: Trying ls -l 2010-03 | grep -E '(\d+_){3}\d+\.pdf' | wc -l for example just return null. So it's dont work perfectly
Try using find.
The command that satisfies your specification __*_*.pdf where * is always a digit:
find 2010_10/ -regex '__\d+_\d+\.pdf'
You seem to be wanting a sequence of 4 numbers separated by underscores, however, based on the regex that you tried.
(\d+_){3}\d+\.pdf
Or do you want to match all names containing solely numbers/underscores?
[\d_]+\.pdf
First, you should be using egrep vs grep or call grep with -E for extended patterns.
So this works for me:
$ cat test2.txt
- Dir
- 2010_01/
- 142_78596_101_322.pdf
- 12_10.pdf
- ...
- 2010_02/
- ...
Now egrep that file:
cat test2.txt | egrep '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf
Since there are parenthesis around the whole pattern, the entire file name will be captured.
Note that the pattern does NOT work with grep in traditional mode:
$ cat test2.txt | grep '((?:\d+_){3}(?:\d+)\.pdf$)'
... no return
But DOES work if you use the extend pattern switch (the same as calling egrep):
$ cat test2.txt | grep -E '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf
Thanks to gbchaosmaster and the wolf I find a way which work for me:
Into a Directory:
find . | grep -P "(\d+_){3}\d+\.pdf" | wc -l
At the Root Directory:
find 20*/ | grep -P "(\d+_){3}\d+\.pdf" | wc -l

How can I exclude one word with grep?

I need something like:
grep ^"unwanted_word"XXXXXXXX
You can do it using -v (for --invert-match) option of grep as:
grep -v "unwanted_word" file | grep XXXXXXXX
grep -v "unwanted_word" file will filter the lines that have the unwanted_word and grep XXXXXXXX will list only lines with pattern XXXXXXXX.
EDIT:
From your comment it looks like you want to list all lines without the unwanted_word. In that case all you need is:
grep -v 'unwanted_word' file
I understood the question as "How do I match a word but exclude another", for which one solution is two greps in series: First grep finding the wanted "word1", second grep excluding "word2":
grep "word1" | grep -v "word2"
In my case: I need to differentiate between "plot" and "#plot" which grep's "word" option won't do ("#" not being a alphanumerical).
If your grep supports Perl regular expression with -P option you can do (if bash; if tcsh you'll need to escape the !):
grep -P '(?!.*unwanted_word)keyword' file
Demo:
$ cat file
foo1
foo2
foo3
foo4
bar
baz
Let us now list all foo except foo3
$ grep -P '(?!.*foo3)foo' file
foo1
foo2
foo4
$
The right solution is to use grep -v "word" file, with its awk equivalent:
awk '!/word/' file
However, if you happen to have a more complex situation in which you want, say, XXX to appear and YYY not to appear, then awk comes handy instead of piping several greps:
awk '/XXX/ && !/YYY/' file
# ^^^^^ ^^^^^^
# I want it |
# I don't want it
You can even say something more complex. For example: I want those lines containing either XXX or YYY, but not ZZZ:
awk '(/XXX/ || /YYY/) && !/ZZZ/' file
etc.
Invert match using grep -v:
grep -v "unwanted word" file pattern
grep provides '-v' or '--invert-match' option to select non-matching lines.
e.g.
grep -v 'unwanted_pattern' file_name
This will output all the lines from file file_name, which does not have 'unwanted_pattern'.
If you are searching the pattern in multiple files inside a folder, you can use the recursive search option as follows
grep -r 'wanted_pattern' * | grep -v 'unwanted_pattern'
Here grep will try to list all the occurrences of 'wanted_pattern' in all the files from within currently directory and pass it to second grep to filter out the 'unwanted_pattern'.
'|' - pipe will tell shell to connect the standard output of left program (grep -r 'wanted_pattern' *) to standard input of right program (grep -v 'unwanted_pattern').
The -v option will show you all the lines that don't match the pattern.
grep -v ^unwanted_word
I excluded the root ("/") mount point by using grep -vw "^/".
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}'
/
/root/.m2
/root
/var
# cat /tmp/topfsfind.txt| head -4 |awk '{print $NF}' | grep -vw "^/"
/root/.m2
/root
/var
I've a directory with a bunch of files. I want to find all the files that DO NOT contain the string "speedup" so I successfully used the following command:
grep -iL speedup *