how to retrieve filename or extension within bash [duplicate] - regex

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 8 years ago.
i have a script that is pushing out some filesystem data to be uploaded to another system.
it would be very handy if i could tell myself what 'kind' of file each file actually is, because it will help with some querying later on down the road.
so, for example, say that my script is spitting out the following:
/home/myuser/mydata/myfile/data.log
/home/myuser/mydata/myfile/myfile.gz
/home/myuser/mydata/myfile/mod.conf
/home/myuser/mydata/myfile/security
/home/myuser/mydata/myfile/last
in the end, i'd like to see:
/home/myuser/mydata/myfile/data.log log
/home/myuser/mydata/myfile/myfile.gz gz
/home/myuser/mydata/myfile/mod.conf conf
/home/myuser/mydata/myfile/security security
/home/myuser/mydata/myfile/last last
there's gotta be a way to do this with regular expressions and sed, but i can't figure it out.
any suggestions?
EDIT:
i need to get this info via the command line. looking at the answers so far, i obviously have not made this clear. so with the example data i provided, assume that data is all being fed via greps and seds (data is already sterlized). i need to be able to pipe the example data to sed/grep/awk/whatever in order to produce the desired results.

Print last filed that are separated by a none alpha character.
awk -F '[^[:alpha:]]' '{ print $0,$NF }'
/home/myuser/mydata/myfile/data.log log
/home/myuser/mydata/myfile/myfile.gz gz
/home/myuser/mydata/myfile/mod.conf conf
/home/myuser/mydata/myfile/security security
/home/myuser/mydata/myfile/last last

This should work for you:
x='/home/myuser/mydata/myfile/security'
( IFS=[/.] && arr=( $x ) && echo ${arr[#]:(-1):1} )
security
x='/home/myuser/mydata/myfile/data.log'
( IFS=[/.] && arr=( $x ) && echo ${arr[#]:(-1):1} )
log

To extract the last element in a filename path:
filename=$(path##*/}
To extract characters after a dot in a filename:
extension=${filename##*.}
But (my comment) rather than looking at the extension, it might be better to use file. See man file.

As others have already answered, to parse the file names:
extension="${full_file_name##*.}" # BASH and Kornshell/POSIX only
filename=$(basename "$full_file_name")
dirname=$(dirname "$full_file_name")
Quotes are needed if file names could have spaces, tabs, or other strange characters in them.
You can also test whether a file is a directory or file or link with the test command (which is linked to [ so that test -f foo is the same as [ -f foo ].
However, you said: "it would be very handy if i could tell myself what kind of file each file actually is".
In that case, you may want to investigate the file command. This command will return the file type as determined by some sort of magic file (traditionally in /etc/magic), but newer implementations can use the user's own scheme. This can tell file type by extension and by the magic number in the file's header, or by looking at the first few lines in the file (looking for a regular expression ^#! .*/bash$ in the first line.

This extracts the last component after a slash or a dot.
awk -F '[/.]' '{ print $NF }'

Related

How to change the current line in BASH using regex

My task is to change all function names in C source code (definitions, declarations and comments) which have only one underscore and only small letters. For example, my_func(void) ----> myFunc(void). How should I do it? In my code I check if the line of the file contains name of function which I have to change, but I have no idea how to do it. Or may be this task has a more efficient and better solution?
while read line; do
if [[ "$line" =~ ^(int|char|float|long|short|void|double)?[[:space:]]?[^_]([a-z0-9]+[_]?)+[(]([a-z[:space:]])*[)][:space:]*[{]? ]]; then
# here should be the code
fi
done < ${FILENAMES[i]}
I understand that here should be used 'sed' like
sed -i 's/_//' ${FILENAMES[i]}
but this command changes all lines of my file, not only what I want and have to change. Thanks.

How to run list of perl regex from file in terminal

I'm fairly new to the whole coding game, and am very grateful for every answer!
I am working on a directory with many .txt files in them and have a file with looong list of regex like "perl -p -i -e 's/\n\n/\n/g' *.xml" they all work if I copy them to terminal. But is there a possibility to run them straight from the file?
I tried ./unicode.sh but that resulted in:
No such file or directory.
Any ideas?
Thank you so much!
Here's a (mostly) equivalent Perl script to the oneliner perl -p -i -e 's/\n\n/\n/g' *.xml (one main difference being that this has strict and warnings enabled, which is strongly recommended), which you could expand upon by putting more code to modify the current line in the body of the while loop.
#!/usr/bin/env perl
use warnings;
use strict;
if (!#ARGV) { # if no files on command line
#ARGV = glob('*.xml'); # get a default list of files
}
local $^I = ''; # enable inplace editing (like perl -i)
while (<>) { # read each line of each file into $_
s/\n\n/\n/g; # modify $_ with a regex
# more regexes here...
print; # write the line $_ back out
}
You can save this script in a file such as process.pl, and then run it with perl process.pl, or do chmod u+x process.pl and then run it via ./process.pl.
On the other hand, you really shouldn't modify XML files with regular expressions, there are lots of Perl modules to do XML processing - I wrote about that some more here. Also, in the example you showed, s/\n\n/\n/g actually won't have any effect, since when reading files line-by-line, no string will contain two \n's (you can change how Perl reads files, but I don't see any mention of that in the question).
Edit: You've named the script in your example unicode.sh - if you're processing Unicode files, then Perl has very powerful features to help with that, although the code won't necessarily end up as nice and short as I've showed above. You'll have to tell us some more about what you're doing, and show some example input and output, to get suggestions about that. See also e.g. perlunitut.
It's likely if you got no such file or directory, your problem was you forgot to make unicode.sh executable, as in chmod +x unicode.sh, assuming that's a script that you wrote.
Of course the normal way to run multiple perl commands is this thing that looks like runme.pl which you write, i.e., a perl script.
That said, yes, everything will work from the terminal, you just need to be careful about escaping that bash performs.

Remove lines from a file which has a matching regex from another file [duplicate]

This question already has answers here:
How to remove the lines which appear on file B from another file A?
(12 answers)
Closed 7 years ago.
I have this shell script:
AVAIL_REMOVAL=$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt) | sed -i "/$AVAIL_REMOVAL/d" $HOME/dcheck/files/domains.txt
$HOME/dcheck/files/available.txt
unregistereddomain1.com available 15/12/28_14:05:27
unregistereddomain3.com available 15/12/28_14:05:28
$HOME/dcheck/files/domains.txt
unregistereddomain1
registereddomain2
unregistereddomain3
I want to remove unregistereddomain1 and unregistereddomain3 lines from domains.txt. How is it possible?
Also, is there a faster solution than grep? This benchmark showed that grep needed the most time to execute: Deleting lines from one file which are in another file
EDIT:
This works with one line files, but not multiline:
sed -i "/$(grep -oPa '^.*(?=(\.com))' $HOME/dcheck/files/available.txt)/d" $HOME/dcheck/files/domains.txt
EDIT 2:
Just copy here to have a backup. This solution needed for a domain checker bash script which if terminating some reason, at the next restart, it will remove the lines from the input file:
grep -oPa --no-filename '^.*(?=(\.com))' $AVAILABLE $REGISTERED > $GREPINPUT \
&& awk 'FNR==NR { a[$0]; next } !($0 in a)' $GREPINPUT $DOMAINS > $DOMAINSDIFF \
&& cat $DOMAINSDIFF > $DOMAINS \
&& rm -rf $GREPINPUT $DOMAINSDIFF
Most of the domain checker scripts here trying to solve this removel at the end of the script. But what they do not think about what's happening when the script terminated to run and there's no graceful shutdown? Than it will check again every single line from the input file, including the ones that are already checked... This one solves this problem. This way the script (with proper service management, like docker-compose, systemd, supervisord) can run for years from millions of millions size list files, until it will totally eat up the input file!
from man grep:
-f file
--file=file
Obtain patterns from file, one per line. The empty file contains
zero patterns, and therefore matches nothing. (-f is specified by POSIX.)
Regarding the speed: depending on regexp the performance may differ drastically. The one you use seems /suspicious/. The fixed lines matches are the fastest, almost always.

bulk file renaming in bash, to remove name with spaces, leaving trailing digits

Can a bash/shell expert help me in this? Each time I use PDF to split large pdf file (say its name is X.pdf) into separate pages, where each page is one pdf file, it creates files with this pattern
"X 1.pdf"
"X 2.pdf"
"X 3.pdf" etc...
The file name "X" above is the original file name, which can be anything. It then adds one space after the name, then the page number. Page numbers always start from 1 and up to how many pages. There is no option in adobe PDF to change this.
I need to run a shell command to simply remove/strip out all the "X " part, and just leave the digits, like this
1.pdf
2.pdf
3.pdf
....
100.pdf ...etc..
Not being good in pattern matching, not sure what regular expression I need.
I know I need something like
for i in *.pdf; do mv "$i$" ........; done
And it is the ....... part I do not know how to do.
This only needs to run on Linux/Unix system.
Use sed..
for i in *.pdf; do mv "$i" $(sed 's/.*[[:blank:]]//' <<< "$i"); done
And it would be simple through rename
rename 's/.*\s//' *.pdf
You can remove everything up to (including) the last space in the variable with this:
${i##* }
That's "star space" after the double hash, meaning "anything followed by space". ${i#* } would remove up to the first space.
So run this to check:
for i in *.pdf; do echo mv -i -- "$i" "${i##* }" ; done
and remove the echo if it looks good. The -i suggested by Gordon Davisson will prompt you before overwriting, and -- signifies end of options, which prevents things from blowing up if you ever have filenames starting with -.
If you just want to do bulk renaming of files (or directories) and don't mind using external tools, then here's mine: rnm
The command to do what you want would be:
rnm -rs '/.*\s//' *.pdf
.*\s selects the part before (and with) the last white space and replaces it with empty string.
Note:
It doesn't overwrite any existing files (throws warning if it finds an existing file with the target name).
And this operation is failsafe. You can get back the changes made by last rnm command with rnm -u.
Here's a list of documents for rnm.

using awk to match a column in log file and print the entire line

I'm trying to write a script which will analyse a log file,
i want to give the user the option to enter a pattern and then print any line which matches this pattern in a specific column (the fifth one)
the following works from the terminal
awk ' $5=="acpid:" {print$0}' *filename*
ok so above im trying to match "acpid:" this works fine but in the script i want to be able to allow multiple entries and search for them all, the problem is i'm messing up the variable in the script this is what i have:
echo "enter any services you want details on, seperated by spaces"
read -a details
for i in ${details[#]}
do
echo $i
awk '$5 == "${i}" {print $0}' ${FILE}
done
again if i directly put in a matching expression instead of the variable it works so i guess my problem is here any tips would be great
UPDATE
So im using the second option suggested(shown below) by #ghoti as it matches my log file slightly better
however im not having any luck with multiple entries. ive added two lines to illustratre the results im getting these are echo $i and echo "finish loop" as placed they should tell me what input the loop is currently on and that im leaving the loop
'read -a details
re=""
for i in "${details[#]}"; do
re="$re${re:+|}$i"
echo $i
echo"finish loop"
done
awk -v re="$re" '$5 ~ re' "$FILE" `
When i give read an input of either "acpid" or "init" seperately a perfect result is matched, however when the input is "acpid init" the following is the output
acpid init
finish loop
What im seeing from this is that the read is taking the both words as one entry and then the awk is searching but not matching them (as would be expected). so why is the input not being taken as two separate entries i had thought the -a option with read specified that words separated by a space would be placed into separate elements of the array. perhaps i have not declared the array correctly?
Update update
ok cancel the above update like i fool i'd forgotten that id chaged IFS to \n earlier in the script changed it back and bingo !!!
Many thanks again to #ghoti for his help!!
There are a few ways that you could do what you want.
One option might be to run through a for loop for each word, then apply a different call to awk, and show the results sequentially. For example, if you entered foo bar into the $details variable, you might get a list of foo matches, followed by a list of bar matches:
read -a details
for i in "${details[#]}"; do
awk -v s="$i" '$5 == s' "$FILE"
done
The idea here is that we use awk's -v option to get each word into the script, rather than expanding the variable inside the quoted script. You should read about how bash deals with different kinds of quotes. (There are also a few Stackoverflow questions on the topic, here and here and elsewhere.)
Another option might be to construct a regular expression that searches for all the words you're interested in, all at once. This has the benefit of using a single run of awk to search through $FILE:
read -a details
re=""
for i in "${details[#]}"; do
re="$re${re:+|}$i"
done
awk -v re="$re" '$5 ~ re' "$FILE"
The result will contain all the interesting lines from $FILE in the order in which they appear in $FILE, rather than ordered by the words you provided.
Note that this is a fairly rudimentary search, without word boundaries, so if you search for "foo bar babar", you may get results you don't want. You can play with the regex yourself, though. :)
Does that answer your question?