This question already has answers here:
Brace expansion with a Bash variable - {0..$foo}
(5 answers)
Closed 3 years ago.
If I run this command in a shell with static values (i.e., 27..97), it works fine and gives the desired output
>ls -d $WORKDIR/batch/somefilename_{27..97}.* 2>/dev/null
somefilename_27.sometxt
somefilename_28.sometxt
somefilename_29.sometxt
..
somefilename_97.sometxt
But if I want to include variables or pass arguments to regular expression then I get the error
ls: cannot access /home/work/batch/somefilename_{27..96}.*: No such file or directory
But that’s not true because the file is present but somehow with variables the regex is not working.
>segStart="27"
>segEnd="96"
>myvar="$segStart..$segEnd"
>echo $segStart
27
>echo $segEnd
96
>echo $myvar
27..96
ls -d $WORKDIR/batch/somefilename_{$myvar}.*
"ls: cannot access /home/work/batch/somefilename_{27..96}.*: No such file or directory"
>array=($(ls -d $WORKDIR/batch/somefilename_$myvar.* 2>/dev/null))
>len=${#array[*]}
>echo $len
0
Note: I am trying to store in an array the directory names where the name is between two integers: e.g., there are 1-100 directories available with name file_1.some, file_2.some, file_3.some, … file_100.some.
If a user wants to get directories from 47 till 97, then I want to read those numbers, store them, and pass them in above ls command.
If you have any other alternative that will also help.
Brace expansion and globbing are very different from regular expressions
Brace expansions happen at defined places in the shell grammar (not in quotes or after parameter expansion; see man bash)
Please don’t use ls to get a list of files in an array: the following should work just fine
arr=(path{1..10}*)
Or, arr=(path*) and then you can filter by testing with [[ the ones you want to get rid of.
You can use seq command to accomplish this:
ls -d $WORKDIR/batch/somefilename_$(seq $segStart $segEnd).*
If you start from 1 for example and want to have it with leading zero (01) you can use formated output of seq like this:
seq -f "%02.0f" $segStart $segEnd"
Related
There is a string located within a file that starts with 4bceb and is 32 characters long.
To find it I tried the following
Input:
find / -type f 2>/dev/null | xargs grep "4bceb\w{27}" 2>/dev/null
after entering the command it seems like the script is awaiting some additional command.
Your command seems alright in principle, i.e. it should correctly execute the grep command for each file find returns. However, I don't believe your regular expression (respectively the way you call grep) is correct for what you want to achieve.
First, in order to get your expression to work, you need to tell grep that you are using Perl syntax by specifying the -P flag.
Second, your regexp will return the full lines that contain sequences starting with "4bceb" that are at least 32 characters long, but may be longer as well. If, for example your ./test.txt file contents were
4bcebUUUUUUUUUUUUUUUUUUUUUUUU31
4bcebVVVVVVVVVVVVVVVVVVVVVVVVV32
4bcebWWWWWWWWWWWWWWWWWWWWWWWWWW33
sometext4bcebYYYYYYYYYYYYYYYYYYYYYYYYY32somemoretext
othertext 4bcebZZZZZZZZZZZZZZZZZZZZZZZZZ32 evenmoretext
your output would include all lines except the first one (in which the sequence is shorter than 32 characters). If you actually want to limit your results to lines that just contain sequences that are exactly 32 characters long, you can use the -w flag (for word-regexp) with grep, which would only return lines 2 and 5 in the above example.
Third, if you only want the match but not the surrounding text, the grep flag -o will do exactly this.
And finally, you don't need to pipe the find output into xargs, as grep can directly do what you want:
grep -rnPow / -e "4bceb\w{27}"
will recursively (-r) scan all files starting from / and return just the ones that contain matching words, along with the matches (as well as the line numbers they were found in, as result of the flag -n):
./test.txt:2:4bcebVVVVVVVVVVVVVVVVVVVVVVVVV32
./test.txt:5:4bcebZZZZZZZZZZZZZZZZZZZZZZZZZ32
I am trying to use csplit in BASH to separate a file by years in the 1500-1600's as delimiters.
When I do the command
csplit Shakespeare.txt '/1[56]../' '{36}'
it almost works, except for at least two issues:
This outputs 38 files, not 36, numbered xx00 through xx37. (Also xx00 is completely blank.) I don't understand how this is possible.
One of the files (why, it seems, that csplit returns 37 non-empty files instead of the 36 non-empty files I expected) doesn't begin with 15XX or 16XX -- it begins with "ACT 4 SCENE 15\n" (where \n is supposed to denote a newline or line break). I don't understand how csplit can match a new line/line break with a number.
When I do the command (which is what I want)
csplit Shakespeare.txt '/1[56][0-9][0-9]/' '{36}'
the terminal returns the error: csplit: 1[56][0-9][0-9]: no match plus listing all of the numbers it lists when the above is executed.
This especially doesn't make sense to me, since grep says otherwise:
grep -c "1[56][0-9][0-9]" Shakespeare.txt
36
grep -c "1[56].." Shakespeare.txt
36
Note: man csplit indicates that I have the BSD version from January 26, 2005. man grep indicates that I have the BSD version from July 28, 2010.
Based on the answer given here by user 'DRL' on 06-20-2008, I decided to try adding the -k option to csplit.
csplit -k Shakespeare.txt '/^1[56][0-9][0-9]/' '{36}'
This returned an error: csplit: ^1[56][0-9][0-9]: no match
However, it still gave (more or less) the desired output: files xx00.txt through xx36.txt (not xx37.txt), and each of the non-empty files, xx01.txt-xx36.txt had the expected/desired content. (In particular, no file began with "ACT 4 SCENE 15".
The man page for csplit says the following about the -k flag:
-k Do not remove output files if an error occurs or a HUP, INT or TERM signal is received.
Honestly I don't quite understand what this means, but I still have the following conjecture about why this solution worked/works:
Conjecture: csplit expects the beginning of the file to match the regex. Thus, since the beginning line of the file did not match ^1[56][0-9][0-9], it threw a tantrum and quit without the -k flag.
Nevertheless, I still don't understand why 1[56][0-9][0-9] did not work, maybe the same reason. And I definitely don't understand why 1[56].. did not work (i.e. why csplit produced a 37th file not beginning with the pattern).
This seems like a relatively basic question, but I can't find it anywhere after an hour of searching. Many (there are a lot!) of the similar questions do not seem to hit the point.
I am writing a script ("vims") to use vim in a sed-like mode (so I can call normal vim commands on a stream input without actually opening vim), so I need to pass each argument to vim with a "-c" flag prepended to it. There are also many characters that need to be escaped (I need to pass regex expressions), so some of the usual methods on SO do not work.
Basically, when I write:
cat myfile.txt | vims ':%g/foo/exe "norm yyPImyfile: \<esc>\$dF,"' ':3p'
which are two command-line vim arguments to run on stdout,
I need these two single-quoted arguments to be passed exactly the way they are to my function vims(), which then tags each of them with a -c flag, so they are interpreted as commands in vim.
Here's what I've tried so far:
vims() {
vim - -nes -u NONE -c '$1' -c ':q!' | tail -n +2
}
This seems to work perfectly for a single command. No characters get escaped, and the "-c" flag is there.
Then, using the oft-duplicated question-answer, the "$#" trick, I tried:
vims() {
vim - -nes -u NONE $(for arg in "$#"; do echo -n " -c $arg "; done) -c ':q!' | tail -n +2
}
This seems to break the spaces within each string I pass it, so does not work. I also tried a few variations of the printf command, as suggested in other questions, but this has weird interactions with the vim command sequences. I've tried many other different backslash-quote-combinations in a perpetual edit-test loop, but have always found a quirk in my method.
What is the command sequence I am missing?
Add all the arguments to an array one at a time, then pass the entire array to vim with proper quoting to ensure whitespace is correctly preserved.
vims() {
local args=()
while (($# > 0)); do
args+=(-c "$1")
shift
done
vim - -nes -u NONE "${args[#]}" -c ':q!' | tail -n +2
}
As a rule of thumb, if you find yourself trying to escape things, add backslashes, use printf, etc., you are likely going down the wrong path. Careful use of quoting and arrays will cover most scenarios.
This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 8 years ago.
i have a script that is pushing out some filesystem data to be uploaded to another system.
it would be very handy if i could tell myself what 'kind' of file each file actually is, because it will help with some querying later on down the road.
so, for example, say that my script is spitting out the following:
/home/myuser/mydata/myfile/data.log
/home/myuser/mydata/myfile/myfile.gz
/home/myuser/mydata/myfile/mod.conf
/home/myuser/mydata/myfile/security
/home/myuser/mydata/myfile/last
in the end, i'd like to see:
/home/myuser/mydata/myfile/data.log log
/home/myuser/mydata/myfile/myfile.gz gz
/home/myuser/mydata/myfile/mod.conf conf
/home/myuser/mydata/myfile/security security
/home/myuser/mydata/myfile/last last
there's gotta be a way to do this with regular expressions and sed, but i can't figure it out.
any suggestions?
EDIT:
i need to get this info via the command line. looking at the answers so far, i obviously have not made this clear. so with the example data i provided, assume that data is all being fed via greps and seds (data is already sterlized). i need to be able to pipe the example data to sed/grep/awk/whatever in order to produce the desired results.
Print last filed that are separated by a none alpha character.
awk -F '[^[:alpha:]]' '{ print $0,$NF }'
/home/myuser/mydata/myfile/data.log log
/home/myuser/mydata/myfile/myfile.gz gz
/home/myuser/mydata/myfile/mod.conf conf
/home/myuser/mydata/myfile/security security
/home/myuser/mydata/myfile/last last
This should work for you:
x='/home/myuser/mydata/myfile/security'
( IFS=[/.] && arr=( $x ) && echo ${arr[#]:(-1):1} )
security
x='/home/myuser/mydata/myfile/data.log'
( IFS=[/.] && arr=( $x ) && echo ${arr[#]:(-1):1} )
log
To extract the last element in a filename path:
filename=$(path##*/}
To extract characters after a dot in a filename:
extension=${filename##*.}
But (my comment) rather than looking at the extension, it might be better to use file. See man file.
As others have already answered, to parse the file names:
extension="${full_file_name##*.}" # BASH and Kornshell/POSIX only
filename=$(basename "$full_file_name")
dirname=$(dirname "$full_file_name")
Quotes are needed if file names could have spaces, tabs, or other strange characters in them.
You can also test whether a file is a directory or file or link with the test command (which is linked to [ so that test -f foo is the same as [ -f foo ].
However, you said: "it would be very handy if i could tell myself what kind of file each file actually is".
In that case, you may want to investigate the file command. This command will return the file type as determined by some sort of magic file (traditionally in /etc/magic), but newer implementations can use the user's own scheme. This can tell file type by extension and by the magic number in the file's header, or by looking at the first few lines in the file (looking for a regular expression ^#! .*/bash$ in the first line.
This extracts the last component after a slash or a dot.
awk -F '[/.]' '{ print $NF }'
I'm writing a program, foo, in C++. It's typically invoked on the command line like this:
foo *.txt
My main() receives the arguments in the normal way. On many systems, argv[1] is literally *.txt, and I have to call system routines to do the wildcard expansion. On Unix systems, however, the shell expands the wildcard before invoking my program, and all of the matching filenames will be in argv.
Suppose I wanted to add a switch to foo that causes it to recurse into subdirectories.
foo -a *.txt
would process all text files in the current directory and all of its subdirectories.
I don't see how this is done, since, by the time my program gets a chance to see the -a, then shell has already done the expansion and the user's *.txt input is lost. Yet there are common Unix programs that work this way. How do they do it?
In Unix land, how can I control the wildcard expansion?
(Recursing through subdirectories is just one example. Ideally, I'm trying to understand the general solution to controlling the wildcard expansion.)
You program has no influence over the shell's command line expansion. Which program will be called is determined after all the expansion is done, so it's already too late to change anything about the expansion programmatically.
The user calling your program, on the other hand, has the possibility to create whatever command line he likes. Shells allow you to easily prevent wildcard expansion, usually by putting the argument in single quotes:
program -a '*.txt'
If your program is called like that it will receive two parameters -a and *.txt.
On Unix, you should just leave it to the user to manually prevent wildcard expansion if it is not desired.
As the other answers said, the shell does the wildcard expansion - and you stop it from doing so by enclosing arguments in quotes.
Note that options -R and -r are usually used to indicate recursive - see cp, ls, etc for examples.
Assuming you organize things appropriately so that wildcards are passed to your program as wildcards and you want to do recursion, then POSIX provides routines to help:
nftw - file tree walk (recursive access).
fnmatch, glob, wordexp - to do filename matching and expansion
There is also ftw, which is very similar to nftw but it is marked 'obsolescent' so new code should not use it.
Adrian asked:
But I can say ls -R *.txt without single quotes and get a recursive listing. How does that work?
To adapt the question to a convenient location on my computer, let's review:
$ ls -F | grep '^m'
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte/
$ ls -R1 m*
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte:
multithread.ec
multithread.ec.original
multithread2.ec
$
So, I have a sub-directory 'mte' that contains three files. And I have six files with names that start 'm'.
When I type 'ls -R1 m*', the shell notes the metacharacter '*' and uses its equivalent of glob() or wordexp() to expand that into the list of names:
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte
Then the shell arranges to run '/bin/ls' with 9 arguments (program name, option -R1, plus 7 file names and terminating null pointer).
The ls command notes the options (recursive and single-column output), and gets to work.
The first 6 names (as it happens) are simple files, so there is nothing recursive to do.
The last name is a directory, so ls prints its name and its contents, invoking its equivalent of nftw() to do the job.
At this point, it is done.
This uncontrived example doesn't show what happens when there are multiple directories, and so the description above over-simplifies the processing.
Specifically, ls processes the non-directory names first, and then processes the directory names in alphabetic order (by default), and does a depth-first scan of each directory.
foo -a '*.txt'
Part of the shell's job (on Unix) is to expand command line wildcard arguments. You prevent this with quotes.
Also, on Unix systems, the "find" command does what you want:
find . -name '*.txt'
will list all files recursively from the current directory down.
Thus, you could do
foo `find . -name '*.txt'`
I wanted to point out another way to turn off wildcard expansion. You can tell your shell to stop expanding wildcards with the the noglob option.
With bash use set -o noglob:
> touch a b c
> echo *
a b c
> set -o noglob
> echo *
*
And with csh, use set noglob:
> echo *
a b c
> set noglob
> echo *
*