List only files but not directories using list.files - regex

How can I list only files, but not directories using list.files (not recursively)? It has an include.dirs argument, but this is ignored when not being used recursively.
I had been thinking something like
list.files(path=myDir, pattern="[^/]$")
but that doesn't seem to work, or a few variations on it. Is there a regex that I can plug in here or a function. I know I can do list.dirs and take a setdiff, but this is already slow enough, I want this to be quicker.
PS: currently on linux, but need something that works cross-platform.
PPS: file.info is really slow, so I think that is also not going to work.
PPPS: It doesn't need to be list.files, that is just the function I had thought should do it.

Consider this regex pattern that matches any file containing letters or numbers and contains the dot extension (to leave out subdirectories but unfortunately files without extensions):
# WITH ANCHORING
files <- list.files(path, pattern=("[a-zA-Z0-9]*[.][a-zA-Z0-9]*$"))
# MATCHING LETTER AND/OR NUMBER FILES WITH EXTENSION
files = list.files(myDir, pattern=("[a-zA-Z0-9]*[.]"))
# WILDCARD FILE MATCHING WITH EXTENSION
files = list.files(myDir, pattern=("*[.]"))
Some other regex variations to catch files with periods (note these also get directories with periods and miss files with no extensions)
list.files(pattern="\\..+$")
list.files(pattern="\\.[[:alnum:]]+$")
And using system2 with ls seems to work pretty well (thanks #42- as well from comments),
system2("ls", args=c("-al", "|", "grep", "^-"))
should get only regular files (including ones without extensions), or
system2("ls", args=c("--classify"))
should return files with directories having a "/" appended so they can be determined.
For an alternative open-source solution, consider the Python solution that allows you to condition if item is a directory and using os.path.join() is agnostic to any OS platform.
import os
files = [f for f in os.listdir(myDir) if os.path.isfile(os.path.join(myDir, f))]

Related

Do not include certain source files

I have a folder containing all the log files, the filenames are colour-red, colour-green, colour-blue, colour-yellow, etc. I am writing the spl to include all the files except one, e.g. colour-white.
I know the * performs the wildcard search, and [^c] excludes specific character in the bracket. But I don't know how to combine them to exclude a certain word. On the other hand, I am not sure the same regrex rule apply for splunk.
source= "log/colour-*"
source= "log/colour-[^w]"
The desired result of the query is to retrieve all the files, expect colour-white.
Maybe some filters can be applied to retrieve the desired result, but so far the filters I know are for the file contents, not the file names.
You can also use something like this in your search query,
source!="log/colour-white"
And you can also check the difference between != and NOT at below link to get a more clear info on what to use.
Splunk Answers
The search command (the implicit command before the first |) does not support regex. To exclude something, use NOT.
(source = "log/colour-*" NOT source = "log/colour-w*")

give sudo permission to log files on different paths like /a/b1/c.log and /a/b2/d.log etc. files

I need a nice column for Centrify tool which include all the log files under the different folders, for example;
/oradata1/oracle/admin/A/scripts/rman_logs/*.log
/oracle/oracle/admin/B/scripts/rman_logs/*.log
/oradata2/admin/C/scripts/logs/*.log
I used this but after the * character user can see all logs;
/ora(data(1|2)|cle)/oracle|admin/admin/*/scripts/rman_logs
/ora(data(1|2)|cle)/oracle|admin/admin/*/scripts/rman_logs
Which expression I must use.
If I understandy our question correctly, you want only .log files. You can use a positive lookahead to assert that it is indeed a log file (contains .log at the end of filename), and match the filename whatever it is (.*).
Then it's really easy. (?=.*\.log(?:$|\s)).* Of course, you can also add specific folders if you wish to restrict the matches, but the positive lookahead will still do its work. I.e. (?=.*\.log(?:$|\s)).*/scripts/.*
EDIT: As your comment, you only need those folders, so you just specify their filepaths in alternations and add [^.\s\/]*\.log at the end. So:
(?:\/oradata1\/oracle\/admin\/A\/scripts\/rman_logs\/|\/oracle\/oracle\/admin\/B\/scripts\/rman_logs\/|\/oradata2\/admin\/C\/scripts\/logs\/)[^\s.\/]*\.log You may shorten the regex by trying to combine filepath elements, but, imo, not necessary as you might as well specify each filepath individually, if they don't overlap too much.
I have found a global expression.
this is not a good way but it works and save me from lots of job. The main files are under the ....../scripts/rman_logs/ for all servers so I use this way.
I can produce these lines and can be a command group for users so this works good
tail /////scripts/rman_logs/*.log
tail ////scripts/rman_logs/.log
Thanks for your helps.

How to search files in windows file explorer with specified extension name?

We can search files in windows 7 or higher version using the following tool:
(I don't have image uploading privilage. I mean the top-right area in windows file explorer.)
When I search for MATLAB files using "*.m", it not only returns *.m files, but also returns *.mp3, *.mp4 files. Is there any way to show *.m files exclusively?
Thanks!
I assume you used the quotation marks here to show the text you typed, because ironically the exact way how it should work is to put the search in quotation marks...
so
*.m
finds .mp3 as well as .m but
"*.m"
should only find the .m files. Alternatively you could also write
ext:".m"
which would guarantee that only extensions are searched. (Although I am not sure if this is ever necessary here, because while windows can have a dot in the filename and also can have files without extensions I am not sure if it is possible to have both at the same time.)
using the following
"*.m"
will solve your problem.You can find more information on regex to be used in msdn in the following link .Advanced query syntax
Above that, you can also take advantage of the wildcard character *.
For example, if you want to search for a file with a name ending with 024 or starting with 024 then you can put in the search box like *024.* or 024*.* respectively.
Here the * after . represents files with any extensions, if you want particular then mention extension line 024.png.
Explorer don't have a function of finding with RegEx.
You need to use Power-Shell instead of Win Explorer;
for example: where '(?i)Out' is a regex
Get-ChildItem -Path e:\temp -Recurse -File | Where-Object { $_.Name -match '(?i)Out' }
alternatively you can just simply search for your extension like this:
.extension
eg:
typing .exe will give you all the files with .exe extensions in a folder.
PS: Typing .xml OR .vmcx will give you both type of files. It is useful if you seek to make an archive of different kinds of files stored in different folders or locations.
You can get close to proper regex support from the mostly awesome Cygwin, and as a bonus you get most every linux tool running natively on linux. But it still doesnn't know that .* means "zero or more of anything", ^ means the start of a line (and $ the end), so some things are still weird.
And a startlingly large bunch of weird corner cases that only deranged perl programmers notice fail the test.
So many other things it gets wrong, but it's more workable than anything in any windows OS, plus you get perl, grep, diff, wget, curl, etc. -- the whole GNU lib for free.
If you want a full on bash shell with proper respect for regex, install the super neet-o Bash for Windows 10
Either will do what you want. And they're a billion times faster than that stupid search bar that takes off at 100 mph then crawls to 1 pixel per 10 minutes near the end.

pattern matching a filename in R

This is probably real simple, but I can't seem to figure out how to do it.
I have an application in R (Shiny) where a user uploads to the application a *.zip file that contains all the components of an ESRI shapefile. I unpack these files into their own directory. This folder then, may or may not, contain a *.shp.xml file. At some point in my R code, I need to find the exact name of the *.shp file that has been unpacked, and distinguish it from the *.shp.xml file. How do I write the expression that will do that? I was thinking to use list.files, but I am unsure how to write the rest of the expression.
thanks!
With R regex patterns the "$" has special meaning as the end of a character element (and the 'dots' need to be escaped with \\, so
shpfils <- list.files(path, pattern="\\.shp$")
This should isolate your file -
Sys.glob("*shp")
as compared to
Sys.glob("*shp*")
which should give both the files
or
Sys.glob("*shp.xml")
which should give the .shp.xml file

white space free path to My Documents

In building a C++ project with the GNU tool chain, make tells me ~
src/Adapter_FS5HyDE.d:1: *** multiple target patterns. Stop.
Search, search, search, and I found out that make thinks that it has multiple targets because the path to my included headers has spaces in it. If you've got your headers stored in some sane place like C:\Program Files then you can take care of this by using the old DOS paths (e.g. C:\PROGRA~1). However, when you have your headers in a truly insane place like My Documents you can get around the problem with MY DOC~1 because there's still a space.
Any idea how to tell my compiler to look in My Documents for headers without make confusing the path as two objects?
(Note: Feel free to throw tomatoes at me for putting header files in My Documents if you'd like, but there is a little rationale for doing that which I don't feel like explaining. If the solution to this question is easy, I'd rather keep it the way it is.)
You can figure out what the old path is by doing a DIR /X in your command prompt.
Or, most of the time you can fake it with the first 6 characters - spaces + ~1 + extension (8.3 paths won't have spaces).
Or, you can use quotes: "C:\Documents and Settings\Administrator\My Documents".
I don't know about make specficially, but the normal way around this is to put quotes around the path i.e.
cd "C:\Program Files\"
does that work?
Side note: the short name (8.3) for the same folder might not be the same on different OS installations. Thus, you can't be sure that C:\Program Files will always be C:\PROGRA~1.
Short names can't contain spaces in them either, so the usual short name for My Documents is MYDOCU~1, not MY DOC~1.
You can find the exact short name for any folder or file (including My Documents) using dir /x <filename>.
If you are using the GNU toolchain from Windows command line (cmd.exe), you should be able to use quotes (") around the folder/file names to work around this problem.
For some folders, including My Documents, you can specify an alternative location. To do this, right-click the folder, select Properties, select Location tab, and away you go. I use this to put my downloads and music on another drive (D:).
Write a wrapper script (e.g. batchfile) to translate the path names to short form.
I have a script "runwin" that does stuff like this - instead of, e.g. gcc <args> I can call runwin gcc <args>;
runwin will make heuristic guesses as to which arguments are filename paths and translate them, then call gcc on the resulting string of arguments.