How to break from groovy 'eachFileMatch()' - regex

I have a working script that lists all pdf files in a directory. It's working as desired but all I need is actually the file name of the first pdf file. Then I want to break the eachFileMatch() as there could be thousands of pdf files in the directory.
I tried to use find from this Break from groovy each closure answer after eachFileMatch().find but didn't work Caught: groovy.lang.MissingMethodException: No signature of method: java.io.File.eachFileMatch() is applicable for argument types: (java.util.regex.Pattern) values: [.*.(?i)pdf]
def directory="c:\\tmp" // place 2 or more pdf files in that
// directory and run the script
def p = ~/.*.(?i)pdf/
new File( directory ).eachFileMatch(p) { pdf ->
println pdf // and break
}
Could anyone give me an idea how to do so?

you can not break out of these each{} methods (exceptions will work, but that would be really dirty). if you check the code for eachFileMatch, you see, that it already reads the whole list() and itereates over it. so one option here is to just use the regular JDK methods and use find to return the first:
// only deal with filenames as string
println new File('/tmp').list().find{it=~/.tmp$/}
// work with `File` objects
println new File('/tmp').listFiles().find{it.isFile() && it=~/.tmp$/}

use
^.*?.(?i)pdf
this will give only the first match

Related

how to search for all constructors in c++ code?

I have to find out all the constructors in my code base (which is huge) , is there any easy way to do it (without opening each file , reading it and finding all classes)? Any language specific feature that I can use in my grep?
To find destructors it is easy , I can search for "~".
I can write some code to find "::" and match right and left words , if they are equal then I can print that line.
But if constructor is inside the class (with in H/HPP file), the above logic is missing.
Since you're thinking of using grep, I'm assuming you want to do it programmaticly, and not in an IDE.
It also depend if you're parsing the header or the code, again I'm assuming you want to parse the header.
I did it using python:
inClass=False
className=""
motifClass=re.compile("class [a-zA-Z][a-zA-Z1-9_]*)")#to get the class name
motifEndClass=re.compile("};")#Not sure that'll work for every file
motifConstructor=re.compile("~?"+className+"\(.*\)")
res=[]
#assuming you already got the file loaded
for line in lines:
if not inClass:#we're searching to be in one
temp=line.match(class)
if temp:
className=res.group(1)
inClass=True
else:
temp=line.match(motifEndClass)
if temp:#doesn't end at the end of the class, since multiple class can be in a file
inClass=False
continue
temp=line.match(motifConstructor)
if temp:
res.append(line)#we're adding the line that matched
#do whatever you want with res here!
I didn't test it,I did it rather quickly, and tried to simplify an old piece of code, so numerous things are not supported, like nested classes.
From that, you can do a script looking for every header in a directory, and use the result how you like !
Search all classes names and then find the function has same name like class name. And second option is that as we know that the constructor is always be public so search word public and find the constructor.

Applescript to extract the Digital Object Identifier (DOI) from a PDF file

I looked for an applescript to extract the DOI from a PDF file, but could not find it. There is enough information available on the actual format of the DOI (i.e. the regular expression), but how could I use this to get the identifier from the PDF file?
(It would be no problem if some external program were used, such as Hazel.)
If you're ok with using an app, I'd recommend Skim. Good AppleScript support. I'd probably structure it like this (especially if the document might be large):
set DOIFound to false
tell application "Skim"
set pp to pages of document 1
repeat with p in pp
set t to text of p
--look for DOI and set DOIFound to true
if DOIFound then exit repeat--if it's not found then use url?
end repeat
end tell
I'm assuming a DOI would always exist on one page (not spread out to between two). Looks like they are invariably (?) on the first page of an article, which would make this quick of course, even with a large doc.
[edit]
Another way would be to get the Xpdf OSX binaries from http://www.foolabs.com/xpdf/download.html and use pdftotext in the command line (just tested this; it works well) and parse the text using AppleScript. If you want to stay in AppleScript, you can do something like:
do shell script "path/to/pdftotext 'path/to/pdf/file.pdf'"
which would output a file in the same directory with a txt file extension -- you parse that for DOI.
Have you tried it with pdfgrep? It works really well in commmandline
pdfgrep -n --max-count 1 --include "*.pdf" "DOI"
i have no idea to build an apple script though, but i would be interested in one also. so that if i drop a pdf into that folder it just automatically extracts the DOI and renames the file with the DOI in the filename.

Sublime Text 2 TM_FILEPATH regex snippet

I'm trying to make CodeIgniter CRUD snippet for Sublime Text 2 and I can't figure out how to write regex snipet, which will return specific part of the TM_FILEPATH variable
I found this one in one of the CodeIgniter snippets:
${TM_FILEPATH/.+((?:application).+)/$1/:application/controllers/${1/(.+)/\l$1.php/}}
If the file location is for example:
/D/Web/MyApp/application/controllers/admin/user.php
This snippet will return:
application/controllers/admin/user.php
What I need is only the part after "controllers" and without extension, in this example:
admin/user
PS: The path after controllers can have various number of directories, it can be user or also admin/something/user.
${TM_FILEPATH/.+(?:controllers\/)(.+)\.\w+/PATH\l$1/}

Using Regex to find function containing a specific method or variable

This is my first post on stackoverflow, so please be gentle with me...
I am still learning regex - mostly because I have finally discovered how useful they can be and this is in part through using Sublime Text 2. So this is Perl regex (I believe)
I have done searching on this and other sites but I am now genuinely stuck. Maybe I am trying to do something that can't be done
I would like to find a regex (pattern) that will let me find the function or method or procedure etc that contains a given variable or method call.
I have tried a number of expressions and they seem to get part of the way but not all the way. Particularly when searching in Javascript I pick up multiple function declarations instead of the one nearest to the call/variable that I am looking for.
for example:
I am looking for the function that calls the method save data()
I have learnt, from this excellent site that I can use (?s) to switch . to include newlines
function.*(?=(?s).*?savedata\(\))
however, that will find the first instance of the word function and then all the text unto and including savedata()
if there are multiple procedures then it will start at the next function and repeat until it gets to savedata() again
function(?s).*?savedata\(\) does something similar
I have tried asking it to ignore the second function (I believe) by using something like:
function(?s).*?(?:(?!function).*?)*savedata\(\)
But that doesn't work.
I have done some investigation with look forwards and look backwards but either I am doing it wrong (highly possible) or they are not the right thing.
In summary (I guess), how do I go backwards, from a given word to the nearest occurrence of a different word.
At the moment I am using this to search through some javascript files to try and understand the structure/calls etc but ultimately I am hoping to use on c# files and some vb.net files
Many thanks in advance
Thanks for the swift responses and sorry for not added an example block of code - which I will do now (modified but still sufficient to show the issue)
if I have a simple block of javascript like the following:
function a_CellClickHandler(gridName, cellId, button){
var stuffhappenshere;
var and here;
if(something or other){
if (anothertest) {
event.returnValue=false;
event.cancelBubble=true;
return true;
}
else{
event.returnValue=false;
event.cancelBubble=true;
return true;
}
}
}
function a_DblClickHandler(gridName, cellId){
var userRow = rowfromsomewhere;
var userCell = cellfromsomewhereelse;
//this will need to save the local data before allowing any inserts to ensure that they are inserted in the correct place
if (checkforarangeofthings){
if (differenttest) {
InsSeqNum = insertnumbervalue;
InsRowID = arow.getValue()
blnWasInsert = true;
blnWasDoubleClick = true;
SaveData();
}
}
}
running the regex against this - including the second one that was identified as should be working Sublime Text 2 will select everything from the first function through to SaveData()
I would like to be able to get to just the dblClickHandler in this case - not both.
Hopefully this code snippet will add some clarity and sorry for not posting originally as I hoped a standard code file would suffice.
This regex will find every Javascript function containing the SaveData method:
(?<=[\r\n])([\t ]*+)function[^\r\n]*+[\r\n]++(?:(?!\1\})[^\r\n]*+[\r\n]++)*?[^\r\n]*?\bSaveData\(\)
It will match all the lines in the function up to, and including, the first line containing the SaveData method.
Caveat:
The source code must have well-formed indentation for this to work, as the regex uses matching indentations to detect the end of functions.
Will not match a function if it starts on the first line of the file.
Explanation:
(?<=[\r\n]) Start at the beginning of a line
([\t ]*+) Capture the indentation of that line in Capture Group 1
function[^\r\n]*+[\r\n]++ Match the rest of the declaration line of the function
(?:(?!\1\})[^\r\n]*+[\r\n]++)*? Match more lines (lazily) which are not the last line of the function, until:
[^\r\n]*?\bSaveData\(\) Match the first line of the function containing the SaveData method call
Note: The *+ and ++ are possessive quantifiers, only used to speed up execution.
EDIT:
Fixed two minor problems with the regex.
EDIT:
Fixed another minor problem with the regex.

Finding multiple files from different folders using regular expressions

I'm trying to load multiple .txt files in R, from different folders.
I have problems writing the path and pattern using regular expressions.
My path has this structure:
'/Users/folderA/folderB/folderC/folderD/01_01_2012/folderE/file.txt'
So, the path is almost the same, except that the folder with the date name always changes.
I have tried to load it like this:
filesToProcess <- list.files(path = "/Users/folderA/folderB/folderC/folderD/",
pattern = "*_*_*/folderE/*.txt")
But this doesn't seem to work.
Could someone please help me writing down this with regular expressions?
Thanks a lot!
The key here is to use argument recursive=TRUE so that you can search inside the folders that are in the original directory:
filesToProcess <- list.files(path = "/Users/folderA/folderB/folderC/folderD",
pattern = "txt", recursive = TRUE, full.names = TRUE)
The pattern has to correspond to the name of the files, it can't refer to the name of the folders (see ?list.files). That's why you need a second step where you have to narrow down to the specific folders you wanted. Note the use of argument full.names=TRUEin the previous call that allow us to keep the path of each file (NB: you also have to drop the final / of the path argument or else it ends up doubled in our output and leads to an error when you'll try to upload the files).
filesToProcess[grep("folderE", filesToProcess)]
A final note:
Your regular expression was flawed anyway: * means
The preceding item will be matched zero or more times.
What you wanted was .: see ?regexp
The period . matches any single character.
Although the subject refers to regular expressions it seems from the example that you really want to use globs. In that case try:
Sys.glob("/Users/folderA/folderB/folderC/folderD/*_*_*/folderE/*.txt")