how to make sense of expression logic in ssis - regex

I am working on a SSIS project that involves unzipping a folder which when extracted contains multiple text files in the same directory using a ForEachLoop Container.
each file will have a different Name.
I have two variables of which variable 2 has an expression
Variable 1
name = zipfileName
Value= sample.zip
variable 2
name = FileName
value = *.*
Expression = REPLACE(#[User::ZipFileName],".zip",".txt")
I need clarification concerning the expression part
My thinking is that this expression means the name of the zipfile is replaced with .txt extension when extracted? I also would like to know how it dynamically changes fileNames in runtime seeing as there
are multiple files
thanks

From what I can see, the Expression is replacing .zip for .txt in [User::ZipFileName]
If the value of [User::ZipFileName] is somefile.zip
the output would be:
somefile.txt

Related

Batch rename files in Adobe Bridge

I want to Batch Rename a few thousand files. I have one part of the regular expression figured out just not the first part.
The filename is the employeeID_employeeName-sequence number. I would like to only keep the employeeID_employee name.
For example the file 123456_John Smith-00001.pdf should become 123456_John Smith.pdf
I was able to use Batch Rename, then string substitution then, find [-\d?] but that only changes the filename to _John Smith.pdf
Please see batch rename example image
The expression that worked was by Wiktor Stribiżew in the comment above. In the string substitution field I placed -\d+(.\w+)$ and left the replace with field empty.

Rename multiple files with different names to same name and different numbers

I have multiple pictures of trucks with random messy names and different formats (jpeg, jpg, png etc.) and I want to rename them to "truck1.jpeg", "truck2.jpg", "truck3.png" and so on. How do I do it using the rename command?
It's probably easier to use bash and mv, since AFAIK you need something like bash to generate the number sequence. In bash
i=1
for x in *; do
echo $x '->' truck$i.${x##*.}
mv "$x" truck$i.${x##*.} && i=$((i+1))
done
The for x in * operates on all files whose names do not begin with a dot and are in the current directory. You can adjust the glob to be more exclusive, but this script will need modification if the files are in other directories. Again, probably easier to collect the files in one directory, or maybe put it in a script file and execute it in multiple directories using find ... -exec.
This uses i as a counter to generate the digits. The trick is the ${x##*.} expression which takes the file name and deletes everything up to the final dot. This allows you to preserve and reattach the file extension to the new name. You have to be careful to set i correctly or you will overwrite old truck1 files with new ones.

Is there a way to grab the names of the files that glob parses through?

I am using the glob module to parse through a bunch of text files. Here is the line of that code:
for file in g.glob('*.TXT'):
for col in csv.DictReader(open(file,'rU')):
It works fine but is there a way to grab the names of the files that it iterates through? Im thinking this is not possible since it just looks for any files with the suffix '.TXT'. But I just thought I would ask.
Since glob.glob() returns only a list of matching file names, it's not possible to fetch all the file names considered using just this function. That would be a job for os.listdir().
If you only want to keep track of the matched files, you can store the return value of glob() before iterating over it:
filenames = g.glob('*.TXT')
for filename in filenames:
for col in csv.DictReader(open(filename,'rU')):
...
Also note that I changed the name file to filename, because that's more precise.

pattern matching a filename in R

This is probably real simple, but I can't seem to figure out how to do it.
I have an application in R (Shiny) where a user uploads to the application a *.zip file that contains all the components of an ESRI shapefile. I unpack these files into their own directory. This folder then, may or may not, contain a *.shp.xml file. At some point in my R code, I need to find the exact name of the *.shp file that has been unpacked, and distinguish it from the *.shp.xml file. How do I write the expression that will do that? I was thinking to use list.files, but I am unsure how to write the rest of the expression.
thanks!
With R regex patterns the "$" has special meaning as the end of a character element (and the 'dots' need to be escaped with \\, so
shpfils <- list.files(path, pattern="\\.shp$")
This should isolate your file -
Sys.glob("*shp")
as compared to
Sys.glob("*shp*")
which should give both the files
or
Sys.glob("*shp.xml")
which should give the .shp.xml file

Finding multiple files from different folders using regular expressions

I'm trying to load multiple .txt files in R, from different folders.
I have problems writing the path and pattern using regular expressions.
My path has this structure:
'/Users/folderA/folderB/folderC/folderD/01_01_2012/folderE/file.txt'
So, the path is almost the same, except that the folder with the date name always changes.
I have tried to load it like this:
filesToProcess <- list.files(path = "/Users/folderA/folderB/folderC/folderD/",
pattern = "*_*_*/folderE/*.txt")
But this doesn't seem to work.
Could someone please help me writing down this with regular expressions?
Thanks a lot!
The key here is to use argument recursive=TRUE so that you can search inside the folders that are in the original directory:
filesToProcess <- list.files(path = "/Users/folderA/folderB/folderC/folderD",
pattern = "txt", recursive = TRUE, full.names = TRUE)
The pattern has to correspond to the name of the files, it can't refer to the name of the folders (see ?list.files). That's why you need a second step where you have to narrow down to the specific folders you wanted. Note the use of argument full.names=TRUEin the previous call that allow us to keep the path of each file (NB: you also have to drop the final / of the path argument or else it ends up doubled in our output and leads to an error when you'll try to upload the files).
filesToProcess[grep("folderE", filesToProcess)]
A final note:
Your regular expression was flawed anyway: * means
The preceding item will be matched zero or more times.
What you wanted was .: see ?regexp
The period . matches any single character.
Although the subject refers to regular expressions it seems from the example that you really want to use globs. In that case try:
Sys.glob("/Users/folderA/folderB/folderC/folderD/*_*_*/folderE/*.txt")