I am curious and have not been able to find the answer to this question.
Is there a way to use a wildcard in a filename in Stata when using the insheet command?
For example, I have a .do file that I will use once a month. The name of the file will change every month(will include a date) but will have one part that will stay consistent(mmddyyyy_data.csv / data.csv stays consistent).
I haven't had any luck using insheet using "./*data.csv" Is there another way to achieve this?
Thanks!
I think this will not work because insheet only wants one dataset and the wildcard would allow you to return more than one.
If you're certain that there will only be one file of this type maybe you can use something like this:
local file : dir . files "*data.csv"
insheet using `file'
Related
I have a use case where I want to read the filename from a metadata table, I have written a pipeline function to read the metadata table, but I am not sure how can I pass this information to ReadFromText as it only takes string as input, Is it possible to assign this value to ReadFromText(). Please suggest some workarounds or ideas how to achieve this, Thanks
code: pipeline | 'Read from a File' >> ReadFromText(I want to pass the file path here?,
skip_header_lines=1)
Note: There will be various folders and files in storage, files are in csv format, but in my use case I can't directly pass the storage location or filename to file path in ReadFromText. I want to read it from metadata and pass the value. Hope I am clear, Thanks
I don't understand why you need to read the metadata. If you want to read all the files inside a folder, you can just provide a blob. This solution working in python, not sure about java.
p|readfromtext("./folder/*.csv")
"*" is the blob here, which allows pipeline to read all the patterns matching .csv. You can also add something at the starting.
What you want is textio.ReadAllFromText which reads from a PCollection instead of taking a string directly.
I am trying to append multiple Excel files into a large database by executing the following code:
cls
set more off
clear all
global route = "C:\Users\NICOLE\Desktop\CAR"
cd "$route"
tempfile buildDB
save `buildDB', emptyok
local filenames : dir "$route" files "*.xlsx"
display `filenames'
foreach f of local filenames {
import excel using `"`f'"' ,firstrow allstring clear
gen source = `"`f'"'
append using `buildDB'
save `"`buildDB'"', replace
}
save "C:\Users\NICOLE\Desktop\CAR\DB_EG-RAC.dta" ,replace
Stata manages to append all of the files, but it also displays the following message of error:
file C:\Users\NICOLE.xlsx not found r(601);
And I do not know how to solve it, because it does not let my code run as it should. Thanks!
We have deadlock here. On the face of it the filename in question is not one you write in your code, but could only be part of the result of
local filenames : dir "$route" files "*.xlsx"
But the file named isn't even in the same directory as that named. Moreover, you are adamant that the file doesn't exist and Stata according to your error report can't find it.
The question still remains: how does Stata get asked to open a file that supposedly doesn't exist?
My only guesses are feeble:
Code you are not showing is responsible.
You are running slightly different versions of this script in different places and getting confused. Can you replicate this error that you did get once all over again? Have you searched everywhere remotely possible on the C: drive for this file nicole.xlsx?
It is crucial to realise that we can test nothing here. The problem has not been presented reproducibly.
I'm trying to process a file in a Kettle transformation. The targeted file has a static name, let's say TARGETED.LOG and it's in a subdirectory which contains a date component (variable) in his name. So, the whole path name will be something like:
c:\username\kettleworkspace\report_[DDMMYYYY]\TARGETED.LOG.
Any advice?
Use the Get File Names step with the include subfolders option, and drop the resulting list of files in your Text File Input with the Accept filenames from previous step option.
Of course between these two step you would probably want to add some Filter step.
My directory contains files named as WM_PersonFile_22022018 , WM_PersonFile_23022018, WM_PersonFile_24022018 , WM_PersonFile_25022018 and these files come on a daily basis. I am using tFileList to iterate through the files
What should be my regex in my Filemask to pick up the most recent file? Should the Use Global Expressions as Filemask be unchecked?
I tried "*.txt" which is picking up all the files.
RegEx would help you to filter for the correct files.
Some other logic would get you the newest file. If you use tFileList, you might be able to sort after date and only take the first result.
Alternatively, if you also want to check the date in the filename is correct, you might need to add a little logic with a tMap, tAssert, tJava or tJavaRow.
I want to extract dates from the suffixes of files in a particular folder. The contents of such a folder look something like:
Packed_Folder_1_2016.06.10
Packed_Folder_1_2016.08.06
Packed_Folder_1_2015.09.03
packed_Folder_1_2015.01.08
... (so on and so forth, always in the same path just different suffixes)
There is no pattern to the dates. I need to make a VS form (2013) to read the name of the files and store the date differences.
Notice how the filenames always follow a pattern? It's always Packed_Folder_1_####.##.##, where the last part is a date.
So what you want to do is list the file names in the folder, and try to find a file that matches the pattern. You could use a regular expression to match the filename (it would be something like R"(Packed_Folder_1_\d{4}\.\d{2}\.\d{2})").
You are talking about Forms, so I am assuming you are able to use Visual C++. If that is the case, you can check FileSystemWatcher Class.
You instantiated it with a given path ( file or directory ), and it will trigger events based on some changes on the target (simple change, creation, rename - you can select which one). You could then update your reference, in case its change suits your needs.