Is there a way to solve Stata's r(601) while looping the append for imported excel files? - stata

I am trying to append multiple Excel files into a large database by executing the following code:
cls
set more off
clear all
global route = "C:\Users\NICOLE\Desktop\CAR"
cd "$route"
tempfile buildDB
save `buildDB', emptyok
local filenames : dir "$route" files "*.xlsx"
display `filenames'
foreach f of local filenames {
import excel using `"`f'"' ,firstrow allstring clear
gen source = `"`f'"'
append using `buildDB'
save `"`buildDB'"', replace
}
save "C:\Users\NICOLE\Desktop\CAR\DB_EG-RAC.dta" ,replace
Stata manages to append all of the files, but it also displays the following message of error:
file C:\Users\NICOLE.xlsx not found r(601);
And I do not know how to solve it, because it does not let my code run as it should. Thanks!

We have deadlock here. On the face of it the filename in question is not one you write in your code, but could only be part of the result of
local filenames : dir "$route" files "*.xlsx"
But the file named isn't even in the same directory as that named. Moreover, you are adamant that the file doesn't exist and Stata according to your error report can't find it.
The question still remains: how does Stata get asked to open a file that supposedly doesn't exist?
My only guesses are feeble:
Code you are not showing is responsible.
You are running slightly different versions of this script in different places and getting confused. Can you replicate this error that you did get once all over again? Have you searched everywhere remotely possible on the C: drive for this file nicole.xlsx?
It is crucial to realise that we can test nothing here. The problem has not been presented reproducibly.

Related

Macro within a loop and include

In Stata, I'm trying to use the command include to run several regressions in one do file. The overall goal is to help in forecasting Natural Gas production.
I have a do file for each product and basin that I'm interested in. Within each do file I am running several versions of the regression based on data available at specific times (e.g. the first set of regressions is for product type 4, in basin 2, with information available in June 2020).
The regression command within the do file looks something like this:
include "gas\temp\NG var.doh"
foreach time in dec14 dec15 dec16 dec17 previous current {
arima price_`perm4type' `perm4pvars' if tin(,`yq_`time'') , ar(1) ma()
}
I have the perm4pvars defined in the file NG var.doh like this:
local perm4pvars txfreeze PNGHH_`time' d.CPIE_`time' d.IFNRESPUOR_`time' POILWTI_`time'
When I run my do file the time from my doh file doesn't show up. So I get an error message: "PNGHH_ is an ambiguous abbreviation"
How can I get the time to show up in my regress command?
I did try doing this in my .doh file
foreach time in dec14 dec15 dec16 dec17 previous current {
local perm4pvars txfreeze PNGHH_`time' d.CPIE_`time' d.IFNRESPUOR_`time' POILWTI_`time'
}
I got it to run, only for time=current
The relationship between the included .do file and the main .do file is not symmetric.
The point of include is that the included do file is read into the main do file. But the included do file knows nothing about any files it is included in. So definitions in the included do file may not usefully refer to definitions in the main .do file, or to any others, unless they in turn are included, or so I presume.
That explains why your reference to local macro time in the included file doesn't do what you want. It's not illegal in any do file or Stata program to refer to a local macro that doesn't exist (meaning, is not visible from the file) but as here the consequences may not be what you want.
Imo your issue is more basic than what the above answer suggests, in that you want to use locals from a loop outside of it. Or, in the case where you got it to run for time=current, misunderstanding what the loop does. Might be good to get a good understanding of the functioning of loops first, also since you seem to be using multiple other loops that are not detailed in your question and where I can only assume those are specified correctly.
Assuming the gas\temp\NG var.doh file only holds the line defining the local, i.e. local perm4pvars txfreeze PNGHH_`time' d.CPIE_`time' d.IFNRESPUOR_`time' POILWTI_`time', a way to get it working the way you want (or at least how I think you want it, since you do not detail the specifications you want to achieve with your loop) is to move the include inside the loop, changing your code to:
foreach time in dec14 dec15 dec16 dec17 previous current {
include "gas\temp\NG var.doh"
arima price_`perm4type' `perm4pvars' if tin(,`yq_`time'') , ar(1) ma()
}
This way, the line in the included .do file can use the local time from the loop. In the case there is more in the .doh file you would have to change your code a bit more to get it to work, however I can't help you with that unless you give me more information about the structure of the code and what you want to achieve with it.

load data into power BI from relative path

I am trying to find a solution to load an external data file but from a relative path, so when someone else open my PBIX it will still work on his/her computer.
many thanks.
Relative paths are *not* currently supported by Power BI.
To ease the pain, you can create a variable that contains the path where the files are located, and use that variable to determine the path of each table. That way, you only have to change a single place (that variable) and all the tables will automatically point to the new location.
Create a Blank Query, give it a name (e.g. dataFolderPath) and type in the path where your files are (e.g. C:\Users\augustoproiete\Desktop)
With the variable created, edit each of your tables in the Advanced Editor and concatenate your variable with the name of the file.
e.g. instead of "C:\Users\augustoproiete\Desktop\data.xlsx", change it to dataFolderPath & "\data.xlsx"
You can also vote/watch this feature request to be notified when it gets implemented:
Support relative path to excel/csv sources
You can use also the "Parameters" function.
1. Create a new Parameter like "PathExcelFiles"
Parameter_ScreenShot
Edit your "Source" entry
SourceEntry_ScreenShot
Done !
I don't think this is possible yet.
Please add your support for this idea so the Microsoft Power BI team will be more likely to add this as a new feature.
I couldn't bear the fact that there is no possibility to use relative paths, but finally I had to...
So I tried to find a half-decent acceptable workaround.
Using Python-Script it is at least possible to get access to the users %HOME% directory.
let
PySource = Python.Execute("from pathlib import Path#(lf)import pandas as pd#(lf)dataset = pd.DataFrame([[str(Path.home())]], columns = [1])"),
homeDir = Text.Trim(Lines.ToText(PySource{[Name="dataset"]}[Value][1])),
...
The same should be possible with R-Script but didn't do it.
Anybody knows any better solution to get the %HOME% directory inside "Power" Query? I would be glad to have one.
Then I created two scripts inside my working directory install.bat:
#ECHO OFF
if exist "%HOME%\.pbiTemplatePath\filepath.txt" GOTO :ERROR
#This is are the key commands
mkdir "%HOME%\.pbiTemplatePath"
echo|set /p="%cd%" > "%HOME%\.pbiTemplatePath\filepath.txt"
GOTO :END
#Just a little message box
:ERROR
SET msgboxTitle=There is already another working directory installed.
SET /p msgboxBody=<"%HOME%\.pbiTemplatePath\filepath.txt"
SET tmpmsgbox=%temp%\~tmpmsgbox.vbs
IF EXIST "%tmpmsgbox%" DEL /F /Q "%tmpmsgbox%"
ECHO msgbox "%msgboxBody%",0,"%msgboxTitle%">"%tmpmsgbox%"
WSCRIPT "%tmpmsgbox%"
:END
and uninstall_all.bat:
#ECHO OFF
if exist "%HOME%\.pbiTemplatePath\filepath.txt" RMDIR /S /Q "%HOME%\.pbiTemplatePath\"
So in "Power" BI I did this:
let
PySource = Python.Execute("from pathlib import Path#(lf)import pandas as pd#(lf)dataset = pd.DataFrame([[str(Path.home())]], columns = [1])"),
homeDir = Text.Trim(Lines.ToText(PySource{[Name="dataset"]}[Value][1])),
workingDirFile = Text.Combine({homeDir, ".PbiTemplatePath\filepath.txt"} , "\"),
workingDir = Text.Trim(Lines.ToText(Csv.Document(File.Contents(workingDirFile),[Delimiter=";", Columns=1, QuoteStyle=QuoteStyle.None])[Column1])),
...
Now if my git-repository (containing a "Power" BI-template-file and some config-files saying the template where to load the data from and the install/uninstall-scripts). Install has to be executed once and nobody has to copy and paste any path.
I'd be glad about any suggestion of improvement. It's not the solution Gotham deserves... Gotham deserves a better one.
As mentioned by a few people, you can use a dataset parameter and reference that in your script. What I haven't seen mentioned is that you can change these values using an API call:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters

GAMS: Filename cannot be used as valid UEL

I trying to merge a large data set in gams. The file should consist of multiple gdx files with several names. The programme merges the files as I would like them to however: it replaces the names of the file to be merged with File_1, File_2, File_3 and so on. I would like to see the name of the gdx file in the merged file (and so far the script I wrote worked fine).
I'm receiving the following error for each line:
***Filename cannot be used as a valid UEL
Existing name: ImpactYesPGTNoLDViolation-D1-PG10-LDI-LDE0.001-LB0.0045-PDC0-D10
Replaced with File_1
Why does this happen? Could it be that the existing name is too long? I tried finding out more about this error but so far have not found any information on it. And is there anyway to fix it? I need the information of the existing name in order to further process the output.
You are right. This name is too long to be used as UEL (aka label). You can only use up to 63 characters. You can read more about this and other limitations for UELs here

Python in Knime: Downloading files and dynamically pressing them into workflow

I'm using Knime 3.1.2 on OSX and Linux for OPENMS analysis (Mass Spectrometry).
Currently, it uses static filename.mzML files manually put in a directory. It usually has more than one file pressed in at a time ('Input FileS' module not 'Input File' module) using a ZipLoopStart.
I want these files to be downloaded dynamically and then pressed into the workflow...but I'm not sure the best way to do that.
Currently, I have a Python script that downloads .gz files (from AWS S3) and then unzips them. I already have variations that can unzip the files into memory using StringIO (and maybe pass them into the workflow from there as data??).
It can also download them to a directory...which maybe can them be used as the source? But I don't know how to tell the ZipLoop to wait and check the directory after the python script is run.
I also could have the python script run as a separate entity (outside of knime) and then, once the directory is populated, call knime...HOWEVER there will always be a different number of files (maybe 1, maybe three)...and I don't know how to make the 'Input Files' knime node to handle an unknown number of input files.
I hope this makes sense.
Thanks!
Thanks to Gábor for getting me on the right track. Although I ended up doing a slightly different route after much experimentation.
===
Being new to Knime, I don't know if this is an efficient use of Knime, or a complete Kluge...but it does work.
So, part of the problem is some of the Knime specific objects - One of which is called URIDataValue.
A Python Pandas dataframe is, apparently, interchangable with the Knime tables. However, I don't know if there's a way to import one of these URIDataValue objects into Python. So here's what I did...
1. I wrote a Python script that creates a Pandas Dataframe, and populates it with one Column. Everything is a string, including the column header:
from pandas import DataFrame
# Create empty table
T = DataFrame(
[
['file:///Users/.../copy/lfq_spikein_dilution_1.mzML'],
['file:///Users/.../copy/lfq_spikein_dilution_2.mzML'],
],
)
T.columns = ['URIDataValue']
#print T
output_table = T
That creates this dataframe:
Note: The column name and values are just strings. But it is (apparently) important that the column header be 'URIDataValue'...even though HERE it's just text. If the column name is not 'URIDataValue' the next node doesn't know what to do.
NEXT, the 'output_table' from the 'Python Source' node is patched to a 'String to URI' node, which (apparently and magically) knows to change the entire columns string values to URIDataValues (presumably based on the name of the first column...don't know that for sure).
Finally, the NEW table, with the correct data objects goes to a 'URI to PORT' node...since apparently 'Port' objects and a 'URI' object are different.
This, then, matches the needed input to the ZipLoop...which is normally the out put from a static (hard coded) 'Input Files' node.
Now, to actually solve the question above, I just have to add the code to my 'Python Source' to download and unzip the S3 files, then annotate the dataframe with their locations, and go.
I have no idea what I'm doing, but it worked.
There are multiple options to let things work:
Convert the files in-memory to a Binary Object cells using Python, later you can use that in KNIME. (This one, I am not sure is supported, but as I remember it was demoed in one of the last KNIME gatherings.)
Save the files to a temporary folder (Create Temp Dir) using Python and connect the Pyhon node using a flow variable connection to a file reader node in KNIME (which should work in a loop: List Files, check the Iterate List of Files metanode).
Maybe there is already S3 Remote File Handling support in KNIME, so you can do the downloading, unzipping within KNIME. (Not that I know of, but it would be nice.)
I would go with option 2, but I am not so familiar with Python, so for you, probably option 1 is the best. (In case option 3 is supported, that is the best in my opinion.)

Python program to extend short urls that integrates with Stata

I have a dataset containing thousands of tweets. Some of those contain urls but most of them are in the classical shortened forms used in Twitter. I need something that gets the full urls so that I can check the presence of some particular websites. I have solved the problem in Python like this:
import urllib2
url_filename='C:\Users\Monica\Documents\Pythonfiles\urlstrial.txt'
url_filename2='C:\Users\Monica\Documents\Pythonfiles\output_file.txt'
url_file= open(url_filename, 'r')
out = open(url_filename2, 'w')
for line in url_file:
tco_url = line.strip('\n')
req = urllib2.urlopen(tco_url)
print >>out, req.url
url_file.close()
out.close()
Which works but requires that I export my urls from Stata to a .txt file and then reimport the full urls. Is there some version of my Python script that would allow me to integrate the task in Stata using the shell command? I have quite a lot of different .dta files and I would ideally like to avoid appending them all just to execute this task.
Thanks in advance for any answer!
Sure, this is possible without leaving Stata. I am using a Mac running OS X. The details might differ on your operating system, which I am guessing is Windows.
Python and Stata Method
Say we have the following trivial Python program, called hello.py:
#!/usr/bin/env python
import csv
data = [['name', 'message'], ['Monica', 'Hello World!']]
with open('data.csv', 'w') as wsock:
wtr = csv.writer(wsock)
for i in data:
wtr.writerow(i)
wsock.close()
This "program" just writes some fake data to a file called data.csv in the script's working directory. Now make sure the script is executable: chmod 755 hello.py.
From within Stata, you can do the following:
! ./hello.py
* The above line called the Python program, which created a data.csv file.
insheet using data.csv, comma clear names case
list
+-----------------------+
| name message |
|-----------------------|
1. | Monica Hello World! |
+-----------------------+
This is a simple example. The full process for your case will be:
Write file to disk with the URLs, using outsheet or some other command
Use ! to call the Python script
Read the output into Stata using insheet or infile or some other command
Cleanup by deleting files with capture erase my_file_on_disk.csv
Let me know if that is not clear. It works fine on *nix; as I said, Windows might be a little different. If I had a Windows box I would test it.
Pure Stata Solution (kind of a hack)
Also, I think what you want to accomplish can be done completely in Stata, but it's a hack. Here are two programs. The first simply opens a log file and makes a request for the url (which is the first argument). The second reads that log file and uses regular expressions to find the url that Stata was redirected to.
capture program drop geturl
program define geturl
* pass short url as first argument (e.g. http://bit.ly/162VWRZ)
capture erase temp_log.txt
log using temp_log.txt
copy `1' temp_web_file
end
The above program will not finish because the copy command will fail (intentionally). It also doesn't clean up after itself (intentionally). So I created the next program to read what happened (and get the URL redirect).
capture program drop longurl
program define longurl, rclass
* find the url in the log file created by geturl
capture log close
loc long_url = ""
file open urlfile using temp_log.txt , read
file read urlfile line
while r(eof) == 0 {
if regexm("`line'", "server says file permanently redirected to (.+)") == 1 {
loc long_url = regexs(1)
}
file read urlfile line
}
file close urlfile
return local url "`long_url'"
end
You can use it like this:
geturl http://bit.ly/162VWRZ
longurl
di "The long url is: `r(url)'"
* The long url is: http://www.ciwati.it/2013/06/10/wdays/?utm_source=twitterfeed&
* > utm_medium=twitter
You should run them one after the other. Things might get ugly using this solution, but it does find the URL you are looking for. May I suggest that another approach is to contact the shortening service and ask nicely for some data?
If someone at Stata is reading this, it would be nice to have copy return HTTP response header information. Doing this entirely in Stata is a little out there. Personally I would use entirely Python for this sort of thing and use Stata for the analysis of data once I had everything I needed.