PVLIB: How do I fix value error when reading tmy3 files on tmy.py? - pvlib

I am writing a program to evaluate the hourly energy output from PV in various cities in the US. For simplicity, I have them in a dictionary (tmy3_cities) so that I can loop through them. To read the code, I followed the TMY to Power Tutorial off Github. Rather than showing the whole loop, I have only added the code that reads and shifts the time by 30-min.
Accordingly, the code taken from the tutorial works for all of the TMY3 files except for Houston, Atlanta, and Baltimore (HAB, for simplicity). All of the tmy3 files were downloaded from NREL and renamed for my own use. The error I get in reading these three files is related to the datetime, and it essentially comes down to a "ValueError: invalid literal for int() with base 10: '1' after some traceback.
Rather than looping into the same problem, I entered each file into the reader individually, and sure enough, only the HAB tmy3 files give errors.
Secondly, I downloaded the files again. This, obviously had no impact.
In a lazy attempt to bypass the issue, I copied and pasted the date and time columns from working tmy3 files (e.g., Miami) into the non-working ones (i.e., HAB) via excel.
I am not sure what else to do, as I am still fairly new to Python and coding in general.
#The dictionary below is not important to the problem below, but is
provided only for some clarification.
tmy3_cities = {'miami': 'TMY3Miami.csv',
'houston': 'TMY3Houston.csv',
'phoenix': 'TMY3Phoenix.csv',
'atlanta': 'TMY3Atlanta.csv',
'los_angeles': 'TMY3LosAngeles.csv',
'las_vegas': 'TMY3LasVegas.csv',
'san_francisco': 'TMY3SanFrancisco.csv',
'baltimore': 'TMY3Baltimore.csv',
'albuquerque': 'TMY3Albuquerque.csv',
'seattle': 'TMY3Seattle.csv',
'chicago': 'TMY3Chicago.csv',
'denver': 'TMY3Denver.csv',
'minneapolis': 'TMY3Minneapolis.csv',
'helena': 'TMY3Helena.csv',
'duluth': 'TMY3Duluth.csv',
'fairbanks': 'TMY3Fairbanks.csv'}
#The code below was taken from the tutorial.
pvlib_abspath = os.path.dirname(os.path.abspath(inspect.getfile(pvlib)))
#This is the only section of the code that was modified.
datapath = os.path.join(pvlib_abspath, 'data', 'TMY3Atlanta.csv')
tmy_data, meta = pvlib.tmy.readtmy3(datapath, coerce_year=2015)
tmy_data.index.name = 'Time'
# TMY data seems to be given as hourly data with time stamp at the end
# Shift the index 30 Minutes back for calculation of sun positions
tmy_data = tmy_data.shift(freq='-30Min')['2015']
print(tmy_data.head())
I would expect each tmy3 file that is read to produce its own tmy_data DataFrame. Please comment if you'd like to see the whole

Related

Concatenate Monthy modis data

I downloaded daily MODIS DATA LEVEL 3 data for a few months from https://disc.gsfc.nasa.gov/datasets. The filenames are of the form MCD06COSP_M3_MODIS.A2006001.061.2020181145945 but the files do not contain any time dimension. Hence when I use ncecat to concatenate various files, the date information is missing in the resulting file. I want to know how to add the time information in the combined dataset.
Your commands look correct. Good job crafting them. Not sure why it's not working. Possibly the input files are HDF4 format (do they have a .hdf suffix?) and your NCO is not HDF4-enabled. Try to download the files in netCDF3 or netCDF4 format and your commands above should work. If that's not what's wrong, then examine the output files in each step of your procedure and identify which step produces the unintended results and then narrow your question. Good luck.

Wildcard or equivalent to read in excel file

I have multiple excel files imported on a daily basis, example code of one of the files is here:
Booked <- read_excel("./Source_Data/CONFIDENTIAL - MI8455 Future Change 20180717.xlsx", skip = 1, sheet = "Appendix 1 - Info Data")
Each day this file changes, the name and structure is always the same, the only difference is the date at the end of the file name.
Is there anyway to have R search for the specific name starting with "CONFIDENTIAL - MI8455 Future Change" and import the data accordingly?
To get the path of the file out you can use this pattern
(?'path'\.\/Source_Data\/CONFIDENTIAL - MI8455 Future Change \d+\.xlsx)
Ok, through a lot of trial, error and google I found an answer and hope that someone else who is new to R may have come across the same problem.
First I needed to identify the file, in the end I used the list.files command: MI8455 <- list.files(path= "G:/MY/FilE/PATH/MI8455", pattern="^MI8455_Rate_Change_Report_1.*\\.xlsx$") If likemyself your files are either in other folders/subfolders of the working directory than the first part of the code specifies where the list.files should look. The pattern element allows you to show what format the name is in and then you are able to specify the file type.
Next you can import using the read_excel package, but rather than specifiying a file path you tell it to use the value that you created earlier: Customer_2017 <- read_excel(MI8455,skip = 5, sheet = "Case Listing - Eml")

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.
You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.
You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

Abaqus Total of each Stress component

I have an Assembly which only consist of one Part. I'm trying to get the TOTAL of every stress compoment of THE WHOLE Assembly/Part within Python.
My problem with my current method is, that it takes ages to sum up the stress of each element(see the code below). The report files gvies me the Totals within a second, so there must be a better way to get to these values over the odb-file.
Thankful for any hint!
odb = session.openOdb(name='C:/temp/Job-1.odb')
step_1 = odb.steps['Step-1']
stress_1=step_1.frames[-1].fieldOutputs['S']
#Step-1
sum_Sxx_1=sum_Syy_1=sum_Szz_1=0
for el in range(numElemente):
Stress=stress_1.getSubset(region=Instance.elements[el],position=CENTROID, elementType='C3D8R').values
sum_Sxx_1 = sum_Sxx_1 + Stress[0].data[0]
sum_Syy_1 = sum_Syy_1 + Stress[0].data[1]
sum_Szz_1 = sum_Szz_1 + Stress[0].data[2]
Direct access by python to values is very slow indeed (I've experienced the same problems). You can write a report file with each value and then work with text files by python again. Just feed file line by line, find relevant line, split it to get stresses, sum them in and continue.

Artificial Neural Network with large inputs & outputs

I've been following Dave Miller's ANN C++ Tutorial, and I've been having some problems getting it to function as expected.
You can view the code I'm working with here. It's an XCode project, but includes the main.cpp and data set file.
Previously, this program would only gives outputs between -1 and 1, I'm presuming due to the use of the tanh function. I've manipulated the data inputs so I can input my data that is much larger and have valid outputs. I've simply done this by multiplying the input values by 0.0001, and multiplying the output values by 10000.
The training data I'm using is the included CSV file. The last column is the expected output, the rest are inputs. Am I using the wrong mathematical function for these data?
Would you say that this is actually learning? This whole thing has stressed me out so much, I understand the theory behind ANN's but just can't implement from scratch for myself.
The net recent average error definitely gets smaller and smaller, which to me would say it is learning.
I'm sorry if I haven't explained myself very well, I'm very new to ANN's and this whole thing is very confusing to me. My university lecturers are useless when it comes to the practical side, they only teach us the theory of it.
I've been playing around with the eta and alpha values, along with the number of hidden layers.
You explained yourself quite well, if the net recent average is getting lower and lower it probably means that the network is actually learning, but here is my suggestion about how to be completely sure.
Take you CSV file and split it into 2 files one should be about 10% of the all data and the other all the remaining.
You start with an untrained network and you run your 10% file trough the net and for each line you save the difference between actual output and expected result.
Then you train the network only with the 90% of the CSV file you have and finally you re run trough the NET the first 10% file again and you compare the differences you had on the first run with the the latest ones.
You should find out that the new results are much closer to the expected values than the first time, and this would be the final proof that your network is learning.
Does this make any sense ? if not please send share some code or send me a link to the exercise you are running and I will try to explain it in code.