I have an Assembly which only consist of one Part. I'm trying to get the TOTAL of every stress compoment of THE WHOLE Assembly/Part within Python.
My problem with my current method is, that it takes ages to sum up the stress of each element(see the code below). The report files gvies me the Totals within a second, so there must be a better way to get to these values over the odb-file.
Thankful for any hint!
odb = session.openOdb(name='C:/temp/Job-1.odb')
step_1 = odb.steps['Step-1']
stress_1=step_1.frames[-1].fieldOutputs['S']
#Step-1
sum_Sxx_1=sum_Syy_1=sum_Szz_1=0
for el in range(numElemente):
Stress=stress_1.getSubset(region=Instance.elements[el],position=CENTROID, elementType='C3D8R').values
sum_Sxx_1 = sum_Sxx_1 + Stress[0].data[0]
sum_Syy_1 = sum_Syy_1 + Stress[0].data[1]
sum_Szz_1 = sum_Szz_1 + Stress[0].data[2]
Direct access by python to values is very slow indeed (I've experienced the same problems). You can write a report file with each value and then work with text files by python again. Just feed file line by line, find relevant line, split it to get stresses, sum them in and continue.
Related
I am writing a program to evaluate the hourly energy output from PV in various cities in the US. For simplicity, I have them in a dictionary (tmy3_cities) so that I can loop through them. To read the code, I followed the TMY to Power Tutorial off Github. Rather than showing the whole loop, I have only added the code that reads and shifts the time by 30-min.
Accordingly, the code taken from the tutorial works for all of the TMY3 files except for Houston, Atlanta, and Baltimore (HAB, for simplicity). All of the tmy3 files were downloaded from NREL and renamed for my own use. The error I get in reading these three files is related to the datetime, and it essentially comes down to a "ValueError: invalid literal for int() with base 10: '1' after some traceback.
Rather than looping into the same problem, I entered each file into the reader individually, and sure enough, only the HAB tmy3 files give errors.
Secondly, I downloaded the files again. This, obviously had no impact.
In a lazy attempt to bypass the issue, I copied and pasted the date and time columns from working tmy3 files (e.g., Miami) into the non-working ones (i.e., HAB) via excel.
I am not sure what else to do, as I am still fairly new to Python and coding in general.
#The dictionary below is not important to the problem below, but is
provided only for some clarification.
tmy3_cities = {'miami': 'TMY3Miami.csv',
'houston': 'TMY3Houston.csv',
'phoenix': 'TMY3Phoenix.csv',
'atlanta': 'TMY3Atlanta.csv',
'los_angeles': 'TMY3LosAngeles.csv',
'las_vegas': 'TMY3LasVegas.csv',
'san_francisco': 'TMY3SanFrancisco.csv',
'baltimore': 'TMY3Baltimore.csv',
'albuquerque': 'TMY3Albuquerque.csv',
'seattle': 'TMY3Seattle.csv',
'chicago': 'TMY3Chicago.csv',
'denver': 'TMY3Denver.csv',
'minneapolis': 'TMY3Minneapolis.csv',
'helena': 'TMY3Helena.csv',
'duluth': 'TMY3Duluth.csv',
'fairbanks': 'TMY3Fairbanks.csv'}
#The code below was taken from the tutorial.
pvlib_abspath = os.path.dirname(os.path.abspath(inspect.getfile(pvlib)))
#This is the only section of the code that was modified.
datapath = os.path.join(pvlib_abspath, 'data', 'TMY3Atlanta.csv')
tmy_data, meta = pvlib.tmy.readtmy3(datapath, coerce_year=2015)
tmy_data.index.name = 'Time'
# TMY data seems to be given as hourly data with time stamp at the end
# Shift the index 30 Minutes back for calculation of sun positions
tmy_data = tmy_data.shift(freq='-30Min')['2015']
print(tmy_data.head())
I would expect each tmy3 file that is read to produce its own tmy_data DataFrame. Please comment if you'd like to see the whole
I'm getting a "The remote Process is out of memory" in SAS DIS (Data Integration Studio):
Since it is possible that my approach is wrong, I'll explain the problem I'm working on and the solution I've decided on:
I have a large list of customer names which need cleaning. In order to achieve this, I use a .csv file containing regular expression patterns and their corresponding replacements; (I use this approach since it is easier to add new patterns to the file and upload it to the server for the deployed job to read from rather than harcoding new rules and redeploying the job).
In order to get my data step to make use of the rules in the file I add the patterns and their replacements to an array in the first iteration of my data step then apply them to my names. Something like:
DATA &_OUPUT;
ARRAY rule_nums{1:&NOBS} _temporary_;
IF(_n_ = 1) THEN
DO i=1 to &NOBS;
SET WORK.CLEANING_RULES;
rule_nums{i} = PRXPARSE(CATS('s/',rule_string_match,'/',rule_string_replace,'/i'));
END;
SET WORK.CUST_NAMES;
customer_name_clean = customer_name;
DO i=1 to &NOBS;
customer_name_clean = PRXCHANGE(a_rule_nums{i},1,customer_name_clean);
END;
RUN;
When I run this on around ~10K rows or less, it always completes and finishes extremely quickly. If I try on ~15K rows it chokes for a super long time and eventually throws an "Out of memory" error.
To try and deal with this I built a loop (using the SAS DIS loop transformation) wherein I number the rows of my dataset first, then apply the preceding logic in batches of 10000 names at a time. After a very long time I got the same out of memory error, but when I checked my target table (Teradata) I noticed that it ran and loaded the data for all but the last iteration. When I switched the loop size from 10000 to 1000 I saw exactly the same behaviour.
For testing purposes I've been working with only around ~500K rows but will soon have to handle millions and am worried about how this is going to work. For reference, the set of cleaning rules I'm applying is currently 20 rows but will grow to possibly a few hundred.
Is it significantly less efficient to use a file with rules rather than hard coding the regular expressions directly in my datastep?
Is there any way to achieve this without having to loop?
Since my dataset gets overwritten on every loop iteration, how can there be an out of memory error for datasets that are 1000 rows long (and like 3 columns)?
Ultimately, how do I solve this out of memory error?
Thanks!
The issue turned out to be that the log that the job was generating was too large. The possible solutions are to disable logging or to redirect the log to a location which can be periodically purged and/or has enough space.
I'm using WEKA Explorer to run a 10fold cross validation. I output the predictions to a CSV file. Because the 10fold approach mixes the order of the data, I do not know which specific data is correctly or incorrectly classified.
I mean, by looking at the CSV I do not know which specific 1 or 0 is classified as 1 or 0. Is there any way to see what is the classification result for every specific instance in test set for every fold? For example, it would be great if the CSV would record the ID of the instance being classified.
One alternative could be for me to implement the 10folds approach manually; i.e., I could create the 10 ARFF files and then run on each of them a percentage split with 90/10 (and preserve order). This solution looks pretty elaborated, effort expensive and error prone.
Thanks for your help!
To do that you need to do the following for every fold:
int result = new int[testSet.numInstances()];
for (int j = 0; j < testSet.numInstances(); j++) {
double res[j] = classifier.classifyInstance(testSet.get(j));
}
Now res array has the classification result for every Instance in test set. You can use this information as you want.
You can for example print the attributes of each instance(e.g if attributes are strings you can print them using (Before addingFilter) testSet.get(j).stringValue(PositionOfAttributeYouWantToPrint)) followed by the classification result.
Note that if the classification result is nominal value you can print it using this:
testSet.classAttribute().value((int)res[j]))
I've been following Dave Miller's ANN C++ Tutorial, and I've been having some problems getting it to function as expected.
You can view the code I'm working with here. It's an XCode project, but includes the main.cpp and data set file.
Previously, this program would only gives outputs between -1 and 1, I'm presuming due to the use of the tanh function. I've manipulated the data inputs so I can input my data that is much larger and have valid outputs. I've simply done this by multiplying the input values by 0.0001, and multiplying the output values by 10000.
The training data I'm using is the included CSV file. The last column is the expected output, the rest are inputs. Am I using the wrong mathematical function for these data?
Would you say that this is actually learning? This whole thing has stressed me out so much, I understand the theory behind ANN's but just can't implement from scratch for myself.
The net recent average error definitely gets smaller and smaller, which to me would say it is learning.
I'm sorry if I haven't explained myself very well, I'm very new to ANN's and this whole thing is very confusing to me. My university lecturers are useless when it comes to the practical side, they only teach us the theory of it.
I've been playing around with the eta and alpha values, along with the number of hidden layers.
You explained yourself quite well, if the net recent average is getting lower and lower it probably means that the network is actually learning, but here is my suggestion about how to be completely sure.
Take you CSV file and split it into 2 files one should be about 10% of the all data and the other all the remaining.
You start with an untrained network and you run your 10% file trough the net and for each line you save the difference between actual output and expected result.
Then you train the network only with the 90% of the CSV file you have and finally you re run trough the NET the first 10% file again and you compare the differences you had on the first run with the the latest ones.
You should find out that the new results are much closer to the expected values than the first time, and this would be the final proof that your network is learning.
Does this make any sense ? if not please send share some code or send me a link to the exercise you are running and I will try to explain it in code.
I've got a large do-file that calls several sub-do-files, all in the lead-up to the estimation of a custom maximum likelihood model. That is, I have a main.do, which looks like this
version 12
set seed 42
do prepare_data
* some other stuff
do estimate_ml
and estimate_ml.do looks like this
* lots of other stuff
global cdf "normal"
program define customML
args lnf r noise
tempvar prob1l prob2l prob1r prob2r y1l y2l y1r y2r euL euR euDiff scale
quietly {
generate double `prob1l' = $ML_y2
generate double `prob2l' = $ML_y3
generate double `prob1r' = $ML_y4
generate double `prob2r' = $ML_y5
generate double `scale' = 1/100
generate double `y1l' = `scale'*((($ML_y10+$ML_y6)^(1-`r'))/(1-`r'))
generate double `y2l' = `scale'*((($ML_y10+$ML_y7)^(1-`r'))/(1-`r'))
generate double `y1r' = `scale'*((($ML_y10+$ML_y8)^(1-`r'))/(1-`r'))
generate double `y2r' = `scale'*((($ML_y10+$ML_y9)^(1-`r'))/(1-`r'))
generate double `euL' = (`prob1l'*`y1l')+(`prob2l'*`y2l')
generate double `euR' = (`prob1r'*`y1r')+(`prob2r'*`y2r')
generate double `euDiff' = (`euR'-`euL')/`noise'
replace `lnf' = ln($cdf( `euDiff')) if $ML_y1==1
replace `lnf' = ln($cdf(-`euDiff')) if $ML_y1==0
}
end
ml model lf customML ... , maximize technique(nr) difficult cluster(id)
ml display
To my great surprise, when I run the whole thing from top to bottom in Stata 12/SE I get different results for one of the coefficients reported by ml display each time I run it.
At first I thought this was a problem of running the same code on different computers but the issue occurs even if I run the same code on the same machine multiple times. Then I thought this was a random number generator issue but, as you can see, I can reproduce the issue even if I fix the seed at the beginning of the main do-file. The same holds when I move the set seed command immediately above the ml model.... The only way to get the same results though multiple runs is if I run everything above ml model and then only run ml model and ml display repeatedly.
I know that the likelihood function is very flat in the direction of the parameter whose value changes over runs so it's no surprise it can change. But I don't understand why it would, given that there seems to be little that isn't deterministic in my do files to begin with and nothing that couldn't be made deterministic by fixing the seed.
I suspect a problem with sorting. The default behaviour is that if two observations have the same value, they will be sorted randomly. Moreover, the random process that guides this sorting is governed by a different seed. This is intentional, as it prevents users to by accident see consistency where none exist. The logic being that it is better to be puzzled than to be overly confident.
As someone mentioned in the comments to this answer, adding the option stable to my sort command made the difference in my situation.