How to pass a coordinate to inference data - pymc3

I've been skimming through Arviz documentation and came across 8 school inference data.
import arviz as az
idata = az.load_arviz_data("centered_eight")
Inference data object also includes school coordinate. How can I do that with my bambi models? I always have a dimension column that looks like ..._dim_....
Thanks!

Related

Correct approach to improve/retrain an offiline model

I have a recommendation system that was trained using Behavior Cloning (BC) with offline data generated using a supervised learning model converted to batch format using the approach described here. Currently, the model is exploring using an e-greedy strategy. I want to migrate from BC to MARWIL changing the beta.
There is a couple of ways to do that:
Convert the data employed to train the BC algorithm plus the agent’s new data and retrain from scratch using MARWIL.
Convert the new data generated by the agent and put it together with the previous converted data employed to train the BC algorithm, using the input parameter, doing something similar to what is described here, and retrain from scratch using MARWIL .
Convert the new data generated by the agent and put it together with the previous converted data employed to train the BC algorithm, using the input parameter, doing something similar to what is described here, and retrain using the restored BC agent using MARWIL .
Questions:
Following option 1.:
Given that the new data slice would be very small compared with the previous one, would the model learn something new?
When we stop using original data?
Following option 2.:
Given that the new data slice would be very small compared with the previous one, would the model learn something new?
When we stop using original data?
This approach works for trajectories associated with new episodes ids, but it will extend the trajectories of episodes already present in the original batch?
Following option 3.:
Given that the new data slice would be very small compared with the previous one, would the model learn something new?
When we stop using original data?
This approach works for trajectories associated with new episodes ids, but it will extend the trajectories of episodes already present in the original batch?
The retrain would update the networks’ weights using the new data points, but to do that how many iterations should we use?
How to prevent catastrophic forgetting?

How to prepare the multilevel multivalued training dataset in python

I am a beginner in machine learning. My academic project involves detecting human posture from acceleration and gyro data. I am stuck at the beginning itself. My accelerometer data has x,y,z values and gyro also has x,y,z values stored in file acc.csv and gyro.csv. I want to classify the 'standing', 'sitting', 'walking' and 'lying' position. The idea is to train the machine using some ML algorithm (supervised) and then throw a new acc + gyro data set to identify what this new dataset predict (what the subject is doing at present). I am facing the following problems--
Constructing a training dataset -- I think my activities will be dependent variable, and acc & gyro axis readings will be independent. So if I like to combine it in single matrix with each element of the matrix again has it's own set of acc and gyro value [Something like main and sub matrix], how can I do that? or is there any alternative idea to do the same?
How can I take the data of multiple activities with multiple readings in a single training matrix,
I mean 10 walking data each with it's own acc(xyz) and gyro (xyz) + 10 standing data each with it's own acc(xyz) and gyro (xyz) + 10 sitting data each with it's own acc(xyz) and gyro (xyz) and so on.
Each data file has different number of records and time stamp, how to bring them into a common platform.
I know I am asking very basic things but these are the confusion part nobody has clearly explained to me. I am feeling like standing in front of a big closed door, inside very interesting things are happening where I cannot participate at this moment with my limited knowledge. My mathematical background is high school level only. Please help.
I have gone through some projects on activity recognition in Github. But they are way too complicated for a beginner like me.
import pandas as pd
import os
import warnings
from sklearn.utils import shuffle
warnings.filterwarnings('ignore')
os.listdir('../input/testtraindata/')
base_train_dir = '../input/testtraindata/Train_Set/'
#Train Data
train_data = pd.DataFrame(columns = ['activity','ax','ay','az','gx','gy','gz'])
train_folders = os.listdir(base_train_dir)
for tf in train_folders:
files = os.listdir(base_train_dir+tf)
for f in files:
df = pd.read_csv(base_train_dir+tf+'/'+f)
train_data = pd.concat([train_data,df],axis = 0)
train_data = shuffle(train_data)
train_data.reset_index(drop = True,inplace = True)
train_data.head()
The Data Set
Problem in Train_set
Surprisingly if I remove the last 'gz' from
train_data = pd.DataFrame(columns =['activity','ax','ay','az','gx','gy','gz'])
Everything is working fine.
You have the data labeled? --> position of x,y,z... = positure?
I have no clue about the values (as I have not seen the dataset, and have no clue about positions, acc or gyro), but Im guessing you should have a dataset within a matrise with x, y, z as categories and a target category ;"positions".
If you need all 6 (3 from one csv and 3 from the other) to define the positions you can make 6 categories + positions.
Something like : x_1, y_1 z_1 , x_2, y_2, and z_2 + position label ("position" category).
You can also make each position an own category with 0/1 as true/false.
"sitting" , "walking" etc... and have 0 and 1 as the values in the columns.
Is the timestamp of any importance towards the position? If it is not a feature of importance I would just drop it. If it is important in some way, you might want to bin them.
Here is a beginners guide from Medium in which you can see a bit how to preprocess your data. It also shows one hot encoding :)
https://medium.com/hugo-ferreiras-blog/dealing-with-categorical-features-in-machine-learning-1bb70f07262d
Also try googling Preprocessing your data, then you will probably find the right recipe

How can I calculate irradiance POA data for a single axis tracking PV system?

I’d like to use pvlib library to calculate irradiance POA data for a single axis tracker system.
From the documentation it appears that this is possible, by creating a pvlib.tracking.SingleAxisTracker class (with the appropriate metadata), and then calling the get_irradiance method.
I've done so as such:
HSAT = SingleAxisTracker(axis_tilt=0,
axis_azimuth=167.5,
max_angle=50,
backtrack=True,
gcr=0.387)
I then use the get_irradiance method of the HSAT instance of the SingleAxisTracker I just created, expecting it to use the metadata that I just entered to calculate POA data for this Horizontal single axis tracker system:
hsat_poa = HSAT.get_irradiance(surface_tilt=0,
surface_azimuth=167.5,
solar_zenith=sz,
solar_azimuth=sa,
dni=dni,
ghi=ghi,
dhi=dhi,
airmass=None,
model='haydavies')
When I go to plot hsat_poa, however, I get what looks like POA data for a fixed tilt system.
When I looked at the source code, I noticed that the SingleAxisTracker.get_irradiance method ultimately calls the location.total_irrad() method, which only returns POA data for a fixed tilt systems.
Do I need to provide my down surface_tilt data from the HSAT system? I had assumed that pvlib models an HSAT system, and would generate the surface_tilt values for me, based on the arguments provided in the SingleAxisTracker class instantiation. But it appears that's not what happens.
So my question is does pvlib require the tracker angle as an input in order to calculate POA data for Single Axis Tracker systems, or can it model the tracker angle itself, based on metadata like axis_tilt, max_angle, and backtrack?
Turns out pvlib.tracking.singleaxis() is the missing link.
This will determine the rotation angle of a single axis tracker system.
tracker_data = pvlib.tracking.singleaxis(solar_position['apparent_zenith'],
solar_position['azimuth'],
axis_tilt=MOUNTING_TILT,
axis_azimuth=MOUNTING_AZIMUTH,
max_angle=MAX_ANGLE,
backtrack=True,
gcr=MOUNTING_GCR)
and then using tracker_data like so:
hsat_poa_model_tracker = HSAT.get_irradiance(surface_tilt=tracker_data['surface_tilt'],
surface_azimuth=tracker_data['surface_azimuth'],
solar_zenith=solar_position['apparent_zenith'],
solar_azimuth=solar_position['azimuth'],
dni=dni,
ghi=ghi,
dhi=dhi,
airmass=None,
model='haydavies')
will calculate POA data for a single axis tracker.
Found the answer in this jupyter notebook:
http://nbviewer.jupyter.org/github/pvlib/pvlib-python/blob/master/docs/tutorials/tracking.ipynb
Can it model the tracker angle itself, based on metadata like
axis_tilt, max_angle, and backtrack?
pvlib's ModelChain will do this. See the PV Power Forecast documentation for an example of using a ModelChain with a SingleAxisTracker.

Simple prediction model for multiple features

I am new in prediction models. I am currently using python2.7 and sklearn. I would like to know a simple model to combine many features to predict one target.
To make it more clear. Lets say I have 4 arrays of size 10: A,B,C,Y. I would like to use the values of A,B,C to predict the values of Y.
Thank you

quadratic featurizer: preprocessing with fit_transform

the following example is written in Python and is taken from the book Mastering Machine Learning.
Overview of the task:
training data is stored in column vectors X_train (features) and y_train (response variables)
data for testing purposes is respectively stored in X_test and y_test
now fit a model to the training data using polynomial regression (in this case quadratic)
The author's approach (imports and data initialization excluded):
quad_featurizer = PolynomialFeatures(degree=2)
X_train_quad = quad_featurizer.fit_transform(X_train)
X_test_quad = quad_featurizer.transform(X_test)
regressor_quad = LinearRegression()
regressor_quad.fit(X_train_quad, y_train)
The author didn't comment the code or tells anything more about the methods used. Since the scikit-learn API couldn't give me a satisfying answer either, I'd like to ask you.
Why would I use fit_transform and not just transform for preprocessing the training data? I mean the actual fitting is done with the regressor_quad object, so fit_transform is redundant, isn't it?
Those featurizers of scikit must be adjusted to your specific dataset and only afterwards can transform it to new feature vectors. fit() performs that adjustment. Therefore you need to first call fit() and then transform(), or both at the same time via fit_transform().
In your example PolynomialFeatures is used to project your training data into a new higher-dimensional space. So a vector (3, 6) would become (1, 3, 6, 3*3, 3*6, 6*6). In fit() PolynomialFeatures learns the size of your training vectors and in transform() it creates new training vectors from the old ones. So X_train_quad is a new matrix with a shape that is different from X_train. Afterwards the same is done with X_test but then PolynomialFeatures knows already the sizes of your vectors so it doesn't have to be fit() again. LinearRegression is then trained on your new training data (X_train_quad) via its fit() method, which is completely separated from PolynomialFeatures and therefore its fit() doesn't really have anything to do with fit() of PolynomialFeatures.