I am using an amazon machine learning for creating ML models for my applications. I have created a datasource and also ML model corresponding to that datasource, however in my application new data always keeps getting added so I have to update the data file in s3 which in turn used by the datasource. So the question is how can I update the datasource corresponding to that data file without changing the datasource id and also how to update the ML model corresponding to that datasource without changing the ML model id?
I know that there are methods in Boto3 to update datasource or ML model however as far as I know it only updates the name of those objects.
Any help would be appreciated.
You cannot do that. Amazon ML datasources are immutable, save for the human-readable name attribute. Instead, when you have new data, create a new datasource that points at the same data file(s) in S3, and then train a new ML model using that datasource.
Related
I have an existing AWS Amplify schema with data deployed to DynamoDB tables.
I want to change the AWS Amplify schema.
When I change the schema, how do I include the data in my old tables and migrate them to the new tables created by AWS Amplify?
The answer to this depends on how much you are changing your schema. If you are just adding new attributes to your models or taking away attributes then you won't need to do anything. If you are renaming or creating new models this will get trickier. My advice would be to add all new schema models you want without removing the old ones. Then write a few migration scripts using the dynamodb directly to migrate your data. and then once all of the old data is migrated you can delete your old models.
I have data from a web users in Firestore.
I have inserted some of this data in Google BigQuery in order to run a machine learning model.
I have experience in training Machine Learning models, but I don't have experience in obtain the predictions for new data once this model is trained.
I have read that I can upload this trained model in Google cloud storage and then put it in AI Platform, but I don't know the process I have to follow, because new data it is going to be inserted in Bigquery, with this new data I want to make predictions and then pick this predictions and put them in Firstore again.
I think that it could be done with Dataflow (Apache Beam) or Data composer (Airflow) where I can automate this process and schedule it to run all the process every week, but I don't have experience in use this technologies,can anyone recommend me what technology will be better for this particular case to lookup information on how to use it?
One possibility could be save the model in AI platform or in google cloud storage and with cloud functions call this saved model and make predictions to save them in firestore?
Bigquery ML supports external Tensorflow models.
TensorFlow model importing. This feature allows you to create BigQuery
ML models from previously-trained TensorFlow models, then perform
prediction in BigQuery ML. See the CREATE MODEL statement for
importing TensorFlow models for more information.
So what you want to achieve is
Get a table in BigQuery
Build out a feature set for your model (select statements)
CREATE MODEL in BigQuery (rerun this to re-train)
Run the ML.PREDICT (or equivalent) to get predictions on new data
As new data arrives into BigQuery you can
- retrain the model (externally or internally depends on type of algorithm you have)
- use the new row in predictions
https://cloud.google.com/bigquery-ml/docs/bigqueryml-intro
For doing this you need 2 services:
One for the prediction which serve your model
One for getting the prediction and storing the result in firestore
Personally, I don't recommend you to store your model in AI-Platform today (a new release should happen by the end of the month, but today, it's no!). I wrote an article for hosting a Tensorflow model in Cloud Run. It should work another framework, but I only had built a tensorflow model, and I used it for my tests.
The best solution if your new data are in BigQuery, and if your model is in tensorflow, is to load you model in BigQuery. The prediction is free of charge, you only pay for the data in your query (I'm also writing an article on this, but I'm waiting the new AI-platform release for providing a correct comparison between both solution).
After getting the prediction, (result of BigQuery + call to Cloud Run OR Result of BigQuery with predict clause), you have to iterate of the results to store them into firestore. I recommend you a batch write to firestore
I have read that I can upload this trained model in Google cloud storage
If you want to do this you can use Dataflow. You can write a pipeline that reads data from BigQuery and writes them to GCS.
(I am not sure I understand how you want your job to interact with AI platform and Firestore)
I am using Google Cloud Datastore(not NDB) for my project.
python2.7 and Django.
I want to create a new model, lets say Tag model.
class Tag(db.Model):
name = ndb.StringProperty()
feature = ndb.StringProperty(default='')
I have added property to a model many times, but not yet created new model.
My question is when I have changed model schema in Django for my another project using mySQL, I always executed manage.py migrate.
Do I have to execute the migration command for Datastore as well?
Or just defining the model is all I have to do?
Thanks in advance!
Unlike SQL databases like MySQL, Cloud Datastore doesn't require you to create kinds (similar to tables) in advance. Other than defining it in your code, no admin steps are required to create the kind.
When you write the first entity of that kind, it's created implicitly for you.
You can even query for kinds that don't exist yet without an error, you'll just get no entities back:
Of course you have to migrate, except if you are using the same database from the another project. Anyway if you type migrate it will create the tables from your models but if you are working with a existing database nothing is going to happen
I have some Models created in AWS Machine Learning with a S3 csv file.
After a lot of search I didn't find the better way to retrain my model.
I would like to know if there any any option to retrain my models with new data or I if need to create a new one each time.
Amazon ML is providing a set of API (and SDKs) that allows creating programmatically a pipeline that will take new data from S3 and generate the datasource and the ML models from it.
All the components including datasources, ML models, evaluation etc. are immutable, and if you want to retrain, you need to recreate it. It allows you to roll back to a previous model, if the performance of the new model is not better that the old model.
I have created a data source and trained the machine learning model in Amazon Machine Learning. The data resides in S3 which is used for creating the data source. However, my application has new data added to S3 every second, thus I need a way in which I can generate the data source and train the model periodically.
Is there a way in which I can achieve this?
Any help is appreciated.
Yes. You need to do a few things:
make sure your data source points to the prefix in s3: bucket/data/ rather than bucket/data/data.csv
write a script that you run regularly to create a new model (you unfortunately can't update the model) against this data. Here's a sample script which does this using boto: https://github.com/mooreds/amazonmachinelearning-anintroduction/blob/master/updatemodel/updatemodel.py
tag your new model and make sure your clients are finding the model to use via tags
delete your old models (mostly to avoid confusion)