ML.NET and Model Builder Local or Remote - ml.net

When we use Model Builder and the ML.NET to build models are the computations done on the host machine or does the data get sent to an Azure service that performs the computations to generate the model?

They're all done locally, but the Image Classification selection has an option to do the training in Azure. You can check this for future updates by the "Local ML" or "Azure ML" tags on each of the items near the bottom.

Related

PowerBI report service - data flow questions

This is what I am trying to do: I have various SQL server databases with data. I created views in all of them. All views will need to be imported, and I specify their relationships. I want this to be refreshed nightly. I want to build various reports of the same data source.
Do I have to use a PowerBI desktop application to import data into PowerBI Report Service? [I have done this so far, but then can create new reports in the cloud on existing data. It would make sense to connect directly from PowerBI report service to my SQL servers.]
Once I uploaded data using a desktop application (as I have done so far), how can I view the data model in the report service once it is uploaded in the cloud?
In order to get routinely refreshed data I need to setup a gateway. Is the local PowerBI desktop application still involved in this process, or could I [in theory] delete the local desktop application that pushed the data in initially?
For your questions:
You have two options, use PBI Desktop to connect to the data using import/direct query, then load it to the service. You can use dataflows to create an import based on your views, but you will then need to create reports from those. Using dataflows, you'll have to set up a refresh schedule, then for the dataset(s) built on top of those, you'll have to set another refresh schedule.
You will be limited to the dataset sizes of 1GB for the workspace if importing data. You cannot use direct query on dataflows (unless you have enhanced compute with PBI premium). Once the dataset is loaded, you can then create new reports in the service or via desktop on top of that dataset. If possible it is recommended to use direct query.
To see the data model, you can use desktop to connect to PBI Service Dataset. This will connect in 'Live Connection' mode, and will be limited to that one dataset, you can't add others to it, Excel, CSV, SQL etc. You can also use Analyse in Excel, a plugin for Excel, that can connect to the data model. You can create new reports in the service for existing data models as well.
When creating the report in PBI Desktop it does not use the Gateway, you connect to your data sources as normal, then once you load the dataset to Power BI it will match the data sources in the file to the ones set up in the Gateway Admin settings. So you will still need PBI Desktop to create reports, but the gateway is there for the refreshing. The Desktop is not used in the process for refreshing. You could delete the workbook or application, but if you have to make changes, what will you refer to? (You could download a copy of the report from the service).+ It is easier to make changes in the desktop app, then the service, as there is a feature difference between dataset creation in the desktop vs service.

How to automate predictions with a trained model in google cloud

I have data from a web users in Firestore.
I have inserted some of this data in Google BigQuery in order to run a machine learning model.
I have experience in training Machine Learning models, but I don't have experience in obtain the predictions for new data once this model is trained.
I have read that I can upload this trained model in Google cloud storage and then put it in AI Platform, but I don't know the process I have to follow, because new data it is going to be inserted in Bigquery, with this new data I want to make predictions and then pick this predictions and put them in Firstore again.
I think that it could be done with Dataflow (Apache Beam) or Data composer (Airflow) where I can automate this process and schedule it to run all the process every week, but I don't have experience in use this technologies,can anyone recommend me what technology will be better for this particular case to lookup information on how to use it?
One possibility could be save the model in AI platform or in google cloud storage and with cloud functions call this saved model and make predictions to save them in firestore?
Bigquery ML supports external Tensorflow models.
TensorFlow model importing. This feature allows you to create BigQuery
ML models from previously-trained TensorFlow models, then perform
prediction in BigQuery ML. See the CREATE MODEL statement for
importing TensorFlow models for more information.
So what you want to achieve is
Get a table in BigQuery
Build out a feature set for your model (select statements)
CREATE MODEL in BigQuery (rerun this to re-train)
Run the ML.PREDICT (or equivalent) to get predictions on new data
As new data arrives into BigQuery you can
- retrain the model (externally or internally depends on type of algorithm you have)
- use the new row in predictions
https://cloud.google.com/bigquery-ml/docs/bigqueryml-intro
For doing this you need 2 services:
One for the prediction which serve your model
One for getting the prediction and storing the result in firestore
Personally, I don't recommend you to store your model in AI-Platform today (a new release should happen by the end of the month, but today, it's no!). I wrote an article for hosting a Tensorflow model in Cloud Run. It should work another framework, but I only had built a tensorflow model, and I used it for my tests.
The best solution if your new data are in BigQuery, and if your model is in tensorflow, is to load you model in BigQuery. The prediction is free of charge, you only pay for the data in your query (I'm also writing an article on this, but I'm waiting the new AI-platform release for providing a correct comparison between both solution).
After getting the prediction, (result of BigQuery + call to Cloud Run OR Result of BigQuery with predict clause), you have to iterate of the results to store them into firestore. I recommend you a batch write to firestore
I have read that I can upload this trained model in Google cloud storage
If you want to do this you can use Dataflow. You can write a pipeline that reads data from BigQuery and writes them to GCS.
(I am not sure I understand how you want your job to interact with AI platform and Firestore)

Import VS Connect live for SQL Server analysis Services

After setting up my ETL, now I'm preparing the reporting phase. I want to know the diffrence between Import and Connect live a SQL Server analysis Services database methods knowing that I'll be working with measures.
In my example : I need to create a measure which count the emails with failed status by department.
I have (after importing the cube) :
Fact Mailing Count
Mail Status Dimension
Message Template Dimension (having the application name which is the
same name of the department)
Live connect: This will connect power bi to analysis services data model directly.so you will be building your model completely on analysis services project and deploying it frequently. So the power bi will have a live connection to the model and make updates when necessary. when new data processed or new measures or tables created. Here the limitation is you cannot combine multiple sources of data and you have to rely on the SASS Model you have already connected with.
Import: This will import tables to the Power bi and you will be allowed to create and manipulate your facts and dimensions as per your wish inside the power bi(in live connect mode you have to do it in analysis services model itself).
The main obstacle of import mode is When you import large tables will a large number of data, power bi has limitations(1GB) on that.
Creating measures and calculated tables are allowed in both modes.the difference is on which side you create them.
A detailed comparison of both Live connect and Import
https://community.powerbi.com/t5/Community-Blog/Power-BI-Live-connection-vs-Import-comparison-and-limitations/ba-p/84377

Framework selection for a new project?

Problem Context
We have a set of excel reports which are generated from an excel input provided by the user and then fed into SAS for further transformation. SAS pulls data from Teradata database and then there is a lot of manipulation that happens with the input data & data pulled from Teradata. Finally, a dataset is generated which can either be sent to the client as a report, or be used for populating Tableau dashboard. Also the database is being migrated from Teradata to Google Cloud (Big Query EDW) as the Teradata pulls from SAS used to take almost 6-7 hours
Problem Statement
Now we need to automate this whole process, by creating front end for the user to upload the input files and from there on the process should trigger and in the end the user should receive the excel file or Tableau dashboard as an attachment in a mail.
Can you suggest what technologies should be used in the front end & middle tier to make this process feasible is least possible time with google cloud platform as the backend?
Can an R shiny front end be a solution given that we need to communicate with a Google Cloud backend ?
I have got suggestion from people that Django will be a good framework to accomplish this task. What are your views on this ?

Sitecore Publishing Problems and determining item state

Can anyone explain to me what state the data should be in for a healthy sitecore instance in each database?
for example:
We currently have an issue with publishing in a 2 server setup.
Our staging server hosts the SQL instance and the authoring / staging instance of sitecore.
We then have a second server to host just the production website for our corp site.
When I look in the master database the PublishQueue table is full of entries and the same table in the web database is empty.
Is this correct?
No amount of hitting publish buttons is changing that at the moment.
How do I determine what the state of an item is in both staging and production environments without having to write an application on top of the sitecore API which I really don't have time for?
This is a normal behavior for the Publish Queue of the Web Database to be blank. The reason is because changes are made on the Master database which will add an entry in the Publish Queue.
After publishing, the item will not be removed from the Publish Queue table. It is the job of the CleanupPublishQueue to cleanup the publish queue table.
In general, tables WILL be different between the two databases as they are used for different purposes. Your master database is generally connected to by authors and the publishing logic, while the web database is generally used as a holding place for the latest published version of content that should be visible.
In terms of debugging publishing, from the Sitecore desktop, you can swap between 'master' and 'web' databases in the lower right corner and use the Content Editor to examine any individual item. This is useful for spot checking individual items have been published successfully.
If an item is missing from 'web', or the wrong version is in 'web', you should examine the following:
Publishing Restrictions on the item: Is there a restriction applied to the item or version that prevents it from publishing at this time?
Workflow state: Is the item/version in the final approved workflow state? You can use the workbox to do a quick check for items needing approval.
Connection strings: Is your staging system connection strings setup to connect to the correct 'web' used by the production delivery server?
The Database table [PublishQueue] is a table where all save and other mutations are stored. This table is used by a Incremental Publish. Sitecore get all the items from the PublishQueue table that were modified more recently than the last incremental publish date. The PublishQueue tabel is not used by a full publish
So it is okay that this Table contain a lot of records on the Master. The web database has the same database scheme. (not the same data, web contain only one version of a item, optimize for performance) The PublishQueue on the web is Empty this is normal.
To Know the state of an item compair the master version with the web version, there can be more than 1 webdatabase, The master database do not know the state/version of the web database