I generated a model on Google Vision (object detection) and I wanted to know if I could add new datasets over time, without having to reprocess the already modeled datasets
I take the example of google :
I have a dataset with roses, tulips ...,
I have already created a moldel with the flowers
And I wanted to add a new dataset with just sunflowers,
without deleting the models of the previous flowers
how I do to add the sunflowers ?
To add new data to your dataset (see Importing images into a non-empty dataset):
Select the dataset from the Datasets page to go to its details page.
On the Dataset details page, select the Import tab.
Selecting the Import tab will take you to the Create dataset page
You can then specify the Google Cloud Storage location of your .csv file and select Import to begin the image import process.
But in your case, you will need to train a new model. If you resume training of your existing model, it will fail. Because your dataset's labels will be changed by adding the sunflower label.
A model with a different number of labels has a different underlying structure (E.g.: the output layer would have more nodes because it has as many nodes as labels) so you can’t resume a model’s training with a dataset that has a different number of labels.
Note that you can add more data to your existing dataset and resume training but only if you add data for the already existing labels.
Related
Let's suppose I have a table in BigQuery and I create a dataset on VertexAI based on it. I train my model. A while later, the data gets updated several times in BigQuery.
But can I simply go to my model and get redirected to the exact version of he data it was trained on?
Using time travel, I can still access the historical data in BigQuery. But I didn't manage to go to my model and figure out on which version of the data it was trained and look at that data.
On the Vertex Ai creating a dataset from BigQuery there is this statement:
The selected BigQuery table will be associated with your dataset. Making changes to the referenced BigQuery table will affect the dataset before training.
So there is no copy or clone of the table prepared automatically for you.
Another fact is that usually you don't need the whole base table to create the database, you probably subselect based on date, or other WHERE statements. Essentially the point here is that you filter your base table, and your new dataset is only a subselect of it.
The recommended way is to create a dataset, where you will drop your table sources, lets call them vertex_ai_dataset. In this dataset you will store all your tables that are part of a vertex ai dataset. Make sure to version them, and not update them.
So BASETABLE -> SELECT -> WRITE AS vertex_ai_dataset.dataset_for_model_v1 (use the later in Vertex AI).
Another option is that whenever you issue a TRAIN action, you also SNAPSHOT the base table. But we aware this need to be maintained, and cleaned as well.
CREATE SNAPSHOT TABLE dataset_to_store_snapshots.mysnapshotname
CLONE dataset.basetable;
Other params and some guide is here.
You could also automate this, by observing the Vertex AI, train event (it should documented here), and use EventArc to start a Cloud Workflow, that will automatically create a BigQuery table snapshot for you.
I am working on a PowerBi project and I need some advice/questions on the best way to approach this project. I am tasked to create a dashboard for employee metrics pulled from an onsite SQL Server database. The managers here are going to have access to the PowerBi cloud, so I will end up uploading this to the cloud. There are 10 or so metrics that need to be shown on the dashboard. We have 5000+ employees. My first thought was to create a table and dump all the metrics into a table and set the PowerBi report to import the data, but that seems excessive and a waste of space to upload all that data to the CLOUD because all of the managers don't need access to every employee. They may want to see 1 or 2 employees' metrics on the dashboard.
My second thought is to (and if this is possible) create a stored procedure that will take the employee id and output a dataset for PowerBi to create a visual for. On the dashboard, have a list of employees and when a manager selects one, PowerBi will call the stored procedure with the employee id and the dataset will be returned for PowerBi to decipher into a visual based on my measurements. I guess I would set the PowerBi report connection type as DIRECT QUERY?
Here are my questions:
Is this possible? Is it possible to what I am thinking for my second plan? Is this how DIRECT QUERY works?
If so, how does DIRECT QUERY work with the PowerBi cloud?
What is setup like? Do I just install the PowerBi Data Gateway/configure it like IMPORT DATA and PowerBi does the rest?
A couple of queries:
What is the frequency of data update ?
In case if it is a batch job, it is ideally preferable to import that data from source into powerbi model and do reporting on the imported data as
a) The performance would be quicker
b) There would be no to and for of data across on prem database and cloud
c) the source would not be impacted constantly
So is the ask to have RLS wherein the managers should see only the employees under them?
Then it is pretty easy to implement RLS in imported version rather than in case of direct query.
Also you won't be able to pass parameters to stored procedures, and you can't execute them in direct query mode. You can however, create table valued functions which give you the ability to use table variables and perform other functions that are more complex in nature in Direct Query mode
you can refer this for additional details :
https://community.powerbi.com/t5/Desktop/Can-i-call-Stored-Procedure-with-Direct-Query/m-p/267141#:~:text=%40Pallavi%20you%20won't%20be,nature%20in%20Direct%20Query%20mode.
I am trying to build an object detection model using Google Cloud Vision. The model should draw bounding boxes around rice
What I have done so far:
I have imported an image set of 15 images
I have used the Google Cloud tool to draw ~550 bounding boxes in 10 images
Where I am stuck:
I have built models before, and the data set was automatically split into train, validation and test set. This time however, Google Cloud is not splitting the data set.
What I have tried:
Downloading the .csv with the labeled data and reimporting it into Google Cloud
Adding more labels beyond the one label I have right now
Deleting and recreating the data set
How can I get Google Cloud to split the data set?
Your problem is that Google Cloud Platform determined your train, test and validation sets when you uploaded your images. Your test and validation images are likely your last 5 images which are not available for training if you have not labeled them yet. If you label all of your images or remove those from the dataset you should be able to train. See this SO answer for more info.
You can verify this by clicking the Export Data option and downloading a CSV of your dataset: you can see that data set categories are already defined, even for images that have not yet been labeled yet.
I am in the process of creating a dashboard in power BI with multiple people. Currently I have 4 entities in a Dataflow that move to a dataset which are then visualized in reports. I recently added a column to one of my entities that I would like to show up in a report that is already created. However, despite the column being added to the entity (it shows up when I try to create a new report), it isn't displayed in the older report. How can I get my new column to display in an already created report?
You need to get the old report, go to the Query Editor and refresh the preview for it to pick up the new column.
You may have to go through the steps to make sure it is not removed, by for example reducing the columns down via a selection. When you create a new report you can see the column as it is getting the dataflow table structure with out any history in the query. Note this is not just for Dataflows, but for most types of connection where the structure changes, for example CSV, Excel etc.
Check if the source data set is set to private by the person who published the report. Changing this might grant you access to the source dataset.
I am new with Oracle APEX and trying to explore all options in APEX (5.1). My query is related to Data loading wizard in Oracle APEX. I created one table which has three columns, and I set up that table as Data Load Definitions.
This is the process that I expect through the data loading wizard:
In the first page of Data load Source, I created one radio page item and by selecting that, it should be assigned to the first column in the table.
I will upload a CSV file with two columns which will be assigned to the second and third columns.
So, whatever records are there in the CSV file, by selecting page item that static strings need to be inserted along with file data.
I Googled the same thing but I didn't find any proper solution for this requirement. If you can help me then it would be appreciated.
My preferred approach for this sort of thing is to use a staging table as the target of the Data Load wizard; then add a process at the end that copies the rows from the staging table to the final table, setting the static column(s) at the same time; then delete the rows from the staging table.
Note: add a column SESSION_ID to the table with a trigger that sets it to v('SESSION') so that the process will only pick up rows for the current user session.