How to create a dataset from already existing database which has multiple images in a folder - computer-vision

So I am trying to build a fingerprint recognition model. To be able to train the model, the dataset I have has 500 folders with each folder containing 10 images each, how do I load the dataset to pass to a model?
I know how to load a dataset if one image represents one class but here as 10 images represent a single class

Related

Google Vertex AI image AutoML classification when an important image feature is text inside the image

I'd like to do image classification. In my dataset, despite the fact that images features is a strong component for this classification (colors, shapes, etc), some categories of images will be hard to distinguish without interpreting the text inside the image.
I don't think VertexAI/AutoML will use pre-trained models in order to facilitate classification if in some case the only difference is the text. I know Google Vision/OCR is capable of doing such extraction. But is there a way to do image classification (VertexAI/AutoML) using Google Cloud Vision extraction as an additional image feature?
Currently my project uses 3 models (no google cloud):
model 1: classify an image using images features
model 2: classify an image, only using OCR + regex (same categories)
model 3: combine both models and decide when to use model 1 or model 2
I'd like to switch to Vertex AI the following will improve my project quality for the following:
AutoML classification seems very good for model 1
I need to use a tool to manage my datasets (Vertex AI managed dataset)
Vertex AI has interesting pipeline training features
If it is confirmed that AutoML won't perform well if some images categories only differs in the text, I would recreate a similar 3-tier models using Vertex AI custom training scripts. I can easily create model 1 with VertexAI/AutoML. However I have no idea if:
I can create model 2 with a vertex ai custom training script using google cloud vision/ocr to do image classification
I can create model 3 that would use models 1 and 2 created by vertex ai.
Could you give me recommendations on how to achieve that using Google Cloud Platform?
For this purpose, I recommend you the following:
1. model 2:
Keep your images in a GCS.
Use the Detect text in images  |  Cloud Vision API to generate your dataset (text) {"gcs":"gs://path_to_image/image_1","text":["text1"...]}.
Use AutoML on this text dataset processed by vision api or just use a regexp on this data or insert into a bigquery dataset and query on it, and so on...
1. model 3:
I would follow a similar approach, processing the images using the cloud vision API and generating a text dataset, but this time, the images that dont have any text on it, will generate a dataset with the "text" field empty {"gcs":"gs://path_to_image/image_2","text":[]}. Your own script can exclude the data with text and generate a dataset for the model 2, and a dataset for the model 1.
I see that your models 2 and 3 are not strictly classifications. Model 2 is a ocr problem, and them you process the output data. The model 3 is basically process your data and separate the proper datasets.
I hope this insight may help you.

Loading multiple keras models in django

I have 5 trained keras models with their weights saved.
To get predictions, I first create the model with same architecture and then load the saved weights.
Now, I wanna get predictions from all these models in django and return them as json response.
Where should I load the models so that they are loaded only when the server starts?
The answer depends on the type of data you are using, for example if it is Image classification task:
You need first to load your images using a simple HTML/js form
After
receiving the images, you need to pre-process them like you did when
you were trying to train your model. meaning: if the model expects
images on 224x224x3 input shape, the uploaded image need to be in
that shape, then convert the image into a numpy array img_to_array
Lastly, you need to pass this img_to_array to model.predict()
and get your results.
There are multiple blogs doing exactly this example

Add new datasets on GCP object detection

I generated a model on Google Vision (object detection) and I wanted to know if I could add new datasets over time, without having to reprocess the already modeled datasets
I take the example of google :
I have a dataset with roses, tulips ...,
I have already created a moldel with the flowers
And I wanted to add a new dataset with just sunflowers,
without deleting the models of the previous flowers
how I do to add the sunflowers ?
To add new data to your dataset (see Importing images into a non-empty dataset):
Select the dataset from the Datasets page to go to its details page.
On the Dataset details page, select the Import tab.
Selecting the Import tab will take you to the Create dataset page
You can then specify the Google Cloud Storage location of your .csv file and select Import to begin the image import process.
But in your case, you will need to train a new model. If you resume training of your existing model, it will fail. Because your dataset's labels will be changed by adding the sunflower label.
A model with a different number of labels has a different underlying structure (E.g.: the output layer would have more nodes because it has as many nodes as labels) so you can’t resume a model’s training with a dataset that has a different number of labels.
Note that you can add more data to your existing dataset and resume training but only if you add data for the already existing labels.

Google Cloud Vision - Data Set cannot be split into train, validation, test data

I am trying to build an object detection model using Google Cloud Vision. The model should draw bounding boxes around rice
What I have done so far:
I have imported an image set of 15 images
I have used the Google Cloud tool to draw ~550 bounding boxes in 10 images
Where I am stuck:
I have built models before, and the data set was automatically split into train, validation and test set. This time however, Google Cloud is not splitting the data set.
What I have tried:
Downloading the .csv with the labeled data and reimporting it into Google Cloud
Adding more labels beyond the one label I have right now
Deleting and recreating the data set
How can I get Google Cloud to split the data set?
Your problem is that Google Cloud Platform determined your train, test and validation sets when you uploaded your images. Your test and validation images are likely your last 5 images which are not available for training if you have not labeled them yet. If you label all of your images or remove those from the dataset you should be able to train. See this SO answer for more info.
You can verify this by clicking the Export Data option and downloading a CSV of your dataset: you can see that data set categories are already defined, even for images that have not yet been labeled yet.

I have trouble loading data from multiple csv files or python code in powerbi

I am using power bi for loading some data. The data is present in the data lake store, and python can load data and do some initial transforms on the data.
The problem is however that the data is located in multiple files as per ID (1 - 10000). It is not possible to load all the data in memory.
I am looking for some way to have a button or something similar that acts as a selector for these Ids and then loads the data for the selected ID using python script, hopefully by passing a parameter to the python script.