Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to find a way to build ML using AWS, preferably using their services such as SageMaker and not just EC2, for object detection in images using an image as input.
AWS Rekognition offers Image Comparison and Object detection APIs, but they are not exactly what I'm looking for, the comparison works only with faces and not objects and object detection is too basic.
AlibabCloud has that functionality as a service (https://www.alibabacloud.com/product/imagesearch) but I would like to use something similar on AWS, rather than Alibaba.
How would I go about and build something like this?
Thank you.
edited 03/08/2020 to add pointers for visual search
Since you seem interested both in the tasks of object detection (input an image, and return bounding boxes with object classes) and visual search (input an image and return relevant images) let me give you pointers for both :)
For object detection you have 3 options:
Using the managed service Amazon Rekognition Custom Labels. The key benefits of this service is that (1) it doesn't require writing ML code, as the service runs autoML internally to find the best model, (2) it is very flexible in terms of interaction (SDKs, console), data loading and annotation and (3) it can work even with small datasets (typically a few hundred images or less).
Using SageMaker Object Detection model (documentation, demo). In this option, the model is also already written (SSD architecture with Resnet or VGG backbone) and you just need to choose or tune hyperparameters
Using your own model on Amazon SageMaker. This could be your own code in docker, or code from an ML framework in a SageMaker ML Framework container. There are such containers for Pytorch, Tensorflow, MXNet, Chainer and Sklearn. In terms of model code, I recommend considering gluoncv, a compact python computer vision toolkit (based on mxnet backend) that comes with many state-of-the-art models and tutorials for object detection
The task of visual search requires more customization, since you need to provide the info of (1) what you define as search relevancy (eg is it visual similarity? or object complementarity? etc) and (2) the collection among which to search. If all you need is visual similarity, a popular option is to transform images into vectors with a pre-trained neural network and run kNN search between the query image and the collection of transformed images. There are 2 tutos showing how to build such systems on AWS here:
Blog Post Visual Search on AWS (MXNet resnet embeddings +
SageMaker kNN)
Visual Search on MMS demo (MXNet resnet
embeddings + HNSW kNN on AWS Fargate)
Related
I have the task of optimizing search on the website. The search should be for pictures and for text by text query. I have already developed, trained, tested and selected a machine learning model that transforms images and text into a feature vector (Python, based on OpenAI CLIP). This feature vector will be transferred to Elastic Search. Elastic Search will be configured by another specialist.
The model will be used first to determine the feature vector on all existing images and texts, and then be used whenever new content is added or existing content is changed.
There is a lot of existing content (approximately several tens of millions of pictures and texts together). About 100-500 pieces of content are added and changed per day.
I haven't worked much with AWS, but in this case the model needs to be deployed to AWS somehow. Of course, I have the model and the entire project locally, I can write an API app and make a Docker container.
The question is, what is the best method to deploy this application on AWS? The best in terms of speed and ease of implementation (for me as an AWS beginner), as well as cost optimization, taking into account the number of requests for the application.
I've seen different possibilities, from simply deploying the application on EC2 (probably the easiest option) to using SageMaker. Also Kubernetes and ECS...
I'd recommend using SageMaker Hosting endpoint if you need to be able to run vectorization in near-real time any time of the day, or in a SageMaker Training job if you can run vectorization batched, for example once every few hour.
For both systems you can use pre-defined Framework containers and SDK to which you pass a Python code and optionally requirements.txt, or you can create your own image.
Currently, I need to implement searching products by image on my app. As doing research, I wanna go for aws rekognition. So when the model predicts the image, I can pass the predicted label to query products by my api. This is what I plan to do. However, I also come across aws visual search (using aws sageMaker) which is way beyond my understanding. So, am I on the right way to implement it by using the first option (aws rekognition ) ???
Amazon Rekognition is 'out-of-the-box' image recognition. It can label pictures, find faces, read text, etc. It accepts custom labels, however it is not possible to modify the general recognition process.
Amazon SageMaker is a machine learning platform for building your own models. It is highly flexible, for everything from image recognition through to predictive analytics. However, it is quite complex and is usually used by Data Scientists.
Given your knowledge levels, Amazon Rekognition would be a better choice for you.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last year.
Improve this question
We are going to do model serving infrastructure. I am comparing Google AI Prediction and kfserving. But I cannot find enough documents about the features of google ai serving and how it is implemented.
It seems that gcloud ai-platform versions create can create model version resource and start serving, which is the only point I can find.
I have three questions:
1, what is relationship between google ai serving and kfserving?
2, how gcloud ai-platform versions create works?
3, as for the features of google ai serving, do google ai serving provide all feature such as canary rollout, explainers, monitoring, etc listed in https://www.kubeflow.org/docs/components/serving/overview/?
The document you shared contains extensive information about Google AI Platform Prediction. In summary, it is a hosted service in GCP where you don't need to manage the infrastructure. You just deploy your model and a new REST endpoint will be available for you to start sending predictions via SDK or API.
Supports multiple frameworks:
TensorFlow
scikit-learn
XGBoost
Pytorch
Custom Docker containers (soon)
Support GPUs
Model versions
Online and Batch prediction
Logging and Monitoring
Multiple Regions
REST API
Answer to your questions:
KFServing you need to manage your own K8s/KubeFlow infrastructure.
Kubeflow supports two model serving systems that allow multi-framework model serving: KFServing and Seldon Core.
AI Platform Service you don't manage the infrastructure, nor need K8s/KF, you simply deploy your models and GCP takes care of the infra.
gcloud ai-platform versions create will deploy a VM(s) in Google Cloud where based on the settings (Runtime) and Framework all the dependencies will be installed automatically, also all you need to load your model will be installed so you can have access to a REST API.
Canary can be implemented used with different Models and versions, it may depend on routing your predictions. Check the What If tool and Model logging.
Google AI Platform can be used to manage the following stages in the ML workflow:
-Train an ML model on your data:
Train model
Evaluate model accuracy
Tune hyperparameters
-Deploy your trained model.
-Send prediction requests to your model:
Online prediction
Batch prediction (for TensorFlow only)
-Monitor the predictions on an ongoing basis.
-Manage your models and model versions.
KFServing enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve production model serving use cases.
I'm using AutoML Video Intelligence and it's very tedious and I was wondering if there was an easier way to create Datasets for the object tracking. An easy way to get the time and position of the box?
I'm pretty sure that you can find the answers on the mentioned questions reading GCP knowledge base documentation in particular about AutoML Video Intelligence product.
At least Object tracking process is nicely explained in terms of implementation with either GCP console UI or constructing HTTP calls to Cloud REST AutoML API.
Furthermore, you can find example tutoring the way how to handle video segments positioning for the relevant prediction requests.
You can adjust initial question, extending it with a certain details about your use case in order to preciously address the solution.
I am aware that it is better to use aws Rekognition for this. However, it does not seem to work well when I tried it out with the images I have (which are sort of like small containers with labels on them). The text comes out misspelled and fragmented.
I am new to ML and sagemaker. From what I have seen, the use cases seem to be for prediction and image classification. I could not find one on training a model for detecting text in an image. Is it possible to to do it with Sagemaker? I would appreciate it if someone pointed me in the right direction.
The different services will all provide different levels of abstraction for Optical Character Recognition (OCR) depending on what parts of the pipeline you are most comfortable with working with, and what you prefer to have abstracted.
Here are a few options:
Rekognition will provide out of the box OCR with the DetectText feature. However, it seems you will need to perform some sort of pre-processing on your images in your current case in order to get better results. This can be done through any method of your choice (Lambda, EC2, etc).
SageMaker is a tool that will enable you to easily train and deploy your own models (of any type). You have two primary options with SageMaker:
Do-it-yourself option: If you're looking to go the route of labeling your own data, gathering a sizable training set, and training your own OCR model, this is possible by training and deploying your own model via SageMaker.
Existing OCR algorithm: There are many algorithms out there that all have different potential tradeoffs for OCR. One example would be Tesseract. Using this, you can more closely couple your pre-processing step to the text detection.
Amazon Textract (In preview) is a purpose-built dedicated OCR service that may offer better performance depending on what your images look like and the settings you choose.
I would personally recommend looking into pre-processing for OCR to see if it improves Rekognition accuracy before moving onto the other options. Even if it doesn't improve Rekognition's accuracy, it will still be valuable for most of the other options!