How can this be completed with the Google Vision-API please?
send image to vision-api
request: 'features': [{': 'LABEL_DETECTION','maxResults': 10,}]
receive the labels in particular the one I'm interest in is a "clock"
receive the boundingPoly so that I know the exact location of the clock within the image
having received the boundingPoly I would want to use it to create a dynamic AR marker to be tracked by the AR library
Currently it doesn't look like Google Vision-API supports a boudingPoly for LABELS hence the question if there is a way to solve it with the Vision-API.
Currently Label Detection does not provide this functionality. We are always looking at ways to enhance the API
After two years, its the same. I am facing similar challenges and I am thinking of opting other solutions. I think custom solutions like TensorFlow object detection API or DarkNet YOLO object API will do this job very easily.
Related
I am new to C++ and the use of the Pointcloud Library PCL (https://pointclouds.org/). At the moment I am able to generate a viewer of the point cloud by using the <pcl::visualization::PCLVisualizer> and I was wondering if it would be possible to save an image of the current viewer "view".
Imagine I have a picture like the following:
At the moment I just take a screenshot manually of what it looks like. However, since I will be processing many point clouds, I would like to have a way to convert this "viewer view" to an image.
Of course I posted the question after researching online. However, I could not find the super easy solution available already in PCL.
You just need to use the function:
void pcl::visualization::PCLVisualizer::saveScreenshot ( const std::string & file )
Documentation here
I hope this will be helpful for someone else in the same situation.
I'm still very new to the world of machine learning and am looking for some guidance for how to continue a project that I've been working on. Right now I'm trying to feed in the Food-101 dataset into the Image Classification algorithm in SageMaker and later deploy this trained model onto an AWS deeplens to have food detection capabilities. Unfortunately the dataset comes with only the raw image files organized in sub folders as well as a .h5 file (not sure if I can just directly feed this file type into sageMaker?). From what I've gathered neither of these are suitable ways to feed in this dataset into SageMaker and I was wondering if anyone could help point me in the right direction of how I might be able to prepare the dataset properly for SageMaker i.e convert to a .rec or something else. Apologies if the scope of this question is very broad I am still a beginner to all of this and I'm simply stuck and do not know how to proceed so any help you guys might be able to provide would be fantastic. Thanks!
if you want to use the built-in algo for image classification, you can either use Image format or RecordIO format, re: https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html#IC-inputoutput
Image format is straightforward: just build a manifest file with the list of images. This could be an easy solution for you, since you already have images organized in folders.
RecordIO requires that you build files with the 'im2rec' tool, re: https://mxnet.incubator.apache.org/versions/master/faq/recordio.html.
Once your data set is ready, you should be able to adapt the sample notebooks available at https://github.com/awslabs/amazon-sagemaker-examples/tree/master/introduction_to_amazon_algorithms
I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download
I'm helping a professor working on a satellite image analysis project, we need 800 images stitching together for a square area at 8000x8000 resolution each image from Google Map, it is possible to download them one by one, however I believe there must be a way to write a script for batch processing.
Here I would like to ask how can I implement this using shell or python script, and how could I download images by google maps url ?
Here is an example of the url:
https://maps.google.com.au/maps/myplaces?ll=-33.071009,149.554911&spn=0.027691,0.066047&ctz=-660&t=k&z=15
However I'm not able to analyse the image direct download link from this.
Update:
Actually, I solved this problem, however due to Google's intention, I would not post the way for doing this.
Have you tried the Google static maps API?
You get 25 000 free requests, but you're limited to 640x640, so you'll need to do ~160 requests at a higher zoom level.
I suggest downloading the images as so: Downloading a picture via urllib and python
URL to start with: http://maps.googleapis.com/maps/api/staticmap?center=-33.071009,149.554911&zoom=15&size=640x640&sensor=false&maptype=satellite
It's been long time since I solved the problem, sorry for the delay.
I posted my code to github here, plz star or fork as you like :)
The idea is to use a virtual web browser at a very high resolution to load the google map page, then do the page capture. The defect is there will be google symbol all around on each image, the solution is to apply over sampling on the resolution on each of the image, then use the stiching technique to stick them all together.
I'm looking to create something loosely similar to the Google Image Charts API, where by I can construct a query string, and an image is returned.
For example:
http://chart.apis.google.com/chart?cht=p3&chs=550x250&chd=t:73,13,10,3,1&chco=80C65A,224499,FF0000&chl=Chocolate|Puff+Pastry|Cookies|Muffffffins|Gelato
I was wondering, what would the best way to achieve this be?
Does anybody have any info on how the Google Image Charts API works "under the hood" ?
Are there any libraries that provide dynamic image generation already?
You can use a server side script to read the query string parameters, generate the image and output the content using the image MIME type.
If you are on PHP, you can use an image library like GD to do this. More information here: http://us3.php.net/manual/en/book.image.php