Is Product Scanner better than Barcode Scanner? - computer-vision

For example, scanning the wine to tell you its name by Vivino.
But, seriously, scanning its barcode is faster and easier with high accuracy. Why do they design to use AI to scan the product by computer vision (object identification)? Is it necessary, mate?

This is not a question that is appropriate for this site.
Your question is likely to be a statement of opinion.
The answer is, not which is better, but the issue of segregation.
The product scanner does not replace the barcode scanner. At least soon.
For example, there are products that do not have source marking and are difficult to apply in-store marking.
In order to save labor, reduce time, standardize, and mechanize these products, product scanners that use AI will be good parts.
In addition, even for products that already have barcodes, it is possible to acquire the information of the target actual product and display it additionally.
At first, every product has a part that seems useless and exaggerated.
However, according to market evaluations, some of them will disappear and some will be sophisticated and essential.
That's always happening and your question seems to be one of them.

Related

Specific topics on Tensorflow for CNN

I have a mini project for my new course in Tensorflow for this semester with random topics. Since I have some background on Convolution Neuron Network, I intend to use it for my project. My computer can only run CPU version of TensorFlow.
However, as a new bee, I realize that there are a lot of topics such that MNIST, CIFAR-10, etc, thus I don't know which suitable topic I should pick out from them. I only have two weeks left. It would be great if the topic is not too complicated but too not easy for study because it matchs my intermediate level.
In your experience, could you give me some advice about the specific topic I should do for my project?
Moreover, it would be better if in this topic I can provide my own data to test my training, because my professor said that it is a plus point to get A grade in my project.
Thanks in advance,
I think that to answer this question you need to properly evaluate the marking criteria for your project. However, I can give you a brief overview of what you've just mentioned.
MNIST: MNIST is a Optical Character Recognition task for individual numbers 0-9 in images size 28px square. This is considered the "Hello World" of CNNs. It's pretty basic and might be too simplistic for your requirements. Hard to gauge without more information. Nonetheless, this will run pretty quickly with CPU Tensorflow and the online tutorial is pretty good.
CIFAR-10: CIFAR is a much bigger dataset of objects and vehicles. The image sizes are 32px square so individual image processing isn't too bad. But the dataset is very large and your CPU might struggle with it. It takes a long time to train. You could try training on a reduced dataset but I don't know how that would go. Again, depends on your course requirements.
Flowers-Poets: There is the Tensorflow for Poets re-training example which might not be suitable for your course, you could use the flowers dataset to build your own model.
Build-your-own-model: You could use tf.Layers to build your own network and experiment with it. tf.Layers is pretty easy to use. Alternatively you could look at the new Estimators API that will automate a lot of the training processes for you. There are a number of tutorials (of varying quality) on the Tensorflow website.
I hope that helps give you a run-down of what's out there. Other datasets to look at are PASCAL VOC and imageNet (however they are huge!). Models to look at experimenting with may include VGG-16 and AlexNet.

Amazon Alexa: Certain utterances not working

I have two utterances:
SampleQuestion what's the {Currency} {Currencytwo} rate
SampleQuestion what is the {Currency} {Currencytwo} rate
The first one works (what's), while the second one doesn't (what is)
What could be a possible reason?
Voice recognition is something that is very hard to test. What is and is not recognized can vary depending on the person speaking, background noise, etc. There are a few things to try to debug your problem.
In the companion app Alexa often types "what it thought it heard". You might check this to see what Alexa thinks it heard when it didn't recognize something.
You can type specific phrases into the simulator on the development page for your skill. This can test specific renderings. However, because it bypasses the voice recognition layer it is only good for debugging the specifics of your interaction model.
Alexa performs poorly when you have two lots that are not separated by static text. You might consider if you can re-phrase your utterance to have a separating words between the two, or to ask for it as two separate utterances.
If either of your slots are custom slots, you might consider what their content is. Alexa doesn't recognize things one word at a time. It looks at the entire sequence of sounds holistically. It matches each possibility against what it heard, and picks the one with the highest confidence value. Since currencies are often foreign words, that might be perturbing things. Try cutting down your list and see if that improves things.

Audio Subtitle Transcription - C++

I'm on a project that among other video related tasks should eventually be capable of extracting the audio of a video and apply some kind of speech recognition to it and get a transcribed text of what's said on the video. Ideally it should output some kind of subtitle format so that the text is linked to a certain point on the video.
I was thinking of using the Microsoft Speech API (aka SAPI). But from what I could see it is rather difficult to use. The very few examples that I found for speech recognition (most are for Text-To-Speech which mush easier) didn't perform very well (they don't recognize a thing). For example this one: http://msdn.microsoft.com/en-us/library/ms717071%28v=vs.85%29.aspx
Some examples use something called grammar files that are supposed to define the words that the recognizer is waiting for but since I haven't trained the Windows Speech Recognition thoroughly I think that might be adulterating the results.
So my question is... what's the best tool for something like this? Could you provide both paid and free options? Well the best "free" (as it comes with Windows) option I believe it's SAPI, all the rest should be paid but if they are really good it might be worth it. Also if you have any good tutorials for using SAPI (or other API) on a context similar to this it would be great.
On the whole this is a big ask!
The issue with any speech recognition system is that it functions best after training. It needs context (what words to expect) and some kind of audio benchmark (what does each voice sound like). This might be possible in some cases, such as a TV series if you wanted to churn through hours of speech -separated for each character- to train it. There's a lot of work there though. For something like a film there's probably no hope of training a recogniser unless you can get hold of the actors.
Most film and TV production companies just hire media companies to transcribe the subtitles based on either direct transcription using a human operator, or converting the script. The fact that they still need humans in the loop for these huge operations suggests that automated systems just aren't up to it yet.
In video you have a plethora of things that make you life difficult, pretty much spanning huge swathes of current speech technology research:
-> Multiple speakers -> "Speaker Identification" (can you tell characters apart? Also, subtitles normally have different coloured text for different speakers)
-> Multiple simultaneous speakers -> The "cocktail party problem" - can you separate the two voice components and transcribe both?
-> Background noise -> Can you pick the speech out from any soundtrack/foley/exploding helicopters.
The speech algorithm will need to be extremely robust as different characters can have different gender/accents/emotion. From what I understand of the current state of recognition you might be able to get a single speaker after some training, but asking a single program to nail all of them might be tough!
--
There is no "subtitle" format that I'm aware of. I would suggest saving an image of the text using a font like Tiresias Screenfont that's specifically designed for legibility in these circumstances, and use a lookup table to cross-reference images against video timecode (remembering NTSC/PAL/Cinema use different timing formats).
--
There's a bunch of proprietary speech recognition systems out there. If you want the best you'll probably want to license a solution off one of the big boys like Nuance. If you want to keep things free the universities of RWTH and CMU have put some solutions together. I have no idea how good they are or how well they might be suited to the problem.
--
The only solution I can think of similar to what you're aiming at is the subtitling you can get on news channels here in the UK "Live Closed Captioning". Since it's live, I assume they use some kind of speech recognition system trained to the reader (although it might not be trained, I'm not sure). It's got better over the past few years, but on the whole it's still pretty poor. The biggest thing it seems to struggle with is speed. Dialogue is normally really fast, so live subtitling has the extra issue of getting everything done in time. Live closed captions quite frequently get left behind and have to miss a lot of content out to catch up.
Whether you have to deal with this depends on whether you'll be subtitling "live" video or if you can pre-process it. To deal with all the additional complications above I assume you'll need to pre-process it.
--
As much as I hate citing the big W there's a goldmine of useful links here!
Good luck :)
This falls into the category of dictation, which is a very large vocabulary task. Products like Dragon Naturally Speaking are amazingly good and that has a SAPI interface for developers. But it's not so simple of a problem.
Normally a dictation product is meant to be single speaker and the best products adapt automatically to that speaker, thereby improving the underlying acoustic model. They also have sophisticated language modeling which serves to constrain the problem at any given moment by limiting what is known as the perplexity of the vocabulary. That's a fancy way of saying the system is figuring out what you're talking about and therefore what types of words and phrases are likely or not likely to come next.
It would be interesting though to apply a really good dictation system to your recordings and see how well it does. My suggestion for a paid system would be to get Dragon Naturally Speaking from Nuance and get the developer API. I believe that provides a SAPI interface, which has the benefit of allowing you to swap in the Microsoft speech or any other ASR engine that supports SAPI. IBM would be another vendor to look at but I don't think you will do much better than Dragon.
But it won't work well! After all the work of integrating the ASR engine, what you will probably find is that you get a pretty high error rate (maybe half). That would be due to a few major challenges in this task:
1) multiple speakers, which will degrade the acoustic model and adaptation.
2) background music and sound effects.
3) mixed speech - people talking over each other.
4) lack of a good language model for the task.
For 1) if you had a way of separating each actor on a separate track that would be ideal. But there's no reliable way of separating speakers automatically in a way that would be good enough for a speech recognizer. If each speaker were at a distinctly different pitch, you could try pitch detection (some free software out there for that) and separate based on that, but this is a sophisticated and error prone task.) The best thing would be hand editing the speakers apart, but you might as well just manually transcribe the speech at that point! If you could get the actors on separate tracks, you would need to run the ASR using different user profiles.
For music (2) you'd either have to hope for the best or try to filter it out. Speech is more bandlimited than music so you could try a bandpass filter that attenuates everything except the voice band. You would want to experiment with the cutoffs but I would guess 100Hz to 2-3KHz would keep the speech intelligible.
For (3), there's no solution. The ASR engine should return confidence scores so at best I would say if you can tag low scores, you could then go back and manually transcribe those bits of speech.
(4) is a sophisticated task for a speech scientist. Your best bet would be to search for an existing language model made for the topic of the movie. Talk to Nuance or IBM, actually. Maybe they could point you in the right direction.
Hope this helps.

Fuzzy queries to database

I'm curious about how works feature on many social sites today.
For example, you enter list of movies you like and system suggests other movies you may like (based on movies that like other people who likes the same movies that you). I think doing it straight-sql way (join list of my movies with movies-users join with user-movies group by movie title and apply count to it ) on large datasets would be just impossible to implement due to "heaviness" of such query.
At the same time we don't need exact solution, approximate would be enough. I wonder is there way to implement something like fuzzy query to traditional RDBMS that would be fast to execute but has some infelicity. Or how such features implemented on real systems.
that's collaborative filtering, or recommendation
unless you need something really complex the slope one predictor is one of the more simple ones it's like 50 lines of python, Bryan O’Sullivan’s Collaborative filtering made easy, the paper by Daniel Lemire et al. introducing "Slope One Predictors for Online Rating-Based Collaborative Filtering"
this one has a way of updating just one user at a time when they change without in some cases for others that need to reprocess the whole database just to update
i used that python code to do predict the word count of words not occurring in documents but i ran into memory issues and such and i think i might write an out of memory version maybe using sqlite
also the matrix used in that one is triangular the sides along the diagonal are mirrored so only one half of the matrix needs to be stored
The term you are looking for is "collaborative filtering"
Read Programming Collective Intelligence, by O'Reilly Press
The simplest methods use Bayesian networks. There are libraries that can take care of most of the math for you.

Data mining/BI/Analytics/ML : Can a mathematically challenged person move into this field?

I have recently become interested in the field(s) of data mining and machine learning. The idea of going through huge datasets and trying to correlate hidden patterns and trends is fascinating. So far I have done the following
Used Weka to load simple data sets and generate decision trees
Continously read books, wiki's, blogs and SO on the same
Started playing around SQL Server DM and Python API's
Have an idea on options of freely available data sets on the web(freedb, UN etc)
What is hindering me is the minute I try to go beyond classification/associsciation and into priori/apriori algorithms I am stuck because understanding mathematical equations and logic is not(to put it modestly) one of my strong points.
So my question would be are there anybody in the Data mining field(in the role of product owner or builder) who are not naturally mathematicians? If so, how would you approach in undestanding the field since free tools like Weka and Rapid-miner both expects some mathematical/statistical background?
P.S: Excuse me if I made some mistake in the query like mixing Data mining and analytics when they are separate as I am still getting my feet wet. I hope my core question is clear.
Well, being able to do some analysis of what the data mining models are showing is absolutely vital. However, these days all of the math and statistics are taken care of by the data mining models. You don't need to understand the math behind them (although it helps).
For example, you can look through the SQL Server Analysis Services Data Mining Algorithms and see that even the technical reference is how to use these implementations, not how to recreate them.
If you can understand the business cases and you can understand what the data mining is telling you, there's really no need to delve into the math behind it.
As for some of the free tools, I've never used them, so I can't speak to them. However, I'm a big fan of SSAS and those data mining models, which don't require an extensive mathematical background.
As Eric says, and as far as you only intend to use the existing algorithms and APIs and make sense from them, I don't see problems with the required math/statistics skill set (anyway, you'll need some previous basic knowledge/level).
Now, if you intend to do research or if you want to improve or modify existing algorithms, or why not, create your own algorithms, then math and statistics is a MUST. I just started doing some research in this area, and I'm still trying to fill my skills gap =)