Identify a fixed phrase in an mp3 or wav file

Identify a fixed phrase in an mp3 or wav file - mp3

I would like to scan a mp3/wav for the phrase "Congratulations, you have successfully completed...". This is from an IVR recording in which the phrase is pre-recorded and recited back to the customer as they complete an action.
What I am not sure about is:
1) How to get a digital representation of the phrase I am seeking?
2) How to compare that digital representation to several thousand wav/mp3 files to see if it exists within the file.

Related

exiftool shows incorrect Duration for MP3. How does that happen?

Downloaded seven MP3 files from a website. exiftool says the Duration is two minutes.
Opened it in an audio editor and find that it is actually four minutes.
Opened a (non-downloaded) MP3 file in the same editor, duration different from two or four.
Copied all audio from the downloaded file and pasted over the other audio. Editor shows the other changing to four minutes.
exiftool shows the second file has a duration of four minutes.
Same behavior (different numbers) for the other six downloaded files. First one is the only one where the difference was approximately a factor of two (so it's not stereo vs. mono)
Is Duration an ID3 tag that can be falsified, as opposed to being measured from the actual audio?

There should be an (approx) in the exiftool output after the Duration value. Duration is not an embedded tag, it's a value that's calculated on the fly by exiftool. If you add the -G (-groupNames) option to your command, you'll see that it is part of the Composite tag group. If you check the listing there, you'll see the tags that exiftool uses to calculate the Duration. It's most likely the group that includes ID3Size and MPEG:AudioBitrate.
Exiftool doesn't read and parse the stream data, which the audio editor will do and get a more accurate result. Odds are there is something incorrect about the header for your file.
Related post on the exiftool forums.

Google vision fails to identify numbers in a table

I am aiming to extract a table of text & numbers from a document using Google's Vision API. The results are far from satisfactory - Vision seems to completely miss the contents of 2 columns in my table.
Recognition rate improves when I manually erase the column border but I cannot pre-process each file which I intend to process.
Cropping the column text, MOVing the column text to a new location dont seem to make a difference.
Increasing the brightness/contrast of the document seems to help a little bit but not enough to be satisfactory.
I'm using the "Try-It" web interface at cloud.google.com/vision/docs/drag-and-drop to test all my experiments... It mimics the results of running my code on the document.
I'm uploading JPG images, created from scanned PDF originals (converted in photoshop).
I dont have any code since the problem shows up just using the web-tool.
many of the numbers are single digits but many are not.
The numbers missed are 1,3,4,8,500,1,16,100,10
Other columns (which ARE read ok) contain decimal numbers
Perhaps there are some tricks/tips that I've not found that I can use?

Byte offset notation for a 900 mb XML file

I am building a search engine in c++ (using a 900 mb rapidXML file that contains pages from wikiBooks) and my objective is to parse the ~900 MB XML document using rapidXML so that the user can just enter one word in the search bar and receive the ACTUAL XML DOCUMENTS that contain that word (link).
I need to figure out how to store index of each token (aka each word within of each document) so that when the user wants to see the page numbers a certain word occurs, I can jump to that specific page.
I have been told to do the "file io offset" (where you store where in the file a word is so that you can jump to it) and I am having a hard time understanding what to do.
Questions:
Do I use the "seekg" and "tellg" in the istream library (to find the byte location that each document PAGE is stored at)? And if so, how?
How do I return the actual document back to the user (that contains many occurances of the searched word)?

File Binary vs Text

Are there some situation where I have to prefer binary file to text file? I'm using C++ as programming language?
For example if I have to store some large text file is it better use text file or binary file?
Edit
The file for the moment has no requirment to be readable from human. Are some performance difference, security difference and so on?
Edit
Sorry for the omit other the requirment (thanks to Carey Gregory)
The record to save are in ascii encoding
The file must be crypted ( AES )
The machine can power off any time. So I've to try to prevents errors.
I've to know if the file change outside the program, I think I'll use a sha1 digest of the file.

As a general rule, define a text format, and use it. It's much
easier to develop and debug, and it's much easier to see what is
going wrong if it doesn't work.
If you find that the files are becoming too big, or taking to
much time to transfer over the wire, consider compressing them.
A compressed text file is often smaller than you can do with
binary. Or consider a less verbose text format; it's possible
to reliably transmit a text representation of your data with
a lot less characters than XML uses.
And finally, if you do end up having to use binary, try to chose
an existing format (e.g. Google's protocol blocks), or base your
format on an existing format. Just remember that:
Binary is a lot more work than text, since you practically
have to write all of the << operators again, including those
in the standard library.
Binary is a lot more difficult to debug, because you can't
easily see what you've actually done.
Concerning your last edit:
Once you've encrypted, the results will be binary. You can
use a text representation of the binary (base64 or some such),
but the results won't be any more readable than the binary, so
it's not worth the bother. If you're encrypting in process,
before writing to disk, you automatically lose all of the
advantages of text.
The issues concerning powering off mean that you cannot use
ofstream directly. You must open or create the file with the
necessary options for full transactional integrity (O_SYNC as
a flag to open under Unix). You must write each record as
a single write request to the system.
It's always a good idea to have a checksum, just in case. If
you're worried about security, SHA1 is a good choice. But keep
in mind that if someone has access to the file, and wants to
intentionally change it, they can recalculate the SHA1 and
insert the new value as well.

All files are binary; the data within them is a binary representation of some information. If you have to store a large amount of text then the file will contain the binary representation of that text. The difference between a "binary file" and a "text file" is that creating the latter involves converting data to a text form before saving it. This is typically done so humans can read it.
The distinction between binary and text is usually made when storing data that is for computer consumption. Typically this data would not be text - it might be a list of numerical configuration values, for example: 1, 2, 3.
If you stored this in text format, your file could contain a list of human-readable numbers, and if you opened the file in Notepad you might see one number per line. But what you're actually saving here is not the binary values 1, 2, 3 - you're saving a string "1\n2\n3\n". Note that this string is 6 characters long, and the binary values (assuming ASCI) would actually be 49, 10, 50, 10, 51, 10!
If the same data were stored in binary format, you would store the numbers in the smallest useful space, and write the file as individual bytes that can often only be read by the code that created them. Opening this file in Notepad would likely display junk characters, because the data makes no sense as text. In this case you would be saving a byte array with actual values { 1, 2, 3 } - or even a single byte with the three values embedded. This could be much smaller than the human-readable equivalent.

Binary files store a sequence of bytes like all other files. You can store numeric values like integers per 4 bytes, characters per single byte, or even serialized class objects and anything you want.
When you know how to read a binary file (ie. you know what is stored in it) you can extract all the information from it. However, text files use text encodings like UTF8, ANSI etc. and they are intended to encode text characters to be processed by text editors.

Binary files are for machines only to interpret, whereas a text file, a human can also open and interpret its content.
So it depends whether you want your file to be readable by a human or not.

It depends on a lot of factors. I can think of two right now:
Do you require the file to be readable by humans?
Is compression a factor? A 10-digits number will take at least 10 bytes as text, but might take as little as four or two as binary.

All data is binary. You always need a machine to interpret it for you. Even if the data is compressed like protocol buffers, Avro, Thrift etc, it is binary, and if it is uncompressed, it is still binary. If you want to read protocol buffers by notepad, there is a two step process. Uncompress, and read. In case of text, this step of uncompressing is not needed. Same is case with encrypted. First unencrypted, and then read. Humans cannot read binary (as some commenters are mentioning). We still need notepad to interpret and display binary (so called text).

All data stored in a text file are human-readable graphic characters. Each line of data ends with a new line character.
In case of a binary file - data is stored in the same format as they are stored in the memory. There are no lines or new line characters. There is an end of file marker.
Moreover binary files show more efficiency for memory as they are stored in zeros and one's.

How to determine .mp3 bit rate without downloading it?

I have a list of .mp3 files over the web and I would like to get the highest quality file.
Quality in multimedia files equals the bit rate of them.
The bit rate itself should be found in the file's headers. If not, length of the audio track could be used too. (Filesize / Track Length = Bit Rate)
These things would be easy if I would have these files locally, but I would like to fetch this information over HTTP and determine which file has the highest quality.
Can I get an audio track's length out of HTTP headers? If not, is it possible to fetch only the bits that describes the length/bit rate instead of downloading the whole file?
I'm writing the code in python, but the question is quite general so I'm not tagging it as a python question.

Assuming that the remote server is behaving nicely, you could issue a HEAD request to the file and check the contents of the Content-Length header field. It doesn't give you track length or bit rate but you can get the size of the file.
EDIT: MP3s consist of multiple frames, each of which can be of a different bit rate (VBR). Track length is calculated from the bit rate of each of these frames, rather than the length itself being stored. If you want the bit rate reliably, you'd need two get the whole file and get the bit rate of each of the frames. It may be possible to grab the first few KB of the file and read the bit rate from the first frame, but this is not always at the same point in the file (e.g. due to position of ID3 tag etc.).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js