Sound processing in C++ on Windows - a nudge in the right direction - c++

I want to write a simple sound editor with a very specific purpose: cutting and re-gluing an audio file (which will contain spoken prose) in such a way that each sentence is repeated N times. (This is for foreign language learning.)
I don't want to use an existing sound editor because I would like to tailor the GUI specifically for this narrow task, reducing the amount of motions and clicks to a minimum.
Unfortunately I don't have any experience whatsoever in working with sound. I was wondering about recommendations for C++ libraries/APIs on Windows that would enable me to:
read in an audio file (mp3 or wav)
select a portion from "here" to "here"
listen to it
append it to a new file
write the whole thing out as mp3 (or at least wav)
Also any general thoughts are very welcome (this is completely unknown territory for me so if you have had any stumbling blocks and mistakes you don't want others to repeat, please do share).

I have previously been quite happy about http://www.portaudio.com/, which is a nice platform independent wrapper to the sound hardware (low latency recording and playback).
For reading / writing mp3s I have used LAME http://lame.sourceforge.net which is also supported support on pretty much all popular platforms.
You might also want to check out the source code of Audacity http://audacity.sourceforge.net/, which does what you want and a lot more.

Related

How to package the assets of a game and allow only the engine to be able to read?

I'm developing an engine for a 2D game in C ++ and for some days I've been looking for a way to protect the images and audio of my future game. I know there is no 100% protection and that someone would be able to open these files, but I mean the regular user who just installed the game, prevent it from modifying the sprites, change the sound, overwrite the xml files with game map data.
I downloaded some games made in Unity and noticed that a .assets extension is used, in Diablo 2 it is used .ma0, .mpq, .data, in FEZ .pak, in Super Meat Boy only a .tp file. In other words, you can not open and edit any of these files in a text editor or unzip with winrar, they offer a minimum level of protection. How is this done? Do I have to create my own binary file format or is there any program that makes it easier to work?
You can't.
That "minimum level of protection" is even more minimal than you think. You can open those files in a hex editor and hack away at them. This activity is something that has been commonplace for many decades.
You can encrypt the data, but since the key must be stored in your application and the user has a copy of your application, that can be extracted and/or changed too.
You can add a digital signature to prevent people from modifying the assets then using them ("modding") but, again, this can be altered in your application.
You can obfuscate the assets by shipping them in a proprietary format, but this is usually done purely for functional reasons because, again, someone will reverse engineer them.
Once a thing is on someone else's computer, you have lost control of that thing.
There are actually multiple questions here, iirc:
How can users be prevented from reading game assets?
How can users be prevented from manipulating game assets?
What file format can be used to store game assets?
If you're using an existing engine, it probably has some support for this, and if it is sufficient for your purposes you only need to learn how to use it.
If you need to roll your own, you need to define your requirements clearly and pick a solution which fulfills them. For asset storage, a ZIP based format is probably easiest to handle, all languages have some form of support for that. To protect integrity, you should use cryptographic algorithms: digital signatures to detect tampering, and encryption to prevent reading. These will probably slow down the opening of assets a little bit, but in most cases this should be acceptable.

adding "read aloud" feature to book app written in Cocos2D

I created a book app and used Cocos2D and physics engine (Chipmunk) to create it. I would like to add "read aloud" feature to it.
So far I found instructions/books and tutorials how to add read aloud feature when book is created with iBook Author (but I couldn't use iBook Author due to some limitations) using Epub3 and SMIL.
I also found a good tutorial from J. Shapiro how to make narrated book using AVSpeechSynthesizer. This helps, only that I would like to use recorded voice, rather than synthesized sound. I don't know if this approach can be modified to do so?
I also know how it can be done in Sprite Kit framework.
The only info that I couldn't find is how to add "read aloud" feature to the app written using Cocos2D. Could it be done within SimpleAudioEngine, or it can be combined with some other engine (possibly from Sprite Kit framework)?
I would appreciate very much if somebody can give me some references/pointers or tutorial links where to look for some answers how to add this feature.
Thanking you in advance.
I would like to use recorded voice, rather than synthesized sound
Good. Add your voice recording audio files (caf, wav or mp3 format) to the project. Play it back at the appropriate time using:
[[SimpleAudioEngine sharedEngine] playEffect:#"someVoiceRecordingFile.wav"];
Define what read aloud means to you because I find that a lot of terms, especially semi-vague ones like this, are used differently depending on who is using it.
When you say read aloud book do you essentially mean a digital storybook that reads the story to you by simply playing narration audio? I've created dozens of these and what you are asking has multiple steps depending on what features you are going for in your book. If you mean simply playing audio and that is it, then yes you could do that in cocos2d using SimpleAudioEngine (as one option) but I assume you already knew that which is why this question has a tab bit of vagueness to it. Either way you probably wouldn't want to play narration as an effect but rather stream it. To do that along with background music you'd stream background music via the left channel and narration via the right. You can easily add a method to SimpleAudioEngine to make this nice and neat. To get you started something similar to this can be used to access the right channel:
CDLongAudioSource* sound = [[CDAudioManager sharedManager] audioSourceForChannel:kASC_Right];
if ([sound isPlaying])
{
[sound stop];
}
[sound load:fileName];
Also use the proper settings and recommended formats for streaming audio such as aifc (or really all audio in general). Although I believe you can stream mp3 without it being decompressed first, the problem is with timing. If you are using highlighted text or looping audio then aifc is the better option. Personally I've never had a reason to use mp3. Wav with narration is something I'd avoid even if just for the file size increase. If the mp3 is decompressed even for streaming (which I'm not sure if it is off the top of my head) then you'd have a huge spike in memory that will be both highly unwanted and at times down right bad.
There are many other things that can go into it but those are the basic first steps. If you want to do things like highlighted text, per-word animations, etc then that will take more work of course and you'd need to be comfortable with cocos2d, SpriteKit, or whatever you decide to use. I'll be doing a tutorial series on it one day soon so I'll cover all of that stuff.
On the other hand, if you are talking about recording someone's voice and having it playback i.e. a mother recording herself reading the story so her child can hear her voice whenever they are using your app, then you'd simply record the audio like you would any other piece of audio, save it to the device, and play it back when the page is displayed in the proper reading mode (or whatever you personally call it). One place to look is the AVAudioRecorder that is part of the AVFoundation framework. Simply Google "iOS audio recording" for examples if you need them.

How do you code into an mp3 file?

I'm interested in creating a drm type program. I want to put code into an mp3 file that checks online to see if it is licensed out. The online aspect is not something I'm concerned with at the moment, but I've looked online for some resource on doing this, but I've found nothing really that useful. There are people talking about viruses, but I want this to be intentional and not malicious. Simply when you play the the mp3 it quickly checks online to see if the license is actually allowed to play. If it is it plays if it isn't it gives an error and says either you need to log on or buy the song.
I'm pretty sure this is impossible. mp3 files can't contain executable code. They could contain byte data that could represent code, but the mp3 player would need to interpret it as such and execute it instead of trying to play it as music, which no player does. You would have to program your own player to do this, but since there's no way to constrain an mp3 file to only one player (again, because they can't contain code), you'd have to create your own competing standard. Further, there would be nothing preventing someone from converting this mp3 to a normal non-constrained mp3 and sharing it.
From a less technical standpoint, as a user (who despises DRM) it would irk me to no end if an mp3 was checking up on whether I was "allowed" to have it, and even more so if I was required to be online in order to listen to it, to the point that I would just delete it and never buy/download/steal anything in that format again.
(Does this count as an answer? It was too long for a comment.)
Its surely possible but very difficult. Im not very experienced in coding but using discord i was able to turn an mp3 file into a txt file, But the problem is that it becomes strange characters

reading mp3 file for game development

I am currently creating a game. My game will use music from an mp3 file that the user sends in in order to make decisions on where to place things, how fast the level moves, etc. I am fairly new at this, I have been reading information about mp3. Currently I have found all the frames in the mp3 file that I am using. I don't really know where to go from here. What I want to do is measure the frequencies of the sound wave of the music at certain times (like every sec) and then based on that frequency, do what I need to for the game. I don't know whether I should decode the mp3, that looks like a lot of work and I don't want to do that if I don't have 2 or if I can just read the bytes in the frame and convert them without decoding anything. I am developing this in c#, using the game engine FlatRedBall. I am not using any libraries. I am also planning on selling this game so I would like to avoid using other people's code if I can avoid it. Please someone help me, I just need a direction to go from here. I know how to parse the header and calculate the framelength, I just don't know the next step in what I want to do...
Convert your music to .ogg format which is free and use free library to play it.
Note: I was going to post this as a comment but it quickly grew too big. :)
Writing your own MP3 enconder/decoder is probably going to take a good ammount of effort; effort which would probably be better spent on your game itself. Therefore, is possible, I would be all means try to use an open source library.
That said, most good MP3 libraries are LGPL/GPL licensed. This means you can use it in a commercial setting, as long as you dynamically link to it. Also the SDL Mixer library, as of version 1.2.12, supports MP3s and is under a more permissive zlib license, but since you mention C# I don't know if stable and up-to-date bindings are available. Also since your project isn't written in SDL to begin with, it might be hard to integrate it.
Also, as #pro_metedor hinted, perhaps using a more open format could help in licensing issues. In general, OGG achieves better compression than MP3, which is a plus for things like download size, bandwidth/resource usage, etc.
Just shop around for a while, and try to be a little flexible. I'm sure you'll find something nice! :)

Extract and analyse sound from mp3 files

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew
I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#