wowza apple hls event tracking - wowza

hi i am using wowza streaming engine 4 , to stream smil file
i am able to trace events when file play on flash,
and gether informantions like , which file play , time etc,
in onConnect() event,
precisely i want to get which file is played from my smil file.
but in case of apple hls streaming when i try to get file name in onHTTPSessionDestroy() method , eg.
public void onHTTPSessionDestroy(IHTTPStreamerSession httpSession) {
String streamName = httpSession.getStreamName();
}
i only get the name of smil file , not the actual file played .
is it possible to get the played file info in wowza hls steaming

New Api is introduce in wowza 4.1.1
public void onHTTPStreamerRequest(IHTTPStreamerSession httpSession, IHTTPStreamerRequestContext reqContext)
we can get check which bit-rate is played like media_w1577403587_b4000000_1.ts
_b is the bitrate

Related

Google Cloud Platform: Speech to Text Conversion of Large Media Files

I'm trying to extract text from mp4 media file downloaded from youtube. As I'm using google cloud platform so thought to give a try to google cloud speech.
After all the installations and configurations, I copied the following code snippet to get start with:
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=16000, language_code='en-US')
response = client.long_running_recognize(config, audio)
But I got the following error regarding file size:
InvalidArgument: 400 Inline audio exceeds duration limit. Please use a
GCS URI.
Then I read that I should use streams for large media files. So, I tried the following code snippet:
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
#In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)for chunk in stream)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
responses = client.streaming_recognize(streaming_config, requests)
But still I got the following error:
InvalidArgument: 400 Invalid audio content: too long.
So, can anyone please suggest an approach to transcribe an mp4 file and extract text. I don't have any complex requirement of very large media file. Media file can be 10-15 mins long maximum. Thanks
The error message means that the file is too big and you need to first copy the media file to Google Cloud Storage and then specify a Cloud Storage URI such as gs://bucket/path/mediafile.
The key to using a Cloud Storage URI is:
RecognitionAudio audio =
RecognitionAudio.newBuilder().setUri(gcsUri).build();
The following code will show you how to specify a GCS URI for input. Google has a complete example on github.
public static void syncRecognizeGcs(String gcsUri) throws Exception {
// Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
try (SpeechClient speech = SpeechClient.create()) {
// Builds the request for remote FLAC file
RecognitionConfig config =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();
// Use blocking call for getting audio transcript
RecognizeResponse response = speech.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
for (SpeechRecognitionResult result : results) {
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
}
}

Google Cloud Speech-to-Text (MP3 to text)

I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.
const speech = require('#google-cloud/speech');
const config = {
encoding: 'LINEAR16', //AMR, AMR_WB, LINEAR16(for wav)
sampleRateHertz: 16000, //16000 giving blank result.
languageCode: 'en-US'
};
MP3 is now supported in beta:
MP3 Only available as beta. See RecognitionConfig reference for details.
https://cloud.google.com/speech-to-text/docs/encoding
MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#AudioEncoding
You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:
https://en.wikipedia.org/wiki/44,100_Hz
To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:
RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8
https://godoc.org/google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1
According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),
Only the following formats are supported:
FLAC
LINEAR16
MULAW
AMR
AMR_WB
OGG_OPUS
SPEEX_WITH_HEADER_BYTE
Anything else will be rejected.
Your best bet is to convert the MP3 file to either:
FLAC. .NET: How can I convert an mp3 or a wav file to .flac
Wav and use LINEAR16 in that case. You can use NAudio. Converting mp3 data to wav data C#
Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.
I had the same issue and resolved it by converting it to FLAC.
Try converting your audio to FLAC and use
encoding: 'FLAC',
For conversion, you can use sox
ref: https://www.npmjs.com/package/sox
now, the mp3 type for spedch-to-text,only available in module speech_v1p1beta1 ,you must post your request for this module,and you will get what you want.
the encoding: 'MP3'
python example like this:
from google.cloud import speech_v1p1beta1 as speech
import io
import base64
client = speech.SpeechClient()
speech_file = "your mp3 file path"
with io.open(speech_file, "rb") as audio_file:
content = (audio_file.read())
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=44100,
language_code="en-US",
)
response = client.recognize(config=config, audio=audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
print(response)
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u"Transcript: {}".format(result.alternatives[0].transcript))
result

Telegram Bot API answerInlineQuery sends voice files as documents

I'm trying to implement inline mode in Telegram bot answering with list of voice messages. Using nginx for serving .ogg files encoded with OPUS and using them in voice_url in InlineQueryResultVoice. After doing some research, I discovered that it worked OK for voice files with size > ~8KB. However smaller files are sent as documents and aren't playable in Telegram client (but they are still correct .ogg audios). Also links are playable in browser.
I use ffmpeg -i <mp3_file> -acodec libopus -b:a 48k -vbr on -compression_level 10 <ogg_file> to convert from mp3. Also tried opus-tools.
Example of what I send in a result array:
[{
"type":"voice",
"id":"m183802670825221348",
"title":"Something",
"voice_url":"https://<domain>/voice/m183802670825221348.ogg",
"duration":1
},
...]
What am I doing wrong?

record video & audio to mp4 file on windows pc

I have received video packets(h264) and audio packets(aac) from network(somewhere) separately. And i can get the ntptime of every packet(a or v).
now I want write them to a mp4 file in 25 fps, how can l do?
I have tried to do it using ffmpeg, but failed. please help me, thanks.

Stream live audio live555

I was writing as I could not find the answer in previous topics. I am using live555 to stream live video (h264) and audio(g723), which are being recorded by a web camera. The video part is already done and it works perfectly, but I have no clue about the audio task.
As long as I have read I have to create a ServerMediaSession to which I should add two subsessions: one for the video and one for the audio. For the video part I created a subclass of OnDemandServerMediaSubsession, a subclass of FramedSource and the Encoder class, but for the audio aspect I do not know on which classes should I base the implementation.
The web camera records and delivers audio frames in g723 format separatedly from the video. I would say the audio is raw as when I try to play it in VLC it says that it could not find any startcode; so I suppose it is the raw audio stream what is recorded by the web cam.
I was wondering if someone could give me a hint.
For an audio stream ,your override of OnDemandServerMediaSubsession::createNewRTPSink should create a SimpleRTPSink.
Something like :
RTPSink* YourAudioMediaSubsession::createNewRTPSink(Groupsock* rtpGroupsock, unsigned char rtpPayloadTypeIfDynamic, FramedSource* inputSource)
{
return SimpleRTPSink::createNew(envir(), rtpGroupsock,
4,
frequency,
"audio",
"G723",
channels );
}
The frequency and the number of channels should comes from the inputSource.