Streaming mulitple mp3 files to Icecast - icecast

I have a few thousand mp3 files on a web server that I need to stream to an Icecast 2.3.3 server instance running on the same server.
Each file is assigned to one or more categories. There are 7 categories in total. I'd like to have one mount per category.
I need to be able to add and remove files to categories. When a file is added / removed, I need to somehow merge the file into the category or shuffle the files in the category, after which I assume I'll need to restart the mount.
My question is: Is there a source application I could use that runs as a service on Windows OS that can automate this kind of thing?
Alternatively I could write a program to shuffle and merge these files together as one big "category" mp3 file, but would like to know if there's another way.
Any advice is very much appreciated.

Since you're just dealing with MP3 files, SHOUTcast sc_trans might be a good option for you.
http://wiki.winamp.com/wiki/SHOUTcast_DNAS_Transcoder_2
You can configure it to use a playlist (which you can generate programmatically), or have it read the directory and just run with the files as-is. Note that sc_trans doesn't support mount points, so you will have to configure Icecast to accept a SHOUTcast-style connection. This works, but will require you to run multiple instances of Icecast. If you'd like to stream everything on a single port later on, you can set up a master Icecast instance which relays all of the streams from the others.
There are plenty of other choices out there depending on your needs. Tools like SAM DJ allow full control over playlists and advertisements but can be overkill depending on what you need to do.
I typically find myself working with a diverse set of inputs, so I use VLC to playback and then some custom software to get this encoded and off to the streaming server. This isn't difficult to do, and you can even use VLC to do the encoding for you if you're crafty in configuring it.

I know it's old, and you most likely already found your solution. However, there may be more folks with this issue, so I throw in a a few considerations when you decide to write an own "shuffler" for MP3 files.
I would not use pure random for the task at hand: the likeliness of titles being played multiple times consecutively exists; you don't want that.
Also, you most likely have your titles sorted in some way, say
Artist A - Title 1
Artist A - Title 2
...
Artist B - Title 1
...
You most likely aim for diversity when shuffling, so you don't want to play the same artist twice consecutively.
I would read all filenames into an array with indices 0...n.
Find the artist with the most number of files, let the number be m.
Then find the next prime p, which is co-prime to n, but larger than m.
The generate a pseudo random number s in [0...n] just ONCE to find a starting song; this avoids to play the same starting sequence each time.
In a loop, do play song s, then set
s := (s + p) mod n
This is guaranteed to play all songs, and they will play just once, and multiple consecutive songs of the same artist are avoided.
Here's a little example for just 16 songs, capital letters are Artist, small letters song titles.
Aa Ab Ac Ba Bb Bc Bd Ca C2 Da Db Dc Dd Ea Fa Fb
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The artists B and E have the most songs, hence
m := 4
You search for a prime number co-prime to 16 = 2 * 2 * 2 * 2, but larger than 4, and you find:
p := 5
You invoke the PRNG function once and obtain, say, 11, so s = 11 is the first song (s = 0) to be played. Then you loop:
Aa Ab Ac Ba Bb Bc Bd Ca Cb Da Db Dc Dd Ea Fa Fb
n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
s 1 14 11 8 5 2 15 12 9 6 3 0 13 10 7 4
s is the played sequence:
Dc Aa Bc Db Fb Bb Da Fa Ba Cb Ea Ac Ca Dd Ab Bd
No artist repetition, no two songs after each other, much diversity.

Related

Looping through csv files to append together in stata

I have 50 csv files I am trying to append together. They all take on the name mortyear. They currently look like
mort70
year cause sex
1970 HA M
1970 HA F
mort71
year cause sex
1971 HA M
1971 ST M
I am currently using the following code:
local years "70 71 72 73 74 75 76 77 78"
local file "mort"
foreach file of local years {
clear
import "file_path/`file'"
keep year cause
append
}
This doesn't seem to work. Is there a better way to accomplish this?
This code is based on a series of guesses on what the syntax might be or might do. Unravelling all the guesses would take a long answer. Here is a start.
Your code defines a local macro file containing text mort and then overwrites it in the loop. As you want the text mort always as the first element of a filename, and a indicator of year that differs around the loop, the filename is going to be assembled wrongly.
The first step is not to do that.
local years "70 71 72 73 74 75 76 77 78"
foreach file of local years {
clear
import "file_path/mort`file'"
keep year cause
append
}
Now your next problems include
There is no import command with that syntax.
You read in data and change it but never save the result.
A bare append isn't legal either.
At the frustrating early stage where everything is new, and if you don't have a mentor, then everything has to be looked up.

Increasing speed of a for loop through a List and dataframe

Currently I am dealing with a massive amount of Data in the original form of a list through combination. I am running conditions on each set of list through a for loop. Problem is this small for loop is taking hours with the data. I'm looking to optimize the speed by changing some functions or vectorizing it.
I know one of the biggest NO NOs is don't do Pandas or Dataframe operations in for loops but I need to sum up the columns and organize it a little to get what I want. It seems unavoidable.
So you have a better understanding, each list looks something like this when its thrown into a dataframe:
0
Name Role Cost Value
0 Johnny Tsunami Driver 1000 39
1 Michael B. Jackson Pistol 2500 46
2 Bobby Zuko Pistol 3000 50
3 Greg Ritcher Lookout 200 25
Name Role Cost Value
4 Johnny Tsunami Driver 1000 39
5 Michael B. Jackson Pistol 2500 46
6 Bobby Zuko Pistol 3000 50
7 Appa Derren Lookout 250 30
This is the current loop, any ideas?
for element in itertools.product(*combine_list):
combo = list(element)
df = pd.DataFrame(np.array(combo).reshape(-1,11))
df[[2,3]] = df[[2,3]].apply(pd.to_numeric)
if (df[2].sum()) <= 5000 and (df[3].sum()) > 190:
df2 = pd.concat([df2, df], ignore_index=True)
Couple things I've done that have sliced off some time but not enough.
*df[2].sum() to df[2].values.sum----its faster
*where the concat is in the if statement I've tried using append and also adding the dataframe together as a list...concat is actually 2 secs faster normally or it will end up being about the same speed.
*by the .apply(pd.to_numeric) changed it to .astype(np.int64) its faster as well.
I'm currently looking at PYPY and Cython as well but I want to start here first before I go through the headache.

two way merge sort for 2 file c++

I was asked to apply Two way merge sort on two files(files of records) ,
the algorithm explains the steps as follows :
Sort phase
1)The records on the file to be sorted are divided into several groups.
Each group is called a run, and each run fits into main memory.
2)An internal sort is applied to each run, 3)and the resulting sorted runs are distributed to two external files.
Merge Sort: 1) One run from each of the external files created in the sort phase merge into a larger runs of sorted records.
2)The result is stored in a third file.
3)The data is distributed back into the first two files, and merging continues until all records are in one large run.
I was able to apply Sort Phase only , so the current files is :
(supposed run contains 3 keys only )
file 1:
50 95 110 | 40 120 153 | 22 80 140
file 2:
10 36 100 | 60 70 130
here's the steps of merge phase
so if i will solve it theoretically will perform the following :
Merge Phase:
step1 :
file 3 :
10 36 50 95 100 110 | 40 60 70 120 130 153 | 22 80 140
file 1:
10 36 50 95 100 100 | 22 80 140
file 2 :
40 60 70 120 130 153
step 2 :
file 3 :
10 36 40 50 60 70 95 100 110 120 130 135 | 22 80 140
file 1 :
10 36 40 50 60 70 95 100 110 120 130 135
file 2:
22 80 140
step 3 :
file 3 :
10 22 36 40 50 60 70 80 95 100 110 120 130 135 140
one run stop sort complete
Now i need to apply merge phase so each key from each file compared to each other and output the smaller to file 3 , and in step 2 redistribute file 3 into two file then merge and sort until have one sort run
How i can apply such algorithm in c++ , i'm little bit confused about how can i determine the size of each run in every step.
As commented by Amdt Jonasson, the program needs to keep track of the run sizes and the end of data for each file. In your example it appears the initial run size is a fixed run size of 3 elements. A merge of two runs of size 3 will result in a single run of size 6 as shown in your steps. In this case, only a single instance of run size and the end of data in each file needs to be tracked.
If the sort is to be a stable sort (the original order preserved on equal keys), and the run size is variable, then an array of run counts for each file will be needed, or some way to denote the end of runs in the file, such as a text file, with a special character sequence used as a end of run indicator.
If the sort is not required to be stable, then an out of order sequence (smaller key value after a larger key value) can be used to indicate the end of a run. The risk here is that two or more runs will appear to be a single run if the runs happen to be in order, which will lose stability and unbalance the run count on the files.
This is a two way merge sort using 3 files. If you use a 4th file, then each merge of runs can alternate between 2 output files, eliminating the need split up the runs after each merge pass.
An alternative for doing a 2 way merge sort with 3 files is a polyphase merge sort, but it's complicated, probably beyond what would be expected for a class assignment, and more of a "legacy" algorithm used back in the days of tape based sorts.

Imagemagick C++ : Reducing memory usage

I have a use case where I want to cross compare 2 sets of images to know the best similar pairs.
However, the sets are quite big, and for performance purposes I don't want to open and close images all the time.
So my idea is:
std::map<int, Magic::Image> set1;
for(...) { set1[...] = Magic::Image(...);}
std::map<int, int> best;
for(...) {
set2 = Magic::Image(...);
//Compare with all the set1
...
best[...] = set1[...]->first;
}
Obviusly I don't need to store all the set 2, since I work image by image.
But in any case the set1 is already so big that storing 32bit images is too much. For reference: 15000 images, 300x300 = 5GB
I though about reducing the memory by downsampling the images to monochrome (it does not affect my use case). But how to do it? Even if I get a color channel, Image-Magick still threats the new image as 32bits, even if it is just a channel.
My final approach has been to write a self-parser that reads color by color, converts it, and creates a bit-vector. Then do XORs and count bits. That works. (using only 170 MB)
However, is not flexible. What if I want to use 2bits, or 8 bits at some point? Is it possible in any way using Imagemagick own classes and just call compare()?
Thanks!
I have a couple of suggestions - maybe something will give you an idea!
Suggestion 1
Maybe you could use a Perceptual Hash. Rather than holding all your images in memory, you calculate a hash one at a time for each image and then compare the distance between the hashes.
Some pHASHes are invariant to image scale (or you can scale all images to the same size before hashing) and most are invariant to image format.
Here is an article by Dr Neal Krawetz... Perceptual Hashing.
ImageMagick can also do Perceptual Hashing and is callable from PHP - see here.
I also wrote some code some time back for this sort of thing... code.
Suggestion 2
I understand that ImageMagick Version 7 is imminent - no idea who could tell you more - and that it supports true single-channel, grayscale images - as well as up to 32 channel multi-spectral images. I believe it can also act as a server - holding images in memory for subsequent use. Maybe that can help.
Suggestion 3
Maybe you can get some mileage out of GNU Parallel - it can keep all your CPU cores busy in parallel and also distribute work across a number of servers using ssh. There are plenty of tutorials and examples out there, but just to demonstrate comparing each item of a named set of images (a,b,c,d) with each of a numbered set of images (1,2), you could do this:
parallel -k echo {#} compare {1} {2} ::: a b c d ::: 1 2
Output
1 compare a 1
2 compare a 2
3 compare b 1
4 compare b 2
5 compare c 1
6 compare c 2
7 compare d 1
8 compare d 2
Obviously I have put echo in there so you can see the commands generated, but you can remove that and actually run compare.
So, your code might look more like this:
#!/bin/bash
# Create a bash function that GNU Parallel can call to compare two images
comparethem() {
result=$(convert -metric rmse "$1" "$2" -compare -format "%[distortion]" info:)
echo Job:$3 $1 vs $2 $result
}
export -f comparethem
# Next line effectively uses all cores in parallel to compare pairs of images
parallel comparethem {1} {2} {#} ::: set1/*.png ::: set2/*.png
Output
Job:3 set1/s1i1.png vs set2/s2i3.png 0.410088
Job:4 set1/s1i1.png vs set2/s2i4.png 0.408234
Job:6 set1/s1i2.png vs set2/s2i2.png 0.406902
Job:7 set1/s1i2.png vs set2/s2i3.png 0.408173
Job:8 set1/s1i2.png vs set2/s2i4.png 0.407242
Job:5 set1/s1i2.png vs set2/s2i1.png 0.408123
Job:2 set1/s1i1.png vs set2/s2i2.png 0.408835
Job:1 set1/s1i1.png vs set2/s2i1.png 0.408979
Job:9 set1/s1i3.png vs set2/s2i1.png 0.409011
Job:10 set1/s1i3.png vs set2/s2i2.png 0.407391
Job:11 set1/s1i3.png vs set2/s2i3.png 0.408614
Job:12 set1/s1i3.png vs set2/s2i4.png 0.408228
Suggestion 3
I wrote an answer a while back about using REDIS to cache images - that can also work in a distributed fashion amongst a small pool of servers. That answer is here.
Suggestion 4
You may find that you can get better performance by converting the second set of images to Magick Pixel Cache format so that they can be DMA'ed into memory rather than needing to be decoded and decompressed each time. So you would do this:
convert image.png image.mpc
which gives you these two files which ImageMagick can read really quickly.
-rw-r--r-- 1 mark staff 856 16 Jan 12:13 image.mpc
-rw------- 1 mark staff 80000 16 Jan 12:13 image.cache
Note that I am not suggesting you permanently store your images in MPC format as it is unique to ImageMagick and can change between releases. I am suggesting you generate a copy in that format just before you do your analysis runs each time.

django ORM merging data

I came across a problem where I could not find an elegant way to solve it...
We have an application that monitors audio-input and tries to assign matches based on acoustic fingerprints.
The application gets a sample every few seconds, then does a lookup and stores the timestamped result in the database.
The fingerprinting is not always accurate, so it happens that "wrong" items get assigned. So the data looks something like:
timestamp foreign_id my comment
--------------------------------------------------
12:00:00 17
12:00:10 17
12:00:20 17
12:00:30 17
12:00:40 723 wrong match
12:00:50 17
12:01:00 17
12:01:10 17
12:01:20 None no match
12:01:30 17
12:01:40 18
12:01:50 18
12:02:00 18
12:02:10 18
12:02:20 18
12:02:30 992 wrong match
12:02:40 18
12:02:50 18
So I'm looking for a way to "clean up" the data periodically.
Could anyone imagine a nice way to achieve this? In the given example - the entry with the foreign-id of 723 should be corrected to 17 etc. And - if possible - with a threshold about how many entries back and forth should be taken into account.
Not sure if my question is clear enough this way, but any inputs welcome!
Check that a foreign id is in the database so many times, then check if those times are close together?
Why not just disregard the 'bad' data when using the data?