Compute Median of Values Stored In Vector - C++? - c++

I'm a programming student, and for a project I'm working on, on of the things I have to do is compute the median value of a vector of int values. I'm to do this using only the sort function from the STL and vector member functions such as .begin(), .end(), and .size().
I'm also supposed to make sure I find the median whether the vector has an odd number of values or an even number of values.
And I'm Stuck, below I have included my attempt. So where am I going wrong? I would appreciate if you would be willing to give me some pointers or resources to get going in the right direction.
Code:
int CalcMHWScore(const vector<int>& hWScores)
{
const int DIVISOR = 2;
double median;
sort(hWScores.begin(), hWScores.end());
if ((hWScores.size() % DIVISOR) == 0)
{
median = ((hWScores.begin() + hWScores.size()) + (hWScores.begin() + (hWScores.size() + 1))) / DIVISOR);
}
else
{
median = ((hWScores.begin() + hWScores.size()) / DIVISOR)
}
return median;
}

There is no need to completely sort the vector: std::nth_element can do enough work to put the median in the correct position. See my answer to this question for an example.
Of course, that doesn't help if your teacher forbids using the right tool for the job.

You are doing an extra division and overall making it a bit more complex than it needs to be. Also, there's no need to create a DIVISOR when 2 is actually more meaningful in context.
double CalcMHWScore(vector<int> scores)
{
size_t size = scores.size();
if (size == 0)
{
return 0; // Undefined, really.
}
else
{
sort(scores.begin(), scores.end());
if (size % 2 == 0)
{
return (scores[size / 2 - 1] + scores[size / 2]) / 2;
}
else
{
return scores[size / 2];
}
}
}

The accepted answer uses std::sort which does more work than we need it to. The answers that use std::nth_element don't handle the even size case correctly.
We can do a little better than just using std::sort. We don't need to sort the vector completely in order to find the median. We can use std::nth_element to find the middle element. Since the median of a vector with an even number of elements is the average of the middle two, we need to do a little more work to find the other middle element in that case. std::nth_element ensures that all elements preceding the middle are less than the middle. It doesn't guarantee their order beyond that so we need to use std::max_element to find the largest element preceding the middle element.
int CalcMHWScore(std::vector<int> hWScores) {
assert(!hWScores.empty());
const auto middleItr = hWScores.begin() + hWScores.size() / 2;
std::nth_element(hWScores.begin(), middleItr, hWScores.end());
if (hWScores.size() % 2 == 0) {
const auto leftMiddleItr = std::max_element(hWScores.begin(), middleItr);
return (*leftMiddleItr + *middleItr) / 2;
} else {
return *middleItr;
}
}
You might want to consider returning a double because the median may be a fraction when the vector has an even size.

const int DIVISOR = 2;
Don't do this. It just makes your code more convoluted. You've probably read guidelines about not using magic numbers, but evenness vs. oddness of numbers is a fundamental property, so abstracting this out provides no benefit but hampers readability.
if ((hWScores.size() % DIVISOR) == 0)
{
median = ((hWScores.begin() + hWScores.size()) + (hWScores.begin() + (hWScores.size() + 1))) / DIVISOR);
You're taking an iterator to the end of the vector, taking another iterator that points one past the end of the vector, adding the iterators together (which isn't an operation that makes sense), and then dividing the resulting iterator (which also doesn't make sense). This is the more complicated case; I'll explain what to do for the odd-sized vector first and leave the even-sized case as an exercise for you.
}
else
{
median = ((hWScores.begin() + hWScores.size()) / DIVISOR)
Again, you're dividing an iterator. What you instead want to do is to increment an iterator to the beginning of the vector by hWScores.size() / 2 elements:
median = *(hWScores.begin() + hWScores.size() / 2);
And note that you have to dereference iterators to get values out of them. It'd be more straightforward if you used indices:
median = hWScores[hWScores.size() / 2];

I give below a sample program that is somewhat similar to the one in Max S.'s response. To help the OP advance his knowledge and understanding, I have made a number of changes. I have:
a) changed the call by const reference to call by value, since sort is going to want to change the order of the elements in your vector, (EDIT: I just saw that Rob Kennedy also said this while I was preparing my post)
b) replaced size_t with the more appropriate vector<int>::size_type (actually, a convenient synonym of the latter),
c) saved size/2 to an intermediate variable,
d) thrown an exception if the vector is empty, and
e) I have also introduced the conditional operator (? :).
Actually, all of these corrections are straight out of Chapter 4 of "Accelerated C++" by Koenig and Moo.
double median(vector<int> vec)
{
typedef vector<int>::size_type vec_sz;
vec_sz size = vec.size();
if (size == 0)
throw domain_error("median of an empty vector");
sort(vec.begin(), vec.end());
vec_sz mid = size/2;
return size % 2 == 0 ? (vec[mid] + vec[mid-1]) / 2 : vec[mid];
}

I'm not exactly sure what your restrictions on the user of member functions of vector are, but index access with [] or at() would make accessing elements simpler:
median = hWScores.at(hWScores.size() / 2);
You can also work with iterators like begin() + offset like you are currently doing, but then you need to first calculate the correct offset with size()/2 and add that to begin(), not the other way around. Also you need to dereference the resulting iterator to access the actual value at that point:
median = *(hWScores.begin() + hWScores.size()/2)

Related

Why finding median of 2 sorted arrays of different sizes takes O(log(min(n,m)))

Pleas consider this problem:
We have 2 sorted arrays of different sizes, A[n] and B[m];
I have and implemented a classical algorithm that takes at most O(log(min(n,m))).
Here's the approach:
Start partitioning the two arrays into two groups of halves (not two parts, but both partitioned should have same number of elements). The first half contains some first elements from the first and the second arrays, and the second half contains the rest (or the last) elements form the first and the second arrays. Because the arrays can be of different sizes, it does not mean to take every half from each array. Reach a condition such that, every element in the first half is less than or equal to every element in the second half.
Please see the code above:
double median(std::vector<int> V1, std::vector<int> V2)
{
if (V1.size() > V2.size())
{
V1.swap(V2);
};
int s1 = V1.size();
int s2 = V2.size();
int low = 0;
int high = s1;
while (low <= high)
{
int px = (low + high) / 2;
int py = (s1 + s2 + 1) / 2 - px;
int maxLeftX = (px == 0) ? MIN : V1[px - 1];
int minRightX = (px == s1) ? MAX : V1[px];
int maxLeftY = (py == 0) ? MIN : V2[py - 1];
int minRightY = (py == s2) ? MAX : V2[py];
if (maxLeftX <= minRightY && maxLeftY <= minRightX)
{
if ((s1 + s2) % 2 == 0)
{
return (double(std::max(maxLeftX, maxLeftY)) + double(std::min(minRightX, minRightY)))/2;
}
else
{
return std::max(maxLeftX, maxLeftY);
}
}
else if(maxLeftX > minRightY)
{
high = px - 1;
}
else
{
low = px + 1;
}
}
throw;
}
Although the approach is pretty straightforward and it works, I still cannot convince myself of its correctness. Furthermore I cant understand why its takes O(log(min(n,m)) steps.
If anyone can briefly explain the correcthnes and why it takes O(log(min(n,m))) steps that would be awesome. Even if you can provide a link with meaningfull explanation.
Time complexity is quite straightforward, you binary search through the array with less elements to find such a partition, that enables you to find the median. You make exactly O(log(#elements)) steps, and since your #elements is exactly min(n, m) the complexity is O(log(min(n+m)).
There are exactly (n + m)/2 elements smaller than the median and the same amount of elements greater. Let's think about them as two halves (let the median belong to one of your choice).
You can surely divide the smaller array into two subarrays, that one of them lies entirely in the first half and the second one in the other half. However, you have no idea how many elements are in any of them.
Let's choose some x - your guess of number of elements from the smaller array in the first half. It must be in range from 0 to n. Then you know, since there are exactly (n + m)/2 elements smaller than the median, that you have to choose (n+m)/2 - x elements from the bigger array. Then you have to check if that partition actually works.
To check if partition is good you have to check if all the elements in the smaller half are smaller than all the elements in the greater half. You have to check if maxLeftX <= minRightY and if maxLeftY <= minRightX (then every element in the left half is smaller then every element in the right half)
If so, you've found the correct partition. You can now easily find your median (it's either max(maxLeftX, maxLeftY)), min(minRightX, minRightY) or their sum divided by 2).
If not, you either took too much elements from the smaller array (the case when maxLeftX > minRightY), so next time you have to guess smaller value for x, or too little of them, then you have to guess greater value for x.
To get the best complexity always guess in the middle of a range of possible values that x may take.

Sorting Optimization

I'm currently following an algorithms class and thus decided it would be good practice to implement a few of the sorting algorithms and compare them.
I implemented merge sort and quick sort and then compared their run time, along with the std::sort:
My computer isn't the fastest but for 1000000 elements I get on average after 200 attempts:
std::sort -> 0.620342 seconds
quickSort -> 2.2692
mergeSort -> 2.19048
I would like to ask if possible for comments on how to improve and optimize the implementation of my code.
void quickSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
if(s >= e)
return;
int pivot;
int a = s + (rand() % (e-s));
int b = s + (rand() % (e-s));
int c = s + (rand() % (e-s));
//find median of the 3 random pivots
int min = std::min(std::min(nums[a],nums[b]),nums[c]);
int max = std::max(std::max(nums[a],nums[b]),nums[c]);
if(nums[a] < max && nums[a] > min)
pivot = a;
else if(nums[b] < max && nums[b] > min)
pivot = b;
else
pivot = c;
int temp = nums[s];
nums[s] = nums[pivot];
nums[pivot] = temp;
//partition
int i = s + 1, j = s + 1;
for(; j < e; j++){
if(comparator(nums[j] , nums[s])){
temp = nums[i];
nums[i++] = nums[j];
nums[j] = temp;
}
}
temp = nums[i-1];
nums[i-1] = nums[s];
nums[s] = temp;
//sort left and right of partition
quickSort(nums,s,i-1,comparator);
quickSort(nums,i,e,comparator);
Here s is the index of the first element, e the index of the element after the last. defaultComparator is just the following lambda function:
auto defaultComparator = [](int a, int b){ return a <= b; };
std::vector<int> mergeSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
std::vector<int> sorted(e-s);
if(s == e)
return sorted;
int mid = (s+e)/2;
if(s == mid){
sorted[0] = nums[s];
return sorted;
}
std::vector<int> left = mergeSort(nums, s, mid);
std::vector<int> right = mergeSort(nums, mid, e);
unsigned int i = 0, j = 0;
unsigned int c = 0;
while(i < left.size() || j < right.size()){
if(i == left.size()){
sorted[c++] = right[j++];
}
else if(j == right.size()){
sorted[c++] = left[i++];
}
else{
if(comparator(left[i],right[j]))
sorted[c++] = left[i++];
else
sorted[c++] = right[j++];
}
}
return sorted;
Thank you all
The first thing I see, you're passing a std::function<> which involves a virtual call, one of the most expensive calling strategies. Give it a try with simply a template T (which might be a function) - the result will be direct function calls.
Second thing, never do this result-in-local-container (vector<int> sorted;) when optimizing and when in-place variant exists. Do in-place sort. Client should be aware of you shorting their vector; if they wish, they can make a copy in advance. You take non-const reference for a reason. [1]
Third, there's a cost associated with rand() and it's far from negligible. Unless you're sure you need the randomized variant of quicksort() (and its benefits regarding 'no too bad sequence'), use just the first element as pivot. Or the middle.
Use std::swap() to swap two elements. Chances are, it gets translated to xchg (on x86 / x64) or an equivalent, which is hard to beat. Whether the optimizer identifies your intend to swap at these places without being explicit could be verified from assembly output.
The way you found the median of three elements is full of conditional moves / branches. It's simply nums[a] + nums[b] + nums[c] - max - min; but getting nums[...], min and max at the same time could also be optimized further.
Avoid i++ when aiming at speed. While most optimizers will usually create good code, there's a small chance that it's suboptimal. Be explicit when optimizing (++i after the swap), but _only_when_optimizing_.
But the most important one: valgrind/callgrind/kcachegrind. Profile, profile, profile. Only optimize what's really slow.
[1] There's an exception to this rule: const containers that you build from non-const ones. These are usually in-house types and are shared across multiple threads, hence it's better to keep them const & copy when modification is needed. In this case, you'll allocate a new container (either const or not) in your function, but you'll probably keep const one for users' convenience on API.
For quick sort, use Hoare like partition scheme.
http://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme
Median of 3 only needs 3 if / swap statements (effectively a bubble sort). No need for min or max check.
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
if(nums[b] > nums[c])
std::swap(nums[b], nums[c]);
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
// use nums[b] as pivot value
For merge sort, use an entry function that does a one time creation of a working vector, then pass that vector by reference to the actual merge sort function. For top down merge sort, the indices determine the start, middle, and end of each sub-vector.
If using top down merge sort, the code can avoid copying data by alternating the direction of merge depending on the level of recursion. This can be done using two mutually recursive functions, the first one where the result ends up in the original vector, the second one where the result ends up in the working vector. The first one calls the second one twice, then merges from the working vector back to the original vector, and vice versa for the second one. For the second one, if the size == 1, then it needs to copy 1 element from the original vector to the working vector. An alternative to two functions is to pass a boolean for which direction to merge.
If using bottom up merge sort (which will be a bit faster), then each pass swaps vectors. The number of passes needed is determined up front and in the case of an odd number of passes, the first pass swaps in place, so that the data ends up in the original vector after all merge passes are done.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

How to figure out "progress" while sorting?

I'm using stable_sort to sort a large vector.
The sorting takes on the order of a few seconds (say, 5-10 seconds), and I would like to display a progress bar to the user showing how much of the sorting is done so far.
But (even if I was to write my own sorting routine) how can I tell how much progress I have made, and how much more there is left to go?
I don't need it to be exact, but I need it to be "reasonable" (i.e. reasonably linear, not faked, and certainly not backtracking).
Standard library sort uses a user-supplied comparison function, so you can insert a comparison counter into it. The total number of comparisons for either quicksort/introsort or mergesort will be very close to log2N * N (where N is the number of elements in the vector). So that's what I'd export to a progress bar: number of comparisons / N*log2N
Since you're using mergesort, the comparison count will be a very precise measure of progress. It might be slightly non-linear if the implementation spends time permuting the vector between comparison runs, but I doubt your users will see the non-linearity (and anyway, we're all used to inaccurate non-linear progress bars :) ).
Quicksort/introsort would show more variance, depending on the nature of the data, but even in that case it's better than nothing, and you could always add a fudge factor on the basis of experience.
A simple counter in your compare class will cost you practically nothing. Personally I wouldn't even bother locking it (the locks would hurt performance); it's unlikely to get into an inconsistent state, and anyway the progress bar won't go start radiating lizards just because it gets an inconsistent progress number.
Split the vector into several equal sections, the quantity depending upon the granularity of progress reporting you desire. Sort each section seperately. Then start merging with std::merge. You can report your progress after sorting each section, and after each merge. You'll need to experiment to determine how much percentage the sorting of the sections should be counted compared to the mergings.
Edit:
I did some experiments of my own and found the merging to be insignificant compared to the sorting, and this is the function I came up with:
template<typename It, typename Comp, typename Reporter>
void ReportSort(It ibegin, It iend, Comp cmp, Reporter report, double range_low=0.0, double range_high=1.0)
{
double range_span = range_high - range_low;
double range_mid = range_low + range_span/2.0;
using namespace std;
auto size = iend - ibegin;
if (size < 32768) {
stable_sort(ibegin,iend,cmp);
} else {
ReportSort(ibegin,ibegin+size/2,cmp,report,range_low,range_mid);
report(range_mid);
ReportSort(ibegin+size/2,iend,cmp,report,range_mid,range_high);
inplace_merge(ibegin, ibegin + size/2, iend);
}
}
int main()
{
std::vector<int> v(100000000);
std::iota(v.begin(), v.end(), 0);
std::random_shuffle(v.begin(), v.end());
std::cout << "starting...\n";
double percent_done = 0.0;
auto report = [&](double d) {
if (d - percent_done >= 0.05) {
percent_done += 0.05;
std::cout << static_cast<int>(percent_done * 100) << "%\n";
}
};
ReportSort(v.begin(), v.end(), std::less<int>(), report);
}
Stable sort is based on merge sort. If you wrote your own version of merge sort then (ignoring some speed-up tricks) you would see that it consists of log N passes. Each pass starts with 2^k sorted lists and produces 2^(k-1) lists, with the sort finished when it merges two lists into one. So you could use the value of k as an indication of progress.
If you were going to run experiments, you might instrument the comparison object to count the number of comparisons made and try and see if the number of comparisons made is some reasonably predictable multiple of n log n. Then you could keep track of progress by counting the number of comparisons done.
(Note that with the C++ stable sort, you have to hope that it finds enough store to hold a copy of the data. Otherwise the cost goes from N log N to perhaps N (log N)^2 and your predictions will be far too optimistic).
Select a small subset of indices and count inversions. You know its maximal value, and you know when you are done the value is zero. So, you can use this value as a "progressor". You can think of it as a measure of entropy.
Easiest way to do it: sort a small vector and extrapolate the time assuming O(n log n) complexity.
t(n) = C * n * log(n) ⇒ t(n1) / t(n2) = n1/n2 * log(n1)/log(n2)
If sorting 10 elements takes 1 μs, then 100 elements will take 1 μs * 100/10 * log(100)/log(10) = 20 μs.
Quicksort is basically
partition input using a pivot element
sort smallest part recursively
sort largest part using tail recursion
All the work is done in the partition step. You could do the outer partition directly and then report progress as the smallest part is done.
So there would be an additional step between 2 and 3 above.
Update progressor
Here is some code.
template <typename RandomAccessIterator>
void sort_wReporting(RandomAccessIterator first, RandomAccessIterator last)
{
double done = 0;
double whole = static_cast<double>(std::distance(first, last));
typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
while (first != last && first + 1 != last)
{
auto d = std::distance(first, last);
value_type pivot = *(first + std::rand() % d);
auto iter = std::partition(first, last,
[pivot](const value_type& x){ return x < pivot; });
auto lower = distance(first, iter);
auto upper = distance(iter, last);
if (lower < upper)
{
std::sort(first, iter);
done += lower;
first = iter;
}
else
{
std::sort(iter, last);
done += upper;
last = iter;
}
std::cout << done / whole << std::endl;
}
}
I spent almost one day to figure out how to display the progress for shell sort, so I will leave here my simple formula. Given an array of colors, it will display the progress. It is blending the colors from red to yellow and then to green. When it is Sorted, it is the last position of the array that is blue. For shell sort, the iterations each time it passes through the array are quite proportional, so the progress becomes pretty accurate.
(Code in Dart/Flutter)
List<Color> colors = [
Color(0xFFFF0000),
Color(0xFFFF5500),
Color(0xFFFFAA00),
Color(0xFFFFFF00),
Color(0xFFAAFF00),
Color(0xFF55FF00),
Color(0xFF00FF00),
Colors.blue,
];
[...]
style: TextStyle(
color: colors[(((pass - 1) * (colors.length - 1)) / (log(a.length) / log(2)).floor()).floor()]),
It is basically a cross-multiplication.
a means array. (log(a.length) / log(2)).floor() means rounding down the log2(N), where N means the number of items. I tested this with several combinations of array sizes, array numbers, and sizes for the array of colors, so I think it is good to go.

how to get median value from sorted map

I am using a std::map. Sometimes I will do an operation like: finding the median value of all items. e.g
if I add
1 "s"
2 "sdf"
3 "sdfb"
4 "njw"
5 "loo"
then the median is 3.
Is there some solution without iterating over half the items in the map?
I think you can solve the problem by using two std::map. One for smaller half of items (mapL) and second for the other half (mapU). When you have insert operation. It will be either case:
add item to mapU and move smallest element to mapL
add item to mapL and move greatest element to mapU
In case the maps have different size and you insert element to the one with smaller number of
elements you skip the move section.
The basic idea is that you keep your maps balanced so the maximum size difference is 1 element.
As far as I know STL all operations should work in O(ln(n)) time. Accessing smallest and greatest element in map can be done by using iterator.
When you have n_th position query just check map sizes and return greatest element in mapL or smallest element in mapR.
The above usage scenario is for inserting only but you can extend it to deleting items as well but you have to keep track of which map holds item or try to delete from both.
Here is my code with sample usage:
#include <iostream>
#include <string>
#include <map>
using namespace std;
typedef pair<int,string> pis;
typedef map<int,string>::iterator itis;
map<int,string>Left;
map<int,string>Right;
itis get_last(map<int,string> &m){
return (--m.end());
}
int add_element(int key, string val){
if (Left.empty()){
Left.insert(make_pair(key,val));
return 1;
}
pis maxl = *get_last(Left);
if (key <= maxl.first){
Left.insert(make_pair(key,val));
if (Left.size() > Right.size() + 1){
itis to_rem = get_last(Left);
pis cpy = *to_rem;
Left.erase(to_rem);
Right.insert(cpy);
}
return 1;
} else {
Right.insert(make_pair(key,val));
if (Right.size() > Left.size()){
itis to_rem = Right.begin();
pis cpy = *to_rem;
Right.erase(to_rem);
Left.insert(*to_rem);
}
return 2;
}
}
pis get_mid(){
int size = Left.size() + Right.size();
if (Left.size() >= size / 2){
return *(get_last(Left));
}
return *(Right.begin());
}
int main(){
Left.clear();
Right.clear();
int key;
string val;
while (!cin.eof()){
cin >> key >> val;
add_element(key,val);
pis mid = get_mid();
cout << "mid " << mid.first << " " << mid.second << endl;
}
}
I think the answer is no. You cannot just jump to the N / 2 item past the beginning because a std::map uses bidirectional iterators. You must iterate through half of the items in the map. If you had access to the underlying Red/Black tree implementation that is typically used for the std::map, you might be able to get close like in Dani's answer. However, you don't have access to that as it is encapsulated as an implementation detail.
Try:
typedef std::map<int,std::string> Data;
Data data;
Data::iterator median = std::advance(data.begin(), data.size() / 2);
Works if the size() is odd. I'll let you work out how to do it when size() is even.
In self balancing binary tree(std::map is one I think) a good approximation would be the root.
For exact value just cache it with a balance indicator, and each time an item added below the median decrease the indicator and increase when item is added above. When indicator is equal to 2/-2 move the median upwards/downwards one step and reset the indicator.
If you can switch data structures, store the items in a std::vector and sort it. That will enable accessing the middle item positionally without iterating. (It can be surprising but a sorted vector often out-performs a map, due to locality. For lookups by the sort key you can use binary search and it will have much the same performance as a map anyway. See Scott Meyer's Effective STL.)
If you know the map will be sorted, then get the element at floor(length / 2). If you're in a bit twiddly mood, try (length >> 1).
I know no way to get the median from a pure STL map quickly for big maps. If your map is small or you need the median rarely you should use the linear advance to n/2 anyway I think - for the sake of simplicity and being standard.
You can use the map to build a new container that offers median: Jethro suggested using two maps, based on this perhaps better would be a single map and a continuously updated median iterator. These methods suffer from the drawback that you have to reimplement every modifiying operation and in jethro's case even the reading operations.
A custom written container will also do what you what, probably most efficiently but for the price of custom code. You could try, as was suggested to modify an existing stl map implementation. You can also look for existing implementations.
There is a super efficient C implementation that offers most map functionality and also random access called Judy Arrays. These work for integer, string and byte array keys.
Since it sounds like insert and find are your two common operations while median is rare, the simplest approach is to use the map and std::advance( m.begin(), m.size()/2 ); as originally suggested by David Rodríguez. This is linear time, but easy to understand so I'd only consider another approach if profiling shows that the median calls are too expensive relative to the work your app is doing.
The nth_element() method is there for you for this :) It implements the partition part of the quick sort and you don't need your vector (or array) to be sorted.
And also the time complexity is O(n) (while for sorting you need to pay O(nlogn)).
For a sortet list, here it is in java code, but i assume, its very easy to port to c++:
if (input.length % 2 != 0) {
return input[((input.length + 1) / 2 - 1)];
} else {
return 0.5d * (input[(input.length / 2 - 1)] + input[(input.length / 2 + 1) - 1]);
}