vector containing doubles [closed] - c++

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I need to calculate the mean, median and s.d. of the values inside the vector. I can sort the vector to find out the median but is there an easier way to find the mean and standard deviation rather than adding stuff up?

You can find the median with std::nth_element. Contrary to (apparently) popular belief, this is normally faster than sorting, then finding the middle element -- it's normally O(N) (linear) where sorting is normally O(N log N).
To add the elements for the mean, you can ust std::accumulate, something like:
double total = std::accumulate(std::begin(v), std::end(v), 0.0);
[Note: depending on how old your compiler is, you may need to use v.begin() and v.end() instead of begin(v) andend(v)`). ]
Computing the variance has been covered in a previous question. The standard deviation is simply the square root of the variance.

In order to find the mean, you're simply going to have to add the vector contents up. You can find the median without actually sorting the vector first, but an algorithm for calculating the median on an unsorted vector would almost certainly be much more complex than if it's sorted. Also, I pretty sure that if you calculate the time to find the median on an unsorted vector, it's almost certainly going to be more than the combined time of sorting and extracting the median. (if you're doing it for just the technical challenge, I'll write one for you...)
Since you're probably going to have to sort the vector, you could calculate the mean whilst you're sorting.

EDIT: Didn't see the C++ tag!
If you are using a language that offers functional programming tools, you can foldl the vector with the + function and divide by its length for mean.
For stddev, you can use a lambda : x -> (x - mean)^2 and fold the result with a +.
It's not more computationally efficient, but it probably saves a lot in developer time!

Related

2D String Matching: Baker-Bird Algorithm [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I want to find a submatrix in a huge matrix, so I google and find the Baker-Bird Algorithm.
But, unfortunately I cannot understand it very much, and the tutorial about it is rare.
I cannot find some example code to study.
So what I want to ask is there some simple example code or pseudo code that I can study it?
Thx in advance.
Ok, from studying the link Kent Munthe Caspersen gave ( http://www.stringology.org/papers/Zdarek-PhD_thesis-2010.pdf page 30 on), I understand how the Baker-Bird Algorithm works.
For a submatrix to appear in a matrix, its columns must all match individually. You can scan down each column looking for matches, and then scan this post-processed matrix for rows indicating columns consecutively matching at the same spot.
Say we are looking for submatrices of the format
a c a
b b a
c a b
We search down each column for the column-matches 'abc' 'cba' or 'aab' and in a new matrix mark the ends of those complete matches in the corresponding cell - for example with A, B or C. (What the algorithm in the paper does is construct a state machine which transitions to a new state based on the old state number and which letter comes next, and then looks for states that indicate we just matched a column, which is more complex but more efficient as it only has to scan each column once instead of once per different column match we are interested in)
Once we have done this, we scan along each row looking for successive values indicating successive columns matched - in this case, we're looking for the string 'ABC' in a matrix row. If we find it, there was a sub-array match here.
Speedups are attained from using the state machine approach described above, and also from choice of string searching algorithm ( there are many with different time complexities: ( of which there are numerous: http://en.wikipedia.org/wiki/String_searching_algorithm )
(Note that the entire algorithm can, of course, be flipped to do rows first than columns, it's identical.)
What about the example in this PhD thesis p.31-33:
http://www.stringology.org/papers/Zdarek-PhD_thesis-2010.pdf

How to determine what bin a float should be in? C++ [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have an array of floats Float_t xbins[41] that defines 40 bins i.e. ranges of floats.
E.g. y is in bin 7 if y > xbins[7] && !(y > xbins[8]).
How do I determine what bin a given float should belong to without having 40 if statements?
Please answer in C++ as I don't speak other languages.
If the array is sorted, then do a binary search to locate the correct bin. You'll need a combination of std::sort (if not sorted), then something like std::lower_bound, to locate. You'll need to ensure that operator< is implemented correctly for Float_t.
As it turned out that the bins are not uniformly spaced but have integer bounds, the probably fastest method is to have a (inverse) look up table that apparently has about 100 entries. One needs to make basically two comparisons for the lower & higher bounds.
If the array bounds are derived with a formula, it could be possible to write an inverse formula that outperforms the LUT method.
For a generic case binary search is the way -- and even that can be improved a bit by doing linear interpolation instead of exactly subdividing the range to half. The speed (if the data is not pathological) would be O(loglogn) compared to O(logn) for binary search.

How to create a DAWG? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
How can a DAWG be created? I have found that there are two ways; one is converting a trie into a dawg and the other being creating a new DAWG straight away? Which one is the easiest? Can you please elaborate on the two and provide some links?
One way to think about the DAWG is as a minimum-state DFA for all of the words in your word list. As a result, the traditional algorithm for constructing a DAWG is the following:
Start off by constructing a trie for the collection of words.
Add a new node to the trie with edges from itself to itself on all inputs.
For each missing letter transition in the trie, add a transition from the start node to this new dead node.
(At this point, you now have a (probably non-minimum) DFA for the set of words.)
Minimize the DFA using the standard algorithm for DFA state minimization.
Once you have done this, you will be left with a DAWG for the set of words you are interested in.
The runtime of this algorithm is as follows. Constructing the initial DFA can be done by constructing a trie for all the original words (which takes time O(n), where n is the total number of characters in all the input strings), then filling in the missing transitions (which takes time O(n|Σ|), where |Σ| is the number of different characters in your alphabet). From there, the minimization algorithm runs in time O(n2 |Σ|). This means that the overall runtime for the algorithm is O(n2 |Σ|).
To the best of my knowledge, there is no straightforward algorithm for incrementally constructing DAWGs. Typically, you would build a DAWG for a set of words only if you already had all the words in advance. Intuitively, this is true because inserting a new word that has some suffixes already present in the DAWG might require a lot of restructuring of the DAWG to make certain old accepting states not accepting and vice-versa. Theoretically speaking, this results because inserting a new word might dramatically change the equivalence classes of the DFA's distinguishability relation, which might require substantial changes to the DFA's structure.
Hope this helps!

Why does using std::vector<> instead of std::list<> cause an increase in code size? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
In a project at work a lot std::list and std::vector are used. Since random inserts were seldom needed, I started to change the std::lists to std::vectors. But with every switch the resulting code size increased (not a fixed amount but roughly 1kB on average). Given that std::vector was already used, I don't see why switching a std::list to std::vector should increase the code size. Any ideas why? The compiler used is g++.
Maybe you have added a vector of a new type (e.g. In your original code you used vector<int> and now you added a vector<string>: they are different types, so the code size will increase to include the new type).
Is this in debug mode? If yes, it could be inlined range checking code that increases the code size. Note that this is not so much necessary for lists, where you only need to check if the next node is null.
Okay, without further details we can only guess.
The vector-memory is contiguous (which is guaranteed by the standard), but list-memory is not. Therefore it might be that the compiler is able to vectorize and unroll your vector-based code better, which leads to bigger instructions and longer binary code.
std::vector contains more functions and code than list (no random access in list)

C++ Algorithm stability [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
How can I tell whether an algorithm is stable or not?..
Also, how does this algorithm Bucketsort compare to Mergesort, Quicksort, Bubblesort, and Insertionsort
?
At first glance it would seem that if your queues are FIFO then it is stable. However I think there some context from class or other homework that would help you make a more solid determination.
From wikipedia:
Stability
Stable sorting algorithms maintain the relative order of records with equal keys. If all keys are different then this distinction is not necessary. But if there are equal keys, then a sorting algorithm is stable if whenever there are two records (let's say R and S) with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list. When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. However, assume that the following pairs of numbers are to be sorted by their first component:
http://en.wikipedia.org/wiki/Sorting_algorithm#Stability
As far as comparing to other algorithms. Wikipedia has a concise entry on it:
http://en.wikipedia.org/wiki/Bucket_sort#Comparison_with_other_sorting_algorithms
Also: https://stackoverflow.com/a/7341355/1416221