Date post: | 27-Jun-2015 |
Category: |
Education |
Upload: | bansal-ashish |
View: | 32 times |
Download: | 5 times |
Medians & Order Statistics
Data Structures & Algorithms
Medians & Order Statistics
What are they? The ith order statistic of a set of n
elements is defined as the ith smallest element in the set. E.g., the minimum order statistic is the
first order statistic, the max is the last order statistic
The median is informally the “halfway” point – there are one (if n is odd) or two (if n is even)
Medians & Order Statistics
This chapter deals with finding a particular order statistic in a set We know we can use a sorting
algorithm to find an order statistic in O(nlog2n) time, by sorting the data first
There are faster algorithms, however, that don’t require a sort
Minimum and Maximum
A basic algorithm is (O(n)) – just scan the set and keep track of the smallest This is the best we can do – in order to find
the min (or max) we must compare every element; since we’ve done this in O(n) time, you can’t get any better
Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}
Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}
Simultaneous min and max
Sometimes its useful to come up with both at the same timeWe can run separate algorithms, or modify the original to keep track of both What about finding the second smallest element?
Write an algorithm to compute the second smallest element. How many comparisons
are required?
Write an algorithm to compute the second smallest element. How many comparisons
are required?
Selection in Expected Linear Time
A general solution would be more useful It would seem like the type of
problem that would be hard to solve – at least O(nlog2n)
In reality it can be accomplished in O(n), using a divide-and-conquer algorithm
Randomized Select
Recall the Randomized QuickSort algorithm As in QuickSort, the idea is to partition the input
recursively Unlike QuickSort, Randomized Select only cares
about one of the partitions – the partition containing the order statistic you’re looking for
The expected running time for this algorithm is O(n)
Randomized Select requires the Randomized Partition algorithm previously discussed
The Randomized Select Algorithm
RandomizedSelect(A, begin, end, i){
if (begin == end ) return A[begin];
q = RandomizedPartition(A, begin, end)k = q – begin + 1if ( i <= k )
return RandomizedSelect(A, begin, q, i)else
return RandomizedSelect(A, q+1, end, i-k)}
The Randomized Select Algorithm
First, partition the array This guarantees that all the elements in
A[begin..q] are less than all the elements in A[q+1..end]
Then compute how many elements are in the array A[begin..q] This is just q-begin+1 (since begin may be
non-zero) This also happens to be the order statistic of
the partition element
The Randomized Select Algorithm
Because of the partitioning, we know which partition the order statistic must be in If the order statistic is less than k, then
recursively search the left partition for order statistic i
If the order statistic is to the right of k, then recursively search the right partition for order statistic i – k We already know that k values are smaller than the
smallest element in this partition We’re looking for the (i-k)th smallest element in that
partition
Analysis of Randomized Select
Worst case, O(n2) We could get unlucky and partition
around the largest or smallest remaining element
This is unlikely, since it’s randomized
The average case is somewhat more complicated (see the formula on 189) but amounts to O(n)
Generic Programming
Generic programming is “programming using types as parameters”The idea of generic programming is to write code that is data-type independent Many algorithms and data structures that we
discuss will operate independently of data type Generic programming provides a way of
writing the code once, then specifying the data type to operate on later
Reference: The C++ Programming Language, by Bjarne Stroustrup
Reference: The C++ Programming Language, by Bjarne Stroustrup
Generic Programming in C++
The principle of generic programming in C++ is implemented via templates Templates provide a way to represent
a wide range of general concepts and simple ways to combine them
Template Functions
The C++ compiler deduces the template arguments from the function argumentsCalling this function is the same as calling any other function:int some_min = FindMin(some_array, SIZE);
template <typename Type>Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}
template <typename Type>Type FindMin(Type data[], int length){ Type min = data[0]; for ( int i = 1 ; i < length ; ++i ) if ( data[i] < min ) min = data[i]; return min;}
Template Functions
You are not limited to one template parameter Multiple parameters are listed as a
comma separated list:template <typename T, typename U> …
Template parameters aren’t even limited to typenames:
template <typename T, int i> …
Template Functions
There are rare occasions when the compiler cannot deduce the type of the template argument E.g., when the argument is only used as a return
type In these cases, explicit specification can be used
template <typename T>T* create(){ return new T;}…SomeClass *p = create<SomeClass>();
template <typename T>T* create(){ return new T;}…SomeClass *p = create<SomeClass>();
Template Functions
Template functions can also be overloaded, both with other template functions and with non-template functions This may be required in a number of
situations: For some types, you can use a different (more
efficient) algorithm Multiple type deductions can be made, such that
the compiler can’t decide which version to use
template <typename T> T sqrt(T);template <typename T> complex<T> sqrt(complex<T>);double sqrt(double);
template <typename T> T sqrt(T);template <typename T> complex<T> sqrt(complex<T>);double sqrt(double);
Template Classes
C++ also provides for template classes
template <typename T>class SomeArray {public: SomeArray(); T& ItemAt(int index); void SetItemAt(int index, const T& value); …private: T m_data[SIZE];};
template <typename T>class SomeArray {public: SomeArray(); T& ItemAt(int index); void SetItemAt(int index, const T& value); …private: T m_data[SIZE];};
Template Classes
Instantiating an object from a template class takes a little more work You must specify the type The resulting object can be used like any
other object
SomeArray<double> array_of_doubles;array_of_doubles.SetItemAt(0, 2.0);double d = array_of_doubles.ItemAt(0);
SomeArray<double> array_of_doubles;array_of_doubles.SetItemAt(0, 2.0);double d = array_of_doubles.ItemAt(0);
Template Classes
Like function templates, template parameters are not limited to only generic types Other data can also be provided:
template <typename Type, int Storage>class SomeArray {public: SomeArray(); Type& ItemAt(int index); …private: Type m_data[Storage];};…SomeArray<double, 1000> array_of_doubles;
template <typename Type, int Storage>class SomeArray {public: SomeArray(); Type& ItemAt(int index); …private: Type m_data[Storage];};…SomeArray<double, 1000> array_of_doubles;
Some Cautions About Templates
Templates provide a convenient way of writing an algorithm or data structure only onceFor each instantiated template, the compiler creates a separate piece of compiled code
E.g., SomeArray<double>, SomeArray<int>, and SomeArray<SomeClass> creates three different implementations of SomeArray in memory
Templates are considered a primary contributor to code bloat because of this property
Care should be taken in template classes to only include methods that depend on the templated type
Other functionality can be moved to a standalone function, or to a non-template base class