Algorithmic Methods of Inquiry
Design
Theory
Empirical
Testing
Education
Discovery
Abstraction in Programming Design Abstraction means deliberately ignoring the details of solution in order to concentrate on overall features that are essential to the job at hand. Designing algorithms can often be complicated. Ignoring some of the complicating details while working towards the solution process at a higher level can help turn an overwhelmingly large problem into a series of smaller individually manageable subproblems. (Divide and Conquer).
In this course we will attempt to build data structures that will stand the test of time and facility. That is we will build into the structure the minimum amount of behaviors that are needed to simulate the object of concern. Nothing more. The code to be designed and tested shall be a little as possible. Our data structure will be required to concentrate only on a single element at a time. No displays of data will take place in the data structure itself for other than debugging purposes. All information regarding the performance of the structure must be unambiguously communicated to the calling program by way of return codes. Should we find that a feature might be desireable, then we will first investigate if the feature could be attained by using a combination of those methods already available in the structure. Furthermore, if we can argue that a feature should not be included as a data structure behavior, then that feature if needed would be in the province of the calling programming.
The Array An ARRAY is a ordered set of a specific, finite number of same-type nodes, whose structural properties involve the relative positions of the nodes. The nodes are randomly (or directly) accessible by an integer index. The addresses of array members occur at contiguous locations in the virtual storage of the computer. The Array by definition is fixed in size. If you require a larger or smaller array you cannot alter an existing Array, but must create another, then populate it. Order is not of great importance in an Array. There are two basic operations for accessing an Array: extraction and storing. The number of elements in an Array is called its range. The first element in an Array will be in the zeroth slot. The maximum defined element will occupy the (N-1) slot. You populate an Array by storing information into the first available free slot. To query or access a stored element you directly address it using the slot location index.
Further behaviors would require custom design, such as: To store an element into an already populated position may require that all elements on one side of the insertion point be shifted by one slot location to accommodate the change. Conversely, to remove an element without leaving a “gap” would require that all remaining elements are shifted by one slot location to contract the empty position. To find an unordered element would require searching every element position.
While most all programming languages provide us with an implementation of the Array they may not provide all the behaviors that we need. Array implementations do commonly include the ability to query, store to and extract from a particular location. But we may also like to store in the first available empty location, find the location of a given value, and provide the count of occupied slots. These functions we will have to program ourselves.
Minimal
Design Considerations
Boundary Conditions include the following: Logic to handle addition of next element in a FULLY POPULATED data structure. Logic to handle removal of an element from an UNPOPULATED data structure.
A note about your labs. All labs must conform to the following structure: Programming will adhere to Object Oriented Programming principles. We will have a User Interface that serves and the main program. The user interface, instantiates the data structure that we are working with and calls on its methods to perform the required function. The Data Structure will contain all the methods (routines) and properties (variables) that will define its behaviors and its current state. Each program class instantiates the next level of data structure. Such as the user interface instantiates the Array class, then the Array class instantiates copies of the NodeElement class. By the time we study Hash Tables we will have a chain of dependent structures working for us. The User Interface program will call the methods of the Hash class, the Hash class will call the Double-Linked Queue class, the Double-Linked Queue class will call the Single-Linked Stack class, and the Single-Linked Stack will call NodeElement class. Do Not Attempt to place all components into the same Class. You will be LOST.
To access a sample of User Interface source code for both JAVA and C++: arraytest.java, arraytest.cpp,
To access the NodeElement data structure source code used by our labs, for both JAVA and C++: NodeElement.java, NodeElement.h, NodeElement.cpp,
|
A STACK is a linear arrangement based upon the foundation of an array with the requirement that all insertions and deletions (and queries) are made at a single end of the structure, usually known as the TOP of the stack. The order in which elements can be retrieved from a stack is Last-In-First-Out (LIFO). Inserting a data item on the top of the stack is called "pushing" into the stack, while removing an item from the top is called "popping" off the stack. Examining the contents of the top slot of the stack is called "peeking" or "querying the top".
The Stack can be thought of as an Array with additional certain constraints placed upon access to its elements. While any of an Array's nodes can be directly accessed, by definition you can only "see" the TOP node of the Stack.
The TOP property is all important to the Stack because it points to the only view that the user has of the Stack's contents.
Note: To save some coding effort you can consider making the Stack an extension of Array.
Types of Stacks Simple
Stack
Linked
Stack
Dynamic
Linked Stack
Where Stacks are used Cafeteria
plates
|
An algorithm is RECURSIVE if it defines a procedure in terms of a simpler case of itself.
For a recursive definition to work, the definition in any given case must be well-defined, avoiding an infinite regress. That is: 1) Every recursive call must simplify the computation in some way. 2) There must be special termination condition or stopper cases to handle the simplest computations directly, without the need to call an even simpler case of itself.
Sometimes the term inductive definition is used as a synonym, because recursion derives general principles from specific observations that something is true.
A recursive call can result in many further recursive calls because the each call is dividing the problem in new smaller subproblems, or smaller cases of the same problem. The recursive function is said to converge to a solution.
For a recursive method to terminate, the problem must eventually be reduced to a stopping case. The stopping case is a trivial result to the current level of the recursive procedure that requires virtually no computation to produce an answer. The trivial result is usually a fundamental constant or an conventional mathematical identity.
For at least one argument, a recursive function must be defined in terms that do not involve calling the function itself. There must be a "way out" of the sequence of recursive calls. When reached, the method returns a result to its immediate caller. This caller, in turn, continues the computation at the point that it left off, and returns another result to its own caller. This process continues unwinding the sequence of recursive calls until a result is percolated back up the calling chain to the original caller.
The difference between a circular definition and a recursive definition is that a recursive definition must have base cases. Base cases are those that satisfy the definition without being defined in terms of the definition itself, and all other cases comprising the definition must be successively "smaller".
Pitfalls Infinite recursion happens either because the parameter values passed during a recursive call do not get simpler or because a special terminating "stopper" case is missing or never reached. (ie. the function provides no guarantee of convergence; it is divergent or harmonically oscillates.) Missing base case. Excessive memory requirements. Excessive recomputation performed.
Recursive
function call implementation requirements:
Each call to the recursive function causes a new variable data save area to be allocated and each reference to a variable in the function's data save area is local to the most recently allocated area. Each return or unwinding causes the current data save area to be released, and the data area allocated immediately prior to the current area becomes reconstituted as the new local data area.
Recursion vs. Iteration The difference between a recursive function and an iterative function is that the iterative type calls for the explicit repetition of some process until a certain condition has been met. Recursion is generally more expensive in computing resources than non-recursive iterative methods. The run environment must store and recover all local and temporary variables and the "return control" addresses prior to calling and upon returning from each recursion operation. A stack is generally used for this purpose. The costs for calls to push and pop as well as handling of boundary conditions could become expensive. In a non-recursive procedure the stacking of unnecessary variables can be eliminated. However, from a human programming point of view, the recursive solution may be the most natural and logical way of solving a problem.
To access the JAVA source code: recursetest Factorial Fibonacci Sumtorial To view Tower of Hanoi Algorithm demonstration Launch
|
ALLOCATION is the acquiring of storage to accommodate the data structures and supporting code of a given task. DEALLOCATION is the releasing of previously allocated virtual storage.
Instantiation is an allocation of storage into which variables and program code are loaded at a unique location in virtual memory to support an instance (a copy) of that program. The instance is called an object. The instantiation process returns as its output a reference variable that serves as a pointer to the object created in virtual memory. The reference variable provides addressability to the instance. A special method, in the instance, called a Constructor is then invoked to request the initialization, if so desired, of the instance's state properties (aka. variables). The operating system (OS) or Java Virtual Machine (JVM) services the request for instantiation.
Simple Queues A QUEUE is a linear list (if linked members) or an array (if contiguous members) with the characteristics that all insertions are made at one end of the list, and all deletions are made from the other end. Data is processed First-In-First-Out (FIFO). Like a stack, a queue can be thought of as an array with certain constraints placed upon access to its member elements.
BOUNDED QUEUES which are of fixed length such as the "pipe queue" and "circular queue" . In a pipe the items are inserted at one end and then migrate towards the other end. In a circular queue items are place in the next sequential slot and do not move, but employ a set of pointers that change value.
UNBOUNDED QUEUES which can expand or contract, such as the "linked queue". A
PRIORITY QUEUE has all the attributes of the
standard QUEUE with the added capability that items can be
inserted into, (or removed from), the structure at any position
(prioritizing).
LISTS A LIST is an ordered set of elements. Order is imposed on the list by linkages to another node of the data structure. Lists differ from arrays in that they are not necessarily arranged in contiguous storage locations. The next element may appear anywhere in storage. Lists have a disadvantage compared to Arrays in that to access an element stored in a List you must traverse the structure by following the linkages; whereas the Array index permit direct random addressability. Lists have an advantage over Arrays when inserting or removing a node from other than the structure's endpoints. Only a simple adjustment of the linkage pointers is required.
A
LINKED LIST is an ordered set of related elements
that are logically connected where each record is linked to the
next by a single pointer.
|
Up until now we have studied LINEAR data structures (Array, Stack, Queue, List). With linear structures data is arranged in a sequential manner. Each node in these structures has a logical beginning node and a logical ending node. Each node has a previous node and a following node.
You want to choose the best structure for your data on the basis of: Logical representation of the type of data. The efficiency or cost of operations (CPU and Time considerations). Memory consumption. Ease of implementation and maintenance.
A TREE is a non-linear structure formed from a collection of nodes that allows us to organize data in a hierarchical (pyramid like) manner. That is: There exists a primary node and all others are connected in a single path directly or indirectly to that node. Each node in the tree may contain a reference to both a node closer to the primary node and also to nodes further away from the primary.
Real-world tree applications occur in many places: genealogy, sports tournaments, corporate organizations, military ranking, and operating system file-management structures, network routing, etc. Since trees are structurally more complex than linear representations, they have a more extensive nomenclature for referring to their various components. Root, decendants, leaves, branches, etc.
Some general tree constraints: No node can be its own ancestor or its own descendant. Two nodes may not have the same child. (Both the above would introduce a loop during traversal operations.)
Binary Trees A special case of the tree data structure is the BINARY TREE. The binary tree consists of a finite set of single-linked elements that is either empty or partitioned into at most 3 disjoint subsets. The first subset contains a single element called the ROOT of the tree. The other subsets are called the left and right subtrees of the original tree. These subtrees are in themselves binary trees.
A tree NODE represents a single informational element of the tree structure.
A node with no children is called a LEAF.
The LEVEL of a node in a tree is the node's distance (number of links) from the root.
The
HEIGHT of a node in a tree is the node's distance (number
of links) from a leaf node.
The series of nodes connecting an ancestor to a descendant is called a PATH. Its length is known as the PATHLENGTH.
The number of CHILD links is always the number of nodes of a tree minus one.
Tree
as a Recursive Data Structure
Definition
of an Ordered Tree A tree is said to be inorder if the following two requirements hold: (A)
Its left subtree contains only those values that are less than (or
equal) to its root value.
Visiting the nodes of a tree in a specific order is called a TRAVERSAL. Three traversal paths we will look at are: PRE-ORDER TRAVERSAL (also known as depth-first order) visits the root, then the left subtree in preorder, then the right subtree in preorder.
IN-ORDER TRAVERSAL (also known as symmetric order) visits the left subtree in inorder, then the root, then the right subtree in inorder.
POST-ORDER TRAVERSAL visits the left subtree in postorder, then the right subtree in postorder, followed by the root.
A THREADED Tree is one so constructed so that the right links of all leaf nodes point to the next node required to be visited during an in-order traversal. This provides a "short cut" or improvement in the number of node visits while traversing the tree. Programming demands for storage and computation are significantly reduced for a threaded tree.
|
A FIELD is a set of bytes that represents some abstract information. A
RECORD is a collection of related fields.
A
SORT is a process that rearranges the records of a file
into a sequence that is sorted on one or more key(s). A sort is known as stable if the arrangement of all records of the same key are preserved after the sort.
A merge is the process of combining two or more sorted files into a resultant sorted file.
Considerations in determining the efficiency of a particular sort
CPU
time
Programs that require less wall clock time to perform a sort usually require more memory and more frequent CPU cycles. Sorts that require less memory and CPU generally need more external storage to move data and elapsed wall clock time. Sorts whose work can be broken up into disjoint, independent units can be dispatched on separate CPUs, where the processing can execute in parallel.
The speed classification or order of sorts
An algorithm is said to be of polynomial time if its running time is upper bounded by a polynomial expression in the size of the input for the algorithm, i.e., T(n) = O(n**k) for some constant k
Exponential order sorts grow more rapidly in the processing time required with the increase in the number of items sorted (N). An algorithm is said to be exponential time, if T(n) is bounded by O(2**nk) for some constant k. That is as the number of items to be sorted increased linearly, the time to perform the sort increases exponentially.
Sub-exponential time is used to express that the running time of some sort algorithm may grow faster than any polynomial but is still significantly smaller than an exponential. In this sense, problems that have sub-exponential time algorithms are somewhat more tractable than those that only have exponential algorithms.
Fastest …............. Faster …............. Somewhat Fast …............ Not Fast Logarithmic << Linear << Polynomial << Sub-Exponential < Exponential
Sort Complexity The common sorting algorithms can be divided into two classes by the complexity of their algorithms. Algorithmic complexity is a non-trivial subject, suffice it to say that there's a direct correlation between the complexity of an algorithm and its relative efficiency.
Orders of Functions Algorithmic complexity is generally written in a form known as Big-O notation, where the O represents the complexity of the algorithm and a value n represents the size of the set that the algorithm is run against. Big-O notation is frequently used to describe a sort algorithm's use of computational resources usually with regard to consumed CPU cycles and wall-clock time.
For example, O(n) means that an algorithm has a linear complexity. In other words, it takes ten times longer to operate on a set of 100 items than it does on a set of 10 items (10 * 10 = 100).
Complexity of O(log n) means that an algorithm has logarithmic complexity. It takes the two times longer to operate on a set of 1000 items vs a set of 100 items. This is extremely efficient.
The complexity was O(n**2) (quadratic complexity with exponential order), then it would take 100 times longer to operate on a set of 100 items than it does on a set of 10 items.
The classes of strongly polynomial algorithms which run in O(n**2) and less, include the bubble, insertion, selection, and shell sorts;
If the sort is on order of O(c**n) this has geometric complexity. If the sort is on order of O(n!) this has factorial complexity. Weakly polynomial would include any sorts which run in O(nk) time and O(n log n) (polynomial order)which includes the heap, merge and quick sorts.
A Bubble Sort passes through the file sequentially several times. Each pass consists of comparing each element in the file with its successor neighbor ( x[i] with x[i +1] ) and interchanging the two elements if they are not in the proper order. Note that after each pass or iteration, the (nth - i) element is in the proper order, so you can improve the sort time by reducing the scans to the remaining (n - i - 1) items. The bubble sort is generally considered to be the most inefficient sorting algorithm in common usage. Under best-case conditions (the list is already sorted), the bubble sort can approach a constant O(n) level of complexity. General-case is an abysmal O(n**2).
A Bi-directional Bubble Sort improves on the Bubble Sort by alternating the direction of successive passes through the file between the top and the bottom. In this way the smaller elements move quickly to the top of the order in the same way that the larger elements move to the bottom of the file. The sort saves some time by not resetting its pointers back to the beginning of the remaining unsorted items. Every successive pass compares the remaining unsorted items of the previous pass.
A Selection Sort works by scanning for the smallest item remaining in the unsorted part of the list, and then swapping it with the item in the next position to be filled of the sorted part of the list. The selection sort has a complexity of O(n**2). A relatively slow sort.
The Quick Sort, also called the partition exchange sort, first provides a partitioning method for the Ath element (called the pivot) to seek its final position J in the partition. At this point elements in positions 0 through J-1 are less than or equal to A; and all elements in positions J+1 through N are greater than or equal to A. The process is recursively reapplied on both newly formed left and right partitions until all partitions are of size 1. Now the file is sorted.
Now recursively sort the new partition to the left of the pivot, and then sort the new partition to the right of the pivot (not including the pivot in either sort).
The efficiency of this algorithm is impacted by which element is chosen as the pivot point. The worst-case efficiency of the quick sort, O(n**2/2), occurs when the list is sorted and the left-most element is chosen. Randomly choosing a pivot point rather than using the left-most element is recommended if the data to be sorted isn't random. As long as the pivot point is chosen randomly, the quick sort has an algorithmic complexity of O(n log n).
The Heap Sort is an implementation of the general selection sort using the input array as a heap representing a descending priority queue. The pre-processing phase creates a heap of size N using the sift up operation, and the selection phase redistributes the elements of the heap in order as it deletes elements from the priority queue using the sift down operation. The Heap Sort has a complexity of O(n log n). In the average case, the heapsort is not as efficient as the quicksort. The Heap Sort requires twice as much time as Quicksort for random input.
INSERTION SORTS are similar to Exchange Sorts, but the essence of their algorithms insert into an existing sorted file. A Simple Insertion Sort is one that sorts a set of records by inserting records into an existing sorted file. Such as you would sort a hand of cards. Like the Bubble Sort, the Insertion Sort has a complexity of O(n**2). Although it has the same complexity, the insertion sort is a little over twice as efficient as the bubble sort.
A Shell Sort sorts separate subfiles of the original file. These subfiles contain every Kth element of the original file. Each subfile is sorted by simple insertion, a new smaller K is chosen, and the file is again partitioned into a new set of subfiles and sorted again. This process is repeated with smaller values for K until K is set to 1. At this point the entire file is sorted. Invented by Donald Shell in 1959, the shell sort is the most efficient of the O(n**2) class of sorting algorithms. Of course, the shell sort is also the most complex of the O(n**2) algorithms.
Parallel Execution SORTS are sorts where the work can be divided into disjoint units that can be dispatched simultaneously on separate processors (CPUs). While these sorts can be run on uniprocessors, they will not be high performance. The Odd-Even Transposition Sort and the Shear Sort are examples of parallel execution sorts.
MERGE SORTS involve combining two or more sorted files into a third sorted file. The merge sort uses a technique to sort a file by dividing the file into N subfiles and merge adjacent pairs of files. The process is repeated until there is only a single file remaining of size N. There are no more than O(n log n) comparisons in the merge sort involving less than N comparisons, but is does on average require twice as many assignments as quicksort.
To view a comparative Sort Algorithm demonstration Launch Here
|
|
The HASH Table is an associative array data structure that relates keys with data values. The primary operation it supports is to efficiently and rapidly provide store and retrieval/lookup functions. It is employed where speed of the access and quick retrieval of data is of primary importance. The data structure elements are divided into (preferably) equal-sized categories, or buckets, to allow quick access to the elements. The hash function determines in which bucket an element belongs.
Thus the second part of the hashing search (or storage) requires a collision-resolution process.
Hash Key Computation The arithmetic computation is usually simple to implement, but you must use caution to avoid various subtle pitfalls. If the resultant hash table is to hold N number of items or keys then we must ensure that our hashing computation results in a number between [0, N-1]. The ideal hashing function should evenly distribute its output across the span of the table's keys. Avoid clumpiness due to collisions, and gaps. You must be familiar with the data, and its variance, to be able to generate a suitable hashing function and to predict the table size needed. Hashing provides a way to use a reasonable amount of both memory and computation time to strike a balance between the consumption of these two resources.
Pseudocode for sample LinkListds linklistds Pseudocode for sample String Hashing Calculation hashstringcalc 100 Random integers (1-1000) for input to lab exersize Click Here
|
[Return to Professor Page]