Friday, 10 July 2015

Algorithm for dummies —— Sorting

As a code monkey, I am not very familiar with the fundamental algorithms. I decide to start learning those because of this wonderful book, which urges me to learn more about algorithms.

Bucket Sort

Sorting is the first algorithms introduced, since it exists everywhere. For instance, to sort 5 numbers 5, 3, 5, 2, 8, we have 8 5 5 3 2. How to do this effectively using computers? Very simple, a 1D array will do.

We initialize an array int a[11], with all zeros. The first number is 5, so we increment a[5] to 1. Doing this for all the numbers, we have a[2] = a[3] = a[8] = 1, a[5] = 2, while the rest are zeros. If we print out the nonzero elements in a, with its value indicating the number of times we print it, the result is "2 3 5 5 8".

This is a simplified version of bucket sort. It is analogous to have 11 buckets, indexed by 0-10. Every time we have a number, we put a flag in the corresponding bucket. We only need to count how many flags in each bucket. The purpose of bucket is to record the number of appearance.

Regarding complexity, we have to iterate m times to throw the flag in m buckets, and iterate n times to throw the flag for n numbers, and do the same again for checking, so the complexity is O(m+n+m+n), which is O(2*(m+n)). Usually we ignore the lesser constants and use upper case so is O(M+N).

The problem regarding bucket sort is that we can only print out the numbers, and imagine they are test scores, we cannot tell who gets what score. In order to do that, we need bubble sort.

Bubble Sort

The basic idea about bubble sort is to interchange two adjacent elements if their order is wrong. For instance, we want to sort 12 35 99 18 76 in descending order, then the smaller numbers are placed at the back. We start with 12, 35, and find 12 < 35, so we change them to 35 12 99 18 76. We then proceed in the same way and since 12 < 99, we have 35 99 12 18 76. At the end, we will have 12 at the last place. This process is analogous to how bubbles come out to the surface, so we call it bubble sort.

In each process, only the index of one number can be determined. In other words, the first time only determines the last number, and the second time only determines second last number. If we have n numbers, we have to place n - 1 numbers in the correct index. Each time, we start with the first number and compare two adjacent numbers. The core of bubble sort is double iteration, and it has O(N^2), which is high.

Quick Sort

Suppose we are sorting 6 1 2 7 9 3 4 5 10 8,we can randomly choose a reference number first, eg 6. We need to put all numbers smaller than 6 to its left while putting those larger than it to its right such as: 3 1 2 5 4 6 9 7 10 8.

For the original sequence, 6 is in the first, and we need to move it to position k such that on its left < 6 and its right > 6. To do this, we can start with two ends, first look for a smaller number from right to left, and then look for a larger number from left to right, then swap them. For the sequence above, we use two "soldiers" i, j. At first, i points to 6, and j points to 8. After searching, j points to 5 and i points to 7, then we swap i and j. Now the sequence is 6 1 2 5 9 3 4 7 10 8. Then the second iteration begins, and j stops after he finds 4. i stops after he finds 9 and then stops. They swap again. The sequence is 6 1 2 5 4 3 9 7 10 8. The inspection continues for the third iteration, and now j finds 3, whereas i could find any number until he meets j. The task is over. Just swap 6 and 3, and it becomes 3 1 2 5 4 6 9 7 10 8.

Next we need to deal with the sequence on the left and right of 6. Starting with left one, 3 1 2 5 4, and use 3 as reference, then we have 2 1 3 5 4. Subsequently, we deal with 2 1 and 5 4. and then 9 7 10 8, we get 1 2 3 4 5 6 7 8 9 10. In essence, every iteration deals with locating the reference number.

The reason why quick sort is quick, is because it jumps over numbers, instead of comparing adjacent ones like bubble sort. In the worst case, it is $O(N^2)$, but the average is $O(NlogN)$.

Example

Say we want to build a library, we need to buy books. Every book has a unique ISBN number. The librarian asks the students to report the ISBN of their favourite books. The best sellers will be reported multiple times and the repeated ISBN needs to be deleted. Then the books are sorted according to ISBN.

The first input is n, represent n students. The second input is all the ISBN numbers. The first output is k, number of books to buy. The second input is all the ISBN in ascending order.

There are two ways. 1. Delete the repetitions, and then sort. 2. Sort first and delete repetitions during output.

For method 1, we can use bucket sort, which has $O(M+N)$. For method 2, we can use bubble sort or quick sort, and then only print out one number of repetitions. Sorting has $O(N^2)$, and both input output have O(N). We drop 2N and the final one is $O(N^2)$.





No comments:

Post a Comment