|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
General Java Pointer Free Data Structures
Pointer Free Data Structures
Mar. 1, 1996 12:00 AM
Last month, we began looking at building data structures in Java. The idea for the article was inspired by the constant posts to comp.lang.java from people who were lost without pointers. The data structures which we introduced were useful, but were really more of a starting point for some more advanced data structures. The third data structure covered last month was a binary search tree. As you will remember, binary search trees are great for storing large amounts of information, as search time is much faster than for a linked-list. The problem with binary trees, however, is the worst case scenario. When the input stream is already in order (reverse or forward), then our binary tree becomes a simple linked list. This, of course, can bring our worst search time from O(lgN) all the way down to O(N) which is no good when you have to search large amounts of data. The term applied to trees which have become weighted to one side is unbalanced. Figure 1 shows an example of a very unbalanced tree, the result of an input stream of 1-2-3-4-5-6-7. A solution to our problem is obvious; we must make sure that we do not allow our trees to become unbalanced. This, however, is easier said than done. It obviously requires manipulation of a variety of links and if the utmost attention is not spent implementing your algorithms then you can end up with a terrible mess of nodes which will be no fun to untangle. Various computer scientists have come up with balanced tree algorithms. One of the more popular implementations is the Red-Black tree implementation. These algorithms are actually based on another, more complicated, type of balanced tree: 2-3-4 trees. Before I confuse anyone, let's fall into some theory. If you are already familiar with these topics then feel free to jump to the next section, where we begin our pointer-free implementations. If you are unfamiliar with these topics or if you are just a little rusty, then you will have a full understanding in a few minutes. Since Red-Black balanced trees are based on 2-3-4 balanced trees we will start our discussion there. Then we will discuss their disadvantages, and finally we will discuss Red-Black balanced trees. One of the main constraints of binary search trees is the fact that they can only connect to two unique nodes. 2-3-4 balanced trees - as the name implies - get around this by allowing each node to link to 2, 3, or 4 unique nodes. Additionally, each node is allowed to have 1, 2, or 3 keys. Figure 2 shows our tree from Figure 1 implemented as a 2-3-4 balanced tree. See how much better everything is now. If you were paying attention to some of my earlier comments, then you will remember when I stated that balanced trees were easy to discuss in theory, but were difficult to implement. This also applies to 2-3-4 trees. It is not hard to see that the methods used to manipulate these trees would be a mess of code that only the hard-core geeks could enjoy. A happy medium of binary search trees and 2-3-4 balanced trees has been developed, and this is what we will implement in this month's column. The "happy medium" comes in the form of something called a Red-Black balanced tree, which is basically a binary representation of a 2-3-4 balanced tree. The caveat here is that the node class uses an extra field to hold a color variable, and we represent different node structures (2, 3, or 4) by altering each node's color variable. Take a look at Figure 3. Here I show how we will represent 3 and 4-nodes by altering the color variable of different binary nodes. Figure 4 shows the tree from Figures 1 and 2 represented as a Red-Black balanced tree.
There are a few rules which Red-Black trees must conform to, and we will institute various checks throughout our methods to insure that these rules are conformed to at all times. The rules are: Let's take a break from theory now, and begin coding. If you are still a little unclear as to just how we will insure that the rules will be adhered to, don't fret; we will build two methods (rotate and split) which will take care of everything for us. First, let's build an object which will represent one node. The object will need to have fields for the key (for simplicity we are using ints), and for the color. To save space we are using just one byte for the color, which will take a value of 1 for red or 0 for black. Additionally, we will need to create links to the left, right, and parent nodes. Listing 1 shows the node class. The code is basic, and can be easily understood from the in line comments. It should be noted that all of the get methods are designed to throw an exception if they are asked for a null node. This exception is defined in Listing 2. As the first comment stated, RBNode will never be manipulated directly by the user. All interaction with the tree will be handled by creating a new instance of the RBTree class (Listing 3). RBTree implements two methods, rotate and split, which manage the balancing of our tree. Let's first take a look at how they would deal with the insertion of a 11 into a Red-Black balanced tree which already contains a few nodes. Once I demonstrate what they do, I will discuss how they do it, and why this is necessary. Note: If you feel that unnecessary time is spent transferring extra objects through the methods (grand, great, curr, child, etc...) then by all means rewrite the code. I simply felt that the code was much easier to understand this way. If anyone does implement the algorithm without the object passing then please do some benchmarks and let me know which is faster; I am always eager to see my code improved upon. Plus, you will get your results published in next month's edition! Figure 5 shows a Red-Black balanced tree waiting for insertion of the number 11. As soon as insert attaches the new node containing 11 to the end of the tree (figure 6), it calls split and passes references to the parent, grand, and great objects. It then checks to see if the parent is a red node; if it is, one to two rotations are preformed on the links. Having to preform two rotations is obviously not something that we want to do too often. Sacrificing too much speed on an insert is not something that we want regardless of how fast a search is. Red-Black trees are great for storing a dynamic data set. If it takes too long to add to that data set then we will not be too pleased. Split by itself does not really do too much work; rotate does the meat of the reorganizing. For our example, only the second call to rotate will happen. It will be passed by the int which has just been inserted, and also the great (great-grandparent) node. It should be noted that cases where two rotations are required are not likely to occur often. Rotate now uses the key passed into it to "rediscover" the locations of great's child and grandchild. The rotate method wraps itself up by preforming three "link restructures". As I feel closely tied to the idiom that ends "...show me and I understand", I hope that you now have a detailed understanding of how Red-Black trees stay balanced. Before I leave you with the final code listing I want to stress a few points regarding the function of our methods. First, it is important to note that 4-nodes are something which we try to keep to a minimum. If you take a look at the snippet from our insert method, you will notice that as we perform each insert, any 4-nodes which are encountered are immediately broken up:
//is the current node a "4 node"
Also, you should note the standard points where the node structure is manipulated. These points allow us to follow the set up rules for Red-Black trees which we defined earlier. The points, with their associated action are:
Related material and references: Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News |
|||||||||||||||||||||||||||||||||||||||||||||||||