Managing Hierarchical Data in SQL

SQL databases are not designed for hierarchical data since the data is stored in flat relational tables with no built in parent-child relationship methods (Oracle does have the connect_by function to help).  XML databases on the other hand are built for hierarchical data.

So with SQL databases we have a problem that requires a solution.  Thanks to the SQL gurus out there we have three models to resolve this issue.  In this post I will point you to these resources.

What is hierarchical data?  The best example of a hierarchical model is a company organizational chart; other examples are product categories, forum categories and site maps.  With a hierarchical model you have a tree like structure that needs to be traversed to get to the node you’re looking for.

Hierarchical Data

The Adjacency list model

This is the simplest method to implement, but it’s the slowest due to recursion.  So if you know the table will be large, use one of the other models. For a small table this would be the answer.

You simply add a parent column that links to the parent entry.  The root entry will be NULL because it does not link to any other node.

Node Parent Node
Home
Products Home
CD’s Products
LP’s Products
Artists Home
Genre Artists
R&B Genre
Rock Genre
About US Home
CAT_ID NAME CAT_PARENT
1 Home
2 product 2
3 CD’s 2
4 LP’s 2
5 Artists 1
6 Genre 5
7 R&B 6
8 Rock 6
9 About Us 1

Traversing the table

SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4

FROM category t1

LEFT JOIN category  t2 ON t2.cat_parent = t1.cat_id

LEFT JOIN category  t3 ON t3.cat_parent = t2.cat_id

LEFT JOIN category  t4 ON t4.cat_parent = t3.cat_id

WHERE t1.name = ‘Home’;

LEV1 LEV2 LEV3 LEV4
Home Artists Genre R&B
Home Artists Genre Rock
Home About Us
Home product LP’s
Home product CD’s

Resources

Trees in Oracle (using connect by)

Adjacency list defined

Adjacency list model (Joe Celko )

Materialized Path model

The idea with the Materialized path model is to link each node in the hierarchy with its position in the tree.  This is done with a concatenated list of all the nodes ancestors.  This list is usually stored in a delimited string.  Note the “Linage” field below.

CAT_ID NAME CAT_PARENT Lineage
1 Home .
2 product 1 .1
3 CD’s 2 .1.2
4 LP’s 2 .1.2
5 Artists 1 .1
6 Genre 5 .1. 5
7 R&B 6 .1. 5.6
8 Rock 6 .1. 5.6
9 About Us 1 .1

Traversing the table

Select lpad(‘-‘,length(t1.lineage))||t1.name listing

From category t1, category t2

Where t1.lineage like t2.lineage ||’%’

And t2.name = ‘Home’;

Order by t1.lineage;

Listing
Home
-product
–CD’s
–LP’s
-Artists
–Genre
—R&B
—Rock
-About Us

Resources

More Trees & Hierarchies in SQL

Trees in SQL

Using Materialized Path to create a paths table

The Nested Set Model

This is the best model for selecting information out of a relational database but slow in adding and deleting. So if you have a static structure this would be your choice.

This model has been championed by Joe Celko and is detailed in his articles and book “Trees and Hierarchies in SQL for Smarties
“.

Nested Set Model - Pure Performance
Nested Set Model
CAT_ID NAME CAT_PARENT Left Right
1 Home 1 18
2 product 1 2 7
3 CD’s 2 3 4
4 LP’s 2 5 6
5 Artists 1 8 15
6 Genre 5 9 14
7 R&B 6 10 11
8 Rock 6 12 13
9 About Us 1 16 17

Traversing the table

(t2 is the parent)

SELECT  lpad(‘-‘,COUNT(t2.name)-1)||t1.name,t1.left_node

FROM category t1,category t2

WHERE t1.left_node BETWEEN t2.left_node AND t2.right_node

GROUP BY t1.left_node,t1.name

order by t1.left_node;

Listing LEFT_NODE
Home 1
-product 2
–CD’s 3
–LP’s 5
-Artists 8
–Genre 9
—R&B 10
—Rock 12
-About Us 16

Resources

Storing Hierarchical Data in a Database (using PHP)

Managing Hierarchical data in MySql

Sql for Smarities article (Joe Celko)

Nested Set Model defined

Definitions

Node – Abstract basic unit used to build linked data structures. Each node contains data and possibly links to other nodes.

Root – Node with no parents.

Leaf – Node with no children

Child – direct subordinate node.

Sibling – Node that shares the same parent.

Ancestors – Parent or other superior node.

Descendants – all subordinate nodes

Removing duplicate rows (SQL)

Oracle MySql DuplicatesAt least once in your career you will have to deal with duplicate rows causing havoc on you application.  This post will help you get through that dilema.  I have Oracle and MySql examples below.

ORACLE Specific

With Oracle with have the luxury of ROWID which uniquely identifies a row in an Oracle table. We will use this pseudo column to remove the duplicates.

Simple Method

This is the simplest method which removes the latest duplicate row added. If you want to remove the earliest you need to change  MAX to MIN and replace “less than”(<) with “greater than”(>).

DELETE FROM [TABLE] A
WHERE ROWID <  ( SELECT max(ROWID)
FROM [TABLE] B
WHERE A.[PRIMARY KEY FIELDS] = B.[PRIMARY KEY FIELDS]);

Another method with constraints

This is another method you can use if you have constraints. This method uses the Oracle function “Exists” which checks for the existence in a sub-query.

DELETE FROM [TABLE] A
WHERE CONTRAINT = [VARIABLE]
AND EXISTS ( SELECT ‘X’
FROM [TABLE] B
WHERE A.[PRIMARY KEY FIELDS] = B.[PRIMARY KEY FIELDS]
AND A.ROWID < B.ROWID);

MYSQL (These would also work in Oracle )

With MySQL  we do not have the tools that Oracle has to easily find the duplicates. But most MySql tables use an auto-increment ID field that helps us to identify the duplicates.

Simple Method

As with the Oracle method use the “greater than” sign to to keep the earliest row entered. We can change this  to pull the latest  by replacing “greater than” with “less than”.

DELETE A FROM [TABLE] as A, [TABLE] as B
WHERE A.[UNIQUE FIELD(S)] = B.[UNIQUE FIELD(S)]
AND A.ID > B.ID;

Without the ID field

Now it gets complicated in MySQL, we need to create a table with an unique ID then remove the duplicates. Once the dups have been remove  then  we can put the data back onto the original table.

Drop the dups table if it exists.

DROP TABLE IF EXISTS [TABLE]_dups;

Create the dups table.  This table will be the base table(table with dups) with the ID added.

CREATE TABLE [TABLE]_dups (
id INT(11) default NULL auto_increment,
[ALL TABLE COLUMNS],
PRIMARY KEY (id)
);

Insert into the dups table from the base table.

INSERT INTO [TABLE]_dups
SELECT NULL,[UNIQUE FIELD(S)]
FROM [TABLE];

Delete the dups using the SQL we used in the prior example

DELETE A FROM [TABLE]_dups as A, [TABLE]_dups as B
WHERE A.[UNIQUE FIELD(S)] = B.[UNIQUE FIELD(S)]
AND A.ID < B.ID;

Delete the Base table

DELETE FROM [TABLE];

Insert into the base table from the dups table.

INSERT INTO [TABLE]
SELECT [all columns less the ID field]
FROM [TABLE]_dups;

Remove the dups table

DROP TABLE [TABLE]_dups;

I hope this posts help you when you run into this problem… and we all run into this problem.