Removing duplicate rows (SQL)

Oracle MySql DuplicatesAt least once in your career you will have to deal with duplicate rows causing havoc on you application.  This post will help you get through that dilema.  I have Oracle and MySql examples below.

ORACLE Specific

With Oracle with have the luxury of ROWID which uniquely identifies a row in an Oracle table. We will use this pseudo column to remove the duplicates.

Simple Method

This is the simplest method which removes the latest duplicate row added. If you want to remove the earliest you need to change  MAX to MIN and replace “less than”(<) with “greater than”(>).

DELETE FROM [TABLE] A
WHERE ROWID <  ( SELECT max(ROWID)
FROM [TABLE] B
WHERE A.[PRIMARY KEY FIELDS] = B.[PRIMARY KEY FIELDS]);

Another method with constraints

This is another method you can use if you have constraints. This method uses the Oracle function “Exists” which checks for the existence in a sub-query.

DELETE FROM [TABLE] A
WHERE CONTRAINT = [VARIABLE]
AND EXISTS ( SELECT ‘X’
FROM [TABLE] B
WHERE A.[PRIMARY KEY FIELDS] = B.[PRIMARY KEY FIELDS]
AND A.ROWID < B.ROWID);

MYSQL (These would also work in Oracle )

With MySQL  we do not have the tools that Oracle has to easily find the duplicates. But most MySql tables use an auto-increment ID field that helps us to identify the duplicates.

Simple Method

As with the Oracle method use the “greater than” sign to to keep the earliest row entered. We can change this  to pull the latest  by replacing “greater than” with “less than”.

DELETE A FROM [TABLE] as A, [TABLE] as B
WHERE A.[UNIQUE FIELD(S)] = B.[UNIQUE FIELD(S)]
AND A.ID > B.ID;

Without the ID field

Now it gets complicated in MySQL, we need to create a table with an unique ID then remove the duplicates. Once the dups have been remove  then  we can put the data back onto the original table.

Drop the dups table if it exists.

DROP TABLE IF EXISTS [TABLE]_dups;

Create the dups table.  This table will be the base table(table with dups) with the ID added.

CREATE TABLE [TABLE]_dups (
id INT(11) default NULL auto_increment,
[ALL TABLE COLUMNS],
PRIMARY KEY (id)
);

Insert into the dups table from the base table.

INSERT INTO [TABLE]_dups
SELECT NULL,[UNIQUE FIELD(S)]
FROM [TABLE];

Delete the dups using the SQL we used in the prior example

DELETE A FROM [TABLE]_dups as A, [TABLE]_dups as B
WHERE A.[UNIQUE FIELD(S)] = B.[UNIQUE FIELD(S)]
AND A.ID < B.ID;

Delete the Base table

DELETE FROM [TABLE];

Insert into the base table from the dups table.

INSERT INTO [TABLE]
SELECT [all columns less the ID field]
FROM [TABLE]_dups;

Remove the dups table

DROP TABLE [TABLE]_dups;

I hope this posts help you when you run into this problem… and we all run into this problem.