DataWright Information Services

Consulting and Resources for Excel and Access




Deleting duplicate records in Access

  • The download file is available
    here.

Often when importing records you will need to deal with
duplicates. There are a number of ways to do this; one is to import
everything and then remove any duplicates, and the cleanest way to
do that is by using a subquery.

This example is based on part of the Order Details table from the
Northwind database. It contains unique records and duplicates
in a table called LineItems. This article describes three queries,
so that you can see the process behind building the final Delete
query.

Flagging the duplicate records

The first query has a subquery as an expression. This subquery is
used in the other two queries, so it will be described once only.
The SQL (written as a single line but broken up here for clarity)
looks like this:

(SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)) AS Counter
  • The subquery is enclosed in parentheses and then given a
    field name (Counter). The field name is only required when you
    want to display the field; if it is hidden the field name /
    alias is unnecessary.
  • The subquery is based on the same table as the main query,
    but it too is given an alias (L) to distinguish the two.
  • The WHERE clause uses two criteria. The first compares
    however many fields are needed to distinguish duplicates (in
    this case three), building a concatenated key for the
    comparison. The second numbers all matching records in ID order.

The full SQL for the query looks like this:

SELECT LineItems.*, 
(SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)) AS Counter
FROM LineItems;

The extra field displays the duplicate number for each record.

 The subquery expression shows the duplicate number for each record

Showing only duplicates

To display only the duplicates, two changes are made in the grid.
The first is to hide the subquery field and the second is to put >1
in its criteria. The resulting SQL looks like this:

SELECT LineItems.*
FROM LineItems
WHERE ((((SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)))>1));

There are two structural changes in this query, which now
displays only the duplicate records.

  • There are now two WHERE clauses, both after the FROM clause
  • Access has gone berserk with the parentheses. The extras are
    not required but the query engine will add them back if you
    remove them, so I usually leave them as they are.
 The second query only displays duplicate records

Deleting duplicates

The final step is to change the first SELECT to DELETE. This
query will remove the duplicates and leave the originals (and any
unique imports) alone.

DELETE LineItems.*
FROM LineItems
WHERE ((((SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)))>1));

It's not necessary to build the first two stages if you want to
delete duplicated records but this approach will help you to check
the results as you go, so that you can be confident of removing only
the unwanted duplicate records.

Run this query and the duplicates will be purged.