DataWright Information Services

Consulting and Resources for Excel and Access

Deleting duplicate records in Access

  • The download file is available here.

Often when importing records you will need to deal with duplicates. There are a number of ways to do this; one is to import everything and then remove any duplicates, and the cleanest way to do that is by using a subquery.

This example is based on part of the Order Details table from the Northwind database. It contains unique records and duplicates in a table called LineItems. This article describes three queries, so that you can see the process behind building the final Delete query.

Flagging the duplicate records

The first query has a subquery as an expression. This subquery is used in the other two queries, so it will be described once only. The SQL (written as a single line but broken up here for clarity) looks like this:

(SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)) AS Counter
  • The subquery is enclosed in parentheses and then given a field name (Counter). The field name is only required when you want to display the field; if it is hidden the field name / alias is unnecessary.
  • The subquery is based on the same table as the main query, but it too is given an alias (L) to distinguish the two.
  • The WHERE clause uses two criteria. The first compares however many fields are needed to distinguish duplicates (in this case three), building a concatenated key for the comparison. The second numbers all matching records in ID order.

The full SQL for the query looks like this:

SELECT LineItems.*, 
(SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)) AS Counter
FROM LineItems;

The extra field displays the duplicate number for each record.

 The subquery expression shows the duplicate number for each record

Showing only duplicates

To display only the duplicates, two changes are made in the grid. The first is to hide the subquery field and the second is to put >1 in its criteria. The resulting SQL looks like this:

SELECT LineItems.*
FROM LineItems
WHERE ((((SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)))>1));

There are two structural changes in this query, which now displays only the duplicate records.

  • There are now two WHERE clauses, both after the FROM clause
  • Access has gone berserk with the parentheses. The extras are not required but the query engine will add them back if you remove them, so I usually leave them as they are.
 The second query only displays duplicate records

Deleting duplicates

The final step is to change the first SELECT to DELETE. This query will remove the duplicates and leave the originals (and any unique imports) alone.

DELETE LineItems.*
FROM LineItems
WHERE ((((SELECT COUNT (*) FROM LineItems AS L 
WHERE (L.OrderID & L.Product & L.UnitPrice 
  = LineItems.OrderID & LineItems.Product & LineItems.UnitPrice) 
  AND (L.LineItemID <= LineItems.LineItemID)))>1));

It's not necessary to build the first two stages if you want to delete duplicated records but this approach will help you to check the results as you go, so that you can be confident of removing only the unwanted duplicate records.

Run this query and the duplicates will be purged.