Skip to main content
    Skip to main content
    SQL
    DISTINCT
    Beginner

    SQL DISTINCT: Remove Duplicates the Right Way

    SQiLs TeamMarch 23, 2026
    SQL DISTINCT: Remove Duplicates the Right Way

    SQL DISTINCT is the keyword you reach for when a database table contains repeated values — the same country appearing on hundreds of invoices, the same composer credited on dozens of tracks. When you need a clean list of unique values, DISTINCT is the keyword that strips away the duplicates. It looks simple, but knowing when to use it (and when not to) separates beginners from confident SQL writers.

    In this guide you'll learn exactly how DISTINCT works, see it in action on the Chinook music-store database, understand how it compares to GROUP BY, and avoid the mistakes that trip up most learners.

    Table of Contents

    How DISTINCT Works

    Place DISTINCT right after SELECT and the database eliminates duplicate rows from the result set. If two or more rows are identical across every selected column, only one copy survives.

    Here's the classic example — listing every unique billing country from the Invoice table:

    SELECT DISTINCT BillingCountry
    FROM Invoice
    ORDER BY BillingCountry
    LIMIT 10;
    
    BillingCountry
    Argentina
    Australia
    Austria
    Belgium
    Brazil
    Canada
    Chile
    Czech Republic
    Denmark
    Finland

    Without DISTINCT, you would get 412 rows (one per invoice). With it, you get 24 — one per unique country.

    How the database processes DISTINCT

    1. FROM — reads all rows from the table
    2. WHERE — filters rows (if present)
    3. SELECT — picks the requested columns
    4. DISTINCT — removes duplicate rows from the result
    5. ORDER BY — sorts the final output

    Because DISTINCT runs after WHERE, you can combine them freely:

    SELECT DISTINCT BillingCity
    FROM Invoice
    WHERE BillingCountry = 'USA'
    ORDER BY BillingCity;
    
    BillingCity
    Boston
    Chicago
    Cupertino
    Fort Worth
    Madison
    Mountain View
    New York
    Orlando
    Redmond
    Reno

    Try it yourselfList all unique billing countries →

    DISTINCT on Multiple Columns

    When you select more than one column, DISTINCT removes rows where the entire combination of values is duplicated. Each unique pair (or triple, or more) gets one row.

    Unique city-country pairs

    SELECT DISTINCT
      BillingCity,
      BillingCountry
    FROM Invoice
    ORDER BY BillingCountry, BillingCity
    LIMIT 10;
    
    BillingCityBillingCountry
    Buenos AiresArgentina
    SidneyAustralia
    VienneAustria
    BrusselsBelgium
    BrasíliaBrazil
    Rio de JaneiroBrazil
    São José dos CamposBrazil
    São PauloBrazil
    EdmontonCanada
    HalifaxCanada

    Notice that Brazil appears multiple times — each row has a different city. DISTINCT looks at the full combination, not each column independently.

    Finding unique customer locations

    SELECT DISTINCT
      City,
      State,
      Country
    FROM Customer
    ORDER BY Country, State, City
    LIMIT 10;
    
    CityStateCountry
    Buenos AiresNULLArgentina
    SidneyNSWAustralia
    VienneNULLAustria
    BrusselsNULLBelgium
    BrasíliaDFBrazil
    Rio de JaneiroRJBrazil
    São José dos CamposSPBrazil
    São PauloSPBrazil
    EdmontonABCanada
    HalifaxNSCanada

    Try it yourselfFind unique customer locations →

    COUNT(DISTINCT column) — Counting Unique Values

    Often you don't need the list of unique values — you just need to know how many there are. Wrap the column in COUNT(DISTINCT ...):

    SELECT
      COUNT(*) AS TotalInvoices,
      COUNT(DISTINCT CustomerId) AS UniqueCustomers,
      COUNT(DISTINCT BillingCountry) AS UniqueCountries
    FROM Invoice;
    
    TotalInvoicesUniqueCustomersUniqueCountries
    4125924

    This tells you: 412 invoices were placed by 59 different customers across 24 countries — all in a single query.

    COUNT(DISTINCT) inside GROUP BY

    You can pair COUNT(DISTINCT) with GROUP BY for powerful breakdowns:

    SELECT
      BillingCountry,
      COUNT(*) AS TotalInvoices,
      COUNT(DISTINCT CustomerId) AS UniqueCustomers
    FROM Invoice
    GROUP BY BillingCountry
    ORDER BY UniqueCustomers DESC
    LIMIT 5;
    
    BillingCountryTotalInvoicesUniqueCustomers
    USA9113
    Canada568
    France355
    Brazil355
    Germany284

    The USA has 91 invoices but only 13 unique customers. COUNT(DISTINCT) reveals patterns that a plain COUNT(*) would hide.

    Try it yourselfCount tracks per album →

    DISTINCT vs GROUP BY

    This is the question every SQL learner asks: when should I use DISTINCT, and when should I use GROUP BY? The short answer: DISTINCT is for deduplication, GROUP BY is for aggregation.

    Side-by-side comparison

    Both of these queries return the same 24 countries:

    -- Using DISTINCT
    SELECT DISTINCT BillingCountry
    FROM Invoice
    ORDER BY BillingCountry;
    
    -- Using GROUP BY
    SELECT BillingCountry
    FROM Invoice
    GROUP BY BillingCountry
    ORDER BY BillingCountry;
    

    The results are identical, but the intent is different:

    • DISTINCT says: "Give me the unique values."
    • GROUP BY says: "Organize rows into groups so I can aggregate them."

    When to pick which

    ScenarioUse
    You need a clean list of unique valuesDISTINCT
    You need counts, sums, averages per groupGROUP BY
    You need unique values and aggregatesGROUP BY with aggregates
    You want to count unique values within groupsCOUNT(DISTINCT) inside GROUP BY

    The key rule

    If your query has no aggregate functions (COUNT, SUM, AVG, MIN, MAX), prefer DISTINCT — it signals your intent clearly. The moment you need an aggregate, switch to GROUP BY. For a deep dive into GROUP BY, see our complete GROUP BY and HAVING guide.

    Cheat Sheet

    SyntaxWhat It DoesExample
    SELECT DISTINCT colUnique values in one columnUnique countries
    SELECT DISTINCT col1, col2Unique combinations of columnsUnique city-country pairs
    COUNT(DISTINCT col)Number of unique valuesHow many distinct customers
    SELECT DISTINCT *Remove fully duplicate rowsDeduplicate an entire result set
    GROUP BY colGroup rows for aggregationRevenue per country

    Quick decision rule:

    • No aggregates needed → DISTINCT
    • Aggregates needed → GROUP BY
    • Count of unique values → COUNT(DISTINCT col) inside either

    Common Mistakes to Avoid

    1. Using DISTINCT when GROUP BY is needed

    DISTINCT cannot aggregate. If you need counts or totals alongside your unique values, you need GROUP BY.

    -- Wrong thinking: "I want distinct countries with their invoice counts"
    SELECT DISTINCT BillingCountry, COUNT(*)
    FROM Invoice;
    -- This errors or gives wrong results
    
    -- Right: use GROUP BY
    SELECT BillingCountry, COUNT(*) AS InvoiceCount
    FROM Invoice
    GROUP BY BillingCountry
    ORDER BY InvoiceCount DESC;
    

    2. Treating DISTINCT as a function

    DISTINCT is a keyword that applies to the entire row, not a single column. Parentheses after DISTINCT don't change this.

    -- These are identical — the parentheses do nothing
    SELECT DISTINCT(BillingCountry) FROM Invoice;
    SELECT DISTINCT BillingCountry FROM Invoice;
    
    -- This returns distinct PAIRS, not distinct countries
    SELECT DISTINCT BillingCountry, BillingCity FROM Invoice;
    

    The parentheses in DISTINCT(col) are just grouping parentheses, not a function call. This catches many beginners off guard.

    3. Using DISTINCT to hide a JOIN problem

    If a JOIN produces more rows than expected and you slap DISTINCT on it to "fix" the count, you're masking a real bug. The correct fix is to check your JOIN conditions.

    -- Suspicious: DISTINCT used to hide duplicate rows from a bad join
    SELECT DISTINCT c.FirstName, c.LastName
    FROM Customer c
    JOIN Invoice i ON i.CustomerId = c.CustomerId;
    
    -- Better: ask whether you actually need the JOIN
    SELECT FirstName, LastName
    FROM Customer;
    

    If you find yourself adding DISTINCT to "fix" unexpected duplicates, stop and examine your JOINs first.

    4. Forgetting that DISTINCT considers NULL as a single value

    In SQL, two NULLs are considered equal for DISTINCT purposes. So if a column has multiple NULL entries, DISTINCT will keep just one.

    SELECT DISTINCT Composer
    FROM Track
    ORDER BY Composer
    LIMIT 5;
    
    Composer
    NULL
    A. Jamal
    AC/DC
    Academy of St. Martin in the Fields & Sir Neville Marriner
    Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair

    The single NULL row represents all 978 tracks with no composer. Keep this in mind when counting — COUNT(DISTINCT Composer) will not count the NULLs.

    5. Using SELECT DISTINCT * without understanding the cost

    SELECT DISTINCT * compares every column in every row. On wide tables with many columns, this is computationally expensive and rarely what you actually want. Be specific about which columns need deduplication.

    -- Expensive and usually unnecessary
    SELECT DISTINCT * FROM Track;
    
    -- Better: specify the columns you care about
    SELECT DISTINCT Composer FROM Track;
    

    Practice Makes Perfect

    The best way to internalize DISTINCT is to write queries yourself. Here are exercises on the Chinook database that target these exact skills:

    ExerciseWhat You'll Practice
    Find Unique Track Composers →Basic DISTINCT on a single column
    Identify Unique Employee Cities →DISTINCT with a small result set
    List All Unique Billing Countries →DISTINCT on invoice data
    Discover Unique Customer States →DISTINCT with NULL values
    Unique Customer Locations →Multi-column DISTINCT
    Product Count by Category →COUNT with GROUP BY (compare to DISTINCT)

    Join 500+ SQL learners already practicing on SQiLs — Browse all exercises →

    Summary

    • DISTINCT removes duplicate rows from your query results — place it right after SELECT.
    • With multiple columns, DISTINCT eliminates rows where the entire combination is duplicated, not each column independently.
    • Use COUNT(DISTINCT col) to count unique values without listing them all.
    • DISTINCT is for deduplication; GROUP BY is for aggregation. If you need counts, sums, or averages, reach for GROUP BY.
    • Watch out for DISTINCT masking JOIN problems — if you need DISTINCT to "fix" unexpected duplicates, check your join conditions first.
    • NULLs count as one value for DISTINCT. COUNT(DISTINCT col) ignores NULLs entirely.

    Ready to practice? Start solving DISTINCT exercises on SQiLs →

    Frequently Asked Questions

    What is the difference between DISTINCT and GROUP BY in SQL?

    DISTINCT removes duplicate rows from the result set — it's purely for deduplication. GROUP BY organizes rows into groups so you can apply aggregate functions like COUNT, SUM, and AVG. If you only need unique values, use DISTINCT. If you need calculations per group, use GROUP BY. While SELECT DISTINCT col and SELECT col GROUP BY col return the same result, GROUP BY signals that aggregation is the goal.

    Does DISTINCT affect performance?

    Yes. DISTINCT forces the database to sort or hash all result rows to find duplicates, which adds processing time. On small result sets the cost is negligible. On large tables with many columns, it can be significant. The best practice is to be specific about which columns need deduplication and to ensure you actually need DISTINCT rather than using it as a crutch for a poorly written JOIN.

    Can I use DISTINCT with ORDER BY?

    Yes, but there's one rule: every column in ORDER BY must appear in the SELECT list (or be derived from it) when DISTINCT is used. This is because DISTINCT produces a set of unique rows, and the database can only sort by columns that exist in that set. Most databases enforce this strictly.

    How does DISTINCT handle NULL values?

    DISTINCT treats all NULL values as equal to each other. If a column contains multiple NULLs, DISTINCT will return only one NULL row. However, COUNT(DISTINCT col) does not count NULLs — it only counts non-NULL unique values. This means COUNT(DISTINCT col) can return a lower number than you expect if your data has NULLs.

    Related Posts