SQL subqueries vs JOINs — it's one of the most common questions once you move past the basics. You're staring at a query and wondering: should I use a subquery or a JOIN here? The answer isn't always obvious, and choosing wrong can mean a query that's harder to read, harder to maintain, or — in production — painfully slow.
This guide breaks down exactly when to reach for a subquery versus a JOIN, with side-by-side examples on the Chinook database so you can see both approaches and decide for yourself.
Table of Contents
- Subqueries Refresher
- JOINs Refresher
- Side-by-Side: Same Problem, Two Approaches
- When Subqueries Win
- When JOINs Win
- Performance: What Actually Matters
- Quick Comparison Cheat Sheet
- Common Mistakes to Avoid
- Practice Makes Perfect
- Summary
- Frequently Asked Questions
Subqueries Refresher
A subquery is a SELECT statement nested inside another query. It runs first, produces a result, and the outer query uses that result. Subqueries can appear in WHERE, FROM, or SELECT clauses.
Find tracks priced above the average
SELECT Name, UnitPrice
FROM Track
WHERE UnitPrice > (
SELECT AVG(UnitPrice)
FROM Track
)
ORDER BY UnitPrice DESC
LIMIT 5;
| Name | UnitPrice |
|---|---|
| War Pigs | 0.99 |
Here the inner query calculates the average price, and the outer query filters tracks above it. The subquery returns a single value — this is called a scalar subquery.
Subqueries can also return lists (for use with IN) or full result sets (as a derived table in FROM).
Try it yourself — Find tracks priced above average →
JOINs Refresher
A JOIN combines columns from two or more tables based on a related column. Unlike subqueries, JOINs produce a single flat result set where you can access columns from all joined tables simultaneously.
Get album titles with artist names
SELECT
ar.Name AS Artist,
al.Title AS Album
FROM Album al
JOIN Artist ar ON ar.ArtistId = al.ArtistId
WHERE ar.Name = 'Iron Maiden'
ORDER BY al.Title;
| Artist | Album |
|---|---|
| Iron Maiden | A Matter of Life and Death |
| Iron Maiden | Brave New World |
| Iron Maiden | Fear of the Dark |
| Iron Maiden | Killers |
| Iron Maiden | Live After Death |
The JOIN links albums to their artists through the shared ArtistId column. You need data from both tables in the output — this is a textbook JOIN use case. For a deeper dive on JOIN types, see our SQL JOINs guide.
Try it yourself — Get album titles and artist names for Iron Maiden →
Side-by-Side: Same Problem, Two Approaches
Many queries can be written with either a subquery or a JOIN. Let's solve the same problem both ways.
Question: Find customers who live in the same country as an employee.
Subquery approach
SELECT FirstName, LastName, Country
FROM Customer
WHERE Country IN (
SELECT DISTINCT Country
FROM Employee
)
ORDER BY Country, LastName
LIMIT 5;
| FirstName | LastName | Country |
|---|---|---|
| Mark | Philips | Canada |
| Jennifer | Peterson | Canada |
| Edward | Francis | Canada |
| Ellie | Sullivan | Canada |
| Roberto | Almeida | Canada |
JOIN approach
SELECT DISTINCT
c.FirstName, c.LastName, c.Country
FROM Customer c
JOIN Employee e ON e.Country = c.Country
ORDER BY c.Country, c.LastName
LIMIT 5;
Same result. The subquery reads more naturally — "find customers whose country is in the list of employee countries." The JOIN requires a DISTINCT because each customer might match multiple employees. In this case, the subquery is cleaner.
Try it yourself — Find customers in employee countries →
When Subqueries Win
Subqueries are the better choice in these scenarios:
1. Comparing against an aggregate
When you need to compare each row against a calculated value like an average, max, or count, a scalar subquery in WHERE is the cleanest approach.
SELECT Name, Milliseconds / 1000 AS Seconds
FROM Track
WHERE Milliseconds > (
SELECT AVG(Milliseconds)
FROM Track t2
JOIN Genre g ON g.GenreId = t2.GenreId
WHERE g.Name = 'Rock'
)
ORDER BY Milliseconds DESC
LIMIT 5;
| Name | Seconds |
|---|---|
| Occupation / Precipice | 5286 |
| Through a Looking Glass | 5088 |
| Greetings from Earth, Pt. 1 | 2960 |
| The Woman King | 2626 |
| A Day in the Life | 2611 |
A JOIN can't elegantly express "rows where this value exceeds the average of a different group." You'd need a cross join with a derived table — much noisier.
Try it yourself — Find tracks longer than the Rock average →
2. Existence checks with NOT EXISTS
NOT EXISTS is the most reliable way to find rows in one table that have no matching rows in another.
SELECT t.Name AS Track
FROM Track t
WHERE NOT EXISTS (
SELECT 1
FROM InvoiceLine il
WHERE il.TrackId = t.TrackId
)
ORDER BY t.Name
LIMIT 5;
| Track |
|---|
| 20 Flight Rock |
| A Cor Do Sangue |
| A Dança da Solidão |
| A Estrada |
| A Hora e a Vez do Cabelo Nascer |
These are tracks that have never been sold. You could use LEFT JOIN ... WHERE il.TrackId IS NULL, but NOT EXISTS makes the intent crystal clear and handles NULLs correctly in all cases.
Try it yourself — Identify tracks never sold →
3. Subqueries in SELECT for row-level calculations
When you need to add a computed column from another table without changing the row count, a correlated subquery in SELECT works well.
SELECT
c.FirstName || ' ' || c.LastName AS Customer,
(
SELECT COUNT(*)
FROM Invoice i
WHERE i.CustomerId = c.CustomerId
) AS InvoiceCount
FROM Customer c
ORDER BY InvoiceCount DESC
LIMIT 5;
| Customer | InvoiceCount |
|---|---|
| Luis Rojas | 7 |
| Helena Holý | 7 |
| Richard Cunningham | 7 |
| Hugh O'Brien | 7 |
| Frank Ralston | 7 |
Note: for large datasets, this pattern can be slow because the subquery runs once per row. Consider a JOIN with GROUP BY for better performance at scale.
When JOINs Win
1. You need columns from multiple tables
This is the most common scenario. When the output requires data from two or more tables, a JOIN is the only practical choice.
SELECT
c.FirstName || ' ' || c.LastName AS Customer,
e.FirstName || ' ' || e.LastName AS SupportRep,
e.Title
FROM Customer c
JOIN Employee e ON e.EmployeeId = c.SupportRepId
ORDER BY Customer
LIMIT 5;
| Customer | SupportRep | Title |
|---|---|---|
| Aaron Mitchell | Margaret Park | Sales Support Agent |
| Alexandre Rocha | Steve Johnson | Sales Support Agent |
| Astrid Gruber | Steve Johnson | Sales Support Agent |
| Bjørn Hansen | Jane Peacock | Sales Support Agent |
| Camille Bernard | Margaret Park | Sales Support Agent |
A subquery can't put columns from two tables side by side in a single result row (without correlated subqueries for each column, which is ugly and slow).
Try it yourself — List customers and their support reps →
2. Joining multiple tables in a chain
Real queries often chain three, four, or more tables. JOINs handle this naturally.
SELECT
ar.Name AS Artist,
al.Title AS Album,
t.Name AS Track,
g.Name AS Genre
FROM Track t
JOIN Album al ON al.AlbumId = t.AlbumId
JOIN Artist ar ON ar.ArtistId = al.ArtistId
JOIN Genre g ON g.GenreId = t.GenreId
WHERE ar.Name = 'Led Zeppelin'
ORDER BY al.Title, t.TrackId
LIMIT 5;
| Artist | Album | Track | Genre |
|---|---|---|---|
| Led Zeppelin | BBC Sessions [Disc 1] [Live] | Communication Breakdown | Rock |
| Led Zeppelin | BBC Sessions [Disc 1] [Live] | Dazed And Confused | Rock |
| Led Zeppelin | BBC Sessions [Disc 1] [Live] | The Girl I Love She Got Long Black Wavy Hair | Rock |
| Led Zeppelin | BBC Sessions [Disc 1] [Live] | What is and What Should Never Be | Rock |
| Led Zeppelin | BBC Sessions [Disc 1] [Live] | Communication Breakdown(2) | Rock |
Four tables, one clean query. Achieving this with nested subqueries would be a nightmare.
Try it yourself — Get detailed sales and track info →
3. Aggregating across related tables
When you need to GROUP BY and aggregate data from joined tables, JOINs combined with GROUP BY are the standard approach.
SELECT
g.Name AS Genre,
COUNT(*) AS TrackCount,
ROUND(AVG(t.Milliseconds) / 1000, 0) AS AvgSeconds
FROM Track t
JOIN Genre g ON g.GenreId = t.GenreId
GROUP BY g.Name
ORDER BY TrackCount DESC
LIMIT 5;
| Genre | TrackCount | AvgSeconds |
|---|---|---|
| Rock | 1297 | 284 |
| Latin | 579 | 232 |
| Metal | 374 | 310 |
| Alternative & Punk | 332 | 234 |
| Jazz | 130 | 291 |
Performance: What Actually Matters
The "subqueries are slow" advice is often repeated but oversimplified. Here's what actually matters:
Modern query optimizers rewrite subqueries. In PostgreSQL, MySQL 8+, and SQL Server, the optimizer often transforms a subquery into a JOIN internally. The execution plan may be identical regardless of how you wrote it.
Correlated subqueries are the real bottleneck. A correlated subquery (one that references the outer query) runs once per row. On a 1-million-row table, that's 1 million sub-executions. Rewriting it as a JOIN + GROUP BY runs it once.
-- Slow: correlated subquery (runs per row)
SELECT Name,
(SELECT COUNT(*) FROM Track WHERE Track.AlbumId = Album.AlbumId) AS Cnt
FROM Album;
-- Faster: JOIN + GROUP BY (single pass)
SELECT al.Title, COUNT(t.TrackId) AS Cnt
FROM Album al
LEFT JOIN Track t ON t.AlbumId = al.AlbumId
GROUP BY al.AlbumId, al.Title;
Use EXPLAIN to decide. Don't guess — check the query plan. If both approaches produce the same plan, choose the one that reads better.
Practical rules:
- Scalar subqueries (returning one value) in WHERE are almost always fine
INsubqueries on small result sets are fine; for large sets, considerEXISTSor a JOIN- Correlated subqueries in SELECT are the main performance risk — rewrite as a JOIN when the table is large
- When in doubt, write it the way that's clearest, then optimize if you measure a problem
Quick Comparison Cheat Sheet
| Scenario | Best Choice | Why |
|---|---|---|
| Need columns from multiple tables | JOIN | Only practical way |
| Filter against an aggregate | Subquery | Cleaner WHERE clause |
| Check for existence / non-existence | Subquery (EXISTS) | Clearest intent, NULL-safe |
| Chain 3+ tables | JOIN | Scales naturally |
| Derived table for intermediate result | Subquery in FROM | Encapsulates complex logic |
| Row-level computed column | Subquery in SELECT | Works but watch performance |
| Aggregate across joined tables | JOIN + GROUP BY | Standard, efficient pattern |
| Filter with IN (small list) | Subquery | Readable |
| Filter with IN (large list) | JOIN or EXISTS | Better performance |
Common Mistakes to Avoid
1. Using a correlated subquery when a JOIN works
Correlated subqueries re-execute for every row. If you're pulling data from another table for every row, a JOIN is almost always better.
-- Slow: correlated subquery
SELECT t.Name,
(SELECT g.Name FROM Genre g WHERE g.GenreId = t.GenreId) AS Genre
FROM Track t;
-- Better: simple JOIN
SELECT t.Name, g.Name AS Genre
FROM Track t
JOIN Genre g ON g.GenreId = t.GenreId;
2. Using IN with a subquery that can return NULL
NOT IN with a subquery that returns NULL gives an empty result — a classic trap.
-- Dangerous: if any CustomerId is NULL, returns no rows
SELECT Name FROM Track
WHERE TrackId NOT IN (
SELECT TrackId FROM InvoiceLine
);
-- Safe: NOT EXISTS handles NULLs correctly
SELECT t.Name FROM Track t
WHERE NOT EXISTS (
SELECT 1 FROM InvoiceLine il
WHERE il.TrackId = t.TrackId
);
3. Forgetting DISTINCT with JOINs that multiply rows
When you JOIN to a table with a one-to-many relationship and only need the "one" side, you'll get duplicates without DISTINCT or aggregation.
-- Wrong: each customer appears once per invoice
SELECT c.FirstName, c.Country
FROM Customer c
JOIN Invoice i ON i.CustomerId = c.CustomerId;
-- Right: use DISTINCT or GROUP BY
SELECT DISTINCT c.FirstName, c.Country
FROM Customer c
JOIN Invoice i ON i.CustomerId = c.CustomerId;
-- Or: use EXISTS (no duplicates by design)
SELECT c.FirstName, c.Country
FROM Customer c
WHERE EXISTS (
SELECT 1 FROM Invoice i WHERE i.CustomerId = c.CustomerId
);
4. Nesting subqueries too deeply
Three or more levels of nesting become unreadable. Use CTEs (WITH clauses) to flatten complex logic.
-- Hard to read: nested subqueries
SELECT * FROM Track WHERE GenreId IN (
SELECT GenreId FROM Genre WHERE Name IN (
SELECT DISTINCT g.Name FROM Genre g
JOIN Track t ON t.GenreId = g.GenreId
GROUP BY g.Name HAVING COUNT(*) > 100
)
);
-- Better: CTE
WITH BigGenres AS (
SELECT g.GenreId, g.Name
FROM Genre g
JOIN Track t ON t.GenreId = g.GenreId
GROUP BY g.GenreId, g.Name
HAVING COUNT(*) > 100
)
SELECT t.*
FROM Track t
JOIN BigGenres bg ON bg.GenreId = t.GenreId;
Practice Makes Perfect
The best way to develop intuition for subqueries vs JOINs is to solve problems both ways. These exercises give you hands-on practice:
| Exercise | What You'll Practice |
|---|---|
| Tracks Priced Above Average → | Scalar subquery in WHERE |
| Customers in Employee Countries → | Subquery with IN vs JOIN |
| Identify Tracks Never Sold → | NOT EXISTS subquery |
| Customer Names and Their Invoices → | Basic JOIN pattern |
| Detailed Sales and Track Info → | Multi-table JOIN chain |
| Tracks from the Largest Genre → | Subquery + GROUP BY combo |
Join 500+ SQL learners already practicing on SQiLs — Browse all exercises →
Summary
- JOINs are best when you need columns from multiple tables, chain several tables, or aggregate across relationships.
- Subqueries are best for filtering against aggregates, existence checks, and encapsulating intermediate logic.
- Many problems can be solved either way — modern optimizers often produce the same execution plan.
- Correlated subqueries are the real performance concern — rewrite them as JOINs when operating on large tables.
NOT EXISTSbeatsNOT INfor null-safety. Use CTEs instead of deeply nested subqueries.- When in doubt, write for clarity first, then optimize with EXPLAIN if performance matters.
Ready to put this into practice? Start solving exercises on SQiLs →
Frequently Asked Questions
Are subqueries slower than JOINs?
Not necessarily. Modern database optimizers (PostgreSQL, MySQL 8+, SQL Server) often rewrite subqueries into JOINs internally, producing identical execution plans. The main performance risk is correlated subqueries that run once per row. Scalar subqueries and IN/EXISTS subqueries on small sets are typically fine.
Can I always replace a subquery with a JOIN?
No. Some patterns are unique to subqueries — like comparing against an aggregate value (WHERE price > (SELECT AVG(price) ...)), or using NOT EXISTS for anti-joins. While workarounds exist, the subquery version is often clearer and just as fast.
When should I use EXISTS vs IN?
EXISTS is generally preferred when the subquery could return NULLs, especially with NOT EXISTS vs NOT IN. Performance-wise, EXISTS short-circuits (stops as soon as it finds one match), while IN evaluates the full list. For small lists, the difference is negligible; for large datasets, EXISTS usually wins.
What about CTEs — are they subqueries or JOINs?
CTEs (Common Table Expressions, using WITH) are syntactic sugar for subqueries in the FROM clause. They don't change performance — the optimizer treats them the same as inline subqueries. Their value is readability: they give names to intermediate steps and flatten deeply nested queries. For advanced CTE patterns like recursion, check out the window functions guide.



