Skip to main content

Databases Overview

This domain covers the full data persistence stack for Java backend engineers: relational SQL with indexes and transactions, connection pooling with HikariCP, NoSQL store selection using CAP theorem reasoning, and safe schema evolution with migration tools. These topics surface in every backend interview — from basic JOIN types through to isolation levels and production migration patterns.

Key Concepts at a Glance

  • SQL: a declarative language for querying relational databases; logical processing order is FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY.
  • JOIN: combines rows from two tables; INNER (both match), LEFT (all from left), RIGHT (all from right), FULL OUTER (all from both).
  • GROUP BY + HAVING: GROUP BY collapses rows into groups; HAVING filters groups by aggregates (COUNT, SUM, etc.).
  • Window function: computes aggregates/rankings over a partition of rows without collapsing them; OVER (PARTITION BY ... ORDER BY ...).
  • B-Tree index: the default index type; stores sorted values for O(log n) lookup; supports equality, range, and prefix queries.
  • Composite index: covers multiple columns in order; the left-prefix rule determines which queries can use it.
  • Covering index: includes all columns the query needs — enables Index Only Scan (no heap access).
  • EXPLAIN ANALYZE: shows the query execution plan; look for Seq Scan on large tables as a signal for a missing index.
  • ACID: Atomicity, Consistency, Isolation, Durability — the four guarantees of a relational transaction.
  • Isolation levels: READ COMMITTED (default in PostgreSQL), REPEATABLE READ (default in MySQL), SERIALIZABLE (strictest); each prevents different read anomalies.
  • MVCC: Multi-Version Concurrency Control — readers see a consistent snapshot without blocking writers; used by PostgreSQL.
  • Deadlock: two transactions each hold a lock the other needs; the database aborts one; application must catch and retry.
  • HikariCP: Spring Boot's default connection pool; maintains pre-established connections for reuse; key setting is maximum-pool-size.
  • Connection leak: a connection borrowed from the pool and never returned; detected with leak-detection-threshold; prevented with try-with-resources.
  • CAP theorem: distributed systems can guarantee at most two of Consistency, Availability, Partition Tolerance; real choice is CP vs AP.
  • BASE: Basically Available, Soft state, Eventually Consistent — the counterpart to ACID for AP NoSQL systems.
  • Redis: in-memory key-value store; use for caching, session storage, rate limiting, leaderboards.
  • MongoDB: document store with flexible JSON-like schema; use for variable-attribute or hierarchically nested data.
  • Cassandra: wide-column store designed for massive write throughput and linear horizontal scale; queries must use the partition key.
  • Flyway: SQL file-based schema migration tool; migrations named V{n}__{desc}.sql; never modify an applied migration.
  • Liquibase: structured changelog-based migration tool; YAML/XML format; built-in rollback support.
  • Expand-contract pattern: safe way to add a NOT NULL column in three steps: add nullable → backfill → set NOT NULL.
  • PostgreSQL: open-source RDBMS; native UUID type, JSONB, RETURNING, partial indexes, strict SQL compliance, process-per-connection model; default isolation READ COMMITTED.
  • MySQL: most deployed open-source RDBMS; no native UUID type, TINYINT(1) booleans, optional strict mode, thread-per-connection; default isolation REPEATABLE READ; excels in managed cloud offerings (Aurora).
  • H2: pure-Java embeddable database; in-memory or file-based; runs inside the JVM for dev/test; MODE=PostgreSQL covers ~95% of PostgreSQL syntax for local dev without Docker.
  • H2 compatibility mode: MODE=PostgreSQL or MODE=MySQL in the JDBC URL instructs H2 to accept database-specific DDL/DML syntax.
  • UUID in PostgreSQL: native 16-byte type; gen_random_uuid() built-in; no fragmentation issues.
  • UUID in MySQL: no native type; BINARY(16) with UUID_TO_BIN(uuid, 1) swap flag for time-ordered, compact storage.
  • PgBouncer: connection pooler for PostgreSQL; transaction-pooling mode multiplexes thousands of app connections onto a small number of DB processes; needed at scale.
  • B-Tree fragmentation (MySQL + UUID): random v4 UUIDs as InnoDB primary keys fragment the B-Tree index; fix with time-ordered UUIDs or a BIGINT surrogate key.

Quick-Reference Table

SQL

FeatureSyntax / RuleKey Note
Logical SELECT orderFROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BYAliases defined in SELECT are not available in WHERE
INNER JOINJOIN t2 ON t1.id = t2.fkExcludes non-matching rows from both sides
LEFT JOINLEFT JOIN t2 ON t1.id = t2.fkNULLs for unmatched right rows
GROUP BY + aggregateGROUP BY col HAVING COUNT(*) > nHAVING filters groups, not rows
Window functionSUM(x) OVER (PARTITION BY y ORDER BY z)No row collapse
CTEWITH name AS (SELECT ...)Reusable named subquery
NULL comparisonIS NULL / IS NOT NULL= NULL always evaluates to UNKNOWN
Parameterized queryWHERE id = ? or WHERE id = :idAlways use — prevents SQL injection

Indexes

TypeWhen to UsePitfall
B-Tree (default)Equality, range, prefix, sortWrite amplification on heavy writes
Composite (a, b)Both columns often in WHERELeft-prefix rule — b alone can't use it
Covering indexFrequent high-traffic queriesWider index = more write overhead
Partial indexSubset of rows (e.g., active=true)Only helps queries matching predicate
Functional indexLOWER(email) comparisonsMust match the expression exactly

Transactions & @Transactional

PropertySpring AnnotationDefault
Isolation level@Transactional(isolation = Isolation.X)Database default (READ_COMMITTED for PG)
Propagation@Transactional(propagation = Propagation.X)REQUIRED
Read-only hint@Transactional(readOnly = true)false
Rollback on checked ex@Transactional(rollbackFor = IOException.class)Only unchecked by default
Independent txPropagation.REQUIRES_NEW
Pessimistic lock@Lock(LockModeType.PESSIMISTIC_WRITE)SELECT ... FOR UPDATE
Optimistic lock@Version field on entityChecks version column on UPDATE

HikariCP

PropertyRecommended ValuePurpose
maximum-pool-size(cores × 2) + 1 ≈ 10Max connections in pool
minimum-idle5Warm connections at rest
connection-timeout30000 msMax wait before exception
max-lifetime1800000 msRecycle before DB kills
keepalive-time60000 msKeep idle connections alive
leak-detection-threshold2000 ms (dev only)Log connections held too long

NoSQL Decision Guide

Use CaseTechnology
Sub-ms reads, caching, sessionsRedis
Variable schema, nested documentsMongoDB
Massive writes, time-series, multi-DCCassandra / DynamoDB
Full-text search, log analyticsElasticsearch
Complex queries + ACIDPostgreSQL (relational)

Flyway Migration Naming

PrefixMeaningRe-runs?
V{n}__desc.sqlVersioned — runs onceNo
R__desc.sqlRepeatable — runs when checksum changesYes
U{n}__desc.sqlUndo (Flyway Teams)On rollback only

MySQL vs PostgreSQL vs H2 Quick Comparison

FeatureMySQLPostgreSQLH2
Native UUID typeNoYes (UUID)Yes (UUID)
Default isolationREPEATABLE READREAD COMMITTEDREAD COMMITTED
BOOLEAN storageTINYINT(1)True BOOLEANTrue BOOLEAN
RETURNING clauseNoYesYes
JSONB supportNo (JSON only)Yes (indexed)No
Strict modeOptionalAlwaysConfigurable
H2 compat modeMODE=MySQLMODE=PostgreSQL (~95%)N/A
Use forLegacy / AuroraNew projectsDev & tests only

UUID Primary Key Strategy

DatabaseEfficient Strategy
PostgreSQLUUID type + DEFAULT gen_random_uuid()
MySQLBINARY(16) + UUID_TO_BIN(UUID(), 1) swap flag
H2 (dev)UUID type (auto-compat with PostgreSQL mode)
JPA (all)@GeneratedValue(strategy = GenerationType.UUID) (Hibernate 6+)

Learning Path

  1. SQL Fundamentals — start here; JOIN types and GROUP BY are first-round interview questions.
  2. Indexes & Query Performance — EXPLAIN plan reading and composite index design are mid-level expectations.
  3. Transactions & ACID — isolation levels and @Transactional behavior are probed at every level.
  4. Connection Pooling — HikariCP configuration, pool sizing, and leak detection are Spring Boot production essentials.
  5. NoSQL Trade-offs — CAP theorem and "when to use Redis vs MongoDB" are common system design warm-up questions.
  6. Schema Migration — Flyway naming conventions and the expand-contract pattern signal production experience.
  7. MySQL, PostgreSQL & H2 — once you understand SQL and transactions, understand which database to run them on and why H2 is the right choice for local development.

Top 5 Interview Questions

Q1: What is the difference between WHERE and HAVING?
A: WHERE filters rows before GROUP BY runs — it cannot reference aggregate functions. HAVING filters after GROUP BY — it filters groups by aggregated values like COUNT(*) or SUM(). A common mistake is trying to filter on an aggregate in WHERE, which is a syntax error.

Q2: What are the SQL isolation levels and what anomaly does each prevent?
A: Four levels in ascending strictness: READ UNCOMMITTED prevents nothing (rarely used). READ COMMITTED prevents dirty reads (default in PostgreSQL). REPEATABLE READ additionally prevents non-repeatable reads (same row returning different values on two reads). SERIALIZABLE additionally prevents phantom reads (range queries returning different rows). Each higher level trades concurrency for correctness.

Q3: How does a composite index work and what is the left-prefix rule?
A: A composite index (user_id, status) sorts data first by user_id, then by status within each user group. A query using the index must include the leftmost column(s) in its filter — WHERE user_id = ? can use it, WHERE status = ? alone cannot because data is not globally sorted by status. Always put the most selective column or the column used in equality filters first.

Q4: What happens when all HikariCP connections are in use?
A: The requesting thread blocks and waits up to connectionTimeout milliseconds. If no connection is returned within that window, HikariCP throws SQLTimeoutException (wrapped as DataAccessException in Spring). A persistent queue of waiting threads (hikaricp.connections.pending > 0) signals that the pool is undersized for the load.

Q5: When would you reach for Redis instead of adding another SQL table?
A: Redis is the right choice when you need sub-millisecond access to data that changes frequently and doesn't require complex querying: caching DB query results (with TTL), storing HTTP sessions, rate-limiting counters (atomic INCR), and real-time leaderboards (sorted sets). Keep your relational database as the source of truth and use Redis as a performance layer in front of it.


All Notes in This Domain

NoteDescription
SQL FundamentalsSELECT, JOIN types, GROUP BY, window functions, CTEs, and Spring JDBC usage.
Indexes & Query PerformanceB-Tree indexes, composite + covering indexes, EXPLAIN plans, JPA index annotations.
Transactions & ACIDACID properties, isolation levels, MVCC, deadlocks, @Transactional propagation.
Connection PoolingHikariCP setup, pool sizing formula, monitoring with Actuator, leak detection.
NoSQL Trade-offsCAP theorem, BASE, Redis/MongoDB/Cassandra/Elasticsearch decision guide.
Schema MigrationFlyway vs Liquibase, versioned migrations, repeatable scripts, expand-contract pattern.
MySQL, PostgreSQL & H2Database comparison, UUID handling, H2 for development, migration path to PostgreSQL.
  • Spring Data — JPA, @Transactional, and repositories are the Spring abstraction over the SQL layer covered here.
  • System Design — database selection (relational vs NoSQL) and sharding are core system design decisions.
  • Databases Interview Prep — consolidated Q&A for rapid interview revision on all topics in this domain.