Skip to main content

Spring Data Interview Questions

Consolidated Q&A for Spring Data JPA. Use for rapid revision before backend interviews. Questions cover the JPA/Hibernate stack, entity mapping, repositories, transactions, performance (N+1), projections, and caching.

How to Use This Page

  • Skim Beginner questions to solidify fundamentals
  • Intermediate questions are the core revision target for most roles (3–5 YOE)
  • Advanced questions signal senior-level depth (5+ YOE)

Beginner

Q: What is JPA and what is the role of Hibernate in Spring Boot?

JPA (Jakarta Persistence API) is a specification — a set of interfaces and annotations (@Entity, @Id, EntityManager) that define how Java objects map to relational tables. Hibernate is the most widely used implementation of that specification; it does the actual SQL generation, session management, and dirty checking. Spring Boot auto-configures Hibernate as the JPA provider when you add spring-boot-starter-data-jpa. Every @Entity annotation you write is a JPA annotation; every SQL that executes was generated by Hibernate.

Q: What is Spring Data JPA and how is it different from JPA?

Spring Data JPA is a Spring library that wraps the JPA API to provide zero-boilerplate repository interfaces (JpaRepository), automatic query generation from method names (findByStatus), and @Query annotations. JPA defines the raw persistence API — annotations plus EntityManager. Spring Data JPA is the convenience layer so you don't write EntityManager.createQuery(...) by hand.

Q: Is a Hibernate Session thread-safe?

No. A Session holds a first-level cache (in-memory identity map) and dirty-checking snapshots — all mutable state. If two threads shared the same Session, they would corrupt each other's cache and generate wrong SQL. Each request/transaction must get its own Session. Spring handles this automatically: it opens a new Session per @Transactional method, binds it to a ThreadLocal, and closes it when the method exits.

Q: What annotations are required to define a minimal JPA entity?

At minimum, @Entity on the class and @Id on the primary key field. @GeneratedValue is typically added to delegate ID generation to the database (e.g., strategy = GenerationType.IDENTITY for auto-increment). Everything else (@Table, @Column) is optional but recommended for explicit control.

Q: What is CrudRepository vs JpaRepository?

CrudRepository provides basic CRUD (save, findById, findAll, delete, count). JpaRepository extends it (through PagingAndSortingRepository) to add JPA-specific methods: flush(), saveAndFlush(), deleteAllInBatch(), and findAll(Sort) / findAll(Pageable) for pagination and sorting. In most Spring Boot applications you extend JpaRepository.

Q: What does @Transactional do?

It marks a method as a transactional boundary. Spring wraps the bean in an AOP proxy that starts a database transaction before the method executes. On a successful return, the proxy commits. If a RuntimeException (or Error) is thrown, the proxy rolls back. All database operations inside the method share that one transaction — they either all succeed or all fail.

Q: How do you enable caching in Spring Boot?

Add @EnableCaching to a configuration class and annotate service methods with @Cacheable (cache reads), @CacheEvict (invalidate on writes), or @CachePut (update cache on writes). Spring Boot auto-configures a CacheManager based on the classpath: Caffeine for in-memory, Redis for distributed.


Intermediate

Q: What is the difference between SessionFactory and Session? Which is thread-safe?

SessionFactory is a thread-safe singleton — created once at application startup, holds all ORM metadata and compiled SQL templates, and is shared across all threads. A Session is NOT thread-safe — it's a lightweight unit-of-work that wraps a JDBC connection and holds a first-level cache (identity map) plus dirty-checking snapshots. Each request or transaction must have its own Session. Spring binds a new Session to a ThreadLocal at the start of each @Transactional method and removes it at the end.

JPA equivalent: SessionFactoryEntityManagerFactory; SessionEntityManager.

Q: What are the four Hibernate entity states?

  • Transient: Object exists in memory but is not associated with any Session — no DB row, no tracking (new Product()).
  • Persistent: Associated with an open Session; any field change is automatically detected (dirty checking) and flushed as SQL at commit time.
  • Detached: Was previously persistent; the Session has since closed. The object still has an ID but no longer tracked — changes are NOT persisted automatically.
  • Removed: Scheduled for deletion; the DELETE SQL runs on the next flush.

Key implication: modifying a detached entity (e.g., after a @Transactional method returns) does nothing without explicitly calling repo.save(entity) inside a new transaction.

Q: What is dirty checking in Hibernate?

When an entity is loaded into a Session, Hibernate takes a snapshot of all its field values. Before flushing (before a commit or before executing a query), Hibernate compares each entity's current fields against the snapshot. If any field changed, Hibernate generates an UPDATE statement automatically — you don't need to call save(). This is called dirty checking. Implication: calling save() on an already-persistent entity inside a transaction is redundant — the save operation is a no-op for in-session entities.

Q: What is the Hibernate first-level cache?

The first-level cache (identity map) is a per-Session in-memory store mapping (entityType, id) to the loaded entity instance. Loading the same entity twice in one transaction returns the cached Java object — no extra SQL. It's cleared when the session closes, so it does NOT survive between requests. deleteAllInBatch() bypasses the first-level cache and issues a direct DELETE, leaving the session holding stale references.

Q: What causes LazyInitializationException?

A LAZY-loaded association is accessed after the Hibernate Session has closed. In Spring Boot with @Transactional, the session is open only during the annotated method. Returning a managed entity to the controller and then accessing a LAZY field there (after the service method returned) throws this exception. Fixes: return a DTO or projection from the service so no LAZY field crosses the transaction boundary; or use JOIN FETCH/@EntityGraph to load the association inside the transaction. types control which persistence operations propagate from a parent entity to child entities: PERSIST, MERGE, REMOVE, REFRESH, DETACH, and ALL (combines all of them). Use CascadeType.ALL on strongly-owned relationships — e.g., @OneToMany Order → OrderItems where items cannot exist without their order. Avoid CascadeType.ALL (especially REMOVE) on @ManyToOne or @ManyToMany associations — deleting a Product should not delete all Categories it belongs to.

Q: Explain @Transactional propagation — REQUIRED vs REQUIRES_NEW.

REQUIRED (the default) joins an existing transaction or creates a new one if none exists. Both the outer and inner methods share one transaction — if either throws, both are rolled back. REQUIRES_NEW suspends the current transaction, starts a fresh independent one, and commits it before returning to the outer transaction. Use REQUIRES_NEW when inner work must commit regardless of the outer outcome — audit logging is the classic example.

Q: What is the self-invocation problem with @Transactional?

@Transactional is implemented through a Spring AOP proxy. When a bean calls one of its own methods as this.method(), the call bypasses the proxy — the annotation on the called method is silently ignored. The fix is to extract the target method into a separate bean and inject that bean, so the call flows through the proxy.

Q: What causes the N+1 query problem and how do you fix it?

When loading N entities with a LAZY association, accessing the association triggers one SQL query per entity — N additional queries on top of the initial load. Fix options: (1) JOIN FETCH in JPQL to load the association in a single query, (2) @EntityGraph to add fetch paths to derived query methods without JPQL, (3) @BatchSize on collections to group N individual queries into batches, (4) DTO projections to select only the needed columns in one query.

Q: What is the difference between a closed interface projection and a DTO projection?

A closed interface projection is an interface whose getter names match entity field names; Spring generates a proxy implementing it and generates SELECT for only those columns — no JPQL needed. A DTO projection uses a plain class or record with a constructor expression in JPQL (SELECT new ClassName(p.id, p.name) FROM Product p). Both are efficient and avoid over-fetching. DTOs are immutable, better for cross-module use, and not tracked by Hibernate. Interface projections require less code for simple cases.

Q: What does @Transactional(readOnly = true) do?

It hints to Hibernate that no writes will occur. Hibernate skips its dirty-checking flush phase before commit (avoids scanning all tracked entities for changes), reducing CPU overhead on read-heavy operations. Some JDBC drivers or connection pools route read-only transactions to a read replica. The standard pattern: annotate the service class with @Transactional(readOnly = true) and override specific write methods with plain @Transactional.

Q: What is OSIV (Open Session in View) and why should you disable it in production?

OSIV is a Spring Boot pattern (enabled by default: spring.jpa.open-in-view=true) that keeps the Hibernate Session open from the start of an HTTP request all the way through view/JSON serialization. This prevents LazyInitializationException when controllers access LAZY associations. However, under high concurrency every thread holds a JDBC connection for its full request lifetime (including time spent serializing JSON or awaiting network I/O), which exhausts the connection pool. The correct fix is spring.jpa.open-in-view=false — then return DTOs or projections from services so no LAZY field is accessed outside a transaction.


Advanced

Q: How does Spring manage the Hibernate Session lifecycle inside @Transactional?

When a @Transactional method is called, Spring's AOP proxy invokes JpaTransactionManager, which calls EntityManagerFactory.createEntityManager() (internally sessionFactory.openSession()). The new Session is bound to the current thread via TransactionSynchronizationManager (a ThreadLocal-based registry). All Spring Data repository calls and EntityManager injections within that thread share this bound session. When the method exits, the proxy flushes dirty entities, commits (or rolls back), closes the session, and removes it from the ThreadLocal. This is why @PersistenceContext EntityManager em gives you a thread-local proxy — it's not a single EntityManager instance but a ThreadLocal-bound one managed by the container.

Follow-up: What happens if two @Transactional methods call each other on the same thread? A: The default REQUIRED propagation means the inner method joins the outer transaction — both share the same Session. The inner method does not open a new session or commit independently; everything commits (or rolls back) when the outermost @Transactional boundary exits.

JOIN FETCH on a @OneToMany association multiplies result rows (each order appears once per item). Database-level LIMIT/OFFSET applied to those multiplied rows would paginate incorrectly. Hibernate detects this and falls back to in-memory pagination: it loads the entire (unjoined) result set into memory and slices it in Java. Hibernate logs a warning: HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!. For paginated collection fetches, use @BatchSize or separate queries to load collections after paginating root entities.

Follow-up: How do you paginate orders while also loading their items efficiently? A: Paginate Order without a collection join: Page<Order> findAll(Pageable pageable). Then use @BatchSize(size = 25) on Order.items — Hibernate batch-loads items for the current page of orders in a handful of IN (...) queries instead of N individual queries.

Q: When would you use TransactionTemplate instead of @Transactional?

TransactionTemplate provides programmatic, fine-grained control that annotations can't express. Use it when: each iteration of a loop needs its own transaction so failures are isolated (batch processing), when a method is not Spring-managed (e.g., a Quartz job or a framework callback), or when you need to set rollback programmatically based on a runtime condition: status.setRollbackOnly(). For all typical business service code, @Transactional is simpler and preferred.

Q: Why should you cache DTOs or projections rather than JPA entities in Redis?

JPA entities may contain Hibernate proxy objects for LAZY-loaded associations. During serialization to Redis (typically via Jackson), accessing a proxy field outside a Hibernate session throws LazyInitializationException. Even if the proxy is initialized before serialization, the serialized form is large and includes all mapped fields — including audit timestamps and internal state not needed by callers. DTOs are plain POJOs with no Hibernate dependencies, explicitly shaped for consumer needs, and compact to serialize. Always ensure any cached object is fully materialized before it is placed in a distributed cache.

Follow-up: How do you handle Redis cache key collisions in a multi-service environment? A: Use a unique key prefix per service or cache name. In RedisCacheManagerBuilderCustomizer, configure RedisCacheConfiguration.defaultCacheConfig().prefixCacheNameWith("myservice:"). All cache entries from that service are namespaced, preventing collision with other services writing to the same Redis instance.

Q: What is the difference between FetchType.EAGER and FetchType.LAZY in JPA, and which is better?

EAGER loads the association immediately together with the parent entity in the same SQL (via JOIN). LAZY defers loading until the association is first accessed — if at all. Contrary to intuition, EAGER is generally worse: it always pays the join cost even when the association is never used, and when loading a list of N parent entities each with an EAGER @ManyToOne, it cannot be batched — producing N joins. LAZY is the correct default; pair it with explicit fetch strategies (JOIN FETCH, @EntityGraph, @BatchSize) when the association is actually needed.


Further Reading