Vlad Mihalcea's Hypersistence

Data knowledge stack

Persistence Context Or First Level Cache

*If the transaction boundaries are not defined explicitly. Each statement is going to be executed within a separate implicit transaction by the database. This implicit transaction mode is known as AUTO COMMIT

*org.hibernate.session API and javax.persistence.EntityManager API represents a context of dealing with the persistent data, which is so-called Persistent Context.

*Persistent Data has a state in relation to both the persistent context and the underlying database

*Persistent context enqueues entity state transitions which get translated to DML statements upon flushing.

*Persistent context has at-most one managed reference for each entity instance. No matter how many times we load that entity, persistent context always returns the same object reference. This behavior is called as the first-level cache.

*Every time an entity is loaded, hibernate makes an additional copy of all entity property values. At flush time, every managed property is matched against the load-time snapshot value.
For managed entities, hibernate can auto-detect incoming changes and schedule SQL updates on our behalf. This process is called dirty checking.

*Analogue to deep comparison strategy, we can also mark dirty properties upon value changing (self-detection). Hibernate can weave this into bytecode either at compile time or class loading time.

FLUSHING

*Hibernate tries to defer the Persistence Context flushing up until the last possible moment. This strategy has been traditionally known as transactional write-behind.

*Write-back (also called write-behind): initially, writing is done only to the cache. The write to the backing store is postponed until the cache blocks containing the data are about to be modified/replaced by new content.

*At flush-time, the entity state transition is materialized into a database DML statement.

*From time to time the entity manager will execute the SQL DML statements needed to synchronize the data store with the state of objects held in the memory. This process is called FLUSH

*In Hibernate AUTO flush mode, if the executed query is not going to HIT the pending SQL INSERT/UPDATE/DELETE then the flush is not strictly required.

*Every DML statement runs inside a database transaction. Based on the current DB transaction isolation level, locks may be acquired for the current selected/ modified table rows. Deferring locking statements may increase performance. That's why hibernate, defers flushing till the last time possible. Reducing lock contention leads to high throughput.

*Hibernate can detect the HQL select, subselect, join and trigger the auto flush as the tables are explicitly referenced in HQL.
If HQL triggers DB triggers/ DB level cascading and those DMLs match executed query, flush will not be triggered and this is same for SQL queries also because hibernate cannot parse SQL queries.

*It is advisable to execute the queries at the beginning or towards the end of the transaction. If we interleave with query transaction, it triggers a premature flush trigger

*If we want to override the default flush strategy of hibernate, you can do it at the query level also.
assertEquals(product.getId(), session.createSQLQuery("select id from product")
.setFlushMode(FlushMode.ALWAYS).uniqueResult());
FLUSHMode.ALWAYS does not apply any optimization. Instead, you can declare entity Syncronization.

assertEquals(product.getId(), session.createSQLQuery("select id from product").addSynchronizedEntityClass(Product.class).uniqueResult());
So, hibernate flushes only when the syncronized entity undergoes any change.

*In JPA's AUTO flush mode, the flush happens intuitively

JPA to Hibernate Flushing strategies

JPA FlushModeType	Hibernate FlushMode	Hibernate implementation details
AUTO	AUTO	The Session is sometimes flushed before query execution.
COMMIT	COMMIT	The Session is only flushed prior to a transaction commit.
	ALWAYS	The Session is always flushed before query execution.
	MANUAL	The Session can only be manually flushed.

*The flushing operations order is:

inserts
updates
deletions of collections elements
inserts of the collection elements
deletes

Second Level Cache

*A proper caching solution would have to span across multiple Hibernate sessions and that's the reason Hibernate supports an additional second-level cache as well.

*In the second level cache, every entity is stored as CacheEntry Object, and the hydrated state is used for creating the CacheEntry. Hydration is a state when a JDBC ResultSet is transformed to an array of raw values

*Hydrated state is saved in the currently running persistence context as an EntityEntry object, which encapsulated the loading-time entity snapshot. The hydrated state is then used by:

the default dirty checking mechanism, which compares the current entity data against the loading-time snapshot.
the second-level cache, whose cache entries are build from the loading-time entity snapshot

*The second level cache uses a disassembled hydrated state (EntityEntry/Loading time entity snapshot objects not the whole entity graph). They are disassembled prior to being stored in CacheEntry. Each entity update affects only one cache entity, not the whole object graph.

Application Level Cache

*For long conversations, transaction boundaries should be pushed to the application layer. To support repeatable reads for multiple user requests, we need to preserve the state at the application level. Hibernate offers two strategies for long conversations 1. Extended persistent context 2. Detached objects

*For isolating concurrent updates at the application level, concurrency has to be dealt at the application level. Application level concurrency can be achieved by optimistic locking. Entities should be annotated with @Version attribute

*Row-level locking is not feasible in case of application level transactions since the row level locks are capable of blocking concurrent modifications within the context of a single database transaction only.

*In case of application level transactions, SQL projections always load latest data from the backend. whereas, Entity SQL loads data from the first level cache ensuring repeatable reads.

Persist & Merge

*persist() - used for new entity persistence (or) equivalent record is not available at the DB
*merge () - used for detached entity persistence

* For managed entities, you don’t need any save method because Hibernate automatically synchronizes the entity state with the underlying database record.

*@MapsId can be used to make the id column serves as both primary key and FK by sharing the primary key of a parent entity. @Id column uses @MapsId parent association property for identity generation. This is an effective way to create a one-to-one mapping. This reduces the footprint required to manage additional index.

JDBC logging

*Datasource-proxy and p6Spy are JDBC logging frameworks. Datasource-proxy proxies datasource to intercept statement executions. p6spy can proxy either data source or driver. In addition to logging, these custom logging frameworks can also provide cross-cutting features like long running query detection and we can also write custom statement execution listeners to detect N+1 query problems

Blogs on Java