September 10, 2007
Latches
<quote>
Latches
Latches are lightweight serialization devices used to coordinate multiuser access to shared data structures, objects, and files.
Latches are locks designed to be held for extremely short periods of time¿for example, the time it takes to modify an in-memory data structure. They are used to protect certain memory structures, such as the database block buffer cache or the library cache in the shared pool. Latches are typically requested internally in a ‘willing to wait’ mode. This means that if the latch is not available, the requesting session will sleep for a short period of time and retry the operation later. Other latches may be requested in an ‘immediate’ mode, which is similar in concept to a SELECT FOR UPDATE NOWAIT, meaning that the process will go do something else, such as try to grab an equivalent sibling latch that may be free, rather than sit and wait for this latch to become available. Since many requestors may be waiting for a latch at the same time, you may see some processes waiting longer than others. Latches are assigned rather randomly, based on the luck of the draw, if you will. Whichever session asks for a latch right after it was released will get it. There is no line of latch waiters¿just a mob of waiters constantly retrying.
Oracle uses atomic instructions like ‘test and set’ and ‘compare and swap’ for operating on latches. Since the instructions to set and free latches are atomic, the operating system itself guarantees that only one process gets to test and set the latch even though many processes may be going for it simultaneously. Since the instruction is only one instruction, it can be quite fast. Latches are held for short periods of time and provide a mechanism for cleanup in case a latch holder ‘dies’ abnormally while holding it. This cleanup process would be performed by PMON.
Enqueues, which we discussed earlier, are another, more sophisticated serialization device used when updating rows in a database table, for example. They differ from latches in that they allow the requestor to ‘queue up’ and wait for the resource. With a latch request, the requestor session is told right away whether or not it got the latch . With an enqueue lock, the requestor session will be blocked until it can actually attain it.
Note Using SELECT FOR UPDATE NOWAIT or WAIT [n], you can optionally decide not to wait for an enqueue lock if your session would be blocked , but if you do block and wait, you will wait in a queue.
As such, an enqueue is not as fast as a latch can be, but it does provided functionality over and above what a latch can offer. Enqueues may be obtained at various levels, so you can have many share locks and locks with various degrees of shareability.
Latch ‘Spinning’
One thing I¿d like to drive home with regard to latches is this: latches are a type of lock, locks are serialization devices, and serialization devices inhibit scalability. If your goal is to construct an application that scales well in an Oracle environment, you must look for approaches and solutions that minimize the amount of latching you need to perform.
Even seemingly simple activities, such as parsing a SQL statement, acquire and release hundreds or thousands of latches on the library cache and related structures in the shared pool. If we have a latch, then someone else might be waiting for it. When we go to get a latch, we may well have to wait for it ourselves.
Waiting for a latch can be an expensive operation. If the latch is not available immediately and we are willing to wait for it, as we likely are most of the time, then on a multi-CPU machine our session will spin¿trying over and over, in a loop, to get the latch. The reasoning behind this is that context switching (i.e., getting ‘kicked off’ the CPU and having to get back on the CPU) is expensive. So, if the process cannot get a latch immediately, we¿ll stay on the CPU and try again immediately rather than just going to sleep, giving up the CPU, and trying later when we¿ll have to get scheduled back on the CPU. The hope is that the holder of the latch is busy processing on the other CPU (and since latches are designed to be held for very short periods of time, this is likely) and will give it up soon. If after spinning and constantly trying to get the latch, we still fail to obtain it, only then will our process sleep, or take itself off of the CPU, and let some other work take place.
The logic is to try to get the latch and, failing that, to increment the miss count¿a statistic we can see in a statspack report or by querying the V$LATCH view directly. Once the process misses, it will loop some number of times (an undocumented parameter controls the number of times and is typically set to 2,000), attempting to get the latch over and over. If one of these get attempts succeeds, then it returns and we continue processing. If they all fail, the process will go to sleep for a short duration of time, after incrementing the sleep count for that latch. Upon waking up, the process begins all over again. This implies that the cost of getting a latch is not just the ‘test and set’-type operation that takes place, but can also be a considerable amount of CPU while we try to get the latch. Our system will appear to be very busy (with much CPU being consumed), but not much work is getting done.
</quote>
Extracted from:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:409308200346402947