![]() ![]() We’ve already seen how requesting access to two resources can easily lead to deadlocks if you do not plan ahead to request them in some agreed upon order. This happens when a page is reorganized due to B-tree splits or merges, or when a record is deleted (and subsequently: purged) so that “the point on the axis” vanishes and thus locks have to be “inherited” to the resulting gap (symmetrical problems occur when a new point splits a gap). I’ve already talked a lot about operations involving one queue, but not much about situations where we have to move locks across two queues. There are 3 hash tables: for record locks, for predicate locks and table locks – the last one uses locked table’s id for hashing, and got it’s own separate 512 shards for latching) Instead, we could try to associate something with a “hash table bucket”, which is almost what we do – we just add an additional modulo 512 step to make number of “shards” fixed and independent from the number of hash table buckets which you can configure at run time. This makes it impractical and unhelpful to associate anything with a “lock queue”. This means that “lock queues” for many different resources which happen to be hashed to the same bucket are mixed together into a single list. But the actual implementation is that locks reside in a hash table, with one doubly-linked list of lock structures per bucket where the hash is computed based on of the resource. The concept is a useful lie, that I’ve used to explain what Lock System tries to achieve, and you may find comments in our code talking about it as it was real. ( The technical reason it is implemented like that is because “lock queue” does not really exist in InnoDB code explicitly. Instead of having one latch per lock queue, we use a slightly different approach: we have fixed number of “shards” into which the lock queues are “sharded” and each shard has it’s own latch. All the other operations in our code base involved one or two lock queues. The changes described in previous articles moved these costly operations to a separate thread and made sure they do not have to latch the whole Lock System while they operate. One of the reasons it took so long to bring Paweł’s idea to reality should hopefully be clear now after reading previous articles in the series: the Lock System is a very complicated beast, and there were at least two places which tried to do something globally on the whole wait-graph as opposed to locally within one queue: deadlock detection and CATS both performed a DFS. So, this article is one level of abstraction lower than what we were dealing with in previous articles: the lock you want to enqueue may wait several seconds before being granted, but the mere operation of enqueuing it, this tiny, very short operation of allocating the lock object, and appending it to the list, is what we are concerned about now. high-frequency “latching” not the high-level long-term “locking” – what we care here is data integrity of the queue itself, and how to coordinate operations on the queue object such as “enqueue”, “dequeue” and “iterate” – each such operation is very short in practice, but historically required latching the whole Lock System, so no other thread could touch any queue until we’ve finished. ![]() Note that this is a low-level change w.r.t. For example if a transaction needs to enqueue a waiting lock for a row in one table, this operation could be done in parallel to another transaction releasing a lock on a different resource. The idea seems relatively simple to explain: let the threads which operate on lock queues for different resources operate in parallel instead of latching the whole Lock System. This post is about the change most important for the performance of Lock System, “WL#10314: InnoDB: Lock-sys optimization: sharded lock_sys mutex”, which arrived in 8.0.21 several years after a proof of concept was proposed and implemented by Paweł Olchawa. We’ve left out technical detail: the queue itself is a data structure, which will be accessed from many (perhaps thousands) threads in parallel – how can we ensure the integrity of the queue and fast parallel operation? It looks like, ironically, the Lock System itself needs some form of latching inside of it. Also we have learned that they form “queues”, conceptually one for each resource. So far we saw that access right currently granted and waiting to be granted are represented as record locks and table locks objects in memory, which we can inspect via performance_schema.data_locks. ![]() In this blog series, I’m describing how InnoDB locks data (tables and rows) in order to provide illusion to clients that their queries are executed one after another, and how this was improved in recent releases. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |