Conflicting updates in concurrent Business Central sessions

adrogin
Mar 6, 2023
8 min read

After my previous posts on lock escalation, it feels like a good time to take a step to a more practical plane and investigate some real applications of the theory. Locks are a mechanism of protecting a SQL session from a negative impact of other concurrent sessions, therefore it makes sense to look at SQL locks keeping the sessions interaction in mind. If there is only one user working with the database at a time and no background tasks in the job queue are executed, all the locking and escalations simply don't matter. This becomes crucial as the volume of data and the number of concurrent sessions grow, increasing the probability of data access conflicts.

To illustrate locking issues with concurrent data access, I am considering a common solution for bulk data upload when a process imports data from a file or an API into a buffer table before mapping the imported records to internal ERP entities. Let's assume, for the sake of the test, that we are developing a process that imports purchase invoices from an external data source into a buffer table to create invoices in BC. Based on this example, I am going to demonstrate a few simple scenarios with two BC sessions updating records in this buffer table.

So, first of all, I declare two sample buffer tables - one for document header and another one for document lines.

table 70101 "Purch. Doc. Header Buffer"
{
    DataClassification = CustomerContent;

    fields
    {
        field(1; "Document Type"; Enum "Purchase Document Type")
        {
            Caption = 'Document Type';
        }
        field(2; "Document No."; Code[20])
        {
            Caption = 'Document No.';
        }
        field(3; "Vendor No."; Code[20])
        {
            Caption = 'Vendor No.';
            TableRelation = Vendor;
        }
    }

    keys
    {
        key(PK; "Document Type", "Document No.")
        {
            Clustered = true;
        }
        key(VendorNo; "Vendor No.") { }
    }

    trigger OnDelete()
    var
        PurchDocLineBuffer: Record "Purch. Doc. Line Buffer";
    begin
        PurchDocLineBuffer.SetRange("Document Type", Rec."Document Type");
        PurchDocLineBuffer.SetRange("Document No.", Rec."Document No.");
        PurchDocLineBuffer.DeleteAll();
    end;
}

This was the header table. And the next one is for lines.

table 70102 "Purch. Doc. Line Buffer"
{
    DataClassification = CustomerContent;

    fields
    {
        field(1; "Document Type"; Enum "Purchase Document Type")
        {
            Caption = 'Document Type';
        }
        field(2; "Document No."; Code[20])
        {
            Caption = 'Document No.';
        }
        field(3; "Line No."; Integer)
        {
            Caption = 'Line No.';
        }
        field(4; "Item No."; Code[20])
        {
            Caption = 'Item No.';
            TableRelation = Item;
        }
        field(5; Description; Text[50])
        {
            Caption = 'Description';
        }
        field(6; Quantity; Decimal)
        {
            Caption = 'Quantity';
        }
    }

    keys
    {
        key(PK; "Document Type", "Document No.", "Line No.")
        {
            Clustered = true;
        }
    }
}

Tables are initialized with test data - two invoices. Document type = "Invoice" for both header records, document numbers are INVOICE_1 and INVOICE_2. All other values are assigned randomly - random vendors,

items, and random quantities. I created the tables to resemble a real-life practical scenario, but actually none of these values matter. What does matter, though, is the number of records in the Purch. Doc. Line Buffer. This number is the key parameter which impacts the locking behaviour and through it, the mutual influence of the two sessions.

Invoice 1 in subsequent tests has 3000 lines, invoice 2 - 6200 lines. Such monster invoices are not so rare for big retailers placing wholesale orders to fulfil the stock of their extensive chain of stores. The other side of this transaction would be the wholesaler sending the sales invoice to the retailer - with the same number of lines on the sales side.

I defined a codeunit which simply deletes invoice lines and takes a nap to simulate a long data processing transaction.

codeunit 70101 "Process Purch. Doc. Buffer"
{
    TableNo = "Purch. Doc. Header Buffer";

    trigger OnRun()
    begin
        Rec.DeleteAll(true);
        Sleep(7000);
    end;
}

All the while that the application thread is suspended by the Sleep function, the SQL transaction with all its locks is active. To see how the transactions interact, I run this codeunit in a background session and update a line record in the main thread. The background session is emphasized in bold font below.

action(DeleteAndUpdate)
{
    Caption = 'Delete and update';
    ApplicationArea = All;

    trigger OnAction()
    var
        PurchDocHeaderBuffer: Record "Purch. Doc. Header Buffer";
        PurchDocLineBuffer: Record "Purch. Doc. Line Buffer";
        StartTime: Time;
        SessionNo: Integer;
    begin
        PurchDocHeaderBuffer.SetRange(
            "Document Type",
            PurchDocHeaderBuffer."Document Type"::Invoice);
        PurchDocHeaderBuffer.SetRange("Document No.", 'INVOICE_1');
        StartSession(
            SessionNo, Codeunit::"Process Purch. Doc. Buffer",
            CompanyName(), PurchDocHeaderBuffer);

        Sleep(5000);
        StartTime := Time();
        PurchDocLineBuffer.SetRange("Document No.", 'INVOICE_2');
        PurchDocLineBuffer.FindFirst();
        PurchDocLineBuffer.Description := 'New description';
        PurchDocLineBuffer.Modify(true);

        Message(
            'Time to update the record: ' +
            Format(Time - StartTime) + ' ms.');
    end;
}

The foreground session also waits for 5 seconds before reading and updating the record. This pause is required to synchronize the threads and allow the background session enough time to delete the INVOICE_1 lines and move to the suspended state.

After updating the record, the UI trigger shows a message informing how long it took to read and modify the record. As you can see in the code, this time does not include the waiting period - only the database communication.

I execute the test and see that the update took 35 milliseconds. Note that the INVOICE_2 line is updated while the other session is holding a lock on the INVOICE_1 being deleted at the same time. So this test demonstrated that both sessions coexist without causing issues to each other.

Now I swap the invoice numbers so that the background session deletes the INVOICE_2, and the main user session updates INVOICE_1 - no other changes in the code - deploy and run the procedure again.

And all of a sudden, the update of a single record got stuck for 7 seconds. Similar message can be found in the Windows event log. This modification is now reported as a long-running query.

Time in the event log message applies only to the update statement, while the client application included the select statement in the measure, plus the overhead of communication between the BC server (which is running in my Docker container on a VM) and the Azure SQL database. Because of this, the number in the event log is slightly lower compared to the time measured in the application. But this discrepancy still does not explain the delay. Nothing has changed (except swapping two invoices), but the statement suddenly slowed down dramatically - the same simple update on a single record is running more than 200 times slower compared to the first execution.

What happens in the background

The answer to this question lies in the background - in the session deleting the buffer lines for the massive INVOICE_2. And the answer is - lock escalation.

dm_tran_locks dynamic management view can give a valuable insight into the locks acquired by active transactions. To have an overview of the active locks, I query this view from the SQL Server Management Studio, joining the result with sys.objects to retrieve the table name.

SELECT
	lock.resource_type,
	lock.resource_description,
	obj.name,
	lock.request_mode,
	lock.request_status,
	lock.request_session_id
FROM sys.dm_tran_locks lock
	join sys.objects obj ON lock.resource_associated_entity_id = obj.object_id

The session is holding an intent exclusive lock on the header table from which it deletes one record, and a partitioned exclusive lock on the lines. With 6200 lines being deleted in a single SQL statement, the lock gets escalated from separate rows to the table level. (You can find some details of the lock escalation in my previous post AL Updates and Locking).

Note that the query above is applicable only for object-level locks. In case of row or page locks, the resource_associated_entity_id column in sys.dm_tran_locks contains the HoBt ID instead of the object ID. In this case the ID should be joined with the sys.partitions view to retrieve the object name.

So the background session is now holding an exclusive lock on the whole table rather then a set of rows being deleted, therefore the session which tries to update a single line in the INVOICE_1, has to wait until the delete transaction is committed and the table lock is released. And this causes a significant performance degradation for the procedure which otherwise completes in a few milliseconds.

This process can be illustrated in the following simple diagrams.

The first diagram demonstrates the situation where Session 1 deletes the invoice INVOICE_1 and holds the lock until the end of the transaction. This invoice is relatively small (or at least, not big enough to trigger the lock escalation). Session 1 in this scenario does not interfere with Session 2, which updates its single line in the INVOICE_2 and completes in 35 milliseconds.

When Session 1 deletes the more massive INVOICE_2, the situation changes dramatically, because the number of individual locks acquired on the lines of the INVOICE_2 exceeds the escalation threshold, and Session 1 locks the whole table rather than lines of a single invoice.

Here, Session 2 can't place its own lock on the invoice line and is forced to wait until Session 1 wakes from slumber and releases the lock.

Lock timeout

So far, two sessions updating and deleting data, were clashing for around 7 seconds, which forced one of the sessions to wait for the other one to finish. But what happens if the session holding the lock does not complete and continues blocking access to the table? Let's increase the transaction time and see the result.

In the next test, I increased the sleep period for the background session, keeping it dormant for 15 seconds instead of 7.

trigger OnRun()
begin
    Rec.DeleteAll(true);
    Sleep(15000);
end;

After running the same code with the new sleep time, I receive an error message reading "We can't save your changes right now" instead of my timer notification.

Windows event log contains an error NavCSideSQLLockTimeoutException. It still classifies the query as a long running SQL statement, but this time the statement execution has been interrupted because it exceeded the lock timeout.

In the event log error information, we can see that the waiting transaction was forced to roll back after approximately 10 seconds, and this is the default lock timeout for Business Central sessions.

In SaaS BC instances, we don't have control over this value, but for on-premise installations it can be overridden. The default 10 seconds threshold is stored in the database, the value $ndo$dbproperty.locktimeoutperiod.

If we want to change the timeout for our on-premise instance, we can override this value in the BC server settings. The config key SqlLockTimeoutOverride controls the lock timeout. Its default value 0 means that the setup from the database is used, whilst setting it to -1 will disable the timeout, causing transactions to wait indefinitely. Any positive value is the actual timeout in seconds.

Lock escalation and inserts

All the experimenting and theorizing above applies to inserts as well as updates. While the database engine keeps exclusive locks on the row level, other transactions can insert new records. But as soon as the the exclusive lock is escalated to the table level, it will prevent other transactions from inserting new records into the Purch. Doc. Line Buffer table.

Practical takeaways

Now we have seen how multiple sessions can affect each other and cause overall system performance degradation when they update the same table, even if the datasets are strictly separated between sessions. So what are the practical outcome of this? How can we reduce the mutual impact of multiple sessions?

Keep transactions short

If possible, split your long process into smaller bits. Is it possible to establish intermediate checkpoints where the transaction can be committed without jeopardizing the data consistency? Then maybe it's a good idea to do it. If a long routine can recover and resume processing in case of failure instead of rolling back to the very beginning, this can improve overall user experience.

Shift bulk updates to the end of the transaction

My tests above demonstrated a bad example of a transaction that deleted a large amount of data and continued a long processing (or actually sleep to simulate long processing) afterwards. Such bulk updates should be kept closer to the end of the transaction. Although this can pose another threat if multiple concurrent sessions are running massive updates - the risk that the long processing will be rolled back and wasted if the bulk update cannot acquire the necessary locks and runs into a timeout.

Split bulk updates into smaller chunks

One ModifyAll / DeleteAll statement is not necessarily better than two or three. Remember that the lock escalation threshold applies to each separate SQL statement within a transaction. A single ModifyAll updating 10 000 records will definitely cause a lock escalation. The same 10 000 records modified in 3 separate ModifyAll statements, although still executed in a single transaction, will be updated without escalation. In environments with a few concurrent sessions updating large amounts of data in the same tables, this may be a lesser evil.

Conflicting updates in concurrent Business Central sessions

What happens in the background

Lock timeout

Lock escalation and inserts

Practical takeaways

Recent Posts

SIGN UP AND STAY UPDATED!