top of page

To sort, or not to sort, that is the question

  • adrogin
  • May 30
  • 10 min read

Back in the old Navision days, when clients' data were stored in the Navision native database instead of Azure SQL Server, and functions like FindSet and FindFirst did not exist in the AL language, every data read from a DB table required some thoughtful considerations. Navision server had an option to connect to SQL Server, but there was always a good chance that the code we were writing would run on the native DB (and there are users out there who run it even to this day). The issue was that the native database did not have that smart query optimizer which helps SQL Server to choose the best and the most performant way to execute a query in every possible situation.

When developing for the Navision DB, developers had to take care of selecting the proper index which would be used to search the data. And the index selected by the developer in the app code was pretty much all the query optimisation available. So there was a function named SetCurrentKey invoked before the Find statement. SetCurrentKey did exactly what its name implies - selected the index for the query. And the rows in the resulting recordset were returned sorted based on the selected key. But this is for the Navision DB. SQL Server, on the other hand, has been able to select the best index automatically probably since its Sybase years and did not need instructions on index choice. But ordering forced by SetCurrentKey applied equally to the Navision native DB and the SQL Server database.

Now the Navision database is long gone (but never forgotten), and the SetCurrentKey function name remains as an artifact from the past, a misnomer often found in AL code for no other reasons than historical legacy and causing confusion for newcomers, similar to the LockTable function, which also does something quite different from what the name suggests. But misnomer or not, SetCurrentKey is used in AL nowadays to add an ordering clause to queries that Business Central sends to SQL Server, and the question of performance impact of the recordset ordering keeps reappearing time and again, sparking numerous online and offline discussions, with strong arguments on both sides: for, as well as against using SetCurrentKey in data reads. So does this function still make sense and can make our FindSet faster or will it only make things worse?


Prologue: Why does this question even exist?

Before attempting to answer the question whether or not SetCurrentKey can have a positive impact on query performance, let's see why we even ask this question in the first place. How can sorting improve performance? SQL Server can select the best index (and do many other smart query optimizations), so adding a sorting operator to the query will only make it slower, won't it?

Well... Not quite right. Because Business Central always adds the ORDER BY clause to every query it sends to SQL Server. And if we do not request sorting explicitly, this simply means that the resulting recordset will be ordered by the primary key fields. So, if I select, let's say, item ledger entries and sort the result by the Item No., like this:

ItemLedgerEntry.SetCurrentKey("Item No.");
ItemLedgerEntry.FindSet();

SQL Server will receive a query similar to this:

SELECT <Some columns here> 
    FROM "default".dbo."CRONUS Ltd_$Item Ledger Entry$437dbf0e-84ff-417a-965d-ed2bb9650972" WITH(READUNCOMMITTED)
    ORDER BY [Item No_] ASC OPTION(FAST 50)

But if I drop the SetCurrentKey from the AL code, the ORDER BY clause will not disappear from the SQL query. It will be replaced by the following:

SELECT <Some columns here> 
    FROM "default".dbo."CRONUS Ltd_$Item Ledger Entry$437dbf0e-84ff-417a-965d-ed2bb9650972" WITH(READUNCOMMITTED)
    ORDER BY [Entry No_] ASC OPTION(FAST 50)

Ordering is always in the query, and removing the "order by" clause is out of AL capabilities. Therefore the question exists: how does this ordering impact performance? Is it better to sort the result on the fields of an index, which presumably should be selected by the query optimizer, or should we leave it alone and return the result sorted on the PK?


Dataset

Below, I will show a few examples illustrating this problem. In all examples I will be selecting records from the Item Ledger Entry table with different variations of filters and ordering. In my Cronus database, ILE has around 1700000 records. These entries have been posted with items and customers randomly selected. There are 10 000 items and 10 000 customers in the DB, so on average, 170 entries are posted for each item and each customer.


Chapter 1, where sorting is always slow

One example of a query whose performance can never benefit from sorting is a query that does not have an index to seek on, and therefore triggers a clustered index scan. For example, like this one:

ItemLedgerEntry.SetLoadFields("Remaining Quantity");
ItemLedgerEntry.SetRange("Posting Date", 20250201D, 20250228D);
ItemLedgerEntry.FindSet();

Item Ledger Entry does not have an index on the Posting Date column that could enable seeks without another predicate on the Item No., and this code resolves to a clustered index scan which takes approximately 0.4 second.


If I change the previous code example adding SetCurrentKey before FindSet

ItemLedgerEntry.SetLoadFields("Remaining Quantity");
ItemLedgerEntry.SetRange("Posting Date", 20250201D, 20250228D);
ItemLedgerEntry.SetCurrentKey("Posting Date");
ItemLedgerEntry.FindSet();

SQL Server does not have any other option but run a sorting operator after the index scan:


And the total query execution time goes up to 1 second. But what also happens in this case is that I am adding a blocking operator to the execution plan and the database server cannot send any data to BC until the whole query is completed, so my BC session remains idle all this time waiting for the query to complete. Whilst in the first case, BC can receive the first block of data and start processing records as soon as the first 50 records have been retrieved (remember that FAST 50 option in the SQL queries above?) So this SetCurrentKey is a double blow on the performance of the code - it results in a slower execution plan, and it blocks parallelism between SQL Server and Business Central.

So far so good: no index, no ordering. But this a simple example and a pretty obvious conclusion. What about a case when an index exists?


Chapter 2, in which item ledger entries are selected

Now I will select some item ledger entries filtered on a field that has a good index on it. It can be Item No. for instance. Here is the sample code that I will be running for different ranges of items in the filter.

ItemLedgerEntry.SetLoadFields("Remaining Quantity");
ItemLedgerEntry.SetRange("Item No.", '6028', '6029');
ItemLedgerEntry.FindSet();

And this code result in a query with the following execution plan.

Here, a nonclustered index can be used to find entries filtered by the item code, but since the query includes ordering by the primary key, the result must be sorted. The recordset is small, and sorting is fast (but as we remember, it is also a blocking operator that does not allow the optimizer to satisfy the FAST 50 option and send partial result before the completion of the query execution).


Now, let's add SetCurrentKey on the Item No. to the code and see how it changes the execution plan.

ItemLedgerEntry.SetLoadFields("Remaining Quantity");
ItemLedgerEntry.SetRange("Item No.", '6028', '6029');
ItemLedgerEntry.SetCurrentKey("Item No.");
ItemLedgerEntry.FindSet();

Plan for this code is this:


Ok, this looks better. This is practically the same execution plan, but without the blocking Sort operator, because the index seek retrieves data in the order required by the query and no additional sorting is needed.

And if I run it a few times adding one more item to the range in each iteration (which means plus circa 170 item ledger entries returned by the query in iteration), I can show how the execution time depends on the number of items in a graph.

Curiously, selecting one item and 173 ledger entries is a tad slower than selecting two items and 343 records. This is not a glitch, but a consistent result. This difference is due to different compilation times for two statements. In my tests, I reset both Business Central and SQL Server data caches, as well as the procedure cache before each run, so results in the chart actually include statement compilation time in addition to the query execution time. And the compilation of a statement with the strict equality predicate (one item) takes 10-15 ms longer compared to the respective statement with a range condition. But this does not change the fact that the code with the SetCurrentKey is a little bit faster. Just around 10 ms better, but adding this to the fact that it does not have the blocking sorting operator and allows the client to start processing data earlier, and we have a definite favorite. Right? Right?!

Let's be patient and see what happens as we add more records to the result. Now I will throw more data to the test: 10 items in each iteration.

And immediately as I switch from 10 to 20 items, I see a strange twitch in the chart data and a different time trend after it. This means that at some point between 1700 and 3400 returned rows, the query optimizer decided that the previously selected plan is inefficient for the estimated number of rows, and now it's time to change the plan. From this moment on, it runs a clustered index scan whose execution cost is constant, and the slow increase in the total time for this plan is only due to the increasing amount of data sent back to the client.

The version of the code with SetCurrentKey still sticks to the same execution plan where the combination of key lookup and nested loops is growing ever more expensive. As the number of rows in the result grows, at some point the execution plan for this procedure also switches to the clustered index scan. But now this plan must include the sorting operator because typically the clustered index matches the primary key (and this is definitely true for the item ledger entry), and SetCurrentKey now forces ordering on the Item No. field.

As I continue the test, now adding 100 items per iteration, the SetCurrentKey version is falling behind with each iteration.

Chapter 3, where SetCurrentKey enables warp speed


So far, in my code samples I used the FindSet function to select item ledger entries. And although I apply SetLoadFields to limit the number of selected columns, FindSet does not precisely respect this instruction, taking it as a recommendation to be considered rather than an imperative to be executed. So when I run the FindSet function, the actual SQL query includes a bunch of fields besides those that I sent in SetLoadFields arguments. System fields like timestamp, $systemId, $systemCreatedBy, etc, are always included in FindSet results. Primary key fields are there too, as well as any fields that the BC runtime finds necessary to load. And all these added fields do not allow me to use the full strength of covering indexes. For this reason we see all the key lookups and clustered index scans in the examples above, where a nonclustered index should suffice.

A query object, on the other hand, is a different matter. AL query respects the columns defined in its declaration and besides fields explicitly defined in the object declaration, the query only adds primary key fields to the selection list. But since all non-primary keys in BC always include all fields of the primary key, we still have a chance to find a good covering index for our query. Like this Key17 which perfectly suites my need.

key(Key17; "Item No.", Open, "Variant Code", Positive, "Lot No.", "Serial No.", "Package No.")
{
    IncludedFields = "Remaining Quantity";
}

So if I create a query object with one column, "Remaining Quantity" which I want to select, and a filter on the "Item No.", it should be able to make good use of an existing index on the Item Ledger Entry table. Except one troublesome nuance. SQL query generated from an AL query object always orders the result, just like FindSet does. If no explicit ordering is defined, the result will be ordered on the primary key, therefore a sorting operator will be included in the execution plan, and once again, we will end up with another clustered index scan instead of the expected seek on a nonclustered index. But if I enable sorting on the fields of the index which I want to use, this will eliminate the need for additional sorting and the query can be covered by the nonclustered index. So, to make this work I need to set the OrderBy property in the query, specifying all the fields of the index which I intend to use, and all these fields must be also included in the query data item.

query 51100 "Item Ledger Entries SCKT"
{
    QueryType = Normal;
    OrderBy = ascending(
        ItemNo, Open, VariantCode, Positive, LotNo, SerialNo, PackageNo);

    elements
    {
        dataitem(Item_Ledger_Entry; "Item Ledger Entry")
        {
            column(RemainingQuantity; "Remaining Quantity") { }
            column(ItemNo; "Item No.") { }
            column(Open; Open) { }
            column(VariantCode; "Variant Code") { }
            column(Positive; Positive) { }
            column(LotNo; "Lot No.") { }
            column(SerialNo; "Serial No.") { }
            column(PackageNo; "Package No.") { }
        }
    }
}

OK, we just installed a warp drive for AL. But as any complex machinery, it requires careful maintenance. I cannot simply include the Item No. field in the dataset and set the OrderBy property to ItemNo, because in this case BC would select the first index in the table that starts with this field. And guess what? Of course this first index does not have Remaining Quantity in the included fields and doesn't work as a covering index for my query. So I have to include all fields of this rather long index in the query definition and set ordering on all these fields. This makes the object fragile because any change in the index throws our FTL drive back to subluminal speed.

But for now at least, it is fast: nothing but a nonclustered index seek, and 16 000 records are selected in 0.015 seconds.


Epilogue


Now I hope you understand why the only correct answer to the question whether we should use SetCurrentKey when selecting table records in AL is "it depends". There are cases when it can be the best solution to a performance problem and can elevate code performance to another level. There are other situations where it can cripple the code. And in most cases it depends on the size of the table, estimated number of the records returned by the query, data distribution, the phase of the moon today, and a million other things. I showed a few examples with a little over 1.5 million records, but with ten million records in the table, tipping points for plan selection would be different. And this is only a simple case without extensions (no joins), and I haven't even mentioned the impact of cache and parameter sniffing.

To me, the default option is not to use SetCurrentKey, because by and large, it is more likely to make things worse. And then enable ordering when it is necessary after a good test.

Comments


SIGN UP AND STAY UPDATED!

Thanks for submitting!

  • GitHub
  • Grey LinkedIn Icon
  • Twitter

© 2021 Developer's thoughts about Microsoft Business Central.  Proudly created with Wix.com

bottom of page