Business Central table extension object, when compiled and deployed, translates to a new SQL table with the same primary key as its base table. And from this moment on, queries sent from BC server to the extended table will include a join statement to collect data from both physical SQL tables into a single logical AL entity. Although, extended application can avoid the join by limiting the fields included in the query (I wrote about this in my previous post "Who benefits from partial records"), this table join becomes a major concern when it comes to extending the BC data model.
We know that joins can be quite painful for query performance. After all, the tug of war between data normalisation and de-normalisation has existed ever since Edgar Codd published his research paper on the relational model. But how bad really is this effect? I heard different opinions and approaches stemming from these opinions - from avoiding any extensions, even on setup tables which never store more than a few dozen records, to no restrictions at all, declaring multiple extensions on any table. I always stood on a more intermediate position keeping extensions away from ledger tables (or generally - from any tables that can hold relatively large amounts of data and whose read performance is crucial). But it was always rather a matter of personal preference, because I never gave it a thorough test to confirm my choice.
Now, that Microsoft announced upcoming changes in the extensions data structure to improve performance, my urge to test table extensions received an additional impulse, so I spent some time running FindSet's on a table varying the number of extensions and the volume of data, and here are the results of the testing.
Test configuration
The base table that I used for the test is "Detailed Cust. Ledg. Entry". I already used this table for some performance tests, so I just reused the same code from previous tests to generate the sufficient data volume. "Sufficient" here means adding 100 000 records on each iteration until the number of table records reaches 1 000 000.
After one million records, I cleaned the table, added another extension, and started a new series of 10 iterations, adding another set of 100 000 records in each.
Extensions installed on each iteration have the same structure and basically are just a copy-pasted definition of a table extension, changing IDs and the name suffix.
tableextension 50200 "Det. Cust. Ledg. Entry Ext1" extends
"Detailed Cust. Ledg. Entry"
{
fields
{
field(50200; "My Amount Ext1"; Decimal) { }
field(50201; "My Code Ext1"; Code[20]) { }
field(50202; "My Date Ext1"; Date) { }
}
}
FindSet on complete recordset
Now a few words about the timings that were measured in this test. The main objective is to find the impact of the extensions on data read performance, so on each iteration, I was running a FindSet selecting all records from the table. But considering Business Central read optimisations, a simple FindSet will not yield a relevant performance measure. BC queries are optimized for client performance, and every SELECT statement is issued with the FAST(50) option, which returns a partial dataset to the client as soon as the first 50 records are retrieved from the table. This statement allows the client to start processing the first portion of the data while the server is still running the query. It often results in SQL Server generating a suboptimal execution plan for the SELECT, but on the other hand, it lessens the importance of the server-side execution time on the overall system performance allowing parallel data processing.
From the performance testing perspective, this means that measuring the time it takes to run a FindSet without any data processing is useless. Run time for a hollow FindSet will always be a few milliseconds, no matter the table size (except the smallest tables where it can be close to 0).
Therefore, if all of my tests, I run FindSet with an empty repeat..until loop iterating on all records, thus measuring the time it takes for the client to receive the complete recordset from the server.
AvgRunTime := 0;
for I := 1 to 10 do begin
Database.SelectLatestVersion();
StartTime := Time();
DetCustLedgEntry.FindSet();
repeat
until DetCustLedgEntry.Next() = 0;
RunTimes[I] := Time() - StartTime;
end;
for I := 1 to 10 do
AvgRunTime += RunTimes[I];
AvgRunTime := Round(AvgRunTime / 10, 1);
And in order to reduce the influence of random fluctuations caused by other processes running at the same time, I repeat the measurement 10 times, taking the average of the 10 runs as the final value for each point.
The final dataset collected this way, includes 110 points:
Number of extensions ranging from 0 to 10
Number of records in the table from 100 000 to 1 000 000
After collecting all the timings for all executions, I built the surface chart demonstrating the dependency of the FindSet performance on the number of extensions for different table sizes.
These numbers certainly should not be considered an absolute benchmark, since the specific timings depend on multiple factors (hardware specification is the most obvious one). But I hope that the chart reflects the impact that table extensions make on the read performance in the consistent environment.
The blue ribbon in the lower part of the chart shows the relation between the number of extensions and the execution time for a relatively low number of records. It covers the 100 000 case and partially, 200 000, for a table with 0 to 5 extensions. While the read time noticeably grows while the number of installed extension increases, the growth is much more apparent on a larger table - the chart shows a much steeper rise on its right-hand side.
The same is true for the relation between the time and the number of records for a given fixed number of table extensions - the performance toll of extending a table grows faster for a larger number of extensions. And this correlation is more obvious on large dataset.
The next chart shows three slices of the surface along the data size axis - dependency of the run time on the number of records for 0, 5, and 10 extensions
And another slice on a different axis now - run time for different number of extensions with the fixed table size - one million records.
From this perspective, some run time fluctuations become noticeable. Unexpected performance degradation at the nodes with 6 and 8 extensions spoil the picture to some extent, but the overall linear growth still holds. All the other points hit the same trendline almost precisely. In all of the tests I've done, adding one extension increases the execution time of the FindSet on the complete dataset by approximately 10%.
FindSet on a filtered recordset
While selecting all rows from a table is an interesting exercise that can give some consistent numbers for analysis, this kind of reads are not the most common in practice. Usually, a FindSet would retrieve some filtered subset of the table records. So in the second part of my performance testing session, I tried to reflect this approach and limit the scope of FindSet to roughly 10% of the whole dataset.
Unfortunately, I can't call this second run successful, because, as you can see in the second surface chart, the numbers appear to be very inconsistent, especially for the high volume of data.
I believe, there are two explanations of this effect possible, and likely, each played its role. First, selecting a smaller recordset takes much less time, and this short test cycle is much more susceptible to fluctuations caused by random system processes. Since all tests were executed on a virtual server, these processes could run on another VM sharing resources with mine.
The slice on the number of extensions illustrates this kind of inconsistency when the time nearly doubles as the recordset is increased from 700 000 records to 800 000, and then drops again.
Another reason of this weird behaviour could be reflected in this chart showing the run time for the different number of extensions on a 1 000 000 recordset.
Unlike the previous chart, where a new portion of data is added to the existing dataset, each point in this picture is a completely new dataset, regenerated anew. Due to the random nature of the test dataset, the number of records satisfying the filtering criteria may differ between the runs. It's never dramatically different and cannot explain the wild veering of the chart, but it could add a few percentage points to the deviations of the timing.
To conclude, table extensions do not have an immediate dramatic impact on the application performance, but rather tend to increase the query running time linearly with each new extension added. In my tests, each extension added around 10% to the execution time of FindSet and iterating over the obtained recordset. This number, 10%, is approximately the ratio of the record size of my test extension to the base table being extension (Detailed Cust. Ledger Entry). I doubt that this ratio can be used as a rule of thumbs, but an interesting observation, nonetheless.
The most important conclusion. As I wrote in the beginning of the post, in the latest release plan for the Business Central 23, Microsoft announced changes in the extension data structure that should increase extension performance. While, the nature of the changes is not yet publicly known, the numbers obtained above will not be applicable in BC 23. But this can be a good benchmark for a new test when the next release is out. If the tests will be relevant at all. We'll see in a few months.
Comments