db48x/doc/6-Performance.md
Christophe de Dinechin c736e6a388 doc: Record performance data about unit conversion
Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
2024-07-28 14:14:04 +02:00

8.1 KiB

Performance measurements

This sections tracks some performance measurements across releases.

NQueens (DM42)

Performance recording for various releases on DM42 with small option (which is the only one that fits all releases). This is for the same NQueens benchmark, all times in milliseconds, best of 5 runs, on USB power, with presumably no GC.

Version Time PGM Size QSPI Size Note
0.6.0 1183 409252 187516 New table-free decimal
0.5.2 1310 711228 1548076
0.5.1
0.4.10+ 1205 651108 RPL stack runloop
0.4.10 1070 650116 Focused optimizations
0.4.9+ 1175 Range-based type checks
0.4.9+ 1215 Remove busy animation
0.4.9 1447 646028 1531868 No LastArgs in progs
0.4.8 1401 633932 1531868
0.4.7 1397 628188 1531868
0.4.6 1380 629564 1531868
0.4.5 1383 624572 1531868
0.4.4 1377 624656 1531868 Implements Undo/LastArg
0.4.3S 1278 617300 1523164 0.4.3 build "small"
0.4.3 1049 717964 1524812 Switch to -Os
0.4.2 1022 708756 1524284
0.4.1 1024 687444 1522788
0.4 998 656516 1521748 Feature tests 7541edf
0.3.1 746 618884 1517620 Faster busy 3f3ab4b
0.3 640 610820 1516900 Busy anim 4ab3c97
0.2.4 522 597372 1514292
0.2.3 526 594724 1514276 Switching to -O2
0.2.2 723 540292 1512980

NQueens (DM32)

Performance recording for various releases on DM32 with fast build option. This is for the same NQueens benchmark, all times in milliseconds, best of 5 runs. There is no GC column, because it's harder to trigger given how much more memory the calculator has. Also, experimentally, the numbers for the USB and battery measurements are almost identical at the moment. As I understand it, there are plans for a USB overclock like on the DM42, but at the moment it is not there.

Version Time PGM Size QSPI Size Note
0.6.0 1751 467260 187948 New table-free decimal
0.5.2 1752 856228 1550436
0.5.1 1746
0.5.0 1723
0.4.10+ 1804 761252 RPL stack runloop
0.4.10 1803 731052 Focused optimizations
0.4.9 2156 772732 1534316 No LastArg in progs
0.4.8 2201 749892 1534316
0.4.7 2209 742868 1534316
0.4.6 2204 743492 1534316
0.4.5 2171 730092 1534316
0.4.4 2170 730076 1534316 Implements Undo/LastArg
0.4.3 2081 718020 1527092
0.4.2 2242 708756 1524284
0.4.1 2152 687500 1522788
0.4 Feature tests 7541edf
0.3.1
0.3
0.2.4
0.2.3

Collatz conjecture check

This test checks the tail recursion optimization in the RPL interpreter. The code can be found in the CBench program in the Demo.48S state. The HP48 cannot run the benchmark because it does not have integer arithmetic.

Timing on 0.4.10 are:

  • HP50G: 397.438s
  • DM32: 28.507s (14x faster)
  • DM42: 15.769s (25x faster)
Version DM32 ms DM42 ms
0.6.0 26256 15355
0.5.2 26733 15695
0.4.10 28507 15769

SumTest (decimal performance)

VP = Variable Precision ID = Intel Decimal Library HW = Hardware-accelerated (float or double types)

Variable Precision vs. Intel Decimal

For 100000 loops, we see that the variable-precision implementation at 24-digit is roughly 10 times slower than the fixed precision implementation at 34 digits (128 bits).

Version DM32 ms DM42 ms
0.6.0 (VP24) 2377390 1768510
0.5.2 (ID) 215421 143412

For 1000 loops, comparing variable-precision decimal with the earlier Intel decimal

Version DM32 ms DM42 ms
0.6.4 (VP24) 32346 23011
0.6.4 (VP12) 13720 10548
0.6.4 (VP6) 6905 5623
0.5.2 (ID) 2154 1434

1000 loops in various implementations

Time in millisecond for 1000 loops:

DM32 Version HW7 HW16 VP6 VP12 VP24 VP36
0.6.4 1414 1719 6905 13720 32346 60259
0.6.2 7436 16017 34898 62012
0.6.0 (Note) 23773
0.5.2 (ID) 2154
DM42 Version HW7 HW16 VP6 VP12 VP24 VP36
0.6.4 422 705 5623 10548 23811 42363
0.6.2 5842 10782 23714 42269
0.6.0 (Note) 17685
0.5.2 (ID) 1434

Note: Results for 0.6.0 with variable precision are artificially good because intermediate computations were not made with increased precision.

1M loops and iPhone results

1 million loops (tests performed with 0.7.1 while on battery):

Version Time (ms) Result
DM32 HW7 1748791 1'384'348.25
DM32 HW16 2188113 1'395'612.15872'53834'6
DM42 HW7 605102 1'384'348.25
DM42 HW16 806730 1'395'612.15872'53834'6

Drawing sin X with FunctionPlot

Configuration DM32 ms DM42 ms
HW7 1869-2000 1681-1744
HW16 1928-2067 1679-2060
ID 2332-5140
VP24 3683-6005 3377-3511
VP36 6567-10186 4434-4709
VP48 8377-10259 5964-6123

Crash at precision 3

Unit conversion benchmark

Units involve reading an external file, so this is a bit slow

«
  «
    0 25 for i
      i 1_m/s * 1_km/yr convert drop
    next
  »
  TEVAL
»

For 5 runs on USB power:

Configuration DM32 ms DM42 ms HP48S HP50G
Units in file 1539-1616 9069-9238
Units in memory 999-1047 3325-3503 6988-7816 2009-2012
No autosimplify 688-723 2383-2585
Commit no autosimplify 691-722 2362-2568