This release focuses on heavy testing of the new variable-precision decimal stack. Adding tests also means finding bugs. Another significant change is support for fixed-precision hardware-accelerated floating point, using 32-bit and 64-bit IEEE754 binary floating-point representation. Since this uses a binary format, some decimal values do not map to decimal correctly. For example, displaying 1.2 with a large number of decimals will show residue, because 1.2 does not have an exact (zero-terminated) representation in binary. New features: * plotting: Make refresh rate configurable * menu: Add `/` key to `FractionsMenu` * hwfp: Add support for hardware-accelerated floating-point * menus: Add hardware floating-point flag to `MathModesMenu` * ui: Allow multiple uses of `.` to insert DMS separators * HMS: Editing of HMS values in HMS format Bug fixes: * stats: Fix crash on `variance` with single-column statistics * algebraic: Clear error before evaluating the function * functions: Correctly emit a type error for non-algebraics * ui: Make sure we save stack if closing the editor * logical: Fix mask for rotate left with 64-bit size * logical: Make sure we save args for single-argument logicals * flags: Update flags on `FlipFlag`, consume them from `BinaryToFlags` * stack: Show multi-line objects correctly * lists: Returns `Bad argument value` for index with bad arguments * lists: Return an empty list for tail of empty list * arithmetic: `→Frac` should not error on integers * power: Do not shut down during `WAIT` if on USB power Improvements: * menu: Shorten the labels `→QIter` and `→QPrec` to avoid scrolling * stack: Avoid running same code twice on simulator * ids: Add aliases for hardware floating point * functions: Optimize abs and neg * ui: Replace calls to `rt.insert` with calls to `insert` * menu: Reorganize fractions menu * dms: Do the DMS conversion using fractions * list: Adjust multi-line rendering * copyright: Update copyright to 2024 * text: Return null text when indexing past end of text Testing: * tests: Increase the delay for help to draw * tests: Add tests for hardware-accelerated floating-point * tests: Add shifts and rotate tests * tests: Check flag functions * tests: Test DMS and HMS operations * tests: Add test for `integrate` using decimal values * tests: Test multi-line stack display * tests: Add tests for `GETI` * tests: Min and max commands * tests: Repair last regression test * tests: Check behaviour of 0^0 * tests: Avoid string overflow in case of very long message Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
6.8 KiB
Performance measurements
This sections tracks some performance measurements across releases.
NQueens (DM42)
Performance recording for various releases on DM42 with small
option (which is
the only one that fits all releases). This is for the same NQueens
benchmark,
all times in milliseconds, best of 5 runs, on USB power, with presumably no GC.
Version | Time | PGM Size | QSPI Size | Note |
---|---|---|---|---|
0.6.0 | 1183 | 409252 | 187516 | New table-free decimal |
0.5.2 | 1310 | 711228 | 1548076 | |
0.5.1 | ||||
0.4.10+ | 1205 | 651108 | RPL stack runloop | |
0.4.10 | 1070 | 650116 | Focused optimizations | |
0.4.9+ | 1175 | Range-based type checks | ||
0.4.9+ | 1215 | Remove busy animation | ||
0.4.9 | 1447 | 646028 | 1531868 | No LastArgs in progs |
0.4.8 | 1401 | 633932 | 1531868 | |
0.4.7 | 1397 | 628188 | 1531868 | |
0.4.6 | 1380 | 629564 | 1531868 | |
0.4.5 | 1383 | 624572 | 1531868 | |
0.4.4 | 1377 | 624656 | 1531868 | Implements Undo/LastArg |
0.4.3S | 1278 | 617300 | 1523164 | 0.4.3 build "small" |
0.4.3 | 1049 | 717964 | 1524812 | Switch to -Os |
0.4.2 | 1022 | 708756 | 1524284 | |
0.4.1 | 1024 | 687444 | 1522788 | |
0.4 | 998 | 656516 | 1521748 | Feature tests 7541edf |
0.3.1 | 746 | 618884 | 1517620 | Faster busy 3f3ab4b |
0.3 | 640 | 610820 | 1516900 | Busy anim 4ab3c97 |
0.2.4 | 522 | 597372 | 1514292 | |
0.2.3 | 526 | 594724 | 1514276 | Switching to -O2 |
0.2.2 | 723 | 540292 | 1512980 |
NQueens (DM32)
Performance recording for various releases on DM32 with fast
build option.
This is for the same NQueens
benchmark, all times in milliseconds,
best of 5 runs. There is no GC column, because it's harder to trigger given how
much more memory the calculator has. Also, experimentally, the numbers for the
USB and battery measurements are almost identical at the moment. As I understand
it, there are plans for a USB overclock like on the DM42, but at the moment it
is not there.
Version | Time | PGM Size | QSPI Size | Note |
---|---|---|---|---|
0.6.0 | 1751 | 467260 | 187948 | New table-free decimal |
0.5.2 | 1752 | 856228 | 1550436 | |
0.5.1 | 1746 | |||
0.5.0 | 1723 | |||
0.4.10+ | 1804 | 761252 | RPL stack runloop | |
0.4.10 | 1803 | 731052 | Focused optimizations | |
0.4.9 | 2156 | 772732 | 1534316 | No LastArg in progs |
0.4.8 | 2201 | 749892 | 1534316 | |
0.4.7 | 2209 | 742868 | 1534316 | |
0.4.6 | 2204 | 743492 | 1534316 | |
0.4.5 | 2171 | 730092 | 1534316 | |
0.4.4 | 2170 | 730076 | 1534316 | Implements Undo/LastArg |
0.4.3 | 2081 | 718020 | 1527092 | |
0.4.2 | 2242 | 708756 | 1524284 | |
0.4.1 | 2152 | 687500 | 1522788 | |
0.4 | Feature tests 7541edf | |||
0.3.1 | ||||
0.3 | ||||
0.2.4 | ||||
0.2.3 |
Collatz conjecture check
This test checks the tail recursion optimization in the RPL interpreter.
The code can be found in the CBench
program in the Demo.48S
state.
The HP48 cannot run the benchmark because it does not have integer arithmetic.
Timing on 0.4.10 are:
- HP50G: 397.438s
- DM32: 28.507s (14x faster)
- DM42: 15.769s (25x faster)
Version | DM32 ms | DM42 ms |
---|---|---|
0.6.0 | 26256 | 15355 |
0.5.2 | 26733 | 15695 |
0.4.10 | 28507 | 15769 |
SumTest (decimal performance)
VP = Variable Precision
ID = Intel Decimal Library
HW = Hardware-accelerated (float
or double
types)
For 100000 loops, we see that the variable-precision implementation at 24-digit is roughly 10 times slower than the fixed precision implementation at 34 digits (128 bits).
Version | DM32 ms | DM42 ms |
---|---|---|
0.6.0 (VP24) | 2377390 | 1768510 |
0.5.2 (ID) | 215421 | 143412 |
For 1000 loops, comparing variable-precision decimal with the earlier Intel decimal
Version | DM32 ms | DM42 ms |
---|---|---|
0.6.4 (VP24) | 32346 | 23011 |
0.6.4 (VP12) | 13720 | 10548 |
0.6.4 (VP6) | 6905 | 5623 |
0.5.2 (ID) | 2154 | 1434 |
Time in millisecond for 1000 loops:
DM32 Version | HW7 | HW16 | VP6 | VP12 | VP24 | VP36 |
---|---|---|---|---|---|---|
0.6.4 | 1414 | 1719 | 6905 | 13720 | 32346 | 60259 |
0.6.2 | 7436 | 16017 | 34898 | 62012 | ||
0.6.0 (Note) | 23773 | |||||
0.5.2 (ID) | 2154 |
DM42 Version | HW7 | HW16 | VP6 | VP12 | VP24 | VP36 |
---|---|---|---|---|---|---|
0.6.4 | 422 | 705 | 5623 | 10548 | 23811 | 42363 |
0.6.2 | 5842 | 10782 | 23714 | 42269 | ||
0.6.0 (Note) | 17685 | |||||
0.5.2 (ID) | 1434 |
Note: Results for 0.6.0 with variable precision are rtificially good because intermediate computations were not made with increased precision.
Drawing sin X
with FunctionPlot
Configuration | DM32 ms | DM42 ms |
---|---|---|
HW7 | 1869-2000 | 1681-1744 |
HW16 | 1928-2067 | 1679-2060 |
ID | 2332-5140 | |
VP24 | 3683-6005 | 3377-3511 |
VP36 | 6567-10186 | 4434-4709 |
VP48 | 8377-10259 | 5964-6123 |
Crash at precision 3