_float## 3 times slower than double# ?

guig2000
New Member

Posts: 4

_float## 3 times slower than double# ? Mar 22, 2023 22:16:11 GMT

Quote

Post by guig2000 on Mar 22, 2023 22:16:11 GMT

Hello,

the last time I programmed something it was years ago and the last time I programmed in basic, it was maybe 30 years ago.

So I discovered that we can program like in the old days on modern computers thanks to qb64 (thanks a lot!!

to the devs) and other softwares.

So I'm very happy with that and I tried to build a Mandelbrot fractal explorer, but I realized that my CPU (ryzen 1600x) is able to compute like 45 millions iterations per second using _float and more than 120 millions when using double. As _float uses 10 bytes (despite 32 byte been reserved for them) and double 8 bytes, I fell that there is maybe an issue somewhere.

Is it an expected behavior, a know issue, or something else?

Last Edit: Mar 22, 2023 22:17:08 GMT by guig2000

bplus
Global Moderator

b = b + ...

Posts: 1,043

_float## 3 times slower than double# ? Mar 22, 2023 22:26:28 GMT

Quote

Post by bplus on Mar 22, 2023 22:26:28 GMT

Hello guig2000 , welcome!

Please show the code you are using to time the interations, a few changes between old QB like4.5 and current QB64 that runs compiled!
Float might run slower in attempts to have more accuracy ie carry more digits in decimal but I wouldn't think that the difference would be so great.

Update (running QB64pe v 3.6 just downloaded today!) on old HP Laptop Windows 10-64 ssd
Times round to nearest hundreth

_Title "Tiny Mandelbrot" 'b+ 2021-07-5 mod for timed test between _Float and Default Single

' Default single 4.69, 4.74, 4.74, 4.75
_Define A-Z As DOUBLE  ' 4.83, 4.84, 4.85, 4.88
'_Define A-Z As _FLOAT  ' 4.92, 4.92, 4.94, 4.94

Screen _NewImage(800, 600, 12)
t# = Timer(.001)
For round = 1 To 100
    For y = -35 To 35
        For x = -5 To 69
            m = 0: r = 0
            For k = 0 To 111
                j = r ^ 2 - m ^ 2 - 2 + x / 25
                m = 2 * r * m + y / 25
                r = j
                l = k And 15
                If j ^ 2 + m ^ 2 > 11 Then k = 112
            Next
            PSet (x + 18, y + 40), l
        Next
    Next
Next
Locate 10, 10
Print Timer(.001) - t#
Sleep

Nothing very shocking here? The more precision the slower it goes.

Last Edit: Mar 22, 2023 22:55:02 GMT by bplus

Cheers! to the stars of Basic where ever they shine!

guig2000
New Member

Posts: 4

_float## 3 times slower than double# ? Mar 22, 2023 23:51:54 GMT

Quote

Post by guig2000 on Mar 22, 2023 23:51:54 GMT

ok, that's my loop:

Dim Shared As Double invZOOM, Xd, Yd, DeltaShift, DeltaZOOM 'qbasic double usage allow to zoom up to 10 power of 13 before seeing calculation approximations
'Dim Shared As _Float invZOOM, Xd, Yd, DeltaShift, DeltaZOOM 'qb64 zoom *10power of 16 without artifact
Dim Shared As _Unsigned Integer MaxIt, SCwd, SChg 'qb64

    Input "screen width? ", SCwd 'qb64
    Input "screen heigh? ", SChg 'qb64
    Screen _NewImage(SCwd, SChg, 32) 'qb64

Dim As _Unsigned Integer PixelX, PixelY, It, Ox, Oy 'qb64 _Unsigned must be deleted on MSdos QuickBasic
Dim As _Unsigned Long C 'qb64
Dim As Double kx, ky, X0, Y0, X, Y, Xc, Yc ', Ox, Oy 'qbasic
'Dim As _Float kx, ky, X0, Y0, X, Y, Xc, Yc ', Ox, Oy 'qb64
Ox = SCwd / 2 'origin point on x axis
Oy = SChg / 2 'origin point on y axis

For PixelY = 16 To SChg - 1
    ' Convert pixel coordinates (PixelX,PixelY) to complex ones (X0,Y0) + zoom and shift  Placing most of them on Y loop reduce some calculation time
    kx = invZOOM * 4 / SCwd 'scale on X axis
    ky = -kx '-invZOOM * 2 / SChg 'scale on y axis
    Y0 = ky * (PixelY - Oy) + Yd 'simplification of (-Oy + Yd / ky) * ky + PixelY * ky
    For PixelX = 0 To SCwd - 1
        X0 = kx * (PixelX - Ox) - Xd 'simplification of (-Ox - Xd / kx) * kx + PixelX * kx
        Rem mandelbrot speed divergence calculation
        X = X0
        Y = Y0
        For It = 0 To MaxIt
            Xc = X * X
            Yc = Y * Y
            If Xc + Yc > 4 Then It = It + 1: Exit For
            Y = 2 * X * Y + Y0
            X = Xc - Yc + X0
        Next It
        It = It - 1
        ntIterations = ntIterations + It
        ' Compute the color to plot
        'If It < 10 Then c = 0 Else c = (255 * (It - 10) / (MaxIt - 10))'qbasic
        If It < MaxIt Then
            C = &HFFFFFF * It * invMaxIt
        Else
            C = &HFFFFFF '767
            ' C = CLng(8388607 * Log(It) / Log(MaxIt))
        End If

        'R = Int(C / 65536)
        'G = Int((C Mod 65536) / 256)
        'B = Int(C Mod 256)
  
        PSet (PixelX, PixelY), &HFF000000 + C 'plot c color with alpha chanel=255
  
Next PixelX, PixelY

Complete code in attachement.

SaveImage.bi (1.94 KB)ftfileop.bi (5.32 KB)FractNav.bas (7.04 KB)

I saw that there this fork phoenix edition, you speak about, but I did not watch if I should use it over the main branch.

Last Edit: Mar 22, 2023 23:58:07 GMT by guig2000

guig2000 New Member Posts: 4	_float## 3 times slower than double# ? Mar 23, 2023 0:53:28 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by guig2000 on Mar 23, 2023 0:53:28 GMT Ok, I downloaded QB64PE, which indeed in much active dev. tested my code in it, it work as well.

guig2000
New Member

Posts: 4

_float## 3 times slower than double# ? Mar 23, 2023 13:03:30 GMT

Quote

Post by guig2000 on Mar 23, 2023 13:03:30 GMT

So I searched why I had so different speed difference between your test code and my code and I discovered, at least on my computer:
1°) 90% of the time spend by the tiny test code seems to be calculating the four power of two operations of the loop.
2°)qb64 non PE is much slower at calculating ^2 with single precision.
3°)qb64 run at quite the same speed than qb64pe in others case.
4°)when replacing x^2 by x * x, it becomes 10 time faster. The code near two times slower with _float Vs double, while simple slightly faster than double.

My results:
original test code
qb64:
'single 7.743 7.788 7.465
' DOUBLE 4.857 4.794 4.852
'_float 4.931 4.974 4.944
qb64pe:
'single 5.337 5.389 5.370
' DOUBLE 5.280 5.282 5.288
'_float 5.443 5.553 5.503

test with
j = r ^ 2 - m ^ 2 - 2 + x / 25
and
If j ^ 2 + m ^ 2 > 11 Then k = 112
replaced by
j = r * r - m * m - 2 + x / 25
and
If j * j + m * m2 > 11 Then k = 112

qb64:
simple 0.297 0.290 0.281 0.280
double 0.360 0.376 0.392 0.393 0.390
_float 0.652 0.645 0.633
qb64pe:
simple 0.292 0.294 0.325 0.303
double 0.376 0.370 0.360 0.378 0.354
_float 0.652 0.636 0.635 0.628

same but with k=2222 and 2223
qb64:
simple 4.025 3.962 3.903
double 5.125 5.072 5.068 5.061
_float 9.109 9.171 9.293

qb64pe:
simple 3.996 3.927 3.947
double 5.12 4.979 5.004 4.981
_float 9.054 9.16 9.09

So yes, going from 8 bytes double to 10 byte _float give a gain in precision (2^-53 Vs 2^-63 ) but the cost is high. It's much more cost effective to replace simple by double.

Last Edit: Mar 23, 2023 13:04:18 GMT by guig2000

_float## 3 times slower than double# ?

Post by guig2000 on Mar 22, 2023 22:16:11 GMT

Post by bplus on Mar 22, 2023 22:26:28 GMT

Post by guig2000 on Mar 22, 2023 23:51:54 GMT

Post by guig2000 on Mar 23, 2023 0:53:28 GMT

Post by guig2000 on Mar 23, 2023 13:03:30 GMT