Force imagecreate to allocate at desired margin
Force imagecreate to allocate at desired margin
For optimization purposes, is there any way to force image create receive allocation by 256 bytes margin ?
Usually windows return blocks at 8 byte margin.
In a loop of constant reallocation/destroy it may happen that randomly windows will give such block by any desired mask, 7/15/31, even 63.
Loop may find fast or repeat a lot up to hundred thousand times, i tried this reallocation with redim of two arrays, first a dummy one and second a target, redimming them variably with rnd*4096, then redim target as needed. It was at QB64 platform.
Here in FB everything is quite cool, i just make a byte array, find desired offset by mask and then manually use areas by pointers with desired operand size. Except image creation. If i already have allocated memory, why not just bind that screenptr to a desired address ?
Or maybe possible to do same with screenres ?
Usually windows return blocks at 8 byte margin.
In a loop of constant reallocation/destroy it may happen that randomly windows will give such block by any desired mask, 7/15/31, even 63.
Loop may find fast or repeat a lot up to hundred thousand times, i tried this reallocation with redim of two arrays, first a dummy one and second a target, redimming them variably with rnd*4096, then redim target as needed. It was at QB64 platform.
Here in FB everything is quite cool, i just make a byte array, find desired offset by mask and then manually use areas by pointers with desired operand size. Except image creation. If i already have allocated memory, why not just bind that screenptr to a desired address ?
Or maybe possible to do same with screenres ?
Re: Force imagecreate to allocate at desired margin
Found a decision though. 256 bytes too much, 256bits is to go.
Repeating screenres with different resolutions make no change to screenptr, but screen ptr is at 16 byte margin, so it is 128bit.
It is quite enough. Only writing down 256bit words may cause slowdown, while i write by uint64. Thus imagecreate become useless in my case, if use extra pages and flip them.
Repeating screenres with different resolutions make no change to screenptr, but screen ptr is at 16 byte margin, so it is 128bit.
It is quite enough. Only writing down 256bit words may cause slowdown, while i write by uint64. Thus imagecreate become useless in my case, if use extra pages and flip them.
Re: Force imagecreate to allocate at desired margin
Would help us if you describe what you're trying to achieve? Working with off-screen buffers? Paging?
Re: Force imagecreate to allocate at desired margin
I have an image being converted from low-res bitmap with colours to 4bpp version, then this image being processed using look-up tables to RGB output, apply heavy post-processing, and store to a new array. Then i need to scale this image out to main window, alas scaling must be "nearest neighbour" type, scaling factor 1/2/3/4 x.
The scaler i wrote just do mutiple writes by pointer step by step, writing, say 16 same dots, with one from source.
Speed is fast enough but some slowdown may be caused by unaligned data, so all my arrays are aligned.
While testing write speed to main screen vs some memory location i found that speed differs like 34000 fps vs 26000 fps. Screenlock neither matters.
So i conclude that SCREENRES gives me a location in video RAM via PCIE. It cannot be otherwise.
To make things faster i tried to apply alignment to screenres's allocated memory with no avail, it is always 128bit wise.
But my specs in GPU-Z show i have 256-bit bus. So this an FB hardcoded setting, somewhere, or driver behavior.
As far as i found, QB64 Phoenix edition have hardware images that scale just at lightspeed, though scaling mode used is unclear.
Maybe then FB have same approach ?
It looks like that speed loss is not much either, but it is true for a small image processing, and when it come to render at fullhd or 4k, things go otherwise.
PS. I've downloaded Fb help files already, i see there some screensync, screencopy, putimage and other things, nothing related to scaling at VRAM.
The scaler i wrote just do mutiple writes by pointer step by step, writing, say 16 same dots, with one from source.
Speed is fast enough but some slowdown may be caused by unaligned data, so all my arrays are aligned.
While testing write speed to main screen vs some memory location i found that speed differs like 34000 fps vs 26000 fps. Screenlock neither matters.
So i conclude that SCREENRES gives me a location in video RAM via PCIE. It cannot be otherwise.
To make things faster i tried to apply alignment to screenres's allocated memory with no avail, it is always 128bit wise.
But my specs in GPU-Z show i have 256-bit bus. So this an FB hardcoded setting, somewhere, or driver behavior.
As far as i found, QB64 Phoenix edition have hardware images that scale just at lightspeed, though scaling mode used is unclear.
Maybe then FB have same approach ?
It looks like that speed loss is not much either, but it is true for a small image processing, and when it come to render at fullhd or 4k, things go otherwise.
PS. I've downloaded Fb help files already, i see there some screensync, screencopy, putimage and other things, nothing related to scaling at VRAM.
Re: Force imagecreate to allocate at desired margin
Because there isn't any. The 'screen' is just a memory buffer, and how it is rendered depends on the driver used (DirectX, OpenGL, none).
Re: Force imagecreate to allocate at desired margin
@Emulog
maybe this thread about array alignment will be of some use to you viewtopic.php?p=283671#p283671
maybe this thread about array alignment will be of some use to you viewtopic.php?p=283671#p283671
Re: Force imagecreate to allocate at desired margin
Thanks.
In that topic i found same conclusion that single dimensioned arrays with pointer access is a good solution.
Also i found that direct offset calculation is somewhat faster than usual a=data(index,index,index), for 2D or 3D arrays.
Just do var=*(arrptr+(index1*n)+(index2*n)+index3), where n are integer values.
Also if data in the array is optimized for speed, like most unused indexes come first, and most used last, then less cache miss occur.
In that topic i found same conclusion that single dimensioned arrays with pointer access is a good solution.
Also i found that direct offset calculation is somewhat faster than usual a=data(index,index,index), for 2D or 3D arrays.
Just do var=*(arrptr+(index1*n)+(index2*n)+index3), where n are integer values.
Also if data in the array is optimized for speed, like most unused indexes come first, and most used last, then less cache miss occur.
Re: Force imagecreate to allocate at desired margin
If you like, you can create an image from first principles straight into a ubyte array.
imagecreate is not needed, but I use it here as a comparison.
You could probably align this array to 16 or 32 offsets.
I have done it in the past, but I'll have to dig out some old code for the alignment.
Anyway, here is createimage (which can be done at any time, you don't need specially to define a gfx screen like imagecreate).
imagecreate is not needed, but I use it here as a comparison.
You could probably align this array to 16 or 32 offsets.
I have done it in the past, but I'll have to dig out some old code for the alignment.
Anyway, here is createimage (which can be done at any time, you don't need specially to define a gfx screen like imagecreate).
Code: Select all
'#cmdline "-exx"
sub createimage(w as long,h as long,clr as ulong=rgba(255,0,255,255),p() as ulong)
#define pad(n) iif(n mod 4=0,n,n + 4-(n mod 4)) 'multiple of 4
redim p(pad(w)*h+8)
p(0)=7 'always
p(1)=4 'pixelsize
p(2)=w 'width
p(3)=h 'height
p(4)=pad(w)*4 'pitch -- padded to a multiple of 16 pixels
for a as long=5 to 7
p(a)=0 'reserved
next
for a as long=8 to ubound(p)
p(a)=clr 'colour
next
end sub
redim as ulong b()
createimage(203,200,,b())
screen 19,32
'AS a comparison
dim as ulong ptr q=imagecreate(203,200)
line @b(0),(0,0)-(202,199),rgb(255,255,255),b
line @b(0),(25,25)-(175,175),rgb(0,200,0),bf
circle @b(0),(100,100),75,rgb(255,0,0)
draw string @b(0),(50,100),"array image"
line q,(0,0)-(202,199),rgb(255,255,255),b
line q,(25,25)-(175,175),rgb(0,200,0),bf
circle q,(100,100),75,rgb(255,0,0)
draw string q,(50,100),"pointer image"
put(10,20),@b(0),trans
put(220,20),q,trans
dim as long x,y,pitch,size
imageinfo q,x,y,,pitch,,size
locate 20
print "width","height","pitch","size"
print x,y,pitch,size,"pointer"
imageinfo @b(0),x,y,,pitch,,size
print x,y,pitch,size,"array"
print
sleep
Re: Force imagecreate to allocate at desired margin
just a comment on -exx, it's good to use it while working/debugging your program but after it's debugged you probably should remove it as it can dramatically slow-down the performance
Re: Force imagecreate to allocate at desired margin
Method 2
image to an aligned array.
tested 64 and 32 and gas 64
Slightly better with 64 bits.
image to an aligned array.
Code: Select all
#include "crt.bi"
'#cmdline "-exx"
dim as ulong a(15)=_
{Rgb(0,0,0),_
Rgb(170,0,0),_
Rgb(0,170,0),_
Rgb(170,170,0),_
Rgb(0,0,170),_
Rgb(170,0,170),_
Rgb(0,85,170),_
Rgb(170,170,170),_
Rgb(85,85,85),_
Rgb(255,85,85),_
Rgb(85,255,85),_
Rgb(255,255,85),_
Rgb(85,85,255),_
Rgb(255,85,255),_
Rgb(85,255,255),_
Rgb(255,255,255)}
#define rd( c ) ( ( c ) Shr 16 And 255 )
#define gr( c ) ( ( c ) Shr 8 And 255 )
#define bl( c ) ( ( c ) And 255 )
#define al( c ) ( ( c ) Shr 24
#define Offset 256\8 '<< ------------------------------- offset value
#macro align(s,n)
scope
dim as long f
do
redim as integer b(f) 'get some granularity
redim s(n+f)
if cast(uinteger,@s(0)) mod Offset <>0 then f+=1:continue do
''if cast(uinteger,@s(0)) mod 2*Offset =0 then f+=1:continue do ' other conditions if needed
var i=cast(uinteger,@s(0))
redim preserve s(n)
if cast(uinteger,@s(0))<> i then f+=1:continue do
exit do
loop
puts "loops " +str(f+1)
end scope
#endmacro
Screen 19
dim as long counter
Var i=Imagecreate(300,16*20)
Dim As Long size
Imageinfo i,,,,,,size
Dim As Ubyte u(any) 'create an array
align(u,size) 'align the array
memcpy(@u(0),i,size) 'transfer image to array
'draw stuff to image
For n As Long=0 To 15*20 Step 20
Line @u(0),(0,n)-(300,n+20),n\20,bf
draw string(340,n),"rgb("+ str(rd(a(n\20)))+","+str(gr(a(n\20)))+","+str(bl(a(n\20)))+")"
Next
Put(20,0),@u(0),pset
locate 25
print "@start of data ";@u(0), iif(cast(uinteger,@u(0)) Mod Offset=0,"aligned "+str(8*Offset)+" bits","??")
Sleep
Slightly better with 64 bits.