Storing data efficiently

General FreeBASIC programming questions.
neil
Posts: 586
Joined: Mar 17, 2022 23:26

Re: Storing data efficiently

Post by neil »

Here's other methods for storing data. Fixed length encoding and Huffman variable length encoding.

4 letters ABCD in string 24 Bytes
string = "ACCACBCBCDCDCCDCCBCCBCBC"

fixed length encoding
no need to count fixed length for all letters
binary1 code = "00"
binary2 code = "01"
binary3 code = "10"
binary4 code = "11"
001010001001100110111011101011101001101001100110 = 48 Bits
48 Bits / 8 = 6 Bytes
24 Bytes = 6 Bytes to store

Huffman variable length encoding
count letters in string
Frequency A = 2
Frequency B = 5
Frequency C = 14
Frequency D = 3

code these by frequency of letters
most frequent letters use smaller binary code
binary1 code = "101"
binary2 code = "11
binary3 code = "0"
binary4 code = "100"
101001010110110100010000100001100110110 = 39 Bits
39 Bits / 8 = 4,875 rounded up = 5 Bytes
24 Bytes = 5 Bytes to store
Last edited by neil on Mar 29, 2022 3:42, edited 1 time in total.
badidea
Posts: 2591
Joined: May 24, 2007 22:10
Location: The Netherlands

Re: Storing data efficiently

Post by badidea »

To store all values in range "AAAAAAAA" to "ZZZZZZZZ", you need 38 bits or 5 bytes.

Code: Select all

print log(26^8) / log(2)
print (log(26^8) / log(2)) / 8
neil
Posts: 586
Joined: Mar 17, 2022 23:26

Re: Storing data efficiently

Post by neil »

Thanks for the info. I am trying to learn how to read a Byte at bit level. Is a bit on or off. I found the bit operator from the FreeBasic Manual. Also another method is a = 128: if (a and 128) = 128 THEN Print "MSB is on": Here's the FreeBasic method.
Which method would be best to use ? Also found I could not use UByte because it requires a negative number.

Code: Select all

DIM AS Byte a,b,n
'128 64 32 16 8 4 2 1 add up the bits that are on for a decimal number
'a = 255 turn all bits on

a = 170
'bit 0 = lsb
'bit 7 = msb

for n = 0 to 7
b = bit(a, n)
print "bit ";n;" = ";b
next
neil
Posts: 586
Joined: Mar 17, 2022 23:26

Re: Storing data efficiently

Post by neil »

Here my demo of an encoder - decoder using fixed binary codes and Huffman variable binary codes. It's It's only been tested using a 24 byte string in the code. So far I can encode and decode fixed binary codes and Huffman variable binary codes.

Code: Select all

'small scale test of encoder - decoder method using fixed  length codes and variable length huffman codes.
'This is only a test of reducing data size. Its only been tested with a 24 Byte string in code below.

Dim As UByte a,b,c,d,i
DIM As string sa,sb,s1,s2,s3,s4,s5

'24 byte string
sa = "ACCACBCBCDCDCCDCCBCCBCBC"

print "String ";sa;" =";len(sa);" Bytes"

'fixed length binary codes
for i = 1 to 24
sb = mid(sa,i,1)
if sb = "A" Then s1 = "00":s5 = s5 + s1
if sb = "B" Then s2 = "01":s5 = s5 + s2
if sb = "C" Then s3 = "10":s5 = s5 + s3
if sb = "D" Then s4 = "11":s5 = s5 + s4
next

print s5;" =";len(s5); " Bits"
print "Using Fixed length Binary encoding"
i = len(s5) / 8
print "(Number of bits / 8) = ";i;" Bytes to store";len(sa);" Bytes"
print

sb =""
Print "Decoded fixed length Binary codes"
Print "Decoded string = ";
'fixed length decoding
for i = 1 to 48 step 2
sb = mid(s5,i,2)
if sb ="00" Then print "A";
if sb ="01" Then print "B";
if sb ="10" Then print "C";
if sb ="11" Then print "D";
next
'End fixed length decoding
print
print

'Huffman variable length Binary encoding
print "Using Huffman variable length Binary encoding"

sb ="":s1 ="":s2 ="":s3 ="":s4 ="":s5 =""

'variable length Binary codes
s1 = "101"
s2 = "11"
s3 = "0"
s4 = "100"

for i = 1 to 24
sb = mid(sa,i,1)
if sb = "A" Then a += 1
if sb = "B" Then b += 1
if sb = "C" Then c += 1
if sb = "D" Then d += 1
next
Print "String = ";sa
Print
Print "Frequency A = ";a
Print "Frequency B = ";b
Print "Frequency C = ";c
Print "Frequency D = ";d
Print
sb =""
for i = 1 to 24
sb = mid(sa,i,1)
if sb = "A" THEN s5 = s5 + s1
if sb = "B" THEN s5 = s5 + s2
if sb = "C" THEN s5 = s5 + s3
if sb = "D" THEN s5 = s5 + s4
next
print s5;" =";len(s5); " Bits"
i = len(s5) / 8
print "(Number of bits / 8) = ";i;" Bytes to store";len(sa);" Bytes"


sb = "":s1 ="":s2 ="":s3 ="":c = 0

print
Print "Decoded Huffman variable length Binary codes"

'decodes Huffman variable length Binary codes
Print "Decoded string = ";
for i = 1 to 39
sb = mid(s5,i,1)
c += 1
if sb = "0" and c = 1 THEN sb ="":c = 0:Print "C";
if sb = "1" and c = 1 Then s1 = sb
if c = 2 and s1 = "1" and sb = "1" Then c = 0: Print "B";
if c = 2 and s1 = "1" THEN s2 = sb
if c = 3 and s1 = "1" and s2 = "0" and sb ="0" THEN c = 0: Print "D";
if c = 3 and s1 = "1" and s2 = "0" and sb ="1" THEN c = 0: Print "A";
next
print
DamageX
Posts: 130
Joined: Nov 21, 2009 8:42

Re: Storing data efficiently

Post by DamageX »

neil wrote: Mar 28, 2022 0:31 Here my demo of an encoder - decoder using fixed binary codes and Huffman variable binary codes. It's It's only been tested using a 24 byte string in the code. So far I can encode and decode fixed binary codes and Huffman variable binary codes.
Nicely done. Some suggestions for you, if you want to get deeper into this topic:

1) I see that you are using strings to hold binary data as zero and one ASCII characters, do you know about the other bit manipulation operators SHL, SHR, OR, XOR?
2) Your Huffman encoder and decoder use a predetermined assignment of codes to symbols a.k.a. "static Huffman table". The next useful feature to add would be a routine that counts how many times each symbol appears, and based on those counts it then automatically assigns codes with the appropriate length. That way you get a table which is customized for the data.
3) If you really want to go nuts, then try to implement Range coding. This is in many ways similar to Huffman except a code doesn't need to have a whole number of bits.
neil
Posts: 586
Joined: Mar 17, 2022 23:26

Re: Storing data efficiently

Post by neil »

Here's a demo of Arithmetic Compression. I got the math formula from wikipedia.org. It only encodes and decodes a string of 6 characters. Most of the examples I have seen does not seem encode or decode more than a few characters.

Code: Select all

'Arithmetic coding demo math formula from https://en.wikipedia.org/wiki/Arithmetic_coding

Dim As Single nb,lb,j,k,l,m,n
Dim Shared As Single e,ub,ua
Dim As UByte a,b,d
Dim Shared As string s1,s2
Declare Sub Decode ()

'Character Values
A = 0:B = 1:D = 3

s1 ="DABDDB"
Print "Original String to encode ";s1
s1 = "":s2 = ""
'lower bound
lb = (6^5* D)+3*(6^4* A)+(3*1)*(6^3* B)+(3*1*2)*(6^2* D)+(3*1*2*3)*(6^1* D)+(3*1*2*3*3)*(6^0* B)

'upper bound
'''''''''''D   A   B   D   D   B
nb = lb + (3 * 1 * 2 * 3 * 3 * 2) 'frequency of characters

' 25110 separate 251 and 10 
s1 = str(nb)
s2 = left(s1,3)
ub = val(s2)
s2 = right(s1,2)
ua = val(s2)

print "Value to store 1 Byte = ";ub 


' convert 25110 to 25100
' 25110 = 251 * 10^2 = 25100 
ub = ub * ua^2

print "Encoded number ";ub

j = (ub - 6^5 * 3) / 3
k = (j - 6^4 * 0) / 1
l = (k - 6^3 * 1) / 2
m = (l - 6^2 * 3) / 3
n = (m - 6^1 * 3) / 3

Print "Original string recovered from encoded number ";

e = ub / 6^5
decode
e = j / 6^4
decode
e = k / 6^3
decode
e = l / 6^2
decode
e = m / 6^1
decode
e = n / 6^0
decode
Print
Print "Done"
sleep

Sub Decode ()
if e >= 3 and e < 6 Then Print "D"; 'range  3 - 5.9
if e < 1  Then Print "A"; 'range 0 - 0.999
if e >= 1 and e < 3 Then Print "B"; 'range 1 - 2.9
End Sub
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Storing data efficiently

Post by dodicat »

damageX wrote
"The next useful feature to add would be a routine that counts how many times each symbol appears"
I can maybe help with that bit:

Code: Select all

#include "file.bi"

Type counter
    As Long tally,ascii
End Type

 Sub Quicksort(array() As counter,begin As Long=0,Finish As Long=255)
 Dim As Long i=begin,j=finish
 Dim As counter x =array(((I+J)\2))
   While I <= J
    While array(I).tally > X.tally:I+=1:Wend
    While array(J).tally < X.tally:J-=1:Wend
     If I<=J Then Swap array(I),array(J): I+=1:J-=1
   Wend
 If J >begin Then Quicksort(array(),begin,J)
 If I <Finish Then Quicksort(array(),I,Finish)
 End Sub
    

Function tally(somestring As String,partstring As String) As long
    Dim As Long i,j,ln,lnp,count,num
    ln=Len(somestring)
    lnp=Len(partstring)
    count=0
    i=-1
    Do
        i+=1
        If somestring[i] <> partstring[0] Then Continue Do
        If somestring[i] = partstring[0] Then
            For j=0 To lnp-1 
                If somestring[j+i]<>partstring[j] Then Continue Do
            Next j
        End If
        count+=1
        i=i+lnp-1
        skip:
    Loop Until i>=ln-1 
    Return count
End Function

Sub savefile(filename As String,p As String) '' unused here
    Dim As Long n=Freefile
    If Open (filename For Binary Access Write As #n)=0 Then
        Put #n,,p
        Close
    Else
        Print "Unable to save " + filename:Sleep:End
    End If
End Sub

Function loadfile(file As String) As String
    Dim As Long  f=Freefile
    If Open (file For Binary Access Read As #f)=0 Then
        Dim As String text
        If Lof(f) > 0 Then
            text = String(Lof(f), 0)
            Get #f, , text
        End If
        Close #f
        Return text
        Else:Print file;" not found":sleep:end:End If
    End Function
    
   
Sub gettally(s As String,t() As counter)
    For n As Long=0 To 255
        t(n).tally=tally(s,Chr(n))
        t(n).ascii=n
    Next
End Sub


Dim As String filename="c:\windows\system32\kernel32.dll"'<<--------------  your file

Dim As String s=loadfile(filename)

Dim As counter t(0 To 255)

gettally(s,t())
quicksort(t())
Dim As Long check
Print "ascii code","frequency"
For n As Long=0 To 255
    Print t(n).ascii,t(n).tally
    check+=t(n).tally
Next
Print "___________________________"
Print "tallycheck",check,"Filelength = ";Filelen(filename)

Sleep

 
neil
Posts: 586
Joined: Mar 17, 2022 23:26

Re: Storing data efficiently

Post by neil »

Thanks dodicat. Nice symbol counter.
Post Reply