Here's other methods for storing data. Fixed length encoding and Huffman variable length encoding.
4 letters ABCD in string 24 Bytes
string = "ACCACBCBCDCDCCDCCBCCBCBC"
fixed length encoding
no need to count fixed length for all letters
binary1 code = "00"
binary2 code = "01"
binary3 code = "10"
binary4 code = "11"
001010001001100110111011101011101001101001100110 = 48 Bits
48 Bits / 8 = 6 Bytes
24 Bytes = 6 Bytes to store
Huffman variable length encoding
count letters in string
Frequency A = 2
Frequency B = 5
Frequency C = 14
Frequency D = 3
code these by frequency of letters
most frequent letters use smaller binary code
binary1 code = "101"
binary2 code = "11
binary3 code = "0"
binary4 code = "100"
101001010110110100010000100001100110110 = 39 Bits
39 Bits / 8 = 4,875 rounded up = 5 Bytes
24 Bytes = 5 Bytes to store
Storing data efficiently
Re: Storing data efficiently
Last edited by neil on Mar 29, 2022 3:42, edited 1 time in total.
Re: Storing data efficiently
To store all values in range "AAAAAAAA" to "ZZZZZZZZ", you need 38 bits or 5 bytes.
Code: Select all
print log(26^8) / log(2)
print (log(26^8) / log(2)) / 8
Re: Storing data efficiently
Thanks for the info. I am trying to learn how to read a Byte at bit level. Is a bit on or off. I found the bit operator from the FreeBasic Manual. Also another method is a = 128: if (a and 128) = 128 THEN Print "MSB is on": Here's the FreeBasic method.
Which method would be best to use ? Also found I could not use UByte because it requires a negative number.
Which method would be best to use ? Also found I could not use UByte because it requires a negative number.
Code: Select all
DIM AS Byte a,b,n
'128 64 32 16 8 4 2 1 add up the bits that are on for a decimal number
'a = 255 turn all bits on
a = 170
'bit 0 = lsb
'bit 7 = msb
for n = 0 to 7
b = bit(a, n)
print "bit ";n;" = ";b
next
Re: Storing data efficiently
Here my demo of an encoder - decoder using fixed binary codes and Huffman variable binary codes. It's It's only been tested using a 24 byte string in the code. So far I can encode and decode fixed binary codes and Huffman variable binary codes.
Code: Select all
'small scale test of encoder - decoder method using fixed length codes and variable length huffman codes.
'This is only a test of reducing data size. Its only been tested with a 24 Byte string in code below.
Dim As UByte a,b,c,d,i
DIM As string sa,sb,s1,s2,s3,s4,s5
'24 byte string
sa = "ACCACBCBCDCDCCDCCBCCBCBC"
print "String ";sa;" =";len(sa);" Bytes"
'fixed length binary codes
for i = 1 to 24
sb = mid(sa,i,1)
if sb = "A" Then s1 = "00":s5 = s5 + s1
if sb = "B" Then s2 = "01":s5 = s5 + s2
if sb = "C" Then s3 = "10":s5 = s5 + s3
if sb = "D" Then s4 = "11":s5 = s5 + s4
next
print s5;" =";len(s5); " Bits"
print "Using Fixed length Binary encoding"
i = len(s5) / 8
print "(Number of bits / 8) = ";i;" Bytes to store";len(sa);" Bytes"
print
sb =""
Print "Decoded fixed length Binary codes"
Print "Decoded string = ";
'fixed length decoding
for i = 1 to 48 step 2
sb = mid(s5,i,2)
if sb ="00" Then print "A";
if sb ="01" Then print "B";
if sb ="10" Then print "C";
if sb ="11" Then print "D";
next
'End fixed length decoding
print
print
'Huffman variable length Binary encoding
print "Using Huffman variable length Binary encoding"
sb ="":s1 ="":s2 ="":s3 ="":s4 ="":s5 =""
'variable length Binary codes
s1 = "101"
s2 = "11"
s3 = "0"
s4 = "100"
for i = 1 to 24
sb = mid(sa,i,1)
if sb = "A" Then a += 1
if sb = "B" Then b += 1
if sb = "C" Then c += 1
if sb = "D" Then d += 1
next
Print "String = ";sa
Print
Print "Frequency A = ";a
Print "Frequency B = ";b
Print "Frequency C = ";c
Print "Frequency D = ";d
Print
sb =""
for i = 1 to 24
sb = mid(sa,i,1)
if sb = "A" THEN s5 = s5 + s1
if sb = "B" THEN s5 = s5 + s2
if sb = "C" THEN s5 = s5 + s3
if sb = "D" THEN s5 = s5 + s4
next
print s5;" =";len(s5); " Bits"
i = len(s5) / 8
print "(Number of bits / 8) = ";i;" Bytes to store";len(sa);" Bytes"
sb = "":s1 ="":s2 ="":s3 ="":c = 0
print
Print "Decoded Huffman variable length Binary codes"
'decodes Huffman variable length Binary codes
Print "Decoded string = ";
for i = 1 to 39
sb = mid(s5,i,1)
c += 1
if sb = "0" and c = 1 THEN sb ="":c = 0:Print "C";
if sb = "1" and c = 1 Then s1 = sb
if c = 2 and s1 = "1" and sb = "1" Then c = 0: Print "B";
if c = 2 and s1 = "1" THEN s2 = sb
if c = 3 and s1 = "1" and s2 = "0" and sb ="0" THEN c = 0: Print "D";
if c = 3 and s1 = "1" and s2 = "0" and sb ="1" THEN c = 0: Print "A";
next
print
Re: Storing data efficiently
Nicely done. Some suggestions for you, if you want to get deeper into this topic:
1) I see that you are using strings to hold binary data as zero and one ASCII characters, do you know about the other bit manipulation operators SHL, SHR, OR, XOR?
2) Your Huffman encoder and decoder use a predetermined assignment of codes to symbols a.k.a. "static Huffman table". The next useful feature to add would be a routine that counts how many times each symbol appears, and based on those counts it then automatically assigns codes with the appropriate length. That way you get a table which is customized for the data.
3) If you really want to go nuts, then try to implement Range coding. This is in many ways similar to Huffman except a code doesn't need to have a whole number of bits.
Re: Storing data efficiently
Here's a demo of Arithmetic Compression. I got the math formula from wikipedia.org. It only encodes and decodes a string of 6 characters. Most of the examples I have seen does not seem encode or decode more than a few characters.
Code: Select all
'Arithmetic coding demo math formula from https://en.wikipedia.org/wiki/Arithmetic_coding
Dim As Single nb,lb,j,k,l,m,n
Dim Shared As Single e,ub,ua
Dim As UByte a,b,d
Dim Shared As string s1,s2
Declare Sub Decode ()
'Character Values
A = 0:B = 1:D = 3
s1 ="DABDDB"
Print "Original String to encode ";s1
s1 = "":s2 = ""
'lower bound
lb = (6^5* D)+3*(6^4* A)+(3*1)*(6^3* B)+(3*1*2)*(6^2* D)+(3*1*2*3)*(6^1* D)+(3*1*2*3*3)*(6^0* B)
'upper bound
'''''''''''D A B D D B
nb = lb + (3 * 1 * 2 * 3 * 3 * 2) 'frequency of characters
' 25110 separate 251 and 10
s1 = str(nb)
s2 = left(s1,3)
ub = val(s2)
s2 = right(s1,2)
ua = val(s2)
print "Value to store 1 Byte = ";ub
' convert 25110 to 25100
' 25110 = 251 * 10^2 = 25100
ub = ub * ua^2
print "Encoded number ";ub
j = (ub - 6^5 * 3) / 3
k = (j - 6^4 * 0) / 1
l = (k - 6^3 * 1) / 2
m = (l - 6^2 * 3) / 3
n = (m - 6^1 * 3) / 3
Print "Original string recovered from encoded number ";
e = ub / 6^5
decode
e = j / 6^4
decode
e = k / 6^3
decode
e = l / 6^2
decode
e = m / 6^1
decode
e = n / 6^0
decode
Print
Print "Done"
sleep
Sub Decode ()
if e >= 3 and e < 6 Then Print "D"; 'range 3 - 5.9
if e < 1 Then Print "A"; 'range 0 - 0.999
if e >= 1 and e < 3 Then Print "B"; 'range 1 - 2.9
End Sub
Re: Storing data efficiently
damageX wrote
"The next useful feature to add would be a routine that counts how many times each symbol appears"
I can maybe help with that bit:
"The next useful feature to add would be a routine that counts how many times each symbol appears"
I can maybe help with that bit:
Code: Select all
#include "file.bi"
Type counter
As Long tally,ascii
End Type
Sub Quicksort(array() As counter,begin As Long=0,Finish As Long=255)
Dim As Long i=begin,j=finish
Dim As counter x =array(((I+J)\2))
While I <= J
While array(I).tally > X.tally:I+=1:Wend
While array(J).tally < X.tally:J-=1:Wend
If I<=J Then Swap array(I),array(J): I+=1:J-=1
Wend
If J >begin Then Quicksort(array(),begin,J)
If I <Finish Then Quicksort(array(),I,Finish)
End Sub
Function tally(somestring As String,partstring As String) As long
Dim As Long i,j,ln,lnp,count,num
ln=Len(somestring)
lnp=Len(partstring)
count=0
i=-1
Do
i+=1
If somestring[i] <> partstring[0] Then Continue Do
If somestring[i] = partstring[0] Then
For j=0 To lnp-1
If somestring[j+i]<>partstring[j] Then Continue Do
Next j
End If
count+=1
i=i+lnp-1
skip:
Loop Until i>=ln-1
Return count
End Function
Sub savefile(filename As String,p As String) '' unused here
Dim As Long n=Freefile
If Open (filename For Binary Access Write As #n)=0 Then
Put #n,,p
Close
Else
Print "Unable to save " + filename:Sleep:End
End If
End Sub
Function loadfile(file As String) As String
Dim As Long f=Freefile
If Open (file For Binary Access Read As #f)=0 Then
Dim As String text
If Lof(f) > 0 Then
text = String(Lof(f), 0)
Get #f, , text
End If
Close #f
Return text
Else:Print file;" not found":sleep:end:End If
End Function
Sub gettally(s As String,t() As counter)
For n As Long=0 To 255
t(n).tally=tally(s,Chr(n))
t(n).ascii=n
Next
End Sub
Dim As String filename="c:\windows\system32\kernel32.dll"'<<-------------- your file
Dim As String s=loadfile(filename)
Dim As counter t(0 To 255)
gettally(s,t())
quicksort(t())
Dim As Long check
Print "ascii code","frequency"
For n As Long=0 To 255
Print t(n).ascii,t(n).tally
check+=t(n).tally
Next
Print "___________________________"
Print "tallycheck",check,"Filelength = ";Filelen(filename)
Sleep
Re: Storing data efficiently
Thanks dodicat. Nice symbol counter.