Asc 195

dafhi · Post by **dafhi** » Mar 01, 2023 7:47

seeing 195 at each odd-numbered position

dim as string r
r = "àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß`abcdefghijklmnopqrstuvwxyz{|}~0@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"

for i as long = 0 to 127
  ? r[i]; " ";
Next

SARG · Post by **SARG** » Mar 01, 2023 9:37

These characters are coded in unicode with 2 bytes so the odd numbers are 195.....
On my system no problem (only the values of the right column) therefore it depends of the used environment.

Code: Select all

U+00E0	à	195 160
U+00E1	á	195 161
U+00E2	â	195 162
U+00E3	ã	195 163
U+00E4	ä	195 164
U+00E5	å	195 165
U+00E6	æ	195 166
U+00E7	ç	195 167
U+00E8	è	195 168
U+00E9	é	195 169

dafhi · Post by **dafhi** » Mar 01, 2023 18:37

i'll look into the other string types. thanks SARG!

dafhi · Post by **dafhi** » Mar 02, 2023 23:14

another something interesting

if i say

Code: Select all

dim as string * 256 a

and then fill each char with corresponding index,
then

Code: Select all

dim as string  b = a

the printed strings show different.

if i instead initialize 'a'

Code: Select all

dim as string a = space(256)

followed by the fill char as index
they print the same

dodicat · Post by **dodicat** » Mar 02, 2023 23:53

Hi dafhi.
fixed length strings behave differently from var length strings.
For instance a fixed length string will stop at a chr(0).
IMO it shouldn't.
A workaround is to make the fixed length string into a var length string using one of the string functions.

Code: Select all

dim as string * 256 a
for n as long=0 to 255
    a[n]=n
next
print "a="
print a
print
dim as string b=a
print "b="
print b
print
b=mid(a,1)
print "b="
print b
sleep

But something which doesn't look clean generally isn't, so fixed length strings are buggy IMHO.

dafhi · Post by **dafhi** » Mar 03, 2023 1:00

i didn't know one could change fixed to var.

interesting solution.

i'm working on a base128 thing. seeing inconsistency across text editors at ctrl-c ctrl-v
chugging along ..

dodicat · Post by **dodicat** » Mar 03, 2023 11:46

Hi dafhi.
Here is a method to change the base of a file (to base 128 default) to any base from 2 to 128.
https://www.mediafire.com/file/igg0dosd ... e.zip/file
The idea was (a few years ago) to make a small file forum readable by changing it's base thus using characters in ascii and not extended ascii.
If you think of a file as just being a number to base 256.
If you want to change it to base 10 for example then when asked your ascii start, using 48 with base 10 will produce a readable number at base 10 fully representing the file.
To test this you have an option of converting back.
The base change is mathematical, not any form of encoding, so it will take a few seconds to slog it out using the gmp library (included).
I have kept everything 32 bits here.
The test file here is wall.bmp which is quite small.
Large files will take much longer to process.

srvaldez · Post by **srvaldez** » Mar 03, 2023 15:53

I think that for binary text encoding it's best to restrict to printable characters, in the web you will find references to many bases, but to me base91 makes good sense
here's some code in PureBasic for base91 encode/decode https://www.purebasic.fr/english/viewto ... 92#p525992
here's a BaseNcodingPascal https://github.com/Xor-el/BaseNcodingPascal
online http://kvanttt.github.io/BaseNcoding/

dafhi · Post by **dafhi** » Mar 04, 2023 2:00

oh thanks guys .. i've been having a heck of a time.
i like base128 for size efficiency over 64, although i'm not certain there are 128 forum friendly characters.

atm i'm trying to figure out how to discover the friendlies algorithmically.

i have a converter that works correctly with internal string / byte array but when i save to "forum.txt"
as formatted data statements, when i manually copy over to my .bas, and read the Data ".." into string,
some of the characters get converted to uni. i suppose i could paste my source. i'll clean it up some. (might take a while)

dafhi · Post by **dafhi** » Mar 04, 2023 2:29

'main' starts around line 500.
text128. r string is char table

Code: Select all

'#include "text128.bas"
/' -- text128.bas (incomplete - 2023 March 3 - by dafhi
'/


'' i kind of know what to expect, but others might want to leave this
'' commented out
'#define im_good




'#include "bitwalker32.bas"
/' --- bitwalker32 - 2023 Mar 1 - by dafhi
  bound-safe bit-aligned read & write util
'/


'#include "boilerplate.bas"
/' ------ boilerplate.bas - 2023 Mar 1  '/
'
'' replaces int()
#define flo(x) (((x)*2.0-0.5)shr 1) '' http://www.freebasic.net/forum/viewtopic.php?p=118633
#define ceil(x) (-((-(x)*2.0-0.5)shr 1))

#undef int
#define int     as Integer

#define min( a, b)        iif( (a)<(b), (a), (b) )
#define max( a, b)        iif( (a)>(b), (a), (b) )

const q34 = chr(34)       '' double quote
const qqcr = chr(34) + chr(10) '' carriage return

union suspicion_suppressor   ' suspicious ptr tango .. i forget who suggested this soludtion
  as any ptr          a
  as ubyte ptr        ub
  as ushort ptr       usho
  as ulong ptr        ul
  as ulongint ptr     uli
  As Single Ptr       s
  as double ptr       d
End Union

sub change_filename_if_exists( byref filename_out as string, filenum int )
  if lof(filenum) > 0 then filename_out = "-" + filename_out
end sub

function ascii_pos( valu as ubyte, pos_curr int, pos0 as ubyte ptr, posE as ubyte ptr ) int
  var p = pos0 + pos_curr
  while p <= posE
    if *p = valu then return p - pos0
    p += 1
  wend
  return posE + 1 - pos0
end function
'
' ----- boilerplate.bas

' ---- bitwalker32.bas continued ..
'
  namespace bitwalker32 '' 2023 Mar 1 - by dafhi
  
type base_literal   as ulong

const as byte         len_base_literal = len(base_literal) * 8 '' Feb 28
 
 '' -- main type
 '
type t
  declare constructor( as long = 1, as byte = 8 )
  
  as byte           unit_len
  as longint        bit_pos
  as any ptr        p_data
  
  declare property  cbits as longint
  declare sub       calc_bytes_req( as long, as byte )
  
  as longint        _bit_ubound
  as long           _byte_ubound
end type

constructor t( c_elems as long, c_bits as byte )
  if c_elems < 1 or c_bits < 1 then exit constructor
  calc_bytes_req c_elems, c_bits
end constructor

sub t.calc_bytes_req( cElems as long, unit_len as byte)
  var k = cElems * unit_len
  
  if k < 1 orelse unit_len > len_base_literal orelse k > 1e9 then 
    ? "  bit_walker32.t.calc_bytes_req"
    ? "unit len: 1 to "; len_base_literal
    ? "elem count * unit len: 1 to 1e9 (1 Bil)"
    sleep 250
    exit sub
  endif
  
  _bit_ubound = k - 1
  _byte_ubound = _bit_ubound \ 8
  
  this.unit_len = unit_len
  bit_pos = 0
end sub

property t.cbits as longint
  return _bit_ubound + 1
end property

' --- internal read / write support - Feb 28
'
sub _wri( byref des as base_literal, shifted_val as base_literal, shifted_mask as base_literal )
  des = (des and (-1 xor shifted_mask)) or (shifted_val and shifted_mask)
end sub

dim as byte         _bit_offset
dim as long         _byte_pos
dim as base_literal _mask

dim as suspicion_suppressor _sp

sub _bounds_check( byref v as t, bit_pos as longint, unit_len as byte )
  var pos0 = max( bit_pos, 0)
  var posE = min( bit_pos + unit_len - 1, v._bit_ubound)
  _byte_pos = pos0 \ 8                 '' integer divide
  _bit_offset = pos0 - _byte_pos * 8
  _mask = 2 ^ max(posE + 1 - pos0, 0) - 1
  _sp.a = v.p_data + _byte_pos          '' "any" ptr
end sub
' -------

  '' read functions
  
function rea( byref v as t ) as base_literal '' Mar 1 - old name: r
  _bounds_check v, v.bit_pos, v.unit_len
  v.bit_pos += v.unit_len
    return ((*_sp.ul shr _bit_offset) or _
  (_sp.ul[1] shl (len_base_literal - _bit_offset))) and _mask
end function

function probe( byref v as t, bit_pos as longint, read_len as byte) as base_literal
  swap bit_pos, v.bit_pos
  swap read_len, v.unit_len
  var ret_val = rea(v)
  swap bit_pos, v.bit_pos
  swap read_len, v.unit_len
  return ret_val
end function

  '' write subs
  
sub wri( byref v as t, valu as base_literal) '' Mar 1 - old name: w
  _bounds_check v, v.bit_pos, v.unit_len
  
  '' lo 4 bytes
  _wri *_sp.ul, valu shl _bit_offset, _mask shl _bit_offset
  
  '' hi 4 bytes
  _wri _sp.ul[1], valu shr (len_base_literal - _bit_offset), _
    _mask shr (len_base_literal - _bit_offset)
  
  v.bit_pos += v.unit_len
end sub

sub doodle( byref v as t, valu as ulongint, bit_pos as longint, write_len as byte )
  swap bit_pos, v.bit_pos
  swap write_len, v.unit_len
  wri v, valu
  swap bit_pos, v.bit_pos
  swap write_len, v.unit_len
end sub

end namespace ' --- bit_walker32
  
  
' -- text128.bas continued ..
'  
  namespace text128 ' -- bin / text converter March 3 - by dafhi

type encode_info
  declare     constructor( as string = "" )
  as string   original_name
  as string   encoded
  as string   formatted
  as string   err_msg
  as ubyte    file(any)
end type

constructor encode_info( original_filename as string )
  original_name = original_filename
end constructor
  
dim as byte         backref(255)

dim as string       r      '' forum-friendly ascii table attempt

dim as long         init_index

sub _init_loop( lo as ubyte, hi int = -1 )
  if init_index > 127 then exit sub
  if len(r) = 0 then r = space(128)
  var _step = 1
  if hi < 0 then hi = lo
  if hi < lo then _step = -1
  for i as long = lo to hi step _step
    if init_index < 128 then
      r[ init_index ] = i
      init_index += 1
    endif
  next
end sub

sub _fill_string_and_backref
  if init_index > 127 then exit sub
    
  _init_loop 35, 38 '' 39, freebasic comment symbol
'  _init_loop 48, 255
  _init_loop 48, 126
  _init_loop 161, 255
'  _init_loop 254
'  _init_loop 174, 193
'  _init_loop 255, 177
'  _init_loop 195, 255
'  ? init_index
    for i int = 0 to 127
  backref( r[i] ) = i
'  ? r[i]; " "; i; "  ";
  Next: ?
End Sub


' ------- raw encode / decode subs
'
sub encode( p_src as any ptr, cbytes_in int, byref result as string )
  
  _fill_string_and_backref
  
  var c_expanded = (cbytes_in * 8 + 6) \ 7 '' integer divide
  
  result = space( c_expanded )
  
  dim as bitwalker32.t  walk_src = type( c_expanded, 7 )
  walk_src.p_data = p_src
  
  for i int = 0 to c_expanded - 1
    var k = bitwalker32.rea( walk_src )
    result[i] = r[k]
  next

end sub

sub decode( p_src as any ptr, cbytes_in int, des() as ubyte )
  _fill_string_and_backref
  
  dim as bitwalker32.t  walker_dest = type( cbytes_in, 7 )
  
  redim des( cbytes_in * 7 \ 8 - 1 )
  walker_dest.p_data = @des(0)
  
  dim as suspicion_suppressor src = type( p_src )
  
  for i int = 0 to cbytes_in - 1
    var k = backref( *src.ub )
    bitwalker32.wri walker_dest, k
    src.a += 1
  next
end sub
'
' ---------- raw encode / decode


sub encode_str_from_file( filename_in as string, byref ret as encode_info )
  
  ret.err_msg = ""
  
  open filename_in for input as #1
  
  '' this sub's pretty safe
  
  '' simple size checks ..
  
  var ilen = lof(1)
  if ilen < 1 then
    ret.err_msg = "encode_str_from_file, no data: " + filename_in
  endif
    
  if ilen > 50000 then
    ret.err_msg = "encode_str_from_file: input file must be < 50K"
  endif
  
  '' close file and exit if error
  
  if len(ret.err_msg) then
    print ret.err_msg
    close #1: sleep 250: exit sub
  endif
  
  '' no error .. toss entire file into array (current safety 50K bytes)
  
  redim ret.file( ilen - 1 )
  get #1,, ret.file()
  
  close #1
  
  '' my raw subs seem stable
  encode @ret.file(0), ilen, ret.encoded
  
  '' fill out remaining type data
  ret.original_name = filename_in
  ret.err_msg = ""
 
end sub


' ------------ support
'
function _build_from_data_statements( byref ret as encode_info ) as string

  '' This sub was difficult to conceptualize.
  
    '' safety checkpoint #1

  dim as long len_formatted = len(ret.formatted)
    if len_formatted > 1e6 orelse len_formatted < 1 then _
  ret.err_msg = "_build_from_data_statements .. string size out of range: " + str(len_formatted): _
  sleep 250: return ""
  
  '' recognize boundary of formatted-data string
  
  var ptrE = @ret.formatted[len_formatted - 1]
  var ptr0 = @ret.formatted[0]
  
  '' find first two double quotes
  
  var i = ascii_pos( 34, 0, ptr0, ptrE )
  var j = ascii_pos( 34, i+1, ptr0, ptrE )
  
  '' first string: filename
  
  var filename_out = mid( ret.formatted, i+2, j-i - 1 )
  
  '' encode string is formatted-data string ..
  '' .. minus data statements, quote marks, filename
  ret.encoded = ""
  
  var length_checkpoint = 0 '' added safety
  
  do
  
    '' tricky.  if i was smarter i could maybe
    '' do with less if statements
    
    i = ascii_pos( 34, j+1, ptr0, ptrE )
    j = ascii_pos( 34, i+1, ptr0, ptrE )
    if i = j then exit do
    
    var ilen = max(j-i - 1, 1)
    length_checkpoint += ilen
    
    '' 'unlikely' but i can't be too sure at this point
    
    if length_checkpoint > len_formatted then
      ret.err_msg = "_build_from_data_statements: build grew too large"
      return ""
    endif
    
    '' build string
    ret.encoded += mid( ret.formatted, i+2, ilen )
  loop
  
  '' laptop didn't explode?
  
  ret.err_msg = ""
  
  return filename_out
  
end function
'
' ------- support


sub data_statements_from_file( filename_in as string, byref ret as encode_info, string_width as ubyte = 65 )

  '' sub was somewhat easy to conceptualize.
  
  '' first i make raw encode from previously-examined sub
  
  encode_str_from_file filename_in, ret

  '' oddly enough, i can pass a string param 2 (above)
  
  '' error check
  
    if len(ret.err_msg) then _
  print ret.err_msg: sleep 250: exit sub

  '' FreeBASIC statement:  Data ".."
  
  const qData = "Data " + q34  '' double quote
  
  '' first entry - filename
  
  ret.formatted = qData + filename_in + qqcr
  
  '' break up encode and slap on Data statements
  
  var pos_read = 0
  while pos_read < len(ret.encoded)
    var pos_next = min(pos_read + string_width, len(ret.encoded))
    dim as string q = space( 65 )
    q = mid(ret.encoded, pos_read+1, pos_next - pos_read)
'    ? len(q)
    ret.formatted += qData + _
      q + qqcr
    pos_read = pos_next
  wend
  
  #ifdef im_good
  #else
    print ret.formatted
    print "length: "; len(ret.encoded)
    print
    print q34 + "im_good" + q34 + " undefined, so i printed would-be contents here."
  #endif
  
end sub

function reconstruct_from_encode( byref ret as encode_info ) as string
  
  var filename_out = ret.original_name'_filename_from_string( name_plus_encode )
  
  var f = freefile
  open filename_out for input as f
  change_filename_if_exists filename_out, f
  close f

  ret.err_msg = ""
  
    if len(ret.encoded) > 50000 then _
  ret.err_msg = "file_from_encode_str: input string > 50K"
  
    if len(ret.err_msg) then _
  print ret.err_msg:sleep 250: return ""  
  
  decode @ret.encoded[0], len(ret.encoded), ret.file()
'  ? len(ret.encoded)
'  ? ubound(ret.file)

    if ubound(ret.file) > 50000 then _
  ret.err_msg = "file_from_encode_str: output string > 50K"
  
    if len(ret.err_msg) then _
  print ret.err_msg:sleep 250: return ""  
  
  'open filename_out for output as #1
  '  put #1,, ret.file()
  'close #1
  
  ret.original_name = filename_out
  print
  print "suggested filename: "; filename_out; ".  data restored to <UDT>encode_info.file()"
  return filename_out
  
end function

end namespace ' --- text128


'' problematic zone .. some of the chars are read as unicode,
'' which messes up decode
sub read_fb_data( byref ret as text128.encode_info )
  read ret.original_name
  ret.encoded = ""
  dim as string q
  do
    read q
    ret.encoded += q
  loop until q = ""
end sub

'const qEnd = "<EOF>" '' Prettification

'' visual inspection of byte vals
sub first_few_vals( p0 as any ptr, ilen int = 5, mesg as string)
  ? mesg
  dim as ubyte ptr a = p0
  for j as ubyte ptr = a to a + ilen - 1
    ? (*j); " ";
  next: ?
'  ? qEnd
end sub


' -- codez


'' all-ascii-vals string
dim as string source = space( 256*1 )
for i int = 0 to len(source) - 1
  source[i] = i
next


var myfile = ""

'' i feed udt into my encoder
dim as text128.encode_info   encode_info = type(myfile)

#if 0 '' visual source ascii inspection
  first_few_vals @source[0], 50, "original"
#endif


text128.encode @source[0], len(source), encode_info.encoded


'' set to 0 to have it write "forum.txt"

#if 1 '' Data statement read into string

  #if 1 '' visual encoded text inspection
    first_few_vals @encode_info.encoded[0], 50, "encoded"
  '  first_few_vals @encode_info.encoded[0], len(encode_info.encoded), "encoded"
  #endif

  read_fb_data encode_info

  #if 1
    ?
    first_few_vals @encode_info.encoded[0], 50, "-- concat attempt from data statements --"
  #endif
  
#else  '' create freebasic data statements

  open "forum.txt" for output as #1
  put #1,, source
  close #1
  
  myfile = "forum.txt"
  var string_width = 65
  text128.data_statements_from_file myfile, encode_info

  open "forum.txt" for output as #1
  put #1,, encode_info.formatted
  close #1

#endif

Data "forum.txt"
Data "#%4DlL$&3<P|\$o2:Jl4M°06AXªl=or9HfÆÆ$P4=Ot@]À0v@V¤\·°³7D]²xM¡ryGd"
Data "À¶§oS;Kk#1>`4}Nr:MvP·>RyHi%Av¢U¢V§f1WBY©dÃÀÃ7¦\°r=W´ºE`·¢Y±¤y©c¾°"
Data "uGsZIgÅ¾³¡c;jÌÌ$8T¾Ln19JpD}°q8GfÊ4^Pu?U¤`Ç>´xFcÀº·ÁS|Mq:Q¨¢·¡T¡V"
Data "«vaW¥[¯rAgB»¨b½°yWÅZ¬iËÌ1H¦¾¯p7Gj8e^³wEcÄÊEÂ¶~S¡Z»Èaº§a½´«©Å½®o7K"
Data "zheÁµ}S¥jIÉÄ¼o;[ÌhÈÃ»sKÌËÊÉÉÍ;"

base128 might be an unrealistic goal. i will look into 91 or 94 but wont give up on 128 yet

made quick ascii generator w/ forum-formatted own output as comments

Code: Select all

const q34 = chr(34)
const q10 = chr(10)

const qdata = "Data " + q34

dim as string build_string

for i as long = 0 to 255 step 16
  dim as string * 16 a
  for j as long = 0 to 15
    a[j] = i + j
  next
  build_string += qData + a + q34 + " " + str(i+15) + q10
next

open "forum.txt" for output as #1
put #1,, build_string
close #1

/'
Data "" 15
Data "" 31
Data " !"#$%&'()*+,-./" 47
Data "0123456789:;<=>?" 63
Data "@ABCDEFGHIJKLMNO" 79
Data "PQRSTUVWXYZ[\]^_" 95
Data "`abcdefghijklmno" 111
Data "pqrstuvwxyz{|}~" 127
Data "€‚ƒ„…†‡ˆ‰Š‹ŒŽ" 143
Data "‘’“”•–—˜™š›œžŸ" 159
Data " ¡¢£¤¥¦§¨©ª«¬®¯" 175
Data "°±²³´µ¶·¸¹º»¼½¾¿" 191
Data "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ" 207
Data "ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß" 223
Data "àáâãäåæçèéêëìíîï" 239
Data "ðñòóôõö÷øùúûüýþÿ" 255
'/

manual labor

Code: Select all

dim as string z0 = " !"   ' 2 (Len)
dim as string z1 = "#$%&" ' 4
dim as string z2 = "()*+,-./" ' 8
dim as string z3 = "0123456789:;<=>?" ' 16
dim as string z4 = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_" ' 32
dim as string z5 = "`abcdefghijklmno" ' 16
dim as string z6 = "pqrstuvwxyz{|}~" ' 15
dim as string z7 = "€‚ƒ„…†‡ˆ‰Š‹ŒŽ" ' 13
dim as string z8 = "‘’“”•–—˜™š›œžŸ" ' 14
dim as string z9 = " ¡¢£¤¥¦§¨©ª«¬®¯" ' 15
dim as string zA = "°±²³´µ¶·¸¹º»¼½¾¿"
dim as string zB = "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ"
dim as string zC = "ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß"
dim as string zD = "àáâãäåæçèéêëìíîï"
dim as string zE = "ðñòóôõö÷øùúûüýþÿ"

dodicat · Post by **dodicat** » Mar 04, 2023 9:52

There have been discussions about this base 128 previously.
viewtopic.php?p=293560&hilit=128#p293560
Then going back to the ForumBlobPost from the link.
If I remember correctly Google was the best browser for base 128, Firefox added a character which completely upset the applecart, and Microsoft Edge didn't work at all.
Members leopardpm and MrSwiss and others had a great confabulations in ForumBlobPost, there were a couple of arguments of course, but it was just a typical freebasic discussion of those days.
If I remember, another trick was to copy the blob characters while in the "reply with quote" mode.

srvaldez · Post by **srvaldez** » Mar 04, 2023 21:16

@dafhi
here's a go implementation of base91, maybe that will help in your implementation https://github.com/bproctor/base91

dafhi · Post by **dafhi** » Mar 05, 2023 0:52

MrSwiss's base128 seems to generate a table i could use

Asc 195

Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195

Re: Asc 195