Getting the true unicode code
Getting the true unicode code
Hi. I am new to this coming from an autoit scripting looking for more power and speed. I am trying to get the unicode code for characters so I can build my base64 converter in freebasic to compare speeds etc.
The issue I am having is that when I ask for a unicode number for a character with Asc it is giving me totally incorrect numbers. The € symbol reads as it being number 128 when I print its value using Asc. I was on the wiki for UTF trying to add up the binary and it being way off so I did some searching and found the true number for € is actually 8364.
What function/command etc do I need to actually get a unicode number? The Asc says it does but obviously it doesn't. I cannot manually create unicode base 2 binary if the number Asc gives me is not accurate.
Thanks in advance.
The issue I am having is that when I ask for a unicode number for a character with Asc it is giving me totally incorrect numbers. The € symbol reads as it being number 128 when I print its value using Asc. I was on the wiki for UTF trying to add up the binary and it being way off so I did some searching and found the true number for € is actually 8364.
What function/command etc do I need to actually get a unicode number? The Asc says it does but obviously it doesn't. I cannot manually create unicode base 2 binary if the number Asc gives me is not accurate.
Thanks in advance.
Getting the real unicode code for characters
I just made a post but its missing and must have errored out because it is not showing up on my profile.
Basically I am asking how do you get the REAL unicode code for a character because Asc is giving incorrect numbers. € shows as number 128 when really it is 8364. I have verified this online and by using Autoit's wAsc feature. So far freebasic only shows me wrong numbers even though it says it will give me unicode numbers.
The purpose for me needing this is so I can manually generate and decode UTF-8 base 2 8-bit binary for my base64 converter.
If you could help me out I would appreciate it, I am new to this coming only from Autoit scripting.
Thanks
Basically I am asking how do you get the REAL unicode code for a character because Asc is giving incorrect numbers. € shows as number 128 when really it is 8364. I have verified this online and by using Autoit's wAsc feature. So far freebasic only shows me wrong numbers even though it says it will give me unicode numbers.
The purpose for me needing this is so I can manually generate and decode UTF-8 base 2 8-bit binary for my base64 converter.
If you could help me out I would appreciate it, I am new to this coming only from Autoit scripting.
Thanks
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
Re: Getting the real unicode code for characters
Hi Morthawt, welcome to the forum :)
Sometimes this can result in a short delay until a moderator appears and sees the new posts.
I've approved your posts and combined them into one thread for you. And I've removed you from the Newly Registered group so any future posts will sail through.
Due to spam problems we decided a while back to vet all posts from newly registered members, in order to prevent spam getting onto the forum.Morthawt wrote:I just made a post but its missing and must have errored out because it is not showing up on my profile.
Sometimes this can result in a short delay until a moderator appears and sees the new posts.
I've approved your posts and combined them into one thread for you. And I've removed you from the Newly Registered group so any future posts will sail through.
Re: Getting the true unicode code
Hello Morthawt, welcome to the forum!
I guess you neither mean UTF-8 nor UTF-32, but UTF-16. So I recommend to check the WSTRING type of FreeBasic and related statements.
I guess you neither mean UTF-8 nor UTF-32, but UTF-16. So I recommend to check the WSTRING type of FreeBasic and related statements.
Re: Getting the true unicode code
Well what I need to do is take a massive string of data or eventually when I learn how file data and read it character by character and get its unicode number.
This shows 128
print asc("€", 1)
Also, so does this:
dim a as string
a = "€"
print WStr(a)
print asc(a, 1)
It is supposed to say: 8364 that way I can encode that number into the UTF-8 binary scheme. I cannot encode that character as 128 because it is not infact 128 at all. I even checked with my autoit converter program I made and it returns correct numbers. This baffles me because the helpfile for Asc says it will give unicode numbers.
What am I doing wrong? How can I get the accurate 8364 for the € character?
This shows 128
print asc("€", 1)
Also, so does this:
dim a as string
a = "€"
print WStr(a)
print asc(a, 1)
It is supposed to say: 8364 that way I can encode that number into the UTF-8 binary scheme. I cannot encode that character as 128 because it is not infact 128 at all. I even checked with my autoit converter program I made and it returns correct numbers. This baffles me because the helpfile for Asc says it will give unicode numbers.
What am I doing wrong? How can I get the accurate 8364 for the € character?
Re: Getting the true unicode code
Try this code, saved as UTF-8/16/32 file, with a BOM (the BOM is needed for fbc to recognize the encoding):
Code: Select all
dim a as wstring * 32
a = "€"
print asc( a )
Re: Getting the true unicode code
I put that in but it still prints 128. Surely there has to be a way to get the unicode value for a character? If there isn't then I cannot continue any further with my attempted converter program in this language because I need to make a binary stream that is in UTF-8 which means I need to get the REAL unicode number so I can do the conversion correctly to binary.
Re: Getting the true unicode code
It has to be saved as Unicode file, then it works for me.
It should also work when the source file was saved using the ANSI codepage encoding provided the correct codepage is used to save the file and when running the program. I couldn't get that to work yet though, there might be a bug with the ANSI codepage -> Unicode conversion.
It should also work when the source file was saved using the ANSI codepage encoding provided the correct codepage is used to save the file and when running the program. I couldn't get that to work yet though, there might be a bug with the ANSI codepage -> Unicode conversion.
Re: Getting the true unicode code
ok when I save the bas file as UTF-8 it changes the € to € and when I manually put in € it comes up with 128 again. I am confused. With my autoit one I have an interface where the given string typed in can be processed as a ANSI input/output or UTF-8. I am trying to make the same thing in FB, for now just with basic strings etc because I have no idea how to deal with files or GUI yet. Does the bas file have to be saved as UTF-8 for it to work? Then if it does will that mean everything is forced to be interpreted as UTF-8 meaning ANSI encoding strings will be screwed up because of it thinking some combinations of ANSI charcters is "supposed" to be one unicode one?
Re: Getting the true unicode code
With the bas file saved in UTF-8 the € in the code gets altered in "a" as you can see. However when I take the actual unicode character in b and convert it to a wide string and get the character it still comes back with 128. So A comes up correctly with the right number when it is supplied with the ASCII bytes that make up the unicode character and even though the real character is converted to wide format with Wstr it still comes up with the wrong number of 128
dim a as Wstring * 32
dim b as String
a = "€"
b = "€"
print Asc(a)
print Asc(Wstr(b))
dim a as Wstring * 32
dim b as String
a = "€"
b = "€"
print Asc(a)
print Asc(Wstr(b))
Re: Getting the true unicode code
If you're using FBEdit it could be the problem as it can't handle unicode. "€" is an unicode "€" if the editor can't show unicode. Try to open the code with notepad++ or any other editor with unicode support.
Re: Getting the true unicode code
I think the Win32 version of FB is lacking an internal call to setlocale(), which is the same issue seen on Linux before. (FB uses the CRT mbstowcs() function to do the conversion, not MultiByteToWideChar(), and the CRT locale must be switched from the default "C" to the system codepage first)
This could be a work-around to fix run-time conversions:
Of course that won't fix any compile-time conversions, until fbc itself is fixed.
This could be a work-around to fix run-time conversions:
Code: Select all
#include once "crt/locale.bi"
setlocale( 0, "" )
dim s as string
dim a as wstring * 32
s = "€"
print s[0]
a = s
print a[0]
a = "€"
print a[0]
Re: Getting the true unicode code
Ok, so are you basically saying I have stumbled into a bug with freebasic? If so how do we go about getting it resolved?
edit: btw a Wasc to go with the Wchr that already exists would be perfect, that way you can either get the TRUE unicode value or ANSI values. There is a Wchr but for some reason there is no Wasc
edit: btw a Wasc to go with the Wchr that already exists would be perfect, that way you can either get the TRUE unicode value or ANSI values. There is a Wchr but for some reason there is no Wasc
Re: Getting the true unicode code
I'll try and see whether my fix theory is correct sometime during the week. There are no plans to make a new FB release anytime soon, but maybe I can upload a preview/snapshot build.
By the way, there are multiple asc() functions, overloaded for different kinds of parameters, and the proper one should be chosen depending on the argument types. wchr() differs from chr() in the return value type, not the parameter types, this is something that is currently not handled with overloading (because then the proper chr() function would have to be chosen based on context which can be ambigious) and thus it has to use a different name.
By the way, there are multiple asc() functions, overloaded for different kinds of parameters, and the proper one should be chosen depending on the argument types. wchr() differs from chr() in the return value type, not the parameter types, this is something that is currently not handled with overloading (because then the proper chr() function would have to be chosen based on context which can be ambigious) and thus it has to use a different name.
Re: Getting the true unicode code
Ok well I will wait and see until then, then. Because this is just too frustrating for a new guy to get to grips with. I thought it would just work until I saw there was no Wasc and then when Asc said it gives unicode I figured that was all I needed but I guess not if its going to give the value 128 for a unicode character with a very high value.
I am just going to put this out of my mind and check the thread until a reply because otherwise I will get so frustrated with failure I will likely give up the whole idea knowing me.
I am just going to put this out of my mind and check the thread until a reply because otherwise I will get so frustrated with failure I will likely give up the whole idea knowing me.