Proper Case / Title Case

Munair · Post by **Munair** » Nov 20, 2018 22:56

A propercase or titlecase function may have been proposed before, but the forum did not show (recent) results. So I forged a small function that will give the proper case of a string, which means, only first letters of words are in upper case. VB does have this functionality and FB not as far as I know.

I deliberately avoided the use of string functions (except for lcase) as they are notoriously slow. Instead the function simply iterates through the string and turns every first letter of a word it finds into a captial.

Code: Select all

function PCase(byval s as string) as string	
	dim b as boolean = true
	
	' first, make all lower case
	s = lcase(s)
	
	' iterate through string and make ucase after word boundaries 
	for i as integer = 0 to len(s) - 1
		select case s[i]
			case 32 to 47, 58 to 64, 91 to 96, 123 to 126
				' characters that (arguably) mark a word boundary
				b = true
			case 97 to 122
				' lower case
				if b then
					' previous char was word boundary
					s[i] -= 32
					' reset flag
					b = false
				end if
		end select
	next
	' result
	return s
end function

dim s as string

s = "WHAT A CRAZY NIGHT OF PROGRAMMING IT WAS!"
print pcase(s)
s = "WHAT/A[CRAZY]{NIGHT}(OF)<PROGRAMMING>?IT~WAS!"
print pcase(s)
s = "WHAT A CRAZY PROGRAMMING-NIGHT IT WAS!"
print pcase(s)
sleep
end

jj2007 · Post by **jj2007** » Nov 21, 2018 1:07

Nice idea! Check https://writing.stackexchange.com/quest ... title-case for more inspiration ;-)

One more test string: s = "OECD and the UN are organisations that I consider very powerful"

sancho3 · Post by **sancho3** » Nov 21, 2018 1:30

If you change the parameter to byref then you avoid making an internal copy of the string. In fact you can change both the parameter and the return to byref.
I have tested the speed of lcase in the past and it is very fast. So I am not sure if it is at all worth changing but here is your code without the lcase and using 'or'ing the ascii value with 32 to produce a lowercase letter.
At the end of the day a person is not likely to be wanting to Title Case an entire document so this function is likely to be only used on a sentence. Hardly worth jumping through hoops for speed gains.

Code: Select all

function PCase(byref s as string) Byref as string   
   dim b as boolean = true
   
   ' iterate through string and make ucase after word boundaries
   for i as integer = 0 to len(s) - 1
      select case s[i]
         case 32 to 47, 58 to 64, 91 to 96, 123 to 126
            ' characters that (arguably) mark a word boundary
            b = true
         case 97 to 122, 65 To 90
            ' lower case
				s[i] or= 32 
            if b then
               ' previous char was word boundary
               s[i] -= 32
               ' reset flag
               b = False
            end if
      end select
   next
   ' result
   return s
end function

dim s as string

s = "WHAT A CRAZY NIGHT OF PROGRAMMING IT WAS!"
print pcase(s)
s = "WHAT/A[CRAZY]{NIGHT}(OF)<PROGRAMMING>?IT~WAS!"
print pcase(s)
s = "WHAT A CRAZY PROGRAMMING-NIGHT IT WAS!"
print pcase(s)
sleep
end

jj2007 · Post by **jj2007** » Nov 21, 2018 2:37

sancho3 wrote:At the end of the day a person is not likely to be wanting to Title Case an entire document so this function is likely to be only used on a sentence. Hardly worth jumping through hoops for speed gains.

Exactly. One more test case:

Code: Select all

s="World bank and OECD are organisations that - as a rule - i consider very powerful"

Expected, using NoTitleCase=".a.are.as.and.by.or.the.that." (more):

Code: Select all

World bank and OECD are organisations that - as a rule - i consider very powerful
World Bank and OECD are Organisations that - as a Rule - I Consider Very Powerful

Munair · Post by **Munair** » Nov 21, 2018 7:04

sancho3 wrote:If you change the parameter to byref then you avoid making an internal copy of the string. In fact you can change both the parameter and the return to byref.

The BYVAL was chosen on purpose, as a copy of the string is likely to be made somewhere anyway. And it's not a big deal with small title strings. The following code gives undesired results with BYREF:

Code: Select all

dim as string s, t
s = "WHAT A CRAZY NIGHT OF PROGRAMMING IT WAS!"
t = pcase(s)
print t
print s

sancho3 wrote:I have tested the speed of lcase in the past and it is very fast.

Indeed. It probably isn't worth the or-ing and the additional testing of capital letters.

Munair · Post by **Munair** » Nov 21, 2018 7:12

However the number of string copies can be limited with a static local variable:

Code: Select all

function PCase(byref s as const string) byref as string	
	dim b as boolean = true
	static c as string
	
	' first, make all lower case
	c = lcase(s)
	
	' iterate through string and make ucase after word boundaries 
	for i as integer = 0 to len(c) - 1
		select case c[i]
			case 32 to 47, 58 to 64, 91 to 96, 123 to 126
				' characters that (arguably) mark a word boundary
				b = true
			case 97 to 122
				' lower case
				if b then
					' previous char was word boundary
					c[i] -= 32
					' reset flag
					b = false
				end if
		end select
	next
	' result
	return c
end function

Munair · Post by **Munair** » Nov 21, 2018 7:17

jj2007 wrote:One more test case:
Code: Select all
s="World bank and OECD are organisations that - as a rule - i consider very powerful"
Expected, using NoTitleCase=".a.are.as.and.by.or.the.that." ([url=http://masm32.com/board/index.php?topic ... 8#msg82258]

ProperCase / TitleCase functions will probably never satisfy all rules / habits. In English rules may be different than in Dutch or German.

jj2007 · Post by **jj2007** » Nov 21, 2018 10:34

Munair wrote:ProperCase / TitleCase functions will probably never satisfy all rules / habits. In English rules may be different than in Dutch or German.

Exactly, I saw many conflicting views on the web. In my implementation, the NoTitleCase=".a.are.as.and.by.or.the.that." string that serves to keep an eye on exceptions could even be an array of strings with each element caring for a particular language. The strings themselves are not particularly long anyway, it's mostly frequently used very short words.

marcov · Post by **marcov** » Nov 21, 2018 11:26

Munair wrote: , only first letters of words are in upper case.

A reasonable first order approximation. .... If you are German :-)

Munair · Post by **Munair** » Nov 21, 2018 12:13

marcov wrote:
Munair wrote: , only first letters of words are in upper case.
A reasonable first order approximation. .... If you are German :-)

Nouns only then.

Munair · Post by **Munair** » Feb 17, 2022 17:49

Would it be something for FB to support PCase / TCase? I recently added them to SharpBASIC and thought that both can be used:

- TCase that does not touch uppercase so that names like FreeBASIC in a title are preserved.
- PCase that converts any letter to lowercase if it is not the first letter, so FreeBASIC becomes Freebasic.

Proper Case / Title Case

Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case

Re: Proper Case / Title Case