fb wiki pages (wakka format)

Forum for discussion about the documentation project.
Post Reply
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

fb wiki pages (wakka format)

Post by AGS »

I am trying to put a program together to translate the wakka pages of
the manual to xhtml. The tags and links (** // [[target description]] etc.)are
straightforward to translate but I am wondering whether the format
of the pages is 'standard'.

Have pages with a certain prefix been written in a certain format?
Take for example the pages that start with keypg. Have all those
pages been written using the same format (perhaps not all pages
will contain the same number/type of sections but the type of possible
sections could be standard?). Is the type of section you may use
when writing a certain type of page limited (limited as in: if you use a section
that is not allowed on a page with a certain prefix someone will edit the page
so it's 'correct' again?).

At first glance it seems not all wiki pages follow the same 'tight' format
as pages starting with keypg. But that's only at first glance.

I'd like to know whether the prefix of a file determines it's contents
in a straightforward (predictable) manner.
Something like
on a page with a title that starts with ... the following section
types and tag types are allowed (or mandatory): ...... while the following
section types and tag types are disallowed: .......
My goal is to create an epub version of the manual. For that I need the
wiki pages in xhtml format (the epub standard supports the use of
cascading style sheets so I can reuse the style sheet as used in the
compiled html version of the manual).
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Re: fb wiki pages (wakka format)

Post by TJF »

nemored generated an .epub version of the german doku. Find it here.

Perhaps he added some hints about the generation process in a README.txt (I didn't download it yet).
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

Re: fb wiki pages (wakka format)

Post by AGS »

TJF wrote:nemored generated an .epub version of the german doku. Find it here.

Perhaps he added some hints about the generation process in a README.txt (I didn't download it yet).
I checked out the .epub from the german site. It looks as if it was generated from the content of the
german website. At the top of every page there is a line that looks like this:
<meta content="FreeBASIC-Portal.de CMS" name="generator" />
Apart from that the word 'Calibre' occurs many times on the xhtml pages
(as a tag attribute: <p class="calibre3"> or <tr class="calibre8">.
Calibre is an e-book management tool you can use to create epub books.
I am guessing that's what was used to create the epub.

Downloading the html pages from the online fb wiki looks to be a good idea (using
wakka is a bit cumbersome). Those html pages can be used to create the epub.
The stylesheet from the online wiki can be reused.

There is one page in the online wiki where all the pages are listed. It's a matter of downloading
the files on that page to get the entire manual.

The xhtml files need some conversion/cleaning up (change links etc...). And not all pages are written in
correct xhtml. I am cheating to get all of this done though: I am using peg/leg by Ian Piumarta
http://piumarta.com/software/peg/peg.1.html). I will definitely do a fb port of leg
(the generated scannerless parser is nice and hackable).

Thanks for the download tip. I will be looking at that epub for sure (already have actually).
And I will, of course, be striving to create a nicer epub than the one in the German language ;)
TJF
Posts: 3809
Joined: Dec 06, 2009 22:27
Location: N47°, E15°
Contact:

Re: fb wiki pages (wakka format)

Post by TJF »

AGS wrote:Downloading the html pages from the online fb wiki looks to be a good idea (using
wakka is a bit cumbersome). Those html pages can be used to create the epub.
The stylesheet from the online wiki can be reused.
I made a html book for devhelp a while back. I extracted the html pages from the chm file (using kchmviewer, if I remember right). That way you need not download more than 400 sides.
dkl
Site Admin
Posts: 3235
Joined: Jul 28, 2005 14:45
Location: Germany

Re: fb wiki pages (wakka format)

Post by dkl »

As far as I know there are no special rules regarding wakka format and page names. The main specialty is that the Wikka (the wiki software) is extended to recognize {{fbdoc ...}} tags, which are usually used for headers/sections etc., allowing the fbdoc tool to generate html files, a page index and the TOC tree for the FB-manual.chm.

It also has "txt" as another backend (for printing to paper), and I think more backends can be added, such as PDF which I think was attempted in the past, though I don't know how hard it is.
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

Re: fb wiki pages (wakka format)

Post by AGS »

The problem with book - like formats (like PDF and epub) is the
necessity to lay out the document in a page like fashion. So
you get page 1, page 2, page 3 etc... up to page n. The wiki
is a different beast as it's a collection of html pages linked together
by means of an index. There is no first page, third page, last page
etc....

epub is essentially nothing but a bunch of html files + an index. It
is a bit like chm. Producing a chm should be much like
producing a epub. With the difference that epub requires you
to give a list of all the pages in the document.

On the upside epub accepts html pages so no translation is needed.
And an epub can be read on a mobile device.

On the downside epub does not come with a search facility like
chm does (referring to the index tab of the chm browser). It largely
depends upon the mobile device used whether there is a proper
index or not. When I use the chm manual I almost always use
the index tab to look up a word.

PDF is a completely different kind of thing. Again you need to sequence
the pages. But the concept of a page is different. A page has a fixed
length depending on the format used (usually either A4 or US Letter).

To convert a html page to a pdf page might require breaking up
the html file into several pieces. There are few free libraries that can lay
out pdf in such a way that this breaking of pages is done correctly
(hyphenation is an issue as well: line breaking should be done correctly).

A PDF has a similar index to epub and lacks the search facility of the
compiled html browser (as does epub). It's that search faciilty that's
key to the ease of using the chm manual.

When I have a proper sequencing of the pages (cannot be done
automagically) creating a pdf version of the manual becomes somewhat easier.
Html tags translate directly to calls to routines in pdflib and
pdflib can do the hyphenation and splitting of the pages as well.
It will no doubt lead to pages with lots of whitespace on them.

The main reason I want to create an epub version of the manual is threefold.
First of all I am seeing other projects releasing manuals in epub/pdf format.

A prime example of this is python. Python come with a manual in pdf, html, epub and
compiled html format (python uses sphinx for creation of the manual.
Sphinx uses latex for creation of the pdf file).

Secondly I have my doubt about the future of compiled html. It is somewhat of a deprecated
format. If a format comes along with equally good searching facilities (the index tab of the chm
browser is, at least on windows, unchallenged) then the days of compiled html are numbered
(the next epub standard could just provide such a facilitiy).

Thirdly the companies behind the epub format are looking to expand the epub standard.
It's clear they want to replace the pdf format with epub. Even Adobe is a member of the idpf
http://idpf.org/ (Adobe has released a pdf reader). The next pdf version would have been an
xml based format but Adobe has never released that next version and instead seems to have
opted to support the epub standard. epub seems to be THE format for electronic books.

The thinking here is that because epub will replace both pdf and chm future releases
of freebasic might need to come with only one manual: an epub one.
As the online wiki is already in html format it makes sense to use those pages to create an epub.

@TJF
I have seen your devhelp effort. devhelp seems to have the same indexing facility as the
chm browser. The only problem with devhelp is that I have not found a windows binary for it yet.
Creating binaries for a gtk+ related project like devhelp tends to be hard (it has a webkit
dependency which will make compilation that much harder) which might make devhelp a one -
platform solution. Given a devhelp binary I can see myself using a manual in devhelp format
instead of one in compiled html format. But you'd have to deliver either the devhelp binary or
a link to such a binary with the manual (otherwise you run the risk of win32 users having
to search for a devhelp binary themselves).

I thought about decompiling the chm file and using those html files as starting point for
the creation of an epub file. But those html files differ from the ones that are created by the
wakka parser (the php program that turns .wakka files into .html pages).

I am hoping to release an epub version of the manual soon (end of this week or end of next
week). A pdf version would be nice to have and it all depends upon the ease of use of
libpdf and coming up with a proper translation of html tags/the cascading style sheet
that comes with the online wiki. For the creation of the epub I can simply reuse the
online wiki pages (copy most of the content/change wiki pages a tiny bit).
More work is needed for the creation of a pdf version of the manual (the wiki pages need
to be parsed much more precise in order to generate appropriate calls to pdflib functions).
marcov
Posts: 3455
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: fb wiki pages (wakka format)

Post by marcov »

I don't think epub will replace chm and pdf.

CHM, simply because it is Microsofts primary format, and windows versions refresh only slowly, PDF because it is very related to postscript and exact printing that is based on totally different principles as HTML/epub.

Moreover, html and xhtml are very large, unwieldy standards, and some of them (like valid epubs) only support a subset.

For the rest, I have some doubts that html (or something derived from it) is really suitable as master format. Partially because of reasons you already state (focus on hyperlinking, not a story with head and tail), partially because trying to transform usually doesn't look very good for output formats that are different in principles from html.

In general, you want an as abstract possible format so that one has a chance if a new format emerges. Further I would keep strongly templated content (like function/sub descriptions), and more narrative content apart, so that you can leverage the things you know about the template stronger in the various transformations.

Btw, if there are people interested in testing with the *nix based html compiler (chmcmd), just yell). I also have a small util for decompiling chms. (both in pascal though)
Post Reply