Difference between revisions of "Article Format"

From openZIM
Jump to navigation Jump to search
(Created page with '== Article Text == Articles to be parsed and shown directly by the ZIM reader are stored as HTML body, without any layout except the formattings used in the article text (headlin…')
 
(Adapt article description format to new namespace usage.)
 
(4 intermediate revisions by one other user not shown)
Line 1: Line 1:
== Article Text ==
All content put in a zim archive must be put in the <code>C</code> namespace.
Articles to be parsed and shown directly by the ZIM reader are stored as HTML body, without any layout except the formattings used in the article text (headlines, tables, images...).


* '''Namespace:''' A
Entries can be :
* '''Path:''' /A/''URL''
* Articles : Html content intended to be displayed to the user
** whereas ''URL'' is often identical to the ''Article Name'', but this is not a requirement (see [[ZIM File Format#URL pointer list (urlPtrPos)|URL pointer list]] and [[ZIM File Format#Title_pointer_list (titlePtrPos)]] for details).
* Resources : Any other kind of file, mainly intended to be included in articles or other resources (css, images, js, ...)


== Meta Data ==
== Article Entries ==
Some publisher want to provide additional information for the reader application to individual articles, such as HTML Meta Data or a special layout around the article text.
Article's contents are full html pages. (including any <code><head></code> tag or any css, scripts and fonts links).


By default the Meta Data can be non-existant or empty.
Article's content must be utf-8 encoded.


The Article Text and Article Meta Data are linked to each other by having the same URL.
ZIM contents are addressed using the entry's path without the namespace. The references in articles HTML code (<code><a href=""></a></code>, <code><img src=""></code>, etc.) must be valid and usable by a classical web browser (ie, URL-encoded following the [http://www.ietf.org/rfc/rfc1738.txt RFC 1738] rules).


* '''Namespace:''' B
Absolute URLs, ie. with a leading slash (''/''), are forbidden, because this avoid including the ZIM contents in any HTTP sub-hierachy. URLs must consequently be relative.
* '''Path:''' /B/''URL''
** whereas /B/''URL'' is the Meta Data used for /A/''URL''.


=== Content Inclusion ===
URLs with namespace (<code>C/foo.html</code>) are also forbidden as the namespace may be hidden by the libzim. URLs must not go "too up" in the directory hierarchy (<code>../C/bar.png</code>). <code>../</code> is still possible if the entries is in a sub-directory.
The Article Text needds to be combined with Article Meta Data, therefore the Meta Data needs to define a placeholder where the Article Text has to be inserted.
== Resources Entries ==
 
There is no strong constraints on resources entries :
== Fetching Article Text vs. Article Meta Data ==
* Mimetype must be correctly set.
Links inside articles always use the A namespace to refer to other articles, so the zimlib does provide Article Text by default for any requests of namespace A.
* We advice textual contents to be utf-8 encoded but we cannot enforce it, it depends of the resource and how it is used.
 
To use the Article Meta Data you have to define a flag when opening a zimfile, so the zimlib is configured to include the Article Text into Article Meta Data before returning the data.
 
 
When fetching Article Text the zimlib has to know if it should provide the Article Text or

Latest revision as of 11:35, 15 December 2020

All content put in a zim archive must be put in the C namespace.

Entries can be :

  • Articles : Html content intended to be displayed to the user
  • Resources : Any other kind of file, mainly intended to be included in articles or other resources (css, images, js, ...)

Article Entries

Article's contents are full html pages. (including any <head> tag or any css, scripts and fonts links).

Article's content must be utf-8 encoded.

ZIM contents are addressed using the entry's path without the namespace. The references in articles HTML code (<a href=""></a>, <img src="">, etc.) must be valid and usable by a classical web browser (ie, URL-encoded following the RFC 1738 rules).

Absolute URLs, ie. with a leading slash (/), are forbidden, because this avoid including the ZIM contents in any HTTP sub-hierachy. URLs must consequently be relative.

URLs with namespace (C/foo.html) are also forbidden as the namespace may be hidden by the libzim. URLs must not go "too up" in the directory hierarchy (../C/bar.png). ../ is still possible if the entries is in a sub-directory.

Resources Entries

There is no strong constraints on resources entries :

  • Mimetype must be correctly set.
  • We advice textual contents to be utf-8 encoded but we cannot enforce it, it depends of the resource and how it is used.