Date: March 2008
Drivers: Neil Williams <codehelp@debian.org>,
Joerg Jaspert <joerg@debian.org>,
Thomas Viehmann <tv@beamnet.de>,
Mark Hymers <mhy@debian.org>,
Frank Lichtenheld <djpig@debian.org>
URL: http://dep.debian.net/deps/dep4/
Source: https://salsa.debian.org/dep-team/deps/-/blob/master/web/deps/dep4.mdwn
Abstract: This document provides an overview of the TDeb format, TDeb
design and usage. This specification should be considered as a work in
progress.
Source: http://svn.debian.org/viewsvn/dep/web/deps/dep4.mdwn?view=markup
Version 0.0.3
- TDeb Specification
- Format of binary translation packages (tdeb)
- Source format
- TDeb contents
- TDeb uploads
- TDeb resources.
- TDeb Architectures
- TDebs and LINGUAS
- Resolution of corner cases
- TDebs and package managers
- TDebs and debconf
- TDebs and multiple templates files
- Tdebs and usr/share/doc
- Lintian support
- TDeb maintainers
- TDeb implementation
- Changes
TDeb Specification
This is where the Draft TDeb Specification, created at the ftp-master/i18n meeting in Extremadura, will be developed and improved.
Motivation
- Updates to translations should not require source NMU's.
- Translation data should not be distributed in architecture-dependent packages.
- Translators should have a common interface for getting updates into Debian (possibly with automated TDeb generation after i18n team review).
Copyright © 2008
- Neil Williams
codehelp@debian.org , - Joerg Jaspert
joerg@debian.org , - Thomas Viehmann
tv@beamnet.de , - Mark Hymers
mhy@debian.org , - Frank Lichtenheld
djpig@debian.org , partially based on dpkg man pages, © by the original authors.
This document is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
For more details, on Debian GNU/Linux systems, see the file /usr/share/common-licenses/GPL-2 for the full license.
Format of binary translation packages (tdeb)
Summary
The tdeb binary package format is a variation of the deb binary package format. It has the same structure as deb, but the (single) data member is replaced by bzip2-compressed members for each LOCALE_ROOT supported.
Locale-root members
The new locale root data members are designed to support easier management of the translations, including allowing users to only install the translations that are needed for one particular installation.
e.g. a standard .deb contains debian-binary, control.tar.gz and data.tar.gz whereas a typical TDeb could contain:
$ ar -t ../pilot-qof-tdeb_0.1.7-1_all.tdeb
debian-binary
control.tar.gz
t.de.tar.bz2
t.en.tar.bz2
t.fr.tar.bz2
t.pt.tar.bz2
t.ru.tar.bz2
t.sv.tar.bz2
t.vi.tar.bz2
t.pt.tar.bz2 would contain translations for pt and pt_BR:
./usr/share/locale/pt/LC_MESSAGES/pilot-qof.mo
./usr/share/locale/pt_BR/LC_MESSAGES/pilot-qof.mo
This allows later tools to extract only the requested translations from the TDeb upon installation.
TDebs are based on the .deb format, it is only a small change in the organisation of the data.tar.gz but it simplifies various stages of handling the resulting packages in the repository, in upload rules and in other support tools.
Use of the .tdeb suffix
Various file-based tools exist to handle .deb files and it will be easier for such tools to be able to reliably tell the difference between a .deb and a .tdeb from the filename rather than having to add new support in the codebase to detect the absence of data.tar.gz and work out how to handle the t.$root.bz2 members. The suffix also makes it easier to manage TDebs in various repository situations. Although closely related to the .deb format, the .tdeb format is sufficiently different to merit a subtle change to the suffix in a similar manner to .udeb.
Format specification
The file is an ar archive with a magic number of !<arch>.
The first member is named debian-binary and contains a series of lines, separated by newlines. Currently only one line is present, the format version number, 2.0 at the time the original dpkg manual page was written. Programs which read new-format archives should be prepared for the minor number to be increased and new lines to be present, and should ignore these if this is the case.
If the major number has changed, an incompatible change has been made and the program should stop. If it has not, then the program should be able to safely continue, unless it encounters an unexpected member in the archive (except at the end), as described below.
The second required member is named control.tar.bz2. It is a tar archive compressed with bzip2 which contains the package control information, as a series of plain files, of which the file control is mandatory and contains the core control information. The control tarball may optionally contain an entry for '.', the current directory.
The members following the control.tar.bz2 are named t.${LOCALE_ROOT}.tar.bz2. Each contains the filesystem archive for the locale root, as a tar archive compressed with bzip2.
LOCALE_ROOT must match the regular expression [a-z]{2,3}
These members must occur in this exact order. Current implementations should ignore any additional members after the t.${LOCALE_ROOT}.tar.bz2 members. Further members may be defined in the future, and (if possible) will be placed after these. Any additional members that may need to be inserted before t.${LOCALE_ROOT}.tar.bz2 and which should be safely ignored by older programs, will have names starting with an underscore, '_'.
Those new members which will not be able to be safely ignored will be inserted before the t.${LOCALE_ROOT}.tar.bz2 members with names starting with something other than underscores, or will (more likely) cause the major version number to be increased.
Source format
+t1.diff.gz
TDebs will use a source format for translation updates that will not cause any changes in the package binaries. The foo_1.2.3-4+t1.diff.gz will be created for changes made by translators and tools will need to apply the translation diff after applying the .diff.gz prepared (and signed) by the Debian maintainer.
The +t[0-9] update will need to be built from the source package but only details changes in the translated content. No changes will be allowed in the package binaries or untranslated content.
Translation updates are source-package based and translation updates are denoted by the +t[0-9] suffix where 0 is assumed to be the original upload by the Debian maintainer.
e.g. for a non-native package foo:
source version 1.2.3-4,
the first TDeb update would be foo_1.2.3-4+t1
the changes from -4 to -4+t1 will be in foo_1.2.3-4+t1.diff.gz
BinNMU versions are not affected as it is source based.
The +t1.diff.gz needs dpkg support which is being implemented:
New translations and translation fixes are currently tracked in the BTS. Tdeb uploads shall be able to close those bugs. Using a changelog might be the easiest way.
During the transition, those bugs will remain. After the transition, those bugs will go away so there should be no need for a closure method. We'll need to rely on i18n.debian.org for translation tracking after Squeeze.
TDeb contents
What goes into a TDeb?
(With the exception of debconf templates, untranslated content remains in the original package).
- Translations from upstream -/usr/share/locale//LC_MESSAGES/.mo
- Other localisation files from upstream - /usr/share/locale//LC_/*
Translated content, including:
- Translated manpages
- Translated info documents (if supported by info)
- Translated documentation. With provisos that packages with large amounts of translated documentation and debconf templates would create two tdebs, one minimal tdeb for debconf and one for the rest.
- Debconf templates file
- Not the config or other related scripts. The regular deb will need to contain a untranslated copy of the templates file, too. See "TDebs and Debconf" below.
TDeb uploads
Initial uploads - +t0
The initial TDeb will be generated by the maintainer, effectively +t0, containing whatever translations are currently supported. The TDeb is uploaded alongside the binary and .dsc. It is up to the maintainer to incorporate any +t1.diff.gz containing updated or new translations that may exist already into each new Debian version.
If the new version has changed translated strings then those will only available in English until the +t1 TDeb can be prepared.
Maintainers are advised to always seek translation updates prior to the upload of the initial TDeb. If maintainers implement a string freeze and wait for translation updates before uploading, the chances of a +t1.diff.gz being required by time of the next release by the maintainer are lower.
See also Timeline.
Maintainers will be creating TDebs in Squeeze+1, using debian/rules, using debhelper calls and uploading TDebs each time they would currently upload any package that contains /usr/share/locale/LC_*/ etc. Those TDebs are, effectively, +t0 - only updates by translators start the +t1 sequence.
Maintainer uploads (non-native package example):
foo_1.2.3-4_amd64.deb
foo-tdeb_1.2.3-4_all.tdeb
foo-bar_1.2.3-4_amd64.deb
foo_1.2.3-4.diff.gz
foo_1.2.3.orig.tar.gz
foo_1.2.3-4.dsc
foo_1.2.3-4_amd64.changes
Maintainer uploads (native package example):
foo_1.2.3_amd64.deb
foo-tdeb_1.2.3_all.tdeb
foo-bar_1.2.3_amd64.deb
foo_1.2.3.tar.gz
foo_1.2.3.dsc
foo_1.2.3_amd64.changes
The foo-tdeb package will be listed in the .changes anyway so existing tools will simply add it to the list of files to be uploaded to ftp-master or wherever. foo-tdeb_1.2.3-4_all.tdeb is, effectively, foo-tdeb_1.2.3-4+t0_all.tdeb
When the maintainer makes a new release, foo_1.2.3-5, which incorporates the TDeb changes, it is done in a similar manner to how an NMU is included. All files matching foo*1.2.3-4* are removed by dak when the new version is uploaded. The updated translations now exist in foo-tdeb_1.2.3-5_all.tdeb - uploaded by the maintainer and there is no +t1.diff.gz or +t1_all.tdeb until the package translations need to be touched again.
Translator updates
Updates to translations will update the existing TDeb, creating +t2.diff.gz and +t3.diff.gz etc. All supported languages go into the existing TDeb, organised by locale root.
Unless a package needs more than one TDeb for the debconf plus large amounts of translated documentation corner case, each source package should only expect to have one TDeb for all binary packages and all locales.
Translation teams can work together to make uploads in a coordinated manner - similar to the current method of requesting deadlines for i18n bugs, a nominated person can collate the various translations prior to a deadline chosen by the teams themselves, according to the needs of that particular package.
Translator updates of TDebs do not necessarily need to use typical package building tools like 'dpkg-buildpackage'. All that is needed is to put the .mo files into the relevant directory hierarchy (or use dh_gentdeb) and then call dpkg-deb --tdeb -b:
dpkg-deb --tdeb -b debian/pilot-qof-tdeb ../pilot-qof-tdeb_0.1.7-1_all.tdeb
This means that translators can build updated TDebs without needing the full dependency chain needed for a source rebuild - only dpkg (at a version that includes the TDeb support) is strictly necessary.
Translator update uploads would contain:
foo-tdeb_1.2.3-4+t1_all.tdeb
foo_1.2.3-4+t1.diff.gz
foo_1.2.3-4+t1.dsc
foo_1.2.3-4+t1_all.changes
The key point is that a +t1 revision can happen during a release freeze without touching the source, without changing any of the binaries. Once the release is out and unstable is accessible again, the maintainer adds +t1.diff.gz to their next upload.
dpkg source formats
Format 3.0 should not be any more difficult than 1.0 or anything that follows. 3.0 has to deal with incorporating patches and changes from the Debian Bug Tracking System, so +t1.diff.gz is no different.
What matters is that the maintainer gets the +t1.diff.gz and applies it onto the source package prior to the next upload. It's no different to how the same maintainer would handle a patch or new translations file sent to the BTS.
TDeb resources.
Packages and patches
The main changes to support TDebs will be concentrated in the archive tools and central packaging tools (dpkg, apt, debhelper).
Test packages are available via Emdebian:
- http://www.emdebian.org/toolchains/search.php?package=emdebian-tdeb&arch=&distro=unstable
- http://packages.debian.org/emdebian-tdeb
- http://buildd.emdebian.org/svn/browser/current/host/trunk/emdebian-grip/trunk/tdeb
- (SVN is regularly updated)
Patches for current tools are handled in repositories for the relevant tools:
- http://git.debian.org/?p=users/codehelp/debhelper.git;a=summary
- http://git.debian.org/?p=users/codehelp/dpkg.git;a=summary
TDeb Architectures
TDebs are architecture-independent
TDebs must only be used for Architecture-independent data. There will be NO support for Architecture-dependent TDebs outside Emdebian.
Any translation system that does not use gettext can choose to use TDebs as long as the translation files are architecture-independent.
TDebs and LINGUAS
Avoiding changes to the source package
Many packages using autotools use the LINGUAS support of gettext but this requires changes within the source of the package - sometimes po/LINGUAS but more commonly configure.ac|in. Changing configure.ac and regenerating the autotools build system completely undermines the objective of TDebs being able to be used independently of maintainer uploads and NMUs. Existing TDeb support ignores the LINGUAS method, therefore:
If a $lang.po file exists in a recognisable po directory (${top_srcdir}/po/ or ${top_srcdir}/po-*/, TDeb handlers will process that .po file even if it is not listed in LINGUAS. If the PO file is valid, the generated .mo file will be included into the TDeb.
Packages will no longer be able to have unactivated or unused translations. (This is a debhelper / other packaging tool implementation problem, not a dpkg one)
As a result of this requirement, the debhelper tdeb tool (dh_gentdeb) handles finding the translations, preparing the binary translation files and moving the translations to suitable directories within the package build.
TDebs and binary packages
The filesystem contents of TDebs and their associated binary packages must be mutually exclusive, so that dpkg doesn't need any special replace handling. We will still need some Replaces for the transition, but that can be handled like any other Replaces.
Migrating packages to TDeb support
Maintainers will need to make a variety of changes to support TDebs:
Replaces Add the recommended $src-tdeb package name with Replaces: $binaries (<< $srcversion) where $srcversion is a fixed string for the version prior to TDebs
e.g. Replaces: apt (<< 0.7.19), apt-utils (<< 0.7.19)
Remove translated content from all *.install files in debian/
- Remove any lines in debian/rules that handle translated content
- Ensure that dh_gentdeb is called in debian/rules (CDBS will be patched to implement this support automatically).
Resolution of corner cases
TDeb documentation duplication
Basing the TDeb on the source package means that the TDeb could include large amounts of translated documentation. This results in a corner case where a package with debconf templates and a large amount of translated documentation would result in the docs being installed merely to obtain the translated templates. In order to resolve this, each source package may have one or more tdebs. If a source package has translations, it must have a tdeb named after the source package (suffixed with -tdeb) and all debconf templates must be placed in it. Such a package should place all architecture independent documentation (even in the native language) into a tdeb. If a package contains documentation which is not always required (for example API documentation or user documentation), the source package may provide additional ${source}-${foo}-tdeb_$version_all.tdeb files.
If tdebs are revised by the translation teams, the suffix +t[0-9]+ must be used and all tdebs for the source package must be revised at the same time.
TDebs and package managers
Package managers can find out whether a package has a base tdeb by examining the Packages file for Translation-Version: [0-9]+. In the case of Translation-Version: 0, the tdeb name and version is the same as the source file with -tdeb appended.
In the case of Translation-Version: 1 or higher, the tdeb name is ${source}-tdeb$version+t[0-9]+all.tdeb. Additional tdebs are referenced in the Packages file in the following way: Additional-Translations: ${source}-api-tdeb, ${source}-user-tdeb
In cases where a base tdeb is present, package managers must call dpkg with the tdeb and the deb in the same invocation in order to ensure that all debconf templates can be extracted before the config script is run.
There is no need to unpack in order to obtain the debconf templates - the tdeb merely has to be locatable by debconf which will call apt-extracttemplates and load the translated debconf strings into memory. See TDebs and debconf:
TDebs and debconf
apt-extracttemplates is used by debconf's dpkg-preconfigure to extract templates from the not-yet-extracted .debs right after download. This needs to take tdebs into account. Note that the templates are per-binary while tdebs are per-source. Also, the .deb should have non-translated templates.
TDebs and multiple templates files
If a source package builds multiple binaries that use debconf, the debian/ directory will contain foo.templates and bar.templates. The TDeb will retain all templates files under the original names. apt-extracttemplates and po-debconf will need to work together to ensure that all templates files are available to debconf so that debconf can selectively load only the templates files required.
Tdebs and usr/share/doc
A tdeb needs usr/share/doc/copyright and changelog.Debian and dpkg will create the necessary files, just as with a normal .deb.
Lintian support
PO translations
- No source changes - The Tdeb packages should not add messages not related to a message of the original source package. How to check this? If there is a POT file, then it is possible to do the comparison with the gettext msg* tools. POT file will not be in the tdeb, only in the main source package. When a PO file is modified, lintian can get the POT file of the same directory from the source package.
- msgfmt warnings - Modification of upstream PO files should be avoided. A warning could be produced.
- File naming rules
- Location of PO files in the source package (+t1.diff.gz)
- Location of mo files in the binary packages (tdeb)
- Location of manpages in the binary packages (tdeb). (current check can be reused)
- Name of the manpages in the binary packages (tdeb). An english manpage shall remain, with the same name, in the original binary package.
TDeb maintainers
Rather than allow repeat uploads of the same change in multiple languages, coordinate builds of tdebs to make a single upload with as many changes as possible at one time. Translation-Maintainers: in debian/control and Localisation Assistants.
TDeb implementation
Incorporation of the tdiff in the next source package
A process will be needed to help maintainers including the tdiff when they prepare a new source package (kind of NMU acknowledgement?) Automated so that the +t1.diff.gz is automatically applied if it exists. Problem still exists with maintainers who don't check apt-get source first. Possible method is to modify uscan and uupdate.
When the maintainer prepare a new package, he applies the tdiff and "acknownledge the new translations". (This tdiff has great chances not to be applicable if the upstream source changed)
The i18n infrastructure can check that this acknowledgement is really performed (e.g. merge the old translations in the new one and check if the translation statistics changed)
Automation in uscan should be possible
This issue can be postponed until tdebs appear for non-native packages (squeeze+1)
L10N Infrastructure
i18n.debian.net gathers the translation material from the packages. It needs to support tdebs too (tdiff).
i18n.debian.net can check that translation material from the tdiff were merged in new versions of the source package
i18n.debian.net needs to help "Localisation Assistants" in gathering the new translations before the preparation of a new tdeb
Timeline
Sequence
- Archive and tools support (Squeeze)
- Debconf translation will form the first TDebs (Squeeze + 1)
- Native packages with program translations next
- Non-native packages with Debian maintainers who are also the upstream
What needs to be done for Squeeze ?
- tdeb binary file definition - (ratification and review)
- tdeb source file definition - (development and testing)
- dpkg-deb and dpkg --install support - (partially implemented in git)
There will be no TDebs in Squeeze.
What needs to be done for Squeeze + 1?
- debhelper support for both tdebs explicitly, and also marking files into classes in general (partially implemented via dh_gentdeb in git)
- provide a patch to cdbs for running dh_gentdeb in the right place. (Done - only remains for the patch to be filed and applied).
- apt/aptitude support for pulling in and removing tdebs
- lintian support
- debdiff support
- devscripts support (debc)
- dak support (run away, run away) run faster
First generation of TDebs:
- packages using debconf
- native packages using gettext (optional)
What will be done for Squeeze + 2 or later?
- dpkg class support - (make it easier to selectively install translations for specific locale roots).
support for packages using non-gettext translations. Packages using non-gettext mechanisms include OOo, mozilla, Qt or Java properties, menus, desktop.)
any remaining debconf packages not yet using TDebs
- remaining native packages using gettext
- non-native packages with a Debian maintainer on upstream team
- starting support for non-gettext packages
Changes
2009-03-08 - [Neil Williams] * Convert to DEP.
2009-03-19 - [Neil Williams] * Add a table of contents via ikiwiki
2009-04-14 - [Neil Williams] * Tweak some of the links to become active. * Update the URL * Fold in the results of the discussions so far on -devel.