14 August 2018

The last ten months we tried to revive the translations for the Akeeba software by self-hosting a translation environment using Weblate. Unfortunately, this experiment did not yield the intended results and bumped into several problems. After a lot of consideration we decided to discontinue translations and only provide English language files as we had originally announced in February 2017.

Why we used Weblate instead of Transifex, CrowdIn etc

This is a short recap of what we already discussed in our announcement of February 2017. Our software's languages consist of several hundred files with nearly 10,000 language keys and several hundred thousand words. Simply put, we have produced a lot of software which is self-documenting and tries to communicate its intentions to the users with clarity. Unfortunately, this puts it at the higher pricing tiers or even the “enterprise” segment of most translation services. These figures are multiplied by the number of languages the software is attempted to be translated in. Based on an examination of the pricing structure of CrowdIn and Transifex that puts us between $4,000 and over $10,000 per year to simply host the translation files with them, a price only to steeply increase as we add more language strings and translations into other languages.

Despite producing a lot of software we remain a very small company. There are two full time developers who also do support and a part time support person. We also have two subcontractors we use whenever we need User Experience, front-end development and graphics work we (the developers) cannot obviously handle ourselves. Even at the lowest end of the pricing spectrum it's way too much. To put things in perspective, it costs about $4,000 per year to distribute all of our software and provide an update server. Also, our annual budget for sponsoring community Joomla and WordPress events is 10,000 Euros, or around $12,000. We can't stop distributing our software and it doesn't make community or business sense to stop sponsoring community events.

You might wonder, if our software is Open Source why don't we just host our languages with either of the aforementioned services for free? It turns out that their free language file hosting for Open Source software is only valid when you do not (directly?) make money out of it. Even though 95% of the downloads of the translated software would be non-paying users, the 5% who would pay became a sticking point. Transifex had contacted us to let us know that if we sell services around our software we can't use their translation services for free. That's why we had to stop using them.

Between February 2017 and October 2017 we were looking for more affordable alternatives. Every translation service we found that was affordable didn't support our language files format (standard Joomla! INI translation files). The same was true for most self-hosted translation software. The only solution which was both affordable and supported Joomla! INI translation files was Weblate.

How things went south: the technical part

Weblate is a Python application using the Django framework. This is not a big problem. There was a learning curve setting up the server (an Ubuntu Server instance on Linode) to serve a Django web application but in the end of the day that was only a marginal difference to setting up any production server, something we already know how to do. The problems we faced were with Weblate itself.

The first thing we bumped into was that Weblate is very clearly oriented towards Python application which use a single .po file for the entire application. Having a project with dozens of translation files, like Akeeba Subscriptions, was quite “fun” to set up. There was a false start and a few lost hours but we managed to work around that.

Since Weblate is deeply integrated with a Git workflow we also had to host all translations in a single Git repository. Otherwise we'd have to deal with merge conflicts every time we updated our code while someone had updated a translation. This was a bit more complicated to set up since it involved changes throughout all of our code repositories. It took another few false starts and lots of automation scripts to make it work but that was a one-off process.

Permissions are important for translations. In an ideal world you can tell the translation system who has the right to translate which project and language. Not in Weblate. By default it comes with global privileges, i.e. you can tell it that someone can translate everything. You can opt to have per project, but not per language, privileges. In other words, when someone wanted to be for example our French translator they'd get permission to translate the software in every conceivable language. An upgrade ended up wiping out the per project permissions so we ended up with everyone being able to translate everything in every language.

The permissions in Weblate make sense for the enterprise world but not an Open Source Software. A translator only has enough privileges to suggest translations. A manager has to approve it. But the manager has far-reaching privileges which are unsuitable for our use case. For example, the manager would have the ability to see how the Git repository is configured which is a security issue in our use case. So we had to customise the permissions, an ill-documented operation which requires assigning privileges with misleading names. After a lot of trial and error we made it. The aforementioned update killed that assignment and we had to do it from scratch again.

Language codes are important when translating software. For example, there's more than one German. There's de-DE (Germany) and there's de-AT (Austria) among other options. They are mostly but not exactly the same. Unfortunately, Weblate only offers the generic de (German) option. We had to go through the list and curate it to what Joomla and WordPress expect; we used Joomla's official translations as a guideline. The aforementioned update screwed this up and we had to do it all over again.

You'd normally expect that when you add, say, a German translator for Akeeba Backup that by creating a German language you get the option to translate all files to German. Not with Weblate! You have to go through each and every language file and add a translation to that language. On projects with plenty of files, like Akeeba Subscriptions or SocialLogin, this was very tedious and error-prone. We had to create even more automation which kinda worked.

The big problem was with how Weblate was handling translations. Empty translations, where there is a language key but no translation content, would result in Weblate throwing an error and preventing further translation of the entire language file. Unfortunately this happened a lot, for example when a translator entered and accepted an empty translation thinking they can come back to it a bit later. It took another round of automation and dissuaded some of the original translators.

In theory, Weblate is also able to export the translation to a file and import it from a file. Unfortunately, this was very buggy in the case of Joomla INI translation files. Double quotes inside the translation were replaced with the literal "_QQ_" which has not been required the last  6 or so years. That would have been a minor inconvenience but for two related bugs. The surrounding double quotes of the translation string, required by the INI format and Joomla!, were also converted to the literal "_QQ_" in the translation environment on import! When you saved those tainted translations the internal "_QQ_" were converted to the asinine "_QQ_"_QQ_"_QQ_" which caused everything to break. Lots of time was lost trying to automate a fix for that. This bug was only recently fixed in Weblate but it was too late since we can't upgrade to the fixed release (more on that later)

Remember how we said Weblate is very integrated with Git? This is done by creating hooks on GitHub to tell Weblate there is a new commit so that Weblate can pull from the repository and update its languages. When you translate something it commits it to the working copy of the repository it keeps on the server and periodically pushes to it. But what happens when there is a new commit on the GitHub repository when Weblate has not committed its changes to the working copy? Why, a merge conflict of course! One that you have to log into the server and resolve manually. We ended up telling Git to automatically stash before pulling but that didn't solve the problem, it only made it less frequent.

And then we come to the really major issue which put the final nail in the coffin of this experiment: updates. If you're using Joomla! or WordPress you're used to automatic or semi-automatic updates. You click a button, stuff is downloaded, installed and you go on your merry way. It takes, what, a minute? Less? Not with Weblate. First you need to download the new release and MANUALLY compare it's example configuration file with your live configuration file. Since the application and framework configuration share the same file it is guaranteed that every minor version upgrade introduces an incompatible change in the configuration file of several hundred lines which you have to track down and fix yourself. Which works most of the times, but not all. Because sometimes something changes in the Django framework and you get 500 errors which you need to track down, search for them, figure out what change took place, figure out what Django option you have to provide and where to put it in the configuration. It's exactly as fun as it sounds: not at all.

But that's not it. Afterwards you need to replace the application files, remember to remove all the .pyc files (because Python stores the compiled bytecode on the disk, not in memory, unlike PHP's OPcache) and run the dreaded migrations. Migrations are supposed to be automatic updates of the database structure and data. They are supposed to be generated automatically by Django itself. There has not been A SINGLE VERSION where the damned migrations worked as they should from the first try. There were always failures. Resolving them had been frustrating and counter-intuitive. We ended up having a secondary local replica server to test things... although half of the time what worked on the replica failed on live or vice versa. 

One very ill-documented issue is that at some point we had to convert the database to UTF8MB4. Of course this required a week of detective work because the actual error was that a certain user's full name (written with French accents) was invalid when running a migration.

And finally, upgrades often require version specific instructions. These typically involve things like "run this specific migration after you do X and Y arbitrary changes in settings.py" or "oh, BTW, if you had custom permissions you can kiss them goodbye". Grunt. If you ever thought that updating Joomla or WordPress was hard I have news for you: you have it easy!

This all brings us to the latest upgrade we tried to perform: Weblate 2.20 to Weblate 3.0.1. The migrations fail at some point because of a database error. A foreign key is not removed before trying to change data, causing a failure to convert something. This brings the whole show to a stall. You can't upgrade from 2.20 to 3.0.1. The only option is starting over, setting up all projects from scratch, trying to import users from the old version and give them permissions again.

How things went south: the community part

Between the three of use we speak a meager three languages fluently: English, Greek and Italian. The latter is not an option for translation since Davide already has his hands full with code. Paying for translators is out of the question considering the very limited resources we can afford to throw into an ancillary and low priority part of our operations. Therefore we need volunteer translators.

Many people expressed their interest and most of them did provide a few translation. However, out of 140 people expressing their interest it's only less than 20 who have logged into the translation site the last four months. Don't get me wrong. All of the translators are volunteers, they've done a terrific job when they had the time but they are only human. When a million things demand your attention and time it's very reasonable that translating this random software for free is somewhere at the bottom of your list. I can relate to that. As a student I was volunteering Greek translations to a few FOSS projects. As soon as I started working I had to drop that since my free time was limited and I'd much rather use it to decompress and do a number of things which didn't require me puzzling over a dictionary in front of a screen.

Between that and a number of drive-by contributions the translation status for most languages and software is abysmal. Realistically speaking, translations below 95% are barely usable if you don't speak English. If you discount English which is the source language and Greek which is maintained by yours truly we only have 2-4 languages for each software which are truly usable. When you take into account which languages people visiting our site are speaking, per Google Analytics, we don't serve usable languages to the majority of non-English speakers.

Side note: Yes, we showed all languages over 60% on the translations download pages. This was done with the secret hope that people would be more likely to pick up maintaining a language that was at least partially translated versus having to start from scratch. The harsh reality is that nobody volunteered for these languages, making the whole point moot. And yes, most projects have 2x to 3x as many languages as you would see on the language download pages but they were too sparsely translated to even list. The really fun thing about these translations is that if we were to host our languages on Transifex or CrowdIn they'd count as full translations, increasing our cost to the several tens of thousands of dollars per year. Ouch!

People need German, Russian, French and Spanish translations - in that order. We are lucky if we have one of them fully usable and another one mostly usable per software. Some software is only translated to languages that are spoken by less than 0.5% of our visitors. So much effort for so little result.

Besides that, there is always the issue of how do you address translation mistakes. When someone contacts us with a translation issue in a language we do not speak we have no sensible way to address it. Languages are complicated and software translations are even more nuanced. Even a native speaker may have difficulty deciding what the best translation is. For example, do you translate the word “site” or “browser”? Are speakers of the language likely to understand the official translated word for that term, chosen by professors inside the insulated vacuum of academia, or is it best to use a more colloquial expression? What about the phrases which don't translate well, such as “security exception”? If your exact translation causes a visual issue because the translation is 5 times longer than the original and there's no sensible solution to the layout do you go for a shorter, more obscure, string or do you paraphrase? Is your paraphrasing likely to cause misunderstandings? You might laugh at that but we know of a case where the French translation in Akeeba Backup was causing French users to think that the Site Transfer Wizard feature would do something it wasn't supposed to and we kept seeing the same invalid bug report. These are issues which cannot be solved unless your translators are developers with a strong language background and experience in UX design. This is an impossible ask for volunteer translators.

So where does that leave us?

It is very clear now that self-hosted translations are a dead end for technical reasons. Using a hosted service is a dead-end for financial / business reasons. Volunteer translations are a dead-end because people don't have free time to translate software changing in a fast pace in languages that matter to the users of the software.

The only reasonable result is to call it quits and stop offering “official” translations.

If you want you can always translate our software to your own language and make the unofficial translations available. Just remember that the license of the language files, just like everything in Akeeba software, is GNU GPL v3 or later (with the exception of Kickstart which is GPL v2 or later). As long as you respect the license and make it clear it's an unofficial translation you are free to do so.

Last time we faced the same dilemma and ended up discontinuing translations there were several suggestions coming to us over our contact form. In an effort to save your time let us recap what won't work:

  • Using a translation service. The whole point is that they are either way overpriced or don't support the language file format we use. No, converting between formats is not an option - the reasons are complicated but trust us when we say that we have tried going down that road before.
  • Using a self hosted translation solution. Weblate did sound ideal until we ran into all the practical problems. Other self-hosted solutions don't even support Joomla INI files. If you think you can manage the translation server yourself, fine, we can give you write access to the translations GitHub repository but keep in mind that there will be no pay and you get a heck of a lot of responsibility.
  • Paying for translations for the specific, popular languages we discussed above. This should stand to reason, being an additional cost to hosting the language files themselves.
  • Manually managing translations. Even discounting the community issues I described, manually managing the translations is a daunting task. We can't afford to commit developer time into it. If you are serious about undertaking this task (about 5 hours every week) get in contact and we can arrange it. Do keep in mind that it's a very serious part-time job which comes with no pay and lots of responsibility.

Regrettably, emails with content like "why don't you use X" will not be replied to unless you explicitly state your experience using it to translate software which uses Joomla! INI files and it doesn't cost an arm and a leg. Thank you, we can type “translation software” in Google and read the results ourselves. Between that and recommendations we ended up with Weblate and the results were explained above. We appreciate your willingness to help but this doesn't help us much. Your experience with cost-effective alternatives, if any, will be of great help though. If you have that experience and can share it with is we will be very happy to discuss with you about it.