i18n and pathauto

The internationalization module (commonly know as the i18n module) and the pathauto modules are great modules, but when combined they do create a minor gotcha.

If you're the type to run statistics on your site, or otherwise look into how people are using your site, and have a fairly standard setup of i18n and pathauto, you'll discover that each node gets a path for each language you're using, regardless of what language the node really is.

First off, it's important to realise that while 'da/some-english-title' just seems wrong, it's really as designed. The language code in the paths isn't the language code of the content, but the requested language.

Why do we need the requested language in the path? It becomes quite clear when looking at pages that actually use the requested language, like the default node listing provided by the node module. The node listing takes the requested language into account, and if so configured, only shows nodes in the requested language.

That in itself doesn't prescribe the use of language prefixes, as the browser provides the language preferences for the user, but they do become necessary when we factor search engines into the equation.

If there's no language code in the path, search engines only index the default language, as there's really no way for them to know how many and which languages a given page can be in (save the possibility of the server returning a 300 "Multiple choices" code with a document listing the options, but that also requires that each language version has its own URI, which brings us back at something like language prefixes), so the only way to get a search engine to index multiple language versions of a page, is giving it different URIs for each language.

And what goes for overview pages like 'node' really goes for individual node pages too. While the language of the node is pretty distinct, there's no way for Drupal to know whether that's the case for the rest of the page too. You could have (and probably have, if you're using i18n) blocks that change depending on the language, and if you don't ensure that the user visiting your site gets the same language as the search engine that sent them there, you could have a situation where part of what they searched for isn't on the page anymore.

So the default way of i18n to do things makes a lot of sense.

But keeping that in mind, you (like me) might come to the conclusion that you don't want each node 'duplicated' under each language prefix, but solely exist under the right language, that's quite possible too:

The i18n module is smart enough to ignore aliases that already contain a language prefix, so the solution is to alter your pathauto patterns to include the prefix. Pathauto used to come with support for the language codes, but the code has been removed by the maintainer as he doesn't use the i18n module and can't guarantee that it works. As it really belong in the i18n module anyway, that makes sense, but as the time of writing, it haven't made its way into the lastest stable release of i18n yet.

Someone has made a patch though, and it's easy to transform that into a module. I've done that and attached the file, with one minor change: when no language is set for the node/term, it will not insert a prefix. This allows for language neutral content to still be available under all language prefixes.

AttachmentSize
i18npathauto-5.x-0.1.tar.gz601 bytes

Blog tags: 

Comments

Great solution

Sound like a great solution, but didn't you forget to attach the module?

No, I forgot to give

No, I forgot to give anonymous users permission to see attachments. Fixed now.

What does it do exactly?

I've had a confusing time attempting to figure out how to use i18n with pathauto. I installed and enabled this module, and I'm not sure exactly what it does. Can you explain to me in simple terms what the module does and how I should use it?
Thanks,
Alex

It adds tokens for language

It adds tokens for language for pathauto to use, which allows one to circumvent the i18n module a bit.

Before this module, my pathauto for blog nodes was: [yyyy]/[mm]/[dd]/[title-raw], which makes pathauto make an alias like /2008/01/07/i18n-and-pathauto. However, when Drupal creates a link to that alias, i18n will step in, notice that there's no language prefix, and rewrite the path to include the language prefix. So if the current language is Danish, it'll change it to /da/2008/01/07/i18n-and-pathauto, and if the language is English, it'll make it /en/2008/01/07/i18n-and-pathauto.

Which will most likely case havoc when spiders starts trawling your site, as you end up with each node duplicated between 2 pathes.

The links will work fine, when /da/2008/01/07/i18n-and-pathauto is requested, i18n will strip the language, and look for an alias named /2008/01/07/i18n-and-pathauto. It will find it and the page will be displayed. But it's still in english.

Using this module, I've changed the pathauto setting to: [lang]/[yyyy]/[mm]/[dd]/[title-raw]. This will make pathauto generate an alias like /en/2008/01/07/i18n-and-pathauto (note the prefix), and when i18n does its thing, it'll notice that there's already a language prefix, and not change the link no matter what the language. And so the /da/2008/01/07/i18n-and-pathauto will never appear.

If there's no language set for a node, the [lang] token will be an empty string, which will be stripped by pathauto, with the result that language neutral nodes wont get a language prefix alias, and fall back to the default behaviour, being available under all prefixes.

Many thanks

Thank you for the thorough explanation. This is going to help relieve some of my i18n woes (and will definitely make things less confusing for my client).

What about Drupal 6?

Thanks for your interesting post.
I am actually working on a multi-langual website with Drupal 6.13, i18n and pathauto. I noticed that the module token now gives us the token [language] for the current node.

I tried the following in pathauto :
[language]/blog/[yyyy]/[mm]/[dd]/[title-raw]

You said that i18n is able to notice that the URL alias already contains the language. Was that with Drupal 5? Did you have to do something special in the settings?

On my installation, I get :
<a href="http://www.mysite.com/en/en/blog/2009/07/29/my-page-title">www.mysite.com/en/en/blog/2009/07/29/my-page-title[/geshifilter-code]

I really don't know what to do to solve this problem!

Has anybody got a tip?