Chapter 7: Making your site global -- taking advantage of free translation at AltaVista

by Richard Seltzer , seltzer@samizdat.com, www.samizdat.com

Copyright ©1998 Richard Seltzer


This is the seventh chapter of a book entitled The Social Web. Permission is granted to make and distribute complete verbatim electronic copies of this item for non-commercial purposes provided the copyright information and this permission notice are preserved on all copies. All other rights reserved. To correspond with the author, send email to seltzer@samizdat.com Comments welcome.

My Internet: a Personal View of Internet Business Opportunities by Richard Seltzer, on CD, includes four books, 162 articles, and 49 newsletter issues that will inspire you and provide the practical information you need to build your own personal Web site or Internet-based business, helping you to become a player in this new business environment.



Update: Jan. 2001. AltaVista just added English to Japanese, Korean, and Chinese, and also Japanese, Korean, and Chinese to English. Apparently (at least with the settings on my PC), you can't copy and paste Japanese, Korean, and Chinese text into the translation box. But you can paste English into the box to translate into those languages. And you can enter the URL of a Web page in one of those languages to get an English translation. (Given my level of ignorance of Asian languages, when I stumble upon or am pointed to a site with unfamiliar characters, I connect to Babelfish, enter the URL and test to see if it is Japanese, Chinese, or Korean).

The Internet is a global communications network; but, until now, its potential as a global business environment has been limited by language barriers.

In the early days, before the Internet became open to business, English dominated. Many users were technical people, who, regardless of their native language, had grown accustomed to reading computer manuals in English.

Then, with the commercial expansion of the Internet, tens of millions of non-technical people from all walks of life connected. Many of these newcomers do not understand English or are uncomfortable using English or as a matter of cultural pride would prefer to be addressed in their native tongue. As a result, businesses of all kinds are rushing to cater to the needs of this growing audience -- providing local-language content, along with selling services and products on-line. This means that an increasing proportion of the content on the Internet is difficult for English-speaking people to decipher. It also means that an increasing number of non-English speaking Web users cannot understand the content of Web pages written in English. As a result, the Web has been fragmenting into local language entities, and costs have been escalating for those companies that want to reach a global audience and hence have to have their pages translated into multiple languages and have to make changes to those translated versions every time they make a change in the content of a page.

The AltaVista Search site (www.altavista.com) adapted to this multi-lingual environment by partnering with companies around the world to set up mirror sites, which provide local content and instructions and help in local languages and have the server located in the target country for fast response time. Currently, they have mirror sites in Australia, Denmark, France, Germany, India, Italy, the Netherlands, Sweden, and the United Kingdom. AltaVista also lets users limit their searches to pages in a particular country (with the command domain: followed by the country identifier, e.g., domain:de for Germany) and lets them to limit searches to pages in particular languages using a pull-down menu on the search form ( Chinese, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hebrew, Hungarian, Icelandic, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, and Swedish). This approach helped global companies with multi-lingual capabilities to navigate through the increasingly language-fragmented Web, but ordinary users and small companies found themselves trapped on their respective language islands.

Using the language menu to pick a particular language, and performing a search in Advanced Search for *, you can see that the AltaVista index includes over 10 million pages in German, 7 million in French, another 7 million in Spanish, 4 million in Italian, and 3 million in Russian. Yes, English still predominates, with over 82 million pages, but thousands of non-English pages are added every day -- including pages with information that could be valuable to you if you could both find it and make sense of it.

Yes, automatic translation software is available for some languages, but it can be awkward cutting-and-pasting Web page text into the software and then bouncing back to the Web to follow links and explore further. You lose the convenience and speed of information access that makes the Web so compelling.

For business communications, automatic translation can provide a good first approximation, and is very useful. But it is not good enough for Web site owners to rely on it as a means of providing their content in multiple languages. If the translation sits on their site, it reflects on their brand; and they need to go out of their way to avoid the glaring gaffes and cultural blunders that automation often generates. That means that they have to pay high prices and cope with long delay times for human experts to polish and correct the output of automated translation. And they also have to provide added disk space for the translated text and develop internal procedures to help manage the increased complexity of a multi-lingual Web site.

But even if individual users had their own translation software and even if many businesses could afford to maintain multi-lingual sites, language would still be a barrier to search. With all the millions of pages out there, how could you find those pages that have information that is potentially valuable to you but that are only available in non-English languages? And how could non-English speaking people find pages at English-language sites, even if those sites could (and would want to) provide automatic translations on the fly?

Amazingly, a free service at the AltaVista Search site goes a long way toward providing practical solutions for these problems, helping to make the Web truly global. This is one of those developments -- like the initial launch of AltaVista -- that change the direction of the Web. The availability of free instantaneous translation helps break down language barriers, opening new markets and business opportunities and fostering international understanding.


AltaVista translation from a user perspective

At Babelfish http://babelfish.altavista.com, you can enter a URL, or type or cut-and-paste any text into the box, and you can translate from English to French, German, Italian, Spanish, or Portuguese; or from any of those languages to English; also French to German, German to French, and Russian to English. Unless the server is extraordinarily busy, you get the results almost immediately. And unless the text is idiomatic or laden with slang, you are likely to get remarkably good translations.

This service uses automated translation software from Systran. Hence it has the strengths and the limitations of automated translation. Don't expect perfection. Don't expect it to understand and correct misspellings and grammar. Don't expect artistic and colloquial translations of poetry and rap lyrics. Do expect quick and useful renderings of business-related information.

You can get some laughs from checking how it handles tricky figures of speech that would challenge a UN interpreter, or by translating from English to another language and back multiple times and watching the ever-increasing distortions of meaning. Or you can use this as an aid to help you smoothly navigate through foreign pages.

Because the software runs on Digital's powerful Alpha computer systems at the AltaVista site, you get very fast results (though response-time may slow somewhat when tens of thousands of users make requests at the same moment).

And the translation service is intimately tied into the AltaVista search service, making translation part of your normal Web-navigating experience. Whenever you do a search, matches in your results list that are in any of the languages now covered come with a "translate" link. Clicking on that takes you to a page where you select the language you want to translate it to. Then clicking on "translate" again, provides you with the page itself -- with all its graphic look and feel, including all its hyperlinks -- with the text in the language of your choice. From there you can continue to explore as you normally do in the Web environment.

This development seems natural for AltaVista, which is based on a massive, language-independent index. Search engines that are built around the syntax of any particular language lock themselves out of the rest of the world. But AltaVista understands nothing about any language. It just captures all the text it finds and treats it all equally. (Within a couple weeks of when the original AltaVista Search site went on line, the developers got email from people in Korea who had typed in queries using their Korean keyboards and had gotten good results pointing to Korean pages. )

Just recently (October 2000), they added a "world keyboard" to this service. Just click on the keyboard icon, and a virtual keyboard appears in a Java window and you get a new, associated translation form. The keyboard has a dropdown menu of language choices: world (English characters plus a few special characters and accented letters used in other European languages), English (useful for people whose keyboard are set already in non-English languages), French (for translating French to English and French to German), German (for German to English and German to French), Spanish (for Spanish to English), Italian (for Italian to English), Portuguese (for Portugese to English), and Russian (for Russian to English). When you select a language, the characters on the keyboard change accordingly. Then you can type in the translation box by clicking on the keys of the virtual keyboard or by hitting the equivalent keys on your real keyboard; either way the charcter you see in the translation box will be a character from the language you have chosen.

This innovation makes it much easier to enter foreign text for translation. In the past, you had to have your PC's keyboard set to enter characters from that language, or had to cut and paste into the translation box text that already had the appropriate non-English characters, or you had to enter English "equivalents" of foreign characters, which sometimes led to ambiguity or confusion.

Unfortunately, the words that you type with the virtual keyboard are only of use in that search box. This is a Java application which makes it impossible to copy and paste the text from the box to anywhere else. (This is unlike the main Babel translation page, where you can copy text -- complete with accents and non-English characters from the results box to any other document on your PC.)


Limitations

There are limitations.

Today the service only provides translations between English and five European languages, from Russian to English, French to German, and German to French. It doesn't handle other language pairs (like English to Russian) and doesn't including such major languages as Arabic, Japanese, and Chinese.

Also, because of performance issues, the size of the text it will translate is limited to 5 Kbytes , which is about 800 words or two double-spaced typed pages. If a document is longer than that limit, only the beginning will be translated; then you will encounter the words "TRANSLATION ENDS HERE" and the balance of the target Web page will appear in the original language. If the balance of such a large document is important to you, you can cut-and-paste additional chunks of text from the original into the form at AltaVista's translation page, one piece at a time, by hand. Admittedly, that's awkward, but it can solve your immediate problem and prevents one person's "need" to translate an entire book from slowing performance for millions of other people with the less demanding requirements.

Also, keep in mind that this service only translates plain text. Words embedded in graphics remain unchanged. And words that appear in Java applets or inside frames or inside databases do not get translated when you submit a URL for translation. And if you submit for translation a URL that is behind a firewall or on the other side of a password-protected registration page, AltaVista won't be able to fetch and translate the text. But you can cut-and-paste text from an applet or from a database query or from inside a frame or from a page on your company's intranet. In fact, you can cut-and-paste text from any source at all -- from newsgroups or forums or chat sessions or your email or your own personal files that reside on your personal computer. Or you can simply type in whatever text you like.


What can you do with this?

Here are a few suggestions:

o multi-lingual email correspondence -- Type your messages in the form at the AltaVista translation page. Then cut-and paste the translated text into the email you are sending. And when you receive messages in a language that you don't understand but that AltaVista can handle, cut-and-paste the email into that same form.

o newsgroup, forum, and chat participation -- Read and submit to newsgroups, forums, and even chat in foreign languages, once again by cutting-and-pasting text into the translation form. For convenience, you might want to keep the AltaVista translation page open in a separate Window.

o travel -- Check local-language Web pages from places where you intend to travel, learning about accommodations and entertainment and events.

o news -- Read local-language news stories on the Web, getting a foreign perspective on events, and perhaps greater detail than that offered by global news sources.

o games -- Play on-line strategy games (like Diplomacy), with participants all over the world, who do not have a common language, but who can use the translation capability at AltaVista to go beyond that barrier.

o research -- If you suspect that information that you need (for business, school, or whatever) is available in another language, enter your query words in the translation form; then cut-and-paste the translated text into the query box (adding English language command words, e.g., link: or host:, if needed). The translated words will have the appropriate accent marks, even if you are unable to generate those accents with your keyboard and software. In addition, you could limit your search to the target language using language tags.

o language study -- If you are reading a book in a foreign language, you might want to keep the AltaVista translation form on-line as you do so and type in unfamiliar words, as an alternative to looking them up in a dictionary. You might also benefit from experimenting with automatic translation back and forth to and from the language you are studying, probing to find the limitations, where human knowledge and experience is essential for understanding. Those are the aspects of language that you should focus your efforts on. Automatic translation will gradually transform language study, just as the ready availability of calculators transformed the study of mathematics.

o distance education -- Already today, over 5 million people a year take courses at a distance, and many of those are delivered over the Internet. Many of these people reside outside the US and are taking courses at US institutions. The ability to rapidly and readily translate messages for email and forums should make it easier to students who are not fluent in English to actively participate in courses delivered in English, and for English-speaking students to participate in courses delivered in foreign languages as well.

o K-12 education -- Arrange partnerships with schools and classes in other countries, using the AltaVista translation capability to break through language barriers, so kids with no language in common can carry on dialogues with one another through email and forums. This could be part of social studies programs intended to foster international understanding. It also could be part of after-school club activity or built into model UN exercises.

o bilingual education and English as a second language -- This service could prove important to non-English-speaking students in predominantly English-speaking schools and to the teachers and administrators who serve them.


Making your Web site available in multiple languages

Today, some Web sites are required by law or by the charters of their organizations to provide all their content in more than one language. For instance, this is true of government sites in Canada and in many instances vast and complex realm of the United Nations. In other cases, while not officially required, multi-lingual capability is highly desirable both from a practical standpoint -- potentially opening new markets for businesses -- but also as a matter of respect for the culture and heritage of people in the target audience.

So how can companies and organizations take advantage of the AltaVista translation capability, getting maximum benefit at minimum expense?

First, make sure that your pages are in a format that can be translated. If much of your content is plain text, then you are in good shape. But if you are using sophisticated techniques that create pages dynamically on the fly or are using frames or the text is generated from databases or appears in Java applets, then you have locked yourself out from taking full advantage of this new capability. Perhaps you should consider creating a plain text version of your pages that will be translatable (and also be indexable by search engines like AltaVista).

If your pages do have translatable text, you could use AltaVista to translate them and save the resulting pages, even large pages created by cutting-and-pasting chunks, at your site; then offer visitors the choice of which language they would like to see. But in that case, you are vulnerable to the vagaries of automatic translation, and an horrendous blunder caused by the inability of the software to understand a colloquial phrase might damage your company's reputation among the very people you are trying to open your site to.

Also, in that case, you take on a significant maintenance burden -- having to change your translated pages every time you change the originals; and additional overhead in terms of disk space and Web site complexity.

But the underlying technology of AltaVista makes possible a very interesting alternative. Every search at AltaVista generates a unique URL, which a user can bookmark and an information provider can cut-and-paste into Web pages, making hyperlinks that automatically generate particular AltaVista searches, providing up-to-date results whenever you want them. That same capability applies to the translation service.

If you do an AltaVista search which yields a particular page in the match list and then click on the word "translate" next to that match, you arrive at the AltaVista translation page with the URL of the target page already in the form. Check the "location" of that translation page -- it is not just http://babelfish.altavista.digital.com. Rather you see a unique URL that the contents of the translation form -- with the URL you are interested in already entered. You can bookmark that page and get back to it whenever you want. And you can cut-and-paste that massive, complex URL and make a hyperlink from your own Web page to there.

In other words, if you have a Web page with less than 5 Kbytes of text -- small enough so you can feel confident that in most circumstances AltaVista will translate the whole thing -- you can make a hyperlink from that page that will take a visitor to AltaVista's translation page with the URL of your page already in the box. So you could have a link at the top of your page that tells visitors (perhaps in more than one language), "for a rough translation of this page, click here." Once at the AltaVista translation page, the visitor then chooses the target language and gets the translation, created on-the-fly, at not cost and no hassle to you. A simple explanation at your site can set user expectations appropriately. You are not responsible for the quality of the translation. You are providing this link as a convenience.

If your pages are in English, this technique would open your site to visitors who do not understand English, but do read French, German, Spanish, Italian, or Portuguese.

But how will those people find you in the first place? They'll never translate your pages if they don't know you exist.

Once again, a practical solution is readily available. AltaVista recognizes "key word" metatags. If you really want to open your site to foreign visitors, make those keywords foreign words. Use keyword metatags on all your pages to provide translations of the words and phrases that potential visitors to your site are likely to search for. A keyword metatag goes in the header of your document thus:

<HTML>

<HEAD>

<TITLE>Blankety blank Web Site</TITLE>

<META name="keywords" content="nouvelles magasin">

</HEAD>

<BODY>

etc.

First decide what words and phrases are most important. Then use AltaVista to translate them into the target languages and cut-and-paste the translation results into your key word metatag (with all the accents). Then once you have completed your page, go to AltaVista, click on ADD/REMOVE URL and enter the URL of that particular page. Then the new text for that page, including the foreign words that you just entered in your keyword metatag, should be in the AltaVista index in a day or two. Then someone searching for those foreign words will be able to find your pages, and then, using your translation link to AltaVista, will be able to read the complete text in their native language.

In other words, with minimal effort, you can go a long ways toward making your Web site and your Web-based business truly global.


Keep it simple

You also could, as described above, create hyperlinks that take visitors to the AltaVista translation page with the URL for one of your pages already entered in the translation form. But the typical Web user would be mystified if suddenly transported to that translation page without some explanation. And the translation service will handle only a limited amount of text (about 5 KB when traffic is heavy), leaving the visitor with a Web page that's only partially translated.

So instead of using that fancy approach, I decided to post a clear and simple explanation of how users can take advantage of the translation service--that empowers them to get the translations they need, while leaving the responsibility in their hands. I then used the translation service to create versions of that document in French, German, Italian, and Portuguese. At the top of my home page (www.samizdat.com), I now have:

These phrases (with the appropriate accent marks, all captured and cut and pasted from the translation service) connect with hyperlinks to the matching documents. Over time, I plan to add those same words and links to all the pages where automatic translation would be helpful (not including, for instance, documents that consist of poetry and lists of titles of books). Here is the full text of the English version of that explanation, which you're welcome to use at your own site. You could also simply link to any or all of the translation pages listed above.

"How to translate into French, German, Spanish, Portuguese, Italian, Chinese, Japanese, or Korean

"To translate foreign language text, first connect to AltaVista search's automatic translation service. In a separate window, connect to the page you want to translate (a Web page, a word processing document, an email message, or a newsgroup item).

"On the target page, click the left mouse button in the left margin beside the starting point in the text, and drag your cursor down over a couple of paragraphs (about a third of a typed page) to select them. Then, in the toolbar, click EDIT, then COPY to save the selected text to your Clipboard.

"Next, bring up the translation page. Position your cursor over the translation form. Click the right mouse button and then PASTE. The selected text should now appear in the form. Below the form, click on the down arrow to select the language pair you want (such as English to French). Then click on TRANSLATE. The translated text should appear in a second or two.

"To save the translated text in a file, click and drag (as above) to select the text; and click EDIT and COPY to place the text on your Clipboard. Then open a document in your word processor and paste the text. Return to the original document and select the next piece of text. Return to the translation page, click NEW TRANSLATION, paste the text in the new form, and proceed as before. Keep doing this as many times as necessary to translate and save the entire text.

"The results should be useful, but they'll be far from perfect. (If you're reading this text in a language other than English, you can judge for yourself how good or bad it is.)"


The rest of The Social Web by Richard Seltzer

Related article Expanded translation capabilities at AltaVista's Babelfish site

My Internet: a Personal View of Internet Business Opportunities by Richard Seltzer, on CD, includes four books, 162 articles, and 49 newsletter issues that will inspire you and provide the practical information you need to build your own personal Web site or Internet-based business, helping you to become a player in this new business environment.

Web Business Boot Camp: Hands-on Internet lessons for manager, entrepreneurs, and professionals by Richard Seltzer (Wiley, 2002). No-nonsense guide targets activities that anyone can perform to achieve online business success. Reviews.

This site is Published by B&R Samizdat Express, 33 Gould St., West Roxbury, MA 02132. (617) 469-2269. seltzer@samizdat.com


Return to B&R Samizdat Express

.


<
Internet Business Showcase: