The Alta Vista Revolution

by Richard Seltzer, B&R Samizdat Express


From Internet-on-a-Disk #15, January/February 1996

Permission is granted to make and distribute verbatim electronic copies of this article for non-commercial purposes provided this permission notice is preserved on all copies. All other rights reserved. To correspond with the author, send email to: seltzer@samizdat.com

NB -- Richard Seltzer (together with Eric and Deb Ray) later wrote a book entitled The AltaVista Search Revolution, published by Osborne/McGraw-Hill in 1997 and re-issued in a second edition in 1999. That books is still in print today (2002).

Please visit our online store at http://store.yahoo.com/samizdat


The Internet is a strange new environment, where you can expect the unexpected and need to be prepared for major changes to occur over night. It's much more like Oz than Kansas.

The historical examples of such changes include the Netscape Navigator and Yahoo.

The Netscape Navigator is now so commonly used as an Internet browser that it's easy to lose sight of how revolutionary it was when it first appeared around September of 1994. At that point in time it was virtually impossible to use the Web from at home. With a 14.4 modem, the browsers of that time simply timed out before you got anything useful. It was a frustrating experience. I could get everything I wanted using my connection from the office over the corporate network. And there were many users at other corporations and educational and research institutions. But the home market simply didn't exist. To really connect to the Web from home, I would need to get an ISDN line and buy about $1000 dollars worth of hardware to make that work with my PC. Or I'd need some other high-speed expensive solution. Then the Netscape Navigator suddenly appeared. It was six times faster than anything else at the time. It was like getting a six times faster Internet connection for free. All of a sudden, it was easy to use the Web from home, and home use started to grow at a phenomenal rate, opening up a wide variety of business opportunities.

Around that same time -- early fall of 1994 -- the number of Web sites had increased to the point that it was becoming very difficult to find what you wanted when you wanted it. This problem had led to the development of the electronic mall as a business concept. The idea was to host or link many separate Web sites in an organized fashion -- to impose order on the disorder of the Web at large. The idea was to create the ideal "on-ramp" -- to invite people to come in to this particular site because here they could easily see -- organized like a mall -- all the kinds of things that they might be interested in; and like in a mall, the visitor could be attracted to wander into this or that other store because of its proximity to the one they were looking for. That looked like a good business model. And then a couple of students from Stanford put together their Yahoo site. This didn't involve any fancy technology, no search engines that would go out and actively check what's on the Web. Basically, their service was just an outgrowth of the lists of interesting sites which they had compiled for their own use. They made it available for others, and made it easy for people to submit info about their own Web sites to be added to the data base. All of a sudden it was very easy to find a Web site that you wanted when you wanted it. Yahoo grew at an astronomical rate -- both in terms of the sites it listed and in terms of the number of users. This simple and effective solution made the "electronic mall" obsolete over night.

Now Alta Vista, a new free service from Digital Equipment, (http://www.altavista.digital.com/ ) breaks the mold. It is now possible not just to find a Web site you might be interested in, but to search the full text of nearly every document on the on the Web and in newsgroups.

That means you can locate a document on a computer in China or anywhere else on the Web probably far quicker than you could locate it on your own hard drive. This makes it easy for you to identify resources and for others to find you. At the same time, this capability changes the whole concept of a Web site. People no longer have to navigate by way of your "home page". You no longer have the ability to control the context and experience of the user who visits your site. Rather people will find the documents they want directly -- diving straight into files you buried deep in subdirectories. Suddenly you have to rethink how you structure and present your information.

We introduced that concept in our last issue, a week before Alta Vista went public. ("Who Controls the Context? -- Search Engines and the Fate of Carefully Constructed Web Sites" http://www.samizdat.com/context.html ). Now it's time to take a closer look at this new capability and its implications.

**DESIGNING FOR ALTA VISTA

If you have a Web site, search for it using Alta Vista. And if you aren't delighted with what you find, go back and redesign your pages, because you can expect that many of the people who might want to visit your site will be using this tool.

First, are the titles of your pages appropriate and useful? If people get to those pages by hyperlinks from your home page, the title is insignificant; hence you might be using some internal shorthand for your own convenience. But the title of each and every page is the first thing that users of Alta Vista will see, and will be an important criterion in their decision to go to your pages or other similar ones.

Second, what are the first three lines on each of those pages? As a default, that's what an Alta Vista user is shown, together with your page titles. Ideally, those lines should provide a clear and crisp picture of the contents of that page, so people interested in that subject matter will know that they want to look at the whole thing.

Next, can you be found the ways your potential visitors are likely to search? For instance, you may have all your company's press releases on line. But the phrase "press release" probably never appears on any of those pages. In other words, someone searching for "XYZ Company" AND "press release" might find nothing. So simply add that phrase to those pages, and soon (not immediately; after the Alta Vista "spiders" have visited there again) people will be able to find them. Likewise, consider any and all key words and phrases that might logically occur to people looking for the kind of information you provide and add those words to your documents -- perhaps as a extra line at the bottom.

Also, remember that many people will be looking for other people and searching by their full name. If individuals are mentioned on your pages, make sure that the full names appear -- first name followed immediately by last name -- somewhere on those pages. And if the name is a common one, be sure that other terms immediately associated with this particular "John Smith" appear with the name.

If you have a page that is particularly important to you and you want to make sure that people searching for that particular piece of information or product/service will find you, be sure the key words appear in your HTML title and the first couple lines of text. That's the best way to come out high in the list of search hits.

**USING ALTA VISTA AND FINDING THE LIZARD

I first approached Alta Vista as a researcher using it as a tool to learn more about my own Web site and how I could improve it.

A simple search for link:samizdat.com gave me a list of nearly 150 Web pages that have hypertext links to my site. I had had no idea that so many people valued what I was doing. And it was a pleasure to check out those other sites and see what we have in common. Because I thought it would be of interest to my viewers I created a page with a list of hyperlinks to all the sites with hyperlinks to my own (simply by saving the source of the pages the Alta Vista search created). http://www.samizdat.com/links.html And to encourage more sites to create such links, I offered to continually update this list, asking Web masters to contact me directly.

Then I began searching for the topics that are important to me -- subjects I cover at my Web site. I was delighted to discover that not only could I find far more Web pages with Alta Vista than with any other search engine, but, also, a set of simple but powerful commands made it easy to refine my searches and home in on my particular needs.

For example, I searched for "The Lizard of Oz," the title of a book I wrote and self-published 22 years ago and which is now available at my Web site (http://www.samizdat.com/liz1.html ) I discovered that a play with that title will be performed in a town in Pennsylvania this February. At first I thought this might be the children's play which I had adapted from my own book. But inquiring further, I discovered that this was a story adapted from the Wizard of Oz by other writers. I also found another, different play with the same title being performed in Tasmania; and half a dozen works of art, all originating in Australia, which is often referred to as "Oz."

Some writers who would like to make their information freely available over the Internet, hesitate to do so because they are uptight that others might plagiarize their work -- not just make it more widely available, but change the name of the author, or lift chunks and appropriate the work piecemeal in another context. Now, in a matter of seconds, and at no cost, you can search the entire Web not just for titles, but for any chunk of text. And the fact that such detection is so easy should be a powerful deterrent for any would-be plagiarizer.

Try Alta Vista and let your imagination run free. And be sure that your kids will do the same. Don't be so naive as to presume that simple-minded censoring schemes will keep them away from certain kinds of information if they actively want to find such information.

And keep in mind that with a free tool as powerful as this, with only a beginner's Internet knowledge, you could set yourself up in business finding informaiton of all kinds on the Internet for clients on request.

**WHAT'S NEXT?

I love Alta Vista. It is perfect for today's Internet. But keep in mind that it's designed for the static text-based Web of today. It cannot handle graphics, voice or video. It doesn't tell you anything about the contents of databases connected to the Internet, or information stored on the Internet in forms other than Web pages or newsgroups. It gets nothing from sites that require registration or sites with dynamic interactive applications.

It represents a revolutionary advance. But there will be more revolutions.

We can expect that increasingly more of the content available over the Internet will be stored in data bases. The provider of the information won't have to go to the expense of converting it to .html (the Web format) or restructuring it in the form of Web pages. Rather, in response to queries, the information will be converted on the fly from a variety of formats to .html and possibly other formats as well, for immediate one-time delivery to the user.

And more and more conent will include audio and video, sometimes combined with text, and sometimes combined with CD ROMs and telephones and television -- a glorious mind-bending mixture of media far beyond the reach of today's Alta Vista.

So Alta Vista is revolutionary. It is today's answer. But already the question has changed.

It's like Alice in the land through the Looking Glass.

I'm not an engineer. I'm just a writer. And I have no advance knowledge of technology that can do multimedia searches or that could probe multiple heterogeneous databases linked to the Internet. It's hard for me to imagine how a multi-media search could work without the use of keyword tags applied by human observers. And it's hard for me to imagine that the owners of databases would want to allow search engines to probe their guts and pump out their data to index it for retrieval.

But on the other hand, there are a lot of creative engineers out there who want to make a buck and who just love a challenge like that.

My guess is that there will be solutions to both these problems within two years.

And my guess is that the solution will be of the "agent" variety, rather than the massive Altavista-style spider/search engine.

In other words, the individual user will launch a program which the user has given a very specific search task -- like playing "Fish" on a mega-scale -- find me all XYZs. And like giving a bloodhound a piece of clothing to sniff, you will give your agent samples of the kinds of things you are looking for -- and these samples might be text, graphics, sounds, video, etc. The agent will then independently poke around everywhere -- might use a search engine like Altavista for the text stuff, might log in and register as a user at publicly available database sites (or even log in with password at database sites which require membership and to which the user is a member), and then come back -- perhaps a day or two later -- with a set of clues and pointers to stuff that's close, or the thing itself, if there's an exact match.

Anyway, such are my speculations.

**NEAT TRICKS

Users are sure to find more applications for this great tool than the designers ever intended. Please send us email (seltzer@samizdat.com ) to tell us about your experiences, and we'll share the best ideas in

Internet-on-a-Disk and at our Web site.

Here's one brilliant trick which was recently posted to newsgroups, and which the author gave us permission to include here. Be sure to check out his Web site next time you go surfing: http://www.mcs.net/~jorn/html/hyper.html

**Subject: WWW/Netnews: A nifty trick courtesy Altavista

From: jorn@MCS.COM (Jorn Barger)

Newsgroups: alt.culture.usenet,alt.culture.www, comp.infosystems.www.authoring.misc

Date: 10 Jan 1996 23:35:55 -0600

Digital's phenomenal new WWWeb search engine at <URL:http://www.altavista.digital.com/ > has such sophisticated capabilities that I've been able to add a new link to my home page, offering most of my *news postings* from the last month (but self-updating/ never out of date) via:

<URL:http://www.altavista.digital.com/cgi-bin/query?pg=q&what=news&fmt=&q=from%3Ajorn.mcs>

(You can set up whatever query you like, there, submit it, and clip the URL of the returned page...)

Be sure to read their very clear help pages for more handy tricks, like collecting virtually *all* the pages that link to any one of yours...


This site is Published by B&R Samizdat Express, 33 Gould St., West Roxbury, MA 02132. (617) 469-2269. seltzer@samizdat.com
 
 


Please visit our online store at http://store.yahoo.com/samizdat

Return to B&R Samizdat Express
Buy Richard's book Web Business Bootcamp (published by Wiley) http://www.amazon.com/exec/obidos/ASIN/0471164194/brsamizdatexpres

Published by B&R Samizdat Express, 33 Gould St., West Roxbury, MA 02132. (617) 469-2269. seltzer@samizdat.com


Internet Business Showcase: