Internet Search Articles by Richard Seltzer, from Cobb Newsletters

Articles which appeared in Internet Search Advantage (formerly known as Power Searching with AltaVista), a monthly newsletter published by The Cobb Group, a division of Ziff-Davis. Reprinted with permission from Internet Search Advantage, ZD Journals. http://www.zdjournals.com

Finding and Being Found for Jobs

The Internet is more than just an eletronic mega-library. It's a way of connecting people. Not only is AltaVista Search important for finding nuggets of information, but it also helps people find, and be found, by one another. AltaVista Search can be especially useful in job searches. Recruiters and headhunters who have positions to fill want to find qualified and interested applicants and, at the same time, want to be found by those applicants. Meanwhile, job seekers want to find openings that match their skills and aspirations and want to be found by recruiters and headhunters who have such positions. Both sets of people can benefit from using AltaVista Search and from posting their information on the Web in a way that maximizes their prospects.

Why AltaVista Search?

If you're looking for jobs or job applicants, your first thought might be to go to a Web site devoted to jobs. But, according to Electronic Recruiting News, there are already more than 3,500 job-related Web sites, and the numbers are growing rapidly. Your dilemma is probably heightened by the following questions: Where do I start? How many of these sites do I visit? And which ones do I think are most likely to get me the results I want? If you think AltaVista Search might help you look through all the sites at once, you'll have no such luck. In fact, job-related Web sites typically use databases to store their information, even when that information is just text-based, like resumes and job postings. To get to that information, you must learn the procedures and the query language of each site, fill in forms, and enter query after query. Because that information is locked up in databases rather than presented on plain Web pages, it's inaccessible by Web crawlers; hence, AltaVista Search and other search engines never index the information. Yet, AltaVista Search is still useful when it comes to finding jobs.

Actually, I discovered the job-matching power of AltaVista Search by accident. My wife was recently looking for a job. Newspaper ads and job-focused Web sites got her nowhere. Then she got a call from a headhunter whose AltaVista search found her resume at our little Web site. Within a couple of weeks, my wife had the job--thanks to a plain-text resume posted in free Web space that we get with our Internet-access account.

Actually, it would be far easier for head-hunters to use AltaVista Search to find the people they want rather than go to job site after job site and cope with all the different formats and search procedures at each site. Today, there are more than 300,000 resumes posted on the Web as plain HTML documents and indexed by AltaVista Search. That means anyone in the world who uses AltaVista Search to look for someone with certain credentials and skills is likely to find useful matches very quickly.

Finding Job Candidates and Resumes

If you're a recruiter or a headhunter, there are several ways you can use AltaVista Search to find job candidates, depending on your needs and your personal style. First, go to the Simple Search page and enter resume. When you click the Submit button, AltaVista Search will return 900,000 matches. Some are resumes, some resume-preparation services, and some just random pages on which the word resume happens to appear (including pages soliciting job applicants and asking them to submit their resumes.) Next, click on one of the LiveTopics links (Way-Cool Topics Map, Tables, or Text-Only if your browser doesn't support Java). You'll see the top 20 categories of pages where the word resume appears, together with many subcategories. As you can see in Figure A, you can scan the lists and select or unselect miscellaneous subcategories to try to focus on what you're looking for. For instance, you can focus on engineer, analyst, consultant, or pascal as a common term that you definitely want to appear in the resumes you'll look at.

Having made those preliminary choices, resubmit your query; then try LiveTopics again to see the next level of categories and subcategories. Continue to refine your search, letting LiveTopics prompt you and guide your search through the maze until you arrive at a manageable number of matches. Then begin to look at the Web pages themselves.

If you prefer, you can go straight to the Advanced Search page and enter resume in the Selection Criteria field. In the Results Ranking Criteria field, enter all the words that are relevant to the experience and qualifications you're looking for.

For instance, if I needed to hire a Webmaster, I would enter the words webmaster html perl java in the Results Ranking Criteria field. And if I wanted someone local who wouldn't have to relocate, I would add the words massachusetts ma mass for all the variants of the state name.

In this case, I get 80,000 matches. But the ones that have all or most of the ranking terms I selected will appear at the top of the list. So I don't worry about the number of matches. Rather, I scan the descriptions in the list, which helps me do a quick rough sort between pages with real resumes of job seekers and pages advertising jobs of the very kind I'm trying to fill. I check the likely Web pages starting from the top. And if I'm not seeing exactly what I want, I fine-tune my search by adding to or subtracting from the words in the Results Ranking Criteria field and resubmitting the search.

Finding job openings

If you're looking for a new job, you follow a procedure similar to the one I described for finding job candidates. You can use several approaches to produce valuable results. Perhaps you want to get an overview of the jobs that are open and the credentials that employers are looking for. In this case, you don't want to start with a job title or category. That could be too limiting and could prevent you from seeing the very opportunity that would be perfect for you.

As I noted above, to search for applicants, I'd go straight to the Advanced Search page and try to find the resumes that match my criteria. By contrast, to search for a job for myself, I'd cast a very wide net to find information. I'd try to avoid guessing what kinds of jobs are available and which ones might be good for me. I'd take my time and make full use of LiveTopics to get a broad view of the market and gradually narrow my search, learning all I could about the jobs marketplace in the process.

I'd start with a simple search and enter job career, which yields about a million matches. Many of these are for pages devoted to job- and career-related services and resources--information I expect to find at this stage. I may very well want to explore these matches or maybe even refine my search to get those and only those as matches; then I might look for the jobs themselves only after I feel sufficiently well-informed.

Next, I'd go to LiveTopics. There, I might want to focus on one of the top 20 categories--for instance, internships (if I were in college and looking for summer or term-time employment). Or I might know very well what I want to do. In that case, I'd go to the Advanced Search page and enter all the relevant terms. For instance, I'm primarily a writer and want to be involved in writing about the Internet. Also, I think of myself as an "Internet evangelist," and I'm particularly interested in companies that are savvy enough to use that term in a job posting. In that case, I'll enter job OR career in the Selection Criteria field. In the Results Ranking Criteria field, I'll enter writ* internet evangelist.

This search yields 2,000 matches, most of which are pages that use the word evangelist in the sense of someone who enthusiastically embraces new technology and helps spread the word about the Internet and potential jobs in this field (rather than the religious sense of the term).

Keep in mind that the vast majority of jobs posted on Web pages today are technical. And, of course, there's no guarantee that what's out there will meet your very particular needs. But the demographics of Web users and the variety and range of companies that actively use the Web are both changing rapidly; so visit the Web periodically and check to see if what's out there suits you.

If you have a clear idea of what you're looking for and have constructed a well-targeted search that should take you to only those openings you'd be interested in, but you still haven't found the job you want, bookmark your search results. Then periodically click on that bookmark to connect to AltaVista Search, launch the same search, and get updated re-sults. If you're using an advanced search, you can also enter a start date and an end date (under the Results Ranking Criteria field) to see just the matches you haven't seen before. (Actually, you only need to enter the start date. The default end date is the current date.)

Posting "Findable" Resumes and Job Openings

Being found by the right job applicants or recruiters is even better than finding them. You have a significant advantage. You don't have to get their attention--you already have it. To increase the likelihood of that happening, take note of the following suggestions: Create a Web page that clearly states the most important facts about yourself and that inserts the most important "fact" words in the HTML title and the first few lines of text. Put yourself in the shoes of the ideal person from whom you want contact and make sure you use words that such a person will be likely to search for. Don't do anything fancy. AltaVista Search indexes plain text. Graphics are irrelevant to being found. As we noted earlier, AltaVista Search doesn't index its databases, so don't put your job postings in a database. Also, don't put important information in a frame, since AltaVista Search doesn't index information inside frames. In addition, AltaVista Search doesn't index pages that require you to register or pages that you can get to only by filling out a form. Keep your job-related pages simple, and you'll be in great shape. Lastly, go to the bottom of the AltaVista Search home page and click on ADD URL. Enter the URL for each new page you create (one at a time). When you do, your information should be in the index by the next day.

As you try these techniques, please let us know about your successes and your frustrations. Send us your tips, the creative approaches you've tried, and your questions.

Flypaper: Using AltaVista Search In Reverse

When old friends whom I hadn't been in touch with for 10 to 30 years started sending me E-mail messages--about a half dozen of these friends each month--at first I thought it amazing that all those people would be looking for me and that AltaVista Search made it so easy for people to find other people on the Internet. Then it gradually dawned on me--why should they look for me? Just like me, they probably each have a hundred or more people they once were close to (old roommates, business associates, and so on) but lost touch with. And why, out of all those others, should they actively come looking for me? With a few quick queries, I soon figured out they weren't looking for me at all. They were looking for themselves. They had gone to AltaVista Search and done what most people do. They had entered their own names as queries. They found themselves at my Web site because the content there includes my writing, which mentions many of my old friends' names, typically in the list of thank you's at the end of my books. Searching for themselves, they chanced upon me. Delighted at that unexpected occurrence, they sent me E-mail messages.

So when the number of long-lost friends starts to decline, I should create a new page at my Web site where I mention acquaintances I haven't mentioned elsewhere and would like to get back in touch with. In other words, instead of trying systematically to find other people on the Internet, I'll set things up to make it easy for those people to find themselves in documents at my Web site. I call this method of finding people the flypaper approach.

It's a neat reversal of my expectations. I connect with the people I want to find by making their names and their subjects of interest findable at my site. The same approach can also work well in the world of business if you're trying to connect with potential customers or employers.

Flypaper for Business Contacts

If you want to connect with a particular person at a particular company, and your phone calls and E-mail messages are going unanswered, create a Web page that mentions that person and/or company. Say all the good things you've been meaning to say about how you could benefit from working together, and so forth. Put the person's name and the company's name in your Web page's HTML <META> tags and in the first line of text, so AltaVista Search's ranking algorithm will place your page high in the list of matches. You don't need to link any of your other Web pages to this one. Just go to AltaVista Search, click on Add URL at the bottom of the page, and enter the URL of this specific page. The next day, your site will be in AltaVista Search's index, ready to be found by your target the next time he or she uses AltaVista Search to search for himself or herself. When that happens, the odds are good that the person will get in touch with you, and suddenly your position in the upcoming dialogue will be greatly improved, because the person contacted you instead of vice versa. Of course, this approach is not guaranteed, but it's certainly worth a try.

As more and more companies and people come onto the Internet, the odds are that this approach could lead to the kinds of business contacts you want. Of course, the same approach can be effective in trying to recruit particular individuals to come to work for you, as well as trying to get particular employers to come looking for you.

Flypaper for Writers and Publishers

The flypaper approach changes how publishers and writers can and should find one another. In the traditional mode, acquisition editors spend years building contacts with just the right people so they'll be able to find talented writers and sign up promising books. In the new mode, these same editors can find writers and books--the new products that their companies depend upon--far more quickly by scouting the Internet. The individual writer, too, has far more opportunity than in the past. Instead of submitting a manuscript to publishers (typically one at a time), the writer can post the material on the Web and publicize it free of charge (making sure that it's indexed by AltaVista Search). This approach increases the chances that an editor will find the writer. Even if that doesn't happen, the work doesn't just gather dust. Rather, it reaches an audience, perhaps leading to correspondence and acquaintance with like-minded people, and the writer's work can improve from the feedback online.

I've experienced several recent instances when being found by people using AltaVista Search has led to interesting business opportunities. For instance, Ebooks Multimedia in San Francisco, publisher of interactive CD-ROMs for children, was looking for content that it could turn into a product. Using search engines, Ebooks found my book The Lizard of Oz at my Web site. I self-published this book 22 years ago, and it had simply been gathering dust. Within a week of Ebooks contacting me, we had a signed contract, and Ebooks is now at work on the project.

A little later, a movie producer in Iceland looking for new material found my never-produced screenplay Spit and Polish. That discovery isn't likely to lead anywhere, but it's an opportunity I would never have dreamed of pursuing actively myself. And just a couple of weeks ago, I received an E-mail message from a woman who found a stage play of mine, Amythos, at my Web site. She works for a professional theater in Spokane, Washington, and is interested in producing this play, which I wrote 25 years ago and which has never been staged.

In all those cases, instead of my having to identify prospects, write query letters, and submit manuscripts--which takes time, effort, and money--people who were looking for that kind of material found me. And because they made the first contact, the conversation started at a different level. They had a particular need and had already determined that my work might fill that need.

I found the most dramatic instance of the flypaper approach totally unexpected. It was a kind of opportunity I would never have dreamed of. A Garry Trudeau fan was looking for a copy of Bull Tales, Trudeau's first book, published when he was an undergraduate at Yale. This fan found the book mentioned at my Web site, where I have a list of every book I've read over the last 39 years. He sent me E-mail to find out if I still had a copy. He also noticed at my site that my daughter (now a sophomore at Sarah Lawrence) has an interest in acting. It turns out that he is the writer/producer of several popular TV shows. She was in Los Angeles over the summer acting in a movie written and produced by my sister, Sallie. After a few friendly E-mail messages, I wound up trading my copy of the book for my daughter to get an audition for a possible part in an episode of one of those TV shows. Nothing immediate resulted, but my daughter learned a lot from the auditioning experience and made contacts that could prove important in the future.

These experiences taught me that you shouldn't presume that other people won't be interested in your material. Don't presume that you know the markets and the potential buyers for all of your material. By making lots of material available on the Web and making sure that AltaVista Search indexes your material, you open up possibilities that you probably never dreamed of.

General Flypaper: the First Step In Building A Web Audience

The basic idea is that people using search engines look first for themselves and then for the subjects nearest and dearest to them. Hence, you can use a targeted approach or a general approach to attract the people you want to get in touch with. With the targeted approach, to reach a particular person and/or particular company, you create a simple Web page that mentions that person or company. For example, as I mentioned, my Web site contains a list of every book I've read for the last 39 years. It's just a list. When I posted it, I doubted that anyone would be interested in it. But AltaVista Search draws lots of traffic to my site. I've received E-mail messages from authors, agents, and publishers who found the list either looking for themselves or looking for books they've been involved with.

I've also received lots of satisfying correspondence from people who love to read. In particular, I got E-mail from Dean Rink, a producer for PBS who was getting ready for a lengthy stay in Antarctica as part of the Live from Antarctica 2 program. He was planning on doing a lot of reading there and was looking for recommendations of good books when he stumbled across my list. Like me, for many years, he had been keeping a list of the books he reads. We ended up swapping lists of favorites and reactions to particular authors. I posted the correspondence at my site, and others joined in. As a result, I've discovered powerful and fascinating books that I otherwise would probably never have heard of.

If you work for a school, create a Web page that lists all alumni and their years of graduation and other public information about them. Click Add URL on the AltaVista Search page, and you'll soon receive E-mail messages from some of them. As you begin to draw an audience to your site with flypaper of this kind, you need to give them reasons for coming back. They'll become a loyal audience of your site, and you'll be creating a new online community. Offer to add their E-mail addresses and other relevant information if they link to your site. Add a letters-to-the- editor page with selected correspondence or create a forum to set up regularly scheduled chat sessions with a host and pre-arranged speakers.

Building Inactive Web sites Without Interactive Software

Recently, while mentioning flypaper in a presentation about AltaVista Search to a group of educators at the NERCOMP conference in Sturbridge, Massachusetts, it dawned on me that this approach can also turn static text pages into opportunities for online interaction. Before the Web, the Internet was very interactive, consisting largely of E-mail messages and newsgroups. The early Web itself was an anomaly--just connecting people to documents rather than people to people. That is changing now with new and better ways for people to interact, rather than just read, at Web sites (chat, forum, and so on). But, thanks to AltaVista Search, static Web pages, too, can become interactive. Instead of carefully polishing every text you write and getting official approval/blessing for every document you wish to publish on the Web, post your work-in-progress. Post your reactions to what you have read and unpolished notes from meetings. If a thought is important to you, it may well be of importance to others. And if AltaVista Search indexes the full text of your pages, interested people will find those pages and, by extension, you.

Posting a document as a work-in-progress begs for comment. Promising to post in the same place the most interesting and relevant reactions (sent by E-mail) provides further encouragement to open up a dialogue. It takes no special software to get a discussion going--just interesting and provocative content and the willingness to talk about your work before you completely finish it. What was just an article or a memo becomes a seed for discussion by a spontaneous community of people interested in learning about and understanding the same subject, sharing experiences and insights. This kind of experience is what formal education sometimes strives for, but very rarely achieves.

So get to work. Set out your flypaper. Then, send me an E-mail message to tell me what you catch and how you benefit from it. This new approach will take you to new territory. We can gain a lot by sharing our insights with one another.

Finding What You Want Even When You Don't Know What You Want: the Live Topics Bonus

ALERT: LiveTopics is a powerful feature that was once offered through AltaVista. They later renamed it "Refine." It is now no longer available. The project, code named "Cow9," was a collaborative effort between researchers at Digital Equipment Corporation and Franšois Bourdoncle of Ecole des Mines de Paris, www.ensmp.fr The underlying technology has enormous potential. This article will give you a sense of how it can be used.

Unlike those of us who have unsuccessfully searched for a particular fact at the library, AltaVista Search is great at finding rare information. With AltaVista Search, the more specific and unique the desired information, the more narrowly you can focus your query and the quicker you'll get the answer you want. Before AltaVista Search, using Internet search engines was often a time-consuming chore. If your query was too general, it could take forever to narrow your search or sort through all the matches to get what you really wanted. In those days, you'd be better off going to the library.

In the early days of AltaVista Search, determining whether a query was specific or general was often a matter of personal style. Different people looking for the same answer would approach a search from totally different perspectives. Those who thought in specific terms were able to get immediate useful results from AltaVista Search. But for those who thought in terms of categories and generalizations, every search was a lengthy battle.

With the introduction of LiveTopics beta, however, your AltaVista Search queries can be a breeze, regardless of your search style. AltaVista Search is just as handy whether you frame your queries in general categories or narrow them to the minutest detail. This flexibility is possible because the same underlying index supports different approaches, ways of thinking, and associations. In this article, we'll look at a few examples of LiveTopics in action.

LiveTopics at Work

When I purchased a new laptop running Windows 95, I wanted to be able to plug in my existing printer, a Canon BJ200. To do so, I needed the Windows 95 driver for that particular printer. However, when I bought my printer, Windows 95 didn't exist, and now I needed to find out where I could obtain the necessary driver. I could contact Canon or Microsoft--or maybe an office-supplies or computer/software store. It could take hours, days, or even weeks to track down the driver I wanted by using this search method, and I needed to use my printer with my laptop right away. So I went to AltaVista Search, and on the Simple Search page, I entered +bj200 +driver*. The first item on the list of matches was a page from which I could download the driver I needed. I had my printer running within five minutes. For this situation, I used a search style that comes natural to me. I thought in terms of specifics--the model number BJ200, being a combination of letters and numbers, was rare. A search for that term matched only that product.

If you were facing the same problem, you might use a different method to find the same driver at a different Web page using LiveTopics. If you think in terms of categories, perhaps the first query term that comes to mind is printer. When your search yields a half million hits, as you can see in Figure A, click on Tables.

Figure A: When you get 500,000 hits from the search term printer, choose the LiveTopics topics map.

AltaVista Search then provides a list of 20 categories, each with subcategories. These categories aren't based on human prejudgment, like the Dewey decimal system in a library. Rather, they're statistically generated on the fly, based on the words that appear on the same pages as your query word(s).

Under the topic Windows, check the term drivers. AltaVista Search will automatically add that term to the query box at the top of the page as +drivers. Under the topic Laserjet, click on inkjet to add it to your list of search terms. As you can see, the topic inkjet includes the names of several vendors, but not Canon. So, go to the query box and type +Canon (as a shortcut). Now, submit the new query, shown in Figure B, and the second item on the list will take you to a shareware site with the Canon BJ200 driver.

By using a different approach, you can get an equally satisfactory result.

Using LiveTopics with Advanced Searches

As another example, consider cooking. I've never been able to use cookbooks, because of my style of thinking. I simply don't know what the different categories of dishes mean. So for me, AltaVista Search is an enormous boon. Let me show you what I mean. I can go to the Advanced Search page and enter recipe in the Selection Criteria field. Then, in the Results Ranking Criteria field, I simply list all the ingredients that happen to be in my refrigerator. AltaVista Search will return a list of recipes. At the top of the page, it will cite several recipes that include all or most of the ingredients I listed. Further down the page, AltaVista Search will include those recipes with fewer of the listed ingredients. With LiveTopics, there's an alternative approach: I can search for recipe, and when AltaVista Search returns some 300,000 matches, I can choose the Way-cool Topics Map! and Topic Words, as shown in Figure C.

Now, I can choose one of the major topics, like beer (which will provide recipes for home-brewing) or chocolate and submit that search. Then I can go to LiveTopics again to drill down further--learning something about cooking terminology and the choices Alta-Vista Search offers at each stage.

Or I could start the search with a known category, such as +recipe +casserole or +recipe +dessert, and then see how that piece of the cooking world is categorized and inter-related. In either case, at some point I'll decide that I'm close enough and start checking individual pages in the match list or, armed with a new vocabulary and tantalizing ideas, I might turn to a traditional cookbook and use it much more effectively than before. Regardless of your search style, LiveTopics provides a handy way to broaden a search to uncover new and valuable information.

Another Simple Search With LiveTopics

If you're into video games, you might want to search for a specific game, such as Master of Orion from Microprose. Let's start with a simple search. Type master* of orion. (The name of the game is "master," but add the asterisk after the name so you'll catch all those pages where someone mistakenly calls it "Masters of Orion.")

In response to this query, AltaVista Search returns more than 400 hits. To find out if there might be software patches for the game, add +patch* to the original query. As you can see, there are several good sources from which you can download free patches.

Next, let's look for some tips on strategy. Type +"master* of orion" strategy tip* tactics advice guide reference. You'll then get about 100 useful hits. Now, with the latest and greatest version of the software and all the game-playing advice you need, how about a search for other Master of Orion addicts out on the Internet? To match skills with another player, refine your search.

While this search query isn't very well-constructed and yields some useless matches, it does provide the names of some sites for people who like to play the game. That's all you need to test your mettle against some of the best players online. If you want other video games that are similar to Master of Orion, but you don't quite know how to categorize them, you'll need to frame your search differently. In this case, your task is to gradually broaden your search, then narrow it again to find a game you haven't tried yet. Specifically, do a search for master* of orion, as we explained earlier. When AltaVista Search returns the results, choose the Way-cool Topics Map! and click on the Topic Words tab. You'll see among the listed topics lots of games that are similar to Master of Orion. If one of the names in the list strikes your fancy, you can do a separate search for it. Alternatively, you can select a couple of the major topics, like multiplayer and wargame, then delete master of orion from the query box and launch a new search. This last approach provides more general results, but not so broad and amorphous as searching for videogam* (you'd use the asterisk to catch videogame, videogames, and video-gaming). Figure E shows the result of this final search.

In the first page of results, I found the Galaxy G Multiplayer Wargame Official Site, where I can sign up for a game with as many as 80 other simultaneous players. I also found other sites with reviews of games of this kind. All this information is right up my alley. You're bound to find some equally useful information in your searches!


Thanks to AltaVista Search LiveTopics, there are many paths through cyberspace. Some take you right to the destination quickly and simply. Some give you a broad overview of the territory, so you can decide which way to go. And some take you through fun and interesting locations, giving you opportunities to stumble upon unexpected wonders--information, like-minded people, and new kinds of online interaction--that match your tastes and interests, but that you may not have known to look for. As you try these techniques, please let me know about your successes and your frustrations. Send me your tips--the creative approaches you've tried--and your questions.

Taking "snapshots" with LiveTopics: Watch out for mirages

ALERT: LiveTopics is a powerful feature that was once offered through AltaVista. They later renamed it "Refine." It is now no longer available. The project, code named "Cow9," was a collaborative effort between researchers at Digital Equipment Corporation and Franšois Bourdoncle of Ecole des Mines de Paris, www.ensmp.fr The underlying technology has enormous potential. This article will give you a sense of how it can be used.

To simplify finding anything on the Internet, LiveTopicsbeta generates categories on the fly and provides a graphical view of those categories. As a result, you get more than just pointers to nuggets of information--you get information about information, an overview of what material is available, and some clues as to how pieces of data relate to one another. The results appear in a flash and, in that moment of enlightenment, it's tempting to jump to sweeping conclusions. But watch out! The quality of your results depends entirely on the quality of your query, and if you aren't careful, you could wind up eating a very large slice of humble pie.

The LiveTopics terms are based on statistics, not on some profound understanding of the meaning of words. Fallacies can arise from the particular words you use, the typography of your query, or even the language you're using. But, if you're careful, the results can be truly amazing.

The Entire Web In One glance

If you need to explain the Internet to someone who is unfamiliar with it or if you simply want to wow a friend with your vast knowledge, go to AltaVista Search and enter the query +*. You'll receive more than 30 million matches--every page in the AltaVista Search index, which is the entire public Web. Figure A shows the result of this query. The asterisk (*) is a wildcard, standing for any unknown letter or letters. And the plus sign (+), in this particular instance, forces Alta-Vista Search to make an exception to its general rule that three characters must precede such a wildcard, which can stand for up to five characters. (The developers simply couldn't resist the temptation to show what this system can do.)

Using the search query +*, you`ll get 30 million hits. 

Now, click LiveTopics' Way-cool Topics Map! link. You'll see 20 words in rectangular boxes that are connected with lines. Those are the top 20 categories of information on the Web--you're looking at a picture of the entire Web.

Keep in mind that these categories are created statistically, live, on-the-fly, and based on the information in the AltaVista Search index. There's no human bias. No one is making assumptions about the content. This isn't a list of categories, like a library's Dewey decimal system, based on human judgment and subject to obsolescence. This is a view of today's Web content based on the information currently in the AltaVista Search index. These are the significant words (not articles like the or auxiliary words) that happen to appear most frequently on the Web. A line connecting terms means the documents containing the one word often contain the connected word as well.

Now, click the Topic Words tab. You'll see a list of the 20 terms--the words in bold--and, next to each term, words that appear most frequently on pages that also contain the main term.

So what's the Internet about (at this broadest level of abstraction)? Lots of the main terms relate to education (school, students, learn, faculty) and research--one of the original intentions for the Internet. You also see terms relating to the underlying technology and how it's deployed (Internet, web, server, online). Business is there as well, with the words opportunities, programs, businesses, career, employment, assistance, and job, which seem to indicate lots of use for hiring and finding jobs. And professionals appears as well, with words relating to membership in professional associations.

Under the topic viewed, you see netscape and navigator, and also microsoft and explorer, perhaps indicating some rough parity between the two major Web browsers and the companies that provide them. What you don't see is also telling. For instance, the word sex doesn't appear, nor does anything related to shopping, online sales, or electronic commerce.

Note also that all the terms and words are in English, not because of a limitation of Live-Topics. The AltaVista Search index contains all text, regardless of language, and Live-Topics uses that entire index. This is simply testimony to the dominance of the English language on the Web today.

You might want to perform this test every month or two to get a sense of how the Web is evolving and how quickly it's doing so. This technique is a good reality check. Why believe the opinions of reporters when you can take a look at the entire phenomenon--with current data--any time you want; and it only takes seconds to see it?

An Example

Let's try an example just for fun. In a Simple Search field, enter the query God. You might be tempted to try god* to capture gods as well as god. But if you take that approach, you get about 100,000 irrelevant results--such as goddard--where god is just the first syllable. You might also be tempted to enter the query with all letters in lowercase to be sure to capture all instances of either god or God. But that, too, yields about 100,000 extra results, many of which are instances where the word is used in a non-religious sense.

Since we're trying for a broad approximation, God is most likely to give us the best results. This query yields more than 800,000 matches. Click the Way-cool Topics Map! link and let's see our search results.

The terms all interconnect and seem to relate to Judeo-Christian religion, with a heavy emphasis on Christian. It's interesting to see that words most frequently associated with God include truth, love, glory, and blessed. Click the Topic Words tab. At the next level, as you can see in Figure B, under the topic Revelation, there are numerous terms related to the Moslem religion (islam, muslims, muslim), so our first approximation was inaccurate that these topics all related to Judeo-Christian hits. Our Topic Words screen reveals a few words associated with the Moslem religion. 

Once again, all the words are in English. On the one hand, that would be expected because of the strong English-language bias of the Internet today. But remember, you stacked the deck by using an English word as your query word. If you wanted to capture content in other languages, you`d have to add Dieu, Gott, and so forth to your query. Adding Yahweh, Allah, and Buddah would also broaden the coverage of non-Christian religions.

But even using this crude one-word query approximation, we'll probably see major changes in our findings over time, as the audience on the Internet expands and becomes much more diverse and international.

Portraits of Countries

Using the command domain: in LiveTopics lets you create portraits of Web activity in particular countries. For instance, domain:au will match pages on Web servers in Australia; domain:uk matches the United Kingdom; domain:fr matches France; domain:co matches Colombia, and so forth. These domain-name suffixes are part of the basic naming structure of the Internet. Suppose your company is interested in doing business in Italy. A search for domain:it will yield about 800,000 matches. The Topic Words screen shows that 19 of the 20 categories are in the Italian language. (Local language is beginning to be very important.) The other category--Italian--appears to be in English. And in the Topic Graph view, the pages fall into two large clusters. If you understand Italian, and have the time and inclination, you might be able to derive some useful conclusions about the underlying cause of that divide.

These kinds of results make it very tempting to read more into the statistics than is warranted. For instance, you might want to check the number of matches for a range of countries, record the results, and do the same queries at fixed intervals to track the relative growth of the use of the Web in these various countries.

But several factors will make such an approach highly questionable. First, the Web site of a company in a given country doesn't need to use that country's domain suffix in its Web address. Some companies go out of their way to obtain domain names that are .com (commercial) without any indication of the countries in which they operate. And some companies in countries that have poor Internet access have other countries' Web sites host their pages. For example, Internet lines to Colombia are few and slow, and Colombia's infrastructure is U.S.-centric. When you're sending E-mail from one part of Bogota (the capital) to another, the message has to bypass systems in California. So, most Colombian companies doing business on the Internet today do so by way of Web servers in the U.S. That means that a search for domain:co would likely yield misleading and incomplete results.

If you do a search for domain:zw (Zimbabwe) and see only six pages in the results list, it would be unwise to jump to the conclusion that companies there don't use the Internet. Rather, it's likely that many have non-country specific domain names or are hosted elsewhere. That kind of result should be the beginning of a more detailed and probing series of searches, rather than the conclusion.

Also, recent reports indicate that, in response to the growing difficulty in reserving domain names that match your company or product name, the country of Tonga (the Friendly Islands in the South Pacific), with the suffix .to, is going into the business of selling domain names to any and all comers. Tonga hopes to become for domain names what Delaware is for incorporation and what Panama is for ship registration. If Tonga is successful, it will further blur the focus of country portraits in LiveTopics. But, for the moment, that`s not a factor.

Beware of the Counter

When AltaVista Search first came out, some ambitious, creative, and enthusiastic reporters went wild over the statistics. Publications like The New York Times and The Wall Street Journal ran articles based on the numbers of matches that AltaVista Search found to various queries, rather than the information it led them to. For instance, they compared the number of matches for Jesus Christ versus John Lennon as if it were a competition and drew broad-and-amusing conclusions from the results. Someone even wrote a program to query AltaVista Search each day for Windows 95 and automatically generated a graph of the counts, day by day, to document the spread of that operating system. But the counts that AltaVista Search provides are only rough approximations. The main intent of the service is to help you find information on the Internet, not to provide statistics about the Internet and its content. The ranking algorithm that determines which items appear high on the results list requires a rough count, so such a count is generated. But it's not meant to be precise. The vast majority of users look only at the first screen or two of results, and striving to achieve scientific accuracy in the count would simply slow down the system, making it far less useful for everyone.

Now, go to the Advanced Search page. Click the down arrow next to Standard Form. Then click As A Count Only. If you enter your query now, you'll get a count and only a count--no list of results--and it will be more accurate than what you see on the Simple Search page. Because the system doesn't have to provide a list of results, it can use the extra cycles to provide a better count. But, still the number is only an estimate. If you submit the same query several times in a row or for several days in a row, you'll likely get different counts. The different numbers don't mean that the underlying index has changed or that something is broken. They simply reflect the fact that the counting function was designed only to provide approximations. The smaller the numbers, the more accurate they're likely to be. But if you get more than 10,000 counts, the approximations can vary considerably, which can be very helpful in quickly checking variant spellings of a new term, which doesn't yet appear in dictionaries. But if the numbers are close, don't lean too hard on them. For example, don't advertise that your Web site has a dozen more hyperlinks to it than your competitors' sites. Don't try to read too much into numbers that were never meant to be precise.


Basically, the more you know about AltaVista Search and about the subject you're researching, the better you'll be able to avoid jumping to false conclusions. This isn't the Holy Grail of information--but LiveTopics can be a very effective tool if you take the time to learn how to use it. As you try these techniques, please let us know about your successes and your frustrations.

The Limits of Accuracy

ALERT: LiveTopics is a powerful feature that was once offered through AltaVista. They later renamed it "Refine." It is now no longer available. The project, code named "Cow9," was a collaborative effort between researchers at Digital Equipment Corporation and Franšois Bourdoncle of Ecole des Mines de Paris, www.ensmp.fr The underlying technology has enormous potential. This article will give you a sense of how it can be used.

Did you know that you can check your own site, your partners' sites, and your competitors' sites using AltaVista Search? That you can use LiveTopics to view and compare your findings? And that you can hone your skills at recognizing and interpreting subtle differences among your search results? When you search the Web with AltaVista Search, you're not searching the target sites directly. Rather, you're looking through the AltaVista Search index. In many cases, the index will be complete and current enough to validate your conclusions. But you should make every effort to make sure that's the case.

Take a good look

Before writing any competitive/comparative report based on AltaVista Search research, be sure to take a good look at the sites themselves. It's important to understand the limits of the measuring instrument you're using.

First, check for completeness. When you search for host:yoursite.com or host:competitor.com, AltaVista Search returns the number of matches from the index. If the number is relatively small (fewer than 1,000), the count should be pretty accurate. Perhaps one of your main initial conclusions is that your own site or your competitor's site has relatively few Web pages. If so, you should go to the site in question. Follow all the local links you find on the home page, all the links you find on those pages, and so on, taking notes as you go along. Notice if there are significantly more pages at the site than AltaVista Search says there are. If so, you'll need to look more closely to determine why.

Second, make sure the information is current. Perhaps one of your preliminary conclusions is that information at a target site is stale or inaccurate. You checked the dates shown with each entry in the list of matches, and they were all several months old and, in some cases, a year old. Or, perhaps you used Advanced Search and limited your query not only to the target site but also by range of dates and found lots of old pages. If that's the case, choose several pages with the oldest dates and click to connect to them directly. Then, with your browser, check each page's Document Info. Are the dates of the actual pages significantly more recent than the dates listed in the AltaVista Search index?

Keep in mind that while there are tens of millions of pages in the AltaVista Search index and while Scooter, the AltaVista Search spider, is continually gathering new and updated information with a thousand threads going simultaneously, the index is never absolutely complete and current. It's an excellent approximation, but it's not perfect. If the AltaVista Search index is your measuring instrument, avoid drawing conclusions that are more precise and subtle than is appropriate, given the instrument's accuracy.

What You Can Do With What You May Find

So what can you do if the AltaVista Search index indicates that a competitor's or partner's site has only three pages, and you find that it actually has at least a dozen? Jot down the URLs of the missing pages. Go to the AltaVista Search page and, at the bottom of the page, click Add URL. Then enter and submit the URLs of each missing page. By the next day, the new information should be in the index, and you can take another close look and revise your analysis. You can do the same thing to add new versions of pages to the AltaVista Search index, substituting today's content for information that may have been gathered six months ago.

You don't need special authority to add URLs to AltaVista Search's index. When you enter an address, the crawler immediately fetches the page in question, and the index is updated overnight based on the information found. By adding URLs, you're helping to improve the
index for the benefit of all.


If the number of new or updated pages is more than about a dozen, the fix may not be quick and simple. Sooner or later, you'll get a message that too many pages from that site have been entered. Because of abuses by a few users, the developers of AltaVista Search have imposed limits. Some of these abuses have been simply malicious. Some individuals find spamming an index (automatically feding useless information into an index) a technical challenge as well as a ompulsion. Like creating a virus, they do it just for the sake of proving they an, with no concern for the effect of their actions. In other cases, businesses have tried a variety of tricks they mistakenly believed would give hem an unfair advantage--resulting in their pages coming out higher on esults lists than their competitors' pages.

If you need to add several dozen URLs to the index, you might want to enter nly a dozen each day. If you need to add hundreds or thousands of URLs, the est you can do (under today's limits) is to enter the URLs for the home page and pages that are entry points for significant branches of the target Web site. When the crawler fetches the page, it will send the full text to the indexer and simultaneously capture the list of hyperlinked URLs for later exploration. Eventually, the crawler will fetch and add those pages as well. You should be aware that some sites contain features that prevent AltaVista Search from indexing their contents. In those cases, the data you gather for comparative purposes may be woefully incomplete. If the barrier is at your own site, you may be able to convince your Webmaster to change the basic design of the pages or to create a duplicate set of plain, indexable pages. If the barrier is at a partner's site, you may want to encourage the Webmaster to allow AltaVista Search to index the site so that people will be aware that it exists and that it contains valuable business information. If it's at a competitor's site, you should simply give up on the idea of comparative analysis based on AltaVista Search and be content with the knowledge that your competitors are missing important traffic.


The indexing barriers you'll encounter include registration, frames, databases, dynamic pages, and text in Acrobat or PostScript. Some sites require users to complete a form before they can advance to the real content. In some cases, the purpose is simply to capture information about users because access to the site is free. In other cases, the site owner is charging membership/subscription fees to get to the content, and passwords are necessary to get in.

Web crawlers, like Scooter, are dumb robots. Because they can't fill out forms or supply passwords, they can't access sites containing indexing barriers. Normally, such sites will allow crawlers to index their home pages but none of the remaining content. If you can reach the other pages at the site directly by entering their URLs, you can add those pages to the AltaVista Search index manually using Add URL. But if all the pages are password-protected, you can't index them. That's a tradeoff that the site owner is making (consciously or unconsciously). The owner gets user information and/or subscription fees but misses the traffic that would otherwise find the site through a search engine like AltaVista Search.

Some commercial sites use frames, where certain information--typically the company logo, site-navigation buttons, and/or banner ads-->remains constant around the outside of the page, while the real content appears in a rectangular box. Unfortunately, Web crawlers see only the information in the outside of the frame, not the content inside the box. Since some browsers can't view frames, some sites provide both non-frames and frames versions of their pages. AltaVista Search can index the non-frames version.

On some sites, the information that appears inside the frame actually consists of plain HTML pages, each with a separate URL. You can add those pages to the index using Add URL. Other sites have much of their content stored in databases, rather than in HTML pages. Once again, crawlers can't fill out forms and hence are stopped in their tracks by this approach to presenting information. In some cases, the database is essential and, by entering queries, the user can generate unique reports. In other cases, the underlying information is actually plain text, such as resumes and job listings, which could just as well be presented as ordinary text. You can add a duplicate of that information, presented as plain text Web pages, to the index.

Some of the more advanced commercial sites today offer various kinds of "dynamic" pages, where users are presented with new material each time they visit a site. (Sometimes, the material is based on a user profile or on "cookies," code that alerts the site about what the person did on his or her last visit to the site.) Such a site potentially could serve up an infinite number of unique Web pages. (This explains why it's impossible to answer the question, how many pages are there on the Web?) It's basically useless to index such pages, since only rarely or randomly will you ever see the same page twice. If such a site has core content that you can present in static pages, it may be wise to do so and make sure AltaVista Search indexes those pages.

Also, while Acrobat and PostScript formats give the information provider precise control of the page layout, AltaVista Search can't decode and index information in those formats. (However, tools exist for indexing such material on an intranet, using a commercial version of AltaVista Search.) 


If you use AltaVista Search to compare Web sites, make sure the sites' index information is complete, current, and comparable. And if you can improve the information in the index by using Add URL, or if you can persuade the Webmaster to alter pages in order to make them indexable, do so. Improve the accuracy of your search instrument before making your measurements and doing your analysis.

The Power of the Add/Remove URL link

Here's my current concern. How do I, at no cost, build a complete index of my Web site with custom search forms, index password-protected pages, and keep the index clean? Well, that's what I want to talk about this month. At my little sandbox Web site where I test new ideas, I have more than 600 text-only documents, some of which are complete books (http://www.samizdat.com). It was through my site that I discovered the importance of using search engines to drive traffic to a Web site (only about 12 percent of my visitors come by way of my home page; others go straight to the document they want, thanks largely to search engines). That was also how I discovered the "flypaper" principle for catching the attention of potential visitors. (I discussed this principle in the article "Flypaper: Using AltaVista Search in Reverse to Let People Find You," which appeared in the August issue of Power Searching with AltaVista.)

Six months ago, a number of people who frequented my site began asking me to add some indexing software. They praised the site as content-rich but complained that it was difficult to find what they wanted, because there was simply too much information to browse through.

Finally, it occurred to me that there was no need to invest money and time in new software. All I had to do was take advantage of the capabilities of the free public AltaVista Search service.

At the point I realized this, every page at my Web site was accounted for in the AltaVista Search index. Whenever I added new pages or made significant changes to old ones, I clicked the Add/Remove URL link on the new AltaVista Search page, as shown in Figure A, and separately added each page on my Web site. Then, the AltaVista Search spider, Scooter, fetched those pages immediately and added them to the AltaVista Search index by the next day.

I did this because I knew how important AltaVista Search indexing is in letting the right people know about my Web site. By providing lots of useful free content on my site, I've been able to build my traffic to more than 1,000 page retrievals per day, without spending a penny on promotion or advertising.

Because I indexed all my pages, the search command


will yield a list of every page at my Web site. And adding search terms to that command will limit the search to my site. In other words, +host:samizdat.com +chat will provide a list of all the documents on my site that mention chat--both the transcripts of my weekly chat sessions and the articles in which I discuss business opportunities related to chat. I also know I can bookmark the results of any search at AltaVista Search, and that I can make a hyperlink to the unique URL of a search result from a Web page at my site. So I can search for +host:samizdat.com and use my right mouse button to copy the resultant URL onto my clipboard. Then, I can paste that address onto my home page, making it a hyperlink with the anchor

"click here"<a href="http://altavista.digital.com/cgi-bin/query?pg=q&c9usej=on&what=web &Kl=XX&q=%2Bhost%3Asamizdat.com& search.x=71&search.y=16">Click here to use AltaVista to search this Web site"</A

Next, I included the following explanation: "To find anything at this site, connect to AltaVista Search and enter in the query box +host:samizdat.com followed by the words or phrases you want to look for, or simply click here and enter your query (+host:samizdat.com will already be in the box)." Most visitors are content with such an informal low-tech approach. One time, though, I got E-mail from Jorn Barger (http://www.mcs.net/~jorn), who explained that I could use forms to better accomplish this task. I replied that what he was suggesting was beyond my limited HTML abilities. So, he was kind enough to write the code for me. Here it is:

<FORM name=mfrm method=GETaction="http://www.altavista.digital.com/cgi-bin/query" <INPUT NAME=q size=65 maxlength=800 wrap=virtualVALUE="+host:samizdat.com"><br <INPUT TYPE=hidden NAME=act VALUE=search <INPUT TYPE=hidden NAME=pg VALUE=q <INPUT TYPE=hidden NAME=text VALUE=yes <INPUT TYPE=hidden NAME=what VALUE=web in language: <SELECT NAME=kl><OPTION VALUE=enSELECTED>English<OPTION VALUE=fr >French<OPTION VALUE=pt >Portuguese</SELECT <INPUT TYPE=submit VALUE=Submit </form

To use this form, all I had to do was replace +host:samizdat.com with the domain name of your site. If you're reading this article at The Cobb Group Web site, there's no need for you to retype the code template. Simply save it as an htm file and use your HTML authoring tools to copy and paste it into your Web page. If you're reading this article in paper form, you could also go to my Web site at http://www.samizdat.com and perform your copy-and-paste action there.

Since I have French and Portuguese translations of some of the items at my site, this code takes advantage of the new language-tags capability at AltaVista Search and gives users a choice of languages. If your site uses English only, you can delete those three language-related lines (beginning with "in language" and ending with "Portuguese").

If you'd like to restrict the search to a directory at your site rather than have it include your entire site, use the URL: command instead. For instance, the command


will limit the search to that particular directory. You also might want to create a series of forms tailored for people who are unfamiliar or uncomfortable with the search syntax at AltaVista Search. Simply edit the item VALUE="+host: samizdat.com", adding query terms within the quotation marks, just as you would use them at the AltaVista Search site. For instance,

VALUE="+host:samizdat.com internet"

will create a form with those search terms pre-loaded, inviting people to add whatever further refinements they might like and making those refinements very easy to submit. (By the way, this form connects to AltaVista Search in text-only mode, so you see none of the logos or ads, and the response time is amazingly fast.)


You may wonder why I use the Add/Remove URL link. If so, you may ask, "Doesn't the AltaVista Search spider follow all the links that it finds and index every page on the Web? And, if I add the URL of my home page (to alert the spider that I have a new site), isn't that enough for all my pages to get indexed?" Yes, AltaVista Search does go from link to link. Yes, it does find all pages, but not immediately and not every day. Even when the spider continuously crawls, with more than a thousand threads working simultaneously to capture one page after another, it still takes the spider a while to go everywhere and get every-thing. The developers at AltaVista Search keep improving the spider's techniques, and they keep upgrading and expanding the equipment at the AltaVista Search site. But at the same time, the total content on the Web keeps growing, making it increasingly difficult to find and index all of it.

If you have new material and you want people to start finding you right away, or if you want to use AltaVista Search as a complete index of your Web site, the only way to make sure that the new pages are added promptly is for you to add the new pages individually by hand. It takes diligence and discipline to keep the information about your site up-to-date, but, in my case at least, the benefit is well worth the effort.

Unfortunately, my approach of adding a URL at AltaVista Search for each new or modified page is appropriate only if you have a rel-atively small site. If you try to add more than a dozen pages per day by hand, AltaVista Search responds that you've added too many pages, and it stops accepting your input. You can then return and add another dozen pages the next day, and a dozen the next.

If you have a large Web site, or if you don't have your own domain name and hence are part of a very large domain (like Geocities) from which many people submit several URLs, you might run into a dead end at your first URL submission. The folks who run AltaVista Search are trying to prevent "index spamming," that is, attempts to stuff the index with bogus information intended to promote someone's business or to serve as simply malicious hacks. That's why the folks at AltaVista Search had to set a limit. But if you can stay below the limit, it's a no-cost simple way to make it easy for visitors to find anything at your site.

Benefits of Add/Remove URL

Have you performed an AltaVista search and found that some items in the list of matches no longer exist? Instead of cursing or sending a nasty feedback message to the AltaVista Search folks, you should click Add/Remove URL and enter the URL of the dead page. The spider will immediately determine if the page is actually gone or if there was only an intermittent network problem. If you receive an error message indicating that the page no longer exists, AltaVista Search will remove that item from the index very soon, usually by the next day. Over time, Scooter will revisit all the pages in the AltaVista Search index, and pages that have vanished will be eliminated from the index. But that doesn't happen instantaneously. So there's always a residue of old information buried among all the good stuff in the index.

Remember that AltaVista Search is a public service, provided for free for the benefit of the Internet community. And consider it your responsibility, as a good Internet citizen, to take a few moments to alert AltaVista Search when you discover that a page has vanished. If we all do that, we'll improve the accuracy of the information in the index, for the benefit of us all.

And don't forget the obvious: Use Add/Remove URL to fix your own typos and other mistakes. Sooner or later, we all end up posting a page that contains an embarrassing mistake. Fortunately, Web technology makes it easy to post a corrected page (unlike having to throw away tens of thousands of copies of a printed brochure). But if the mistake appears in the HTML title, in the first couple of text lines, or in a description META tag, it could be perpetuated at AltaVista Search for weeks after you've corrected the page, unless you use Add/Remove URL to update the copy of that page in the index.

Workaround for Subscription-only Information

If your Web pages consist of content that you sell by subscription and hence are protected by password (like the Power Searching with AltaVista pages at The Cobb Group), you face a bit of a dilemma. You want to protect the information, because that's the source of your revenue. But at the same time, you'd like people to know that you have this information. You'd like your information to be completely indexed at the AltaVista Search site to attract potential subscribers. But once you password-protect your pages, you also prevent the AltaVista Search spider (and other search-engine spiders) from ever retrieving and indexing your information. If you have a small business, you might be willing to try some workarounds using the Add/Remove URL link. (Large companies would probably prefer more direct, though perhaps costly, solutions.) For instance, you can copy your password-protected page to a new URL and add that URL at the AltaVista Search page. Then, you can delete the content and replace it with a pointer to your home page, where you can set up a form that visitors can use to see free samples/excerpts or to sign up for a subscription. Then, all the words on your site will be indexed, but you'll retain control of the content. That will be a temporary solution until the next time the spider visits your page and finds new content. You should be able to make this work on a more permanent basis by using the Robots Exclusion Standard link to deter Web spiders from retrieving that page again. (To see details on how to do that, just click Add/Remove URL and scroll down to the paragraph.)

Use this workaround selectively and judiciously. It makes good sense if you have real content that you don't want to give away for free. But the same method can be abused by someone trying to draw people to a site using a bait-and-switch strategy, where the person pretends to have content that isn't really there. That strategy borders on "index spamming," and the folks who run AltaVista Search take a very dim view of such behavior. As they say at Add/Remove URL, "Left unchecked, this behavior [will] make Web indexes worthless. [Alta-Vista Search] will disallow URL submissions from those who spam the index. In extreme cases, [AltaVista Search] will exclude all their pages from the index."

Finding the Right Tool for the Right Job

I've been getting a lot of E-mail lately, all very interesting. Based on the E-mail responses to my articles here in Power Searching with AltaVista, I think we need to take a look at what AltaVista is very good at and at those instances where it makes sense to get your answers another way. AltaVista Search can be useful in a great many circumstances, but it doesn't offer the best solution for every problem in the Internet world. Just as you can cut meat with a fork, sometimes you're better off using a knife. Likewise, while a complex and careful search in AltaVista Search may get you the answer to your question, in some cases you can find a simpler or better solution.

AltaVista Search's Strengths

AltaVista's unique strengths include:

Complementary Web Tools for AltaVista Search

If a search on AltaVista Search doesn't readily get you what you want, you have many alternatives. For instance, suppose you want to see a list of all the colleges in California. For that, go to Yahoo! at http://www.yahoo.com. From its home page, shown in Figure A, click Universities under Education, then click United States, Private by State, and/or Public by State and California.

Once you've chosen the name of a particular college from that list, use AltaVista Search to find information about the college and its related links. (To find hyperlinks to its site, use link:, and to get a picture of everything that's at the college's site, use host:, refine, then graphics. Think of these search methods as complementary rather than competing, and get the most out of the abilities of both.) Likewise, you may want the phone number or address of an individual or a business. If that person or business has a Web site and a distinctive name, AltaVista Search could be a quick and easy way to get what you want. If not, try one of the white pages/yellow pages directories on the Net. Of course, you could use AltaVista Search to find such directories. A simple search for

directory "find people" "white pages" will give you an enormous number of matches, as you can see in Figure B.

My favorite people-search directory is Switchboard, shown in Figure C and located at http://www.switchboard.com. Switchboard includes listings gleaned from phone books as well as entries provided directly by individuals and companies. The details you gather from such a directory can help you make effective AltaVista Search queries for further details. (Tidbits like the phone number and Zip code can help quickly focus a search that otherwise will yield tens of thousands of matches.)

There are specialized directories for all kinds of useful information. For instance, Figure D shows Four11, located at http://www.four11.com. This site is handy for finding someone's E-mail address. Also, Liszt, located at http://www.liszt.com, lets you find E-mail distribution lists you might want to check out. You can search Liszt's database of more than 70,000 lists or navigate your way through its categorized menus (as you would at Yahoo!). As you can see in Figure E, a search on AltaVista Search for

+directory +"whatever kind of thing you are looking for" may well point you to a new specialized directory that will suit your needs. (In Figure E, our "directory" is Liszt and we're searching for direct-marketing information.)

Also, the folks at Internic, the organization that originally managed the distribution of domain names, maintains and directs you to some very useful directories. If you're trying to figure out who runs a particular Web site or if you have a host name in the form of a number and want to know what conventional URL that number translates to, go to http://www.internic.net and click on Internic's Directory and Database Services. One of its directories, Webfinder, is particularly helpful if you're looking for the home page of a particular company.

If you're looking only for newsgroups, you should check the newsgroups-only search engine, DejaNews, at http://www.dejanews.com. As you can see in Figure F, not only can you search the contents of newsgroup items (as you can at AltaVista Search), but you also can post your own items to newsgroups, even if your Internet service provider doesn't offer USENET newsgroup services.

As a general rule, be flexible and creative, and take advantage of the variety of search and directory capabilities on the Web. Try to find the right combination of search tools for the job. There are times when you should use AltaVista Search to find specialized directories. Then, you can use those directories to get additional facts that will help you accomplish more focused and effective searches at AltaVista Search.

Follow-up on Job Hunting

My article on job hunting ("Finding Jobs and Being Found for Jobs" in the July issue) has generated lots of correspondence. (In case you'd like to point friends to that article, it's also available at the AltaVista Search site at http://www.altavista.digital.com/av/content/get_a_job.htm). I heard from folks in India and England, and from

My advice in that article was twofold. First, I suggested that when looking for a job, you use the advanced-search feature and enter in the search field "job OR career" and in the Ranking field words describing the kind of job you want and where you want it. I also suggested that if you were trying to recruit someone, you should enter in the search field resume and in the Ranking field words describing the qualifications you're looking for. My second piece of advice was to post your resume on the Web and to do so "flypaper" style in the HTML title. Begin the first line of text with resume and then describe the kind of job you're looking for. Finally, go to AltaVista Search and add the URL for that page.

Some of the people responding didn't have their own Web pages. Understandably, they were confused by the "flypaper" advice. My suggestion to them was to supplement their AltaVista searches with The Top 100 Electronic Recruiters Web site (from The Internet Business Network at http://www.interbiznet.com/eeri). This site provides a simple way to submit resumes to or search job postings at the top 100 job-related Web sites.

Putting People Together: Not Just Information

While writing the book The AltaVista Search Revolution, I heard from Bernadette Price in Portland, Oregon. She had searched AltaVista for Gowganda, the name of the tiny town in northern Ontario where she grew up. She used the search results not just for gleaning information, but rather as a starting point for a unique social adventure. AltaVista Search connected her with the Auld Reekie Lodge, a tourist spot in the town, and from there she was able to get in touch with an old friend, Gertrude Trudel. Then the two of them put together "The Great Gowganda Get-Together" by using all the tricks they could come up with on the Internet, including AltaVista Search, various directories and a personal Web page, plus phone, snail mail, and the news media. The event finally took place over Labor Day weekend. Here's what she had to say about it.

From: Bernadette Price < Date: Thu, 11 Sep 1997 14:19:15 -0700 Subject: Re: how did the Labor Day get together go?

Oh, Richard, it was wonderful, to say the least! I've been back a week now, and have been too busy catching up on work, etc., to update the Great Gowganda Get-Together home page, but will hopefully have time this weekend. The best part is that just about EVERYONE came "home" again--it was amazing! So many of us hadn't seen each other since we were children, over three decades ago. We had close to 500 people come that weekend ... and this is a town with a year-round population of under 100!

The Toronto Star was fabulous! They gave the story FRONT PAGE coverage on July 28, and then they arranged for a reporter to do a follow-up at the reunion, complete with a photo of me and my friend Gertie Trudel. That piece ran the next day, the Sunday of the Labor Day weekend. Gertie had done so much work from her end and made about 200 phone calls, and so I arranged with Northern Telephone and the Telephone Pioneers of America group there to donate a used computer to her in thanks for all her community effort. Finally, Gertie will be online! It's been AWFUL trying to keep up on the planning with her through expensive phone calls and the oh-so-slow snail mail! She and her husband have no idea how wonderful computers and the Internet are (and wait until they're online and discover AltaVista! :-), so they haven't bothered looking into getting one before this. I bet they will wonder how it was possible to live without one before!

We got so much "local" coverage as well--lots of those "dinky" little papers up in northern Ontario featured us on the front page as well--the angle, of course, being on this "newfangled modern technology" bringing about the reunion. The press coverage isn't over, either. There's a piece that will run in Net-Life Magazine that goes out to 140,000 Internet subscribers in Canada who use Sympatico, and then I got a call from Toronto Computes! magazine and they are doing a piece for their October issue. So is Northern Telephone in their publication, plus they are giving Gertie a one-year free Internet subscription to go with the used computer she got.

I've been too busy to update the web page, but I have tons of photos and lots of text to recap the unbelievably great time we all had, and I hope to do so this weekend.

By all means, feel free to use whatever you like in your article for the Cobb newsletter. Any chance you could include the URL for the Great Gowganda Get-Together home page? http://www.geocities.com/Yosemite/Rapids/2297.

Thanks for your interest in how the reunion came out. I would have let you know in due course anyway, as soon as I got head above water! :-) Sheesh, one thing about vacations is that you really need another one as soon as you get back!

Best regards, Bernadette


When you use AltaVista Search, remember it's just one tool among many available on the Internet. Don't neglect all the old ways you've used for years to find information and contact people. You don't have to give up one way of doing things to use another. Rather, let yourself expand the range of what you can do by creatively using the new capability in concert with all the rest. If your project really matters to you, put some work into it like Bernadette did, and then send us E-mail to let us know about it.

Information X-rays: Refine as a Diagnostic Tool

ALERT: LiveTopics is a powerful feature that was once offered through AltaVista. They later renamed it "Refine." It is now no longer available. The project, code named "Cow9," was a collaborative effort between researchers at Digital Equipment Corporation and Franšois Bourdoncle of Ecole des Mines de Paris, www.ensmp.fr The underlying technology has enormous potential. This article will give you a sense of how it can be used.

If you're interested in a fast-changing field, you may want to use AltaVista Search to get a quick overview of your field instead of just to find Web pages and nuggets of information--for example, what's important in a particular field, what's changing, and what are the major companies and/or schools. To simplify finding anything on the Internet, AltaVista Search can generate categories on the fly and provide a graphical view of those categories. As a result, you get more than just pointers to nuggets of information--you get information about information, a picture of how pieces of information relate to one another. Keep in mind that these categories are created statistically, live, on-the-fly based on the information in the AltaVista Search index. There is no human bias. No one is making assumptions about the content. This isn't like the Dewey Decimal system, subject to obsolescence as the world rapidly changes. This is a view of today's Web content based on the information currently in the AltaVista Search index. And that set of information is enormous enough to yield very interesting and unexpected results.

How to Refine and Print the Results

The AltaVista Search site continues to grow and change in response to the needs of users and because of new opportunities opened by technology. A few months ago, AltaVista changed the look and feel of Live Topics search features. Now, you don't see terms like Way-cool Topics Map! Rather when you first submit a query (from either the Simple or Advanced Search), you have a choice of clicking on Search to get the standard list of matching Web pages or on Refine to see a set of categories based on words found on the same pages as your query words. These categories can be displayed in either list form (20 items, with subcategories listed next to each) or in graph form, as shown in Figure A. The subcategories become visible when you move your cursor over a given term. If you prefer to go straight to the graphic view or to the list of categories, you can pre-select that choice in your personal preferences (just as you can choose to go straight to Advanced instead of Simple, as your default setting).

Once you see the graphic view, you may well be tempted to print it and show it to a colleague, save it for comparison with another such picture, or build a set of data for spotting long-term trends. Unfortunately, because the image is created in a Java applet, the normal technique of choosing Print from your Web browser doesn't work. The printout includes everything but the image you wanted.

As a workaround, you could use the Print Screen option on your PC to capture whatever is on the screen and send it to a printer and/or to put that image into your clipboard for pasting into another application, like PowerPoint. That approach, however, tends to drop some of the detail, sometimes making the chart difficult to read--not very useful for showing your boss or presenting at a meeting.

There are numerous graphics software packages that can improve the visual quality of your results. I use Paint Shop Pro to capture the image and then save it in GIF format for use in my Web-based presentations, or in BMP format for use with PowerPoint. To capture a graph, first open Paint Shop Pro, go to the Capture menu, and select New. The Web page will appear on the screen. Just drag the mouse cursor across the image you want to capture. When you release the mouse button, you'll automatically return to Paint Shop Pro with the Web page image appearing in the work area.

Once you have a satisfactory way of saving and printing these images, you may want to go to Advanced Search and run the same query, limiting the search to a range of dates, and generate a series of images that show how the Web content in this subject area has changed over time. And you can go back to AltaVista Search later to generate fresh images to add to your collection and perhaps help you recognize important trends.

With these images, what you don't see is often just as important as what you do see. What's missing? What is the true range? Often, the value comes from the unexpected--the set of categories and subcategories that you wouldn't have thought of on your own. I think of these images as X-rays rather than snapshots, because it takes experience and insight to interpret them. X-rays are diagnostic tools, a starting point for research both on the Internet and by traditional methods. X-rays represent a means, rather than an end. The objective isn't to get answers to questions, but rather to help you decide what questions to ask and to help you avoid missing something significant.

Fast-changing Fields of Study

In many cases, you should use a Refine search not to narrow your query, but to expand it and get a "10,000-foot view" of a particular subject. For instance, suppose you're a biochemistry major and are interested in graduate school or in getting a job in that field. Enter the query biochemistry and click Refine. Check the list of categories and the graphic view. Choose one or more categories or subcategories of particular interest. Add those to your query by clicking in the box beside each category, then clicking Refine again. Continue doing this until the X-ray begins to more closely resemble the range of your personal interests. Just getting a breakdown of all the information on the Web about that subject, seeing the relationships, and knowing what the categories are could be of value to you. To focus the X-ray further, choose a company or university that appears prominently in the list of categories, or choose one that you know by reputation and add that company or university to the query. For instance, you might add this to the end of your query:


The command +host: means that you now want results limited to that one domain name; in this case, you only want to track information on biochemistry jobs found at Harvard University Web sites. Save or print the results, then substitute another university or company, and so on.

In the November 1997 article "Understanding the Limits of Accuracy," we discussed the need to understand the limits of accuracy before making precise judgments. That advice is very important in this case. Some Web sites use techniques that prevent the AltaVista Search spider from making a full index of their content. For instance, they may require a registration/password process, may appear only in frames, or may even draw content on the fly from databases. If that's the case for one or more of the sites you're examining, you won't be comparing apples to apples. So if accuracy is important to you, go directly to the individual sites and see if you encounter any of the telltale signs of a barrier to indexing.

As a further test, enter the target site by way of its home page and navigate through internal links to pages that are particularly interesting. Note the directories in which these pages appear or the URLs of individual pages. Then go back to AltaVista Search and do queries for those particular directories or pages by entering url: followed by the Web address, as shown in Figure B. If you get zero results, that means the directory or page hasn't been indexed by AltaVista Search, perhaps because it was only recently added to the site and the spider hasn't found it yet. Hence, the X-rays you obtained off the site are incomplete and perhaps misleading.

If accuracy is essential and time is no object, you could click Add/Remove URL on the AltaVista Search home page and enter the URL for each important page that isn't yet in the AltaVista Search index. Anyone can do this. You don't need special privileges or any official relationship with the site in question. The spider will immediately fetch the page and add it to the index, usually by the next day. Then you can go back and get a fresh X-ray, based on the more complete information.

X-rays of Companies

If you're in business, you can use that same approach to get X-rays of the Web sites of competitors, suppliers, partners, and major customers, as well as your own Web site. You can make comparisons and look for trends, in each case using the query format +host:.

If the company in question makes extensive use of the Web for marketing (and nearly all medium- to large-size companies do nowadays), you'll get a candid, unrehearsed view of what's important to them. What's the range of topics they cover? Do words like customer or satisfaction appear in the top 20 or in the first set of subcategories? Does the content present a clear, coherent picture, with all the various categories interlinked? Or is the information mostly scattered and unconnected?

You can also search a subset of these same companies' sites. For instance, entering

+host:digital.com +internet

yields all the pages at Digital Equipment that mention internet, as shown in Figure C.

To get a broader view of a company's visibility and range of activity, use as the query the company name itself (in quotation marks if it consists of more than one word; or if the company is well-known under more than one form of its name, use Advanced Search and enter each name--in quotation marks--separated by the word OR). That will give you a picture of all the indexed pages on the public Web that mention the company by name.

For an X-ray view of pages that have hypertext links to a given site but that aren't at the site itself (an indication of the types and range of content at sites that consider the target site useful and valuable), use a combination of the link: and host: commands. For instance, Figure D shows the results of the search

+link:digital.com -host:digital.com

Use a combination of the link: and host: commands .to search pages with hypertext links to a Web site (while excluding the site itself).

First Cut at Market Segmentation and Analysis

If your field of business is new and rapidly changing, like the Internet, AltaVista Search X-rays may be a good way to get a first approximation of the market segments and the key areas of interest. For instance, try doing separate queries for intranet, electronic commerce, and isp* (the common abbreviation for Internet service provider). Using traditional techniques, such as questionnaires and focus groups, it would take a market research firm months to gather and assemble data on any of these topic areas. By the time you received your high-priced report, the marketplace would have changed again. Properly done (taking into account all the issues of accuracy and completeness described above and in previous articles), the X-ray approach can provide useful information almost immediately, at no cost, and help you better focus and speed up professional market research efforts.

X-rays of Individuals

What works for universities and companies also works for individuals who are active and well-known on the Internet. For the fun of it, try getting X-rays of yourself, your friends, people you're dating or might want to date, people you're considering hiring or working for. Use the person's name as the query term or use the host: command if the person has his or her own Web site with its own domain name, or use the url: command with the directory address of the person's personal Web pages. If the name is rare, just enter it in quotation marks, like "richard seltzer", in either the Simple or Advanced Search. I use lowercase to catch all instances of a name. If it's a common name, go to the Advanced Search, enter the name in the query field, and in the Ranking field, enter a series of words that are likely to distinguish the person you're interested in from others who have the same name. To be sure that you catch all instances of the name, you can use the NEAR command in the Advanced Search. For instance, richard NEAR seltzer catches all instances where those words appear within ten words of one another, in any order. Hence, it catches Seltzer, Richard, and Richard Warren Seltzer--names that would be missed by entering "richard seltzer", which only matches instances where those two words appear next to one another in exactly that order.

The results can be both amusing and illuminating. Because virtually everything of significance that I've written over the last 30 years is at my Web site and has been indexed by AltaVista Search, I can see the instantaneous results of a statistical analysis of all those words. In an X-ray of myself, I see the word irrational in the top 20 categories, and in an X-ray of my Web site (http://www.samizdat.com), the word gratifying is in the top 20. I see the pieces of my life--here's my son who plays chess; here are characters from books I`ve written; here are my business contacts--some are separate islands of activity and others are interrelated in interesting ways.

The next time you plan a party list or business meeting that includes people who are active on the Internet, do graphic-style searches of all prospective attendees. Print these information X-rays so you can see them side-by-side as you're deciding who to invite and where to seat them. Then for the event itself, post these pictures on the wall as a conversation starter and have AltaVista Search open on a nearby PC for folks to follow up and play with what they find out about one another.

Questions and answers

The following questions are based on recent correspondence about my articles. They deal with matters that many people find confusing.

Bogus Research Results

Q A new publication, Content Creator's Newsletter, produced by RealNetworks (the creators of RealAudio and RealVideo), claims these products? have about 90 percent marketshare and bases this claim on results from AltaVista and HotBot ("Search Engines Reveal that 88-94% of Internet Streaming Media URLs use RealAudio and RealVideo"). Is this claim accurate? Did Content Creator's Newsletter use the search engine correctly? (A marketshare number that high sounds absolutely incredible.)

A This looks like a good example of the misuse of search engines for market research. For AltaVista, Content Creator's Newsletter says its criteria were

(search using "link:*.XXX", advanced search, precise count)."

In other words, it used Advanced Search and selected Give Me Only A Precise Count Of Matches using the link: command (which counts the pages that have hyperlinks to particular pages or sites) with the following search criteria:

RealNetworks-(link:*.ra OR link:*.ram OR link:*.rm OR link:*.rpm)
Netshow-(link:*.asx OR link:*.asf)
Vxtreme-(link:*.ivy OR link:*.ivx)

Content Creator's Newsletter notes that RealNetworks includes RealAudio, RealVideo, and RealFlash. The results it obtained by doing this search on November 5, 1997, indicated the following:

Media type URLs Marketshare
RealNetworks 512,856 93.8%
Netshow 9,570 1.7%
VDO 10,341 1.9%
Vivo 3,736 .7%
Xing 4,148 .8%
Vxtreme 6,321 1.2%
Vosaic 2,804 .5%

First, a minor detail--Content Creator's Newsletter indicates that instead of going directly to the AltaVista Search site, located at http://www.altavista.digital.com, it went to the site of a company named AltaVista Technology, located at http://www.altavista.com. That site redirects you to the real AltaVista search site, but garners advertising revenue from the confusion.

Unfortunately, no command at AltaVista Search can tell you the number of files with a particular file extension (aside from domain:, which works for edu, com, net, and so on, and for the file extensions that designate countries). The query Content Creator's Newsletter constructed with link: simply finds files that have hyperlinks to pages containing .ra and so forth anywhere in their filenames. It looks like Content Creator's Newsletter made the mistake of using the precise count feature without ever getting a list of results it could check. If Content Creator's Newsletter had checked--clicking to list items and then viewing the pages' source code to see what hyperlinks are on those pages--it would have quickly realized the command wasn't returning the desired results.

Similarly, the alternative command url:*.ra wouldn't have told Content Creator's Newsletter the number of files on the Web with that extension. Once again, AltaVista would show all pages that have .ra as an element of the URL--not necessarily at the end of the URL. Just looking at the URLs of the files in the list of matches would make that clear. For information about how to subscribe to this new newsletter, visit http://www.real.com/mailinglist/index.html

Getting Indexed at AltaVista

Q When I edit my Web page, I don't know how to update it in AltaVista, other than by simply uploading with FTP. In my eagerness, I tried going to Add/Remove URL on AltaVista, but I'm afraid I've just messed things up. My page used to come up as number 6 or 7 when I typed +folsom +homes. Now only one of my subsidiary pages appears under the heading TEST. My index page has all but disappeared.

A First, the main criteria for ranking are the HTML title and the first couple of text lines. On each of your pages, make sure you have the most important terms in those two places (the words you expect people to search for). After you've uploaded your edited pages to your Web site, go to AltaVista and use Add URL for each page.

Return to AltaVista after a day or two, search for the terms most important to you (terms you've included in your HTML titles and first lines of text), and see how you rank. You also can search for host: followed by your domain name (if you have your own), or search for url: followed by the directory in which your pages reside (if you don't) to see which of your pages are already included in the index.

A Problem

Q I've tried to properly handle the META tags, and for a while, everything was working fine. Then, after looking at document sources on other pages, I decided to add comments like


I then went to Add URL, and I've had nothing but problems ever since.

A Comments shouldn't make any difference--they're not indexed. As for META tags, if you put what you need in the HTML title and the first couple of text lines, META tags should be unnecessary.

Offending Scooter

Q One thing you suggested really surprised me. You said when I finished editing, I should go to Add URL for each page. I've never done that because when I go to AltaVista's Add URL page, it says, "Please submit only one URL. Our crawler, Scooter, will eventually explore your entire site by following links." I want to be really clear before I start because I certainly don't want to offend Scooter.

A If you want your pages in the index by the next day, use Add URL for each page. If you don't mind waiting several weeks, just enter the home page.

Purging URLs in AltaVista

Q When I was first composing my subsidiary Web pages, I used TEST as my title for pages I was testing. I never added their URLs, and I've since changed the titles. (No I haven't added their URLs yet, because I thought I wasn't supposed to.) Yet somehow, when I type +homes +"El Dorado Hills", I see TEST on the second page, and then something about school scores in one of my subsidiary pages or other information I wrote on different pages. For instance, I had a page on climate, a page on recreation, a page on school scores, and so on. But my index page has failed to show up like it once did. So how do I remove all those TEST pages. Do I go to Remove URL?

A Once again, use Add URL and enter the URL of each old, dead TEST page. Scooter will immediately try to fetch the pages. If the pages no longer exist, Scooter will get the message Error 404 and remove the pages from the index, usually by the next day.

Updating your URLs

Q Now that I've worked with AltaVista as an information provider instead of a researcher, I find that I don't like the search index nearly as much. My job is to list companies in the index, but I've had no success. Is there something I don't know about the AltaVista schedule? Why does AltaVista insist that it updates and has the most current and relevant listings, when in fact it doesn't?

A The information in the AltaVista index is as current as you want to make it. At any time, you can go to the AltaVista site, click Add URL, and simply type the URL of a new or revised page of yours; that info will be in the index usually by the next day. I do that for all of my pages.

AltaVista now has more than 90 million pages in its index, and its crawlers visit about 10 million pages a day. So eventually your pages will be found automatically--soon if there are many links from other pages to yours (because that's how the crawler learns about pages--from links), later if there are few links, and perhaps never if there are no links. If being indexed is important to you, simply take charge and use Add URL.

Keywords and Translation

Thanks to AltaVista's new free translation service, people who can't read English will be able to read your Web pages. But will they be able to find your pages? Have you tried this service yet? If not, conduct any search, click Translate next to one of the matches, and choose the language you want the document translated to. Unless traffic to the site is extraordinarily heavy, you should get the translated page with all its graphics and format intact in a matter of seconds. AltaVista will translate from English to French, Spanish, Italian, Portuguese, and German, and vice versa. The limit on how much text it will translate is variable right now, depending on traffic, but should normally be in the range of 20 KB. You can go straight to the translation page at


There you can enter a URL or type or cut and paste into the box any text you want translated. That means you can translate email messages, newsgroup items, text that you have on your hard drive, or the balance of Web pages that were too long to translate in a single gulp. Unless the text is idiomatic or laden with slang, you're likely to get remarkably good translations. The translation service is especially good for business communications. Of course, words that are embedded in graphics remain unchanged--only plain text is translated.

The AltaVista folks have partnered with a company that specializes in automatic translation--AYSTRAN. But the fact that the AltaVista index is truly global makes an enormous difference. The underlying index understands nothing about any language, and that ignorance is a source of tremendous strength. Search engines that are built around the syntax of any given language lock themselves out of the rest of the world. AltaVista just captures all the text it finds. (Actually, within a few weeks of when AltaVista first went live to the public, the developers were surprised to get email from people in Korea who had used their Korean keyboards to enter queries and had gotten results pointing to Korean pages. There are some problems with Asian languages, where there are a variety of encodings of the same language. But you often can get good results in any language, because AltaVista captures and matches the basic code.)

Language tags at AltaVista are based on recently developed software that analyzes the content of the index to determine which pages are in which language, allowing you to limit your search to pages written in a particular language. And the domain: command allows you to limit a search to pages in a particular country (for instance, domain:fr for pages in France), based on the country codes on the end of Web addresses for many, but not all, pages hosted outside the U.S. Using those tools, you can find pages in foreign countries and in foreign languages.

Free Translation

Before the new translation service arrived on the scene, there were only roundabout solutions for dealing with the problem of non-English pages on the Web. Now, however, there's no barrier to your reading pages in different languages, and at the same time, there's no barrier to prevent people who speak those languages from reading English-language content at your site.

But how will they find you? They aren't going to translate your page if they don't know it exists. And how will they know it exists if all the words on your page are in English? Apparently, they'd have to do an English-language search to get your page as a match. So what is the benefit to you as a Web site owner? Actually, there are two simple ways to plug into this new capability and make your Web-based business far more global than it was before.

First, use keyword META tags on all your pages to provide translations of the keywords and phrases that potential visitors to your site are likely to search for. A keyword META tag goes in the header of your document, like this:

<TITLE>The World's Greatest Web Site</TITLE>
<META name="keywords" content="cabbages kings oranges apples">

Normally, you'd use a keyword META tag for synonyms of important terms that appear in the document itself--words that people might search for. Many Webmasters misunderstand the purpose of this feature. Misled by the term keyword, they think of AltaVista as if it were a database, as if keywords were the only words that AltaVista indexed. In fact, AltaVista indexes every plain-text word on your static Web pages--unless blocked by forms or frames.

Other Webmasters insert keyword META tags in hopes of improving their rankings on results lists for certain searches. But a simpler and more effective way to do that is to put your most important words and phrases in your pages' HTML titles and first couple lines of text.

A much more effective use of this feature is for translations of your most important terms. Go to the AltaVista translation page and enter these terms one word at a time or one phrase at a time, and submit them for translation into each of the five available languages (if all five are important to you). Cut and paste the translation results to build your keyword META tag. (By using this method, you get the complete words, with accent marks, despite the fact your keyboard isn't set up to type with such accents.)

Once you've made these changes on all your pages, go to AltaVista, click Add/Remove URL, and enter the URL for each page. The new material will be available in the index within a day or two. Then someone searching in French, German, Italian, Spanish, or Portuguese will be able to find your pages with AltaVista and immediately translate them to their native language on the fly.

Next, as a reminder to people who find your pages this way, and to alert people who may stumble upon your pages in other ways (for instance, hyperlinks and banner ads), you could put a note at the top of each page, with a hyperlink pointing people to the AltaVista translation service. In fact, because every query at AltaVista generates a unique URL, you could

Then visitors to your site who click on that link will go straight to that page with your URL already in the form, and all they'll have to do is select the language they want the page translated to.

By doing this, you avoid the cost of translation and the labor of continually translating pages as you change them. You also avoid the overhead of the extra disk space needed to maintain your site in multiple languages. If your pages are relatively small, this could be an enormous benefit to you, opening foreign markets at no cost.

The AltaVisa Translation Assistant for Web Research

The "beta" wallpaper has now gone away, but the AltaVista Search site is continually improving its new translation service--which translates English to French, German, Italian, Spanish, or Portuguese, and vice versa--now known as the AltaVista Translation Assistant. Don't be surprised by changes in the look and feel. As AltaVista gathers data about usage (from tens of millions of hits) and gleans good suggestions from an enormous volume of user suggestions, it's making changes. You can now access the service by going to the Translation Assistant (Babelfish) page at


or by clicking the Translations tab, which now appears above the query form in both Simple and Advanced Search.

Three Translation Assistant Methods

Accessing the Babelfish page is simple, just click the Translate link to the right of a search result. The Babelfish page will automatically display the URL of the page you want to translate, as shown in Figure A. Click the down arrow next to Translate From and select a language pair. Then click Translate. You'll then see the Web page you selected in the target language you chose. The formatting (including graphics) will be the same as what you'd normally see at that page, but the Translation Assistant will translate the words that appear in plain text (as opposed to buried in Java applets or graphic images). Also, all the hyperlinks will be in place with their associated words translated. When you click any of these hyperlinks, you'll return to the AltaVista translation page with the new URL already entered in the form. To go to that new page and have it automatically translated, just click Translate. If the target page isn't in the language you thought it was, you'll see the page in its original language. You might translate a page from French to English, then click on a hyperlink in the translated page, and ask for a translation of that page from French to English, when in fact the original language of the new page is English. In that case, AltaVista ignores the translation command, and you see the page in the original English. That means you can click any of those links and continue to navigate the Web as you normally do.

You can also go straight to the translation page (by entering the URL or clicking the tab), type a URL, select a language, and click Translate. Note that you can enter the URL for any publicly accessible Web page, regardless of whether AltaVista has indexed that page.

Finally, the translation page lets you translate strings of text. Just go straight to the translation page, type or copy and paste any text in a language that the service handles, and click Translate.

The methods of accessing this service make it very handy for a wide variety of uses, such as Web research, email, and participation in newsgroups, forums, and chat. In this article, we'll discuss Web research, and we'll discuss other applications in subsequent articles.

Web Research: Finding pages to Translate

The foreign language content on the Web is growing at a rapid rate, making this service ever more useful when you're navigating the Web for research purposes and don't want to miss important information simply because you don't understand the language it's written in. Under normal circumstances, the pages in your results list are very likely to be in the same language as the language you used in your query. And you're likely to remain blissfully ignorant that additional pages actually exist but are written in other languages. In other words, if you search for English words and phrases, you'll get only pointers to pages written in English (unless the information provider is particularly savvy and has included translations of keywords and phrases in the text of a page or in a keyword meta tag, as discussed in last month's issue). But sometimes, the same word is used in more than one language. (This is particularly common with technical words.) And sometimes you're looking for a company name or trade name that remains constant across languages.

You also might want to target foreign language pages. Perhaps you know enough of the target language to, with the help of a dictionary, come up with the right words and phrases to enter your query in that other language. But in that case, you're likely to run into the limitations of your PC in representing the characters of that foreign language, as shown in Figure B. If you're like most of us, you can't easily enter the special accent marks needed for French, German, Spanish, and Portuguese. Fortunately, the type-it-in mode in the AltaVista Translation Assistant can solve both those problems.

After you've submitted your query in your native language and gleaned all the relevant research results, open a second browser window--one for the AltaVista query form (either Simple or Advanced Search) and one for the AltaVista translation page. Enter the first word or phrase in your query in the translation form, select a target language, and click Translate. Then copy and paste the translated word or phrase--complete with all the right accents--into the query box. Do that for each and every separate word or phrase in your query, entering all the necessary query punctuation (+), special commands (such as, anchor:), and, in Advanced Search, operators (AND). Then submit your query. Do that for each language you feel is likely to be important for your research.

Remember that every query submitted to AltaVista generates a unique URL. That means if you're working on a continuing research project and you're likely to want to resubmit the same query in the future, you can bookmark the results page. Then, whenever you want, you can click that bookmark and thereby launch that exact query, generating fresh results. If these translated queries you're constructing could be useful to you or your colleagues in the future, bookmark them, so you don't have to repeat all this work.

Mechanics of Translation

When using the AltaVista Translation Assistant, keep in mind the limitations of machine translation. If the material you want to translate is poetry or highly colloquial, or laden with technical jargon, the results are likely to be disappointing (and might turn out to be humorous). But for news and business communications--straightforward, grammatical, correctly spelled text--the translations tend to be very good.

Also, keep in mind there's a limit to the size of text that AltaVista will translate. This length is variable. It now seems to range from a minimum of about 5 KB (about the size of the text at the average Web page) to a maximum of about 20 KB. The AltaVista folks are trying to balance usage for maximum benefit to the widest possible audience; so the length is likely to vary based on demand at the time you happen to connect and as they, over time, further adjust the underlying algorithms.

That means if the page you want translated happens to have lots of text, you will at some point see ***TRANSLATION ENDS HERE***; and after that point, you'll see the continuation in the original language. Because of the demand factor, those words might appear at a different place in the text the next time you try to translate the same page.

If you need more of that particular page translated, copy the additional text with your browser and then click ***TRANSLATION ENDS HERE***, a hyperlink that takes you back to the Translation Assistant page, with the form blank. Then paste the new text into the form. You can do this in a variety of ways. First, place your cursor in the Translate text box. Then right-click with your mouse and click Paste; or, with Navigator, click Edit and click Paste. Then click Translate. Next you'll see the translated text (plain text, without the formatting and the graphics of the original Web page), followed by the original text you submitted. (Note: The same length limitations apply in this mode as when you submit a URL for translation.)

If the document is particularly long, you might have to go through this process many times to translate the whole thing. In that case, you might want to run two or more browsers. Keep the translation form page (babelfish) in one browser window, and the Web page you're translating in the other browser window. Then you can easily move each chunk to the translation form, without losing track of where you are in the text. (To clear the form for your new text, click New Translation.)

If you'd like to keep a complete copy of the translation so you can print, edit, and/or reread it in its entirety, open your word processor, create a new document, and copy and paste each successive chunk of translated text into that document.

If the page has frames, you'll see the message:

Frame Error--The page you requested is composed of frames. We don't at present translate a framed document as it's likely to take longer to download than your patience will allow. Below is the list of pages this frame is composed of. We'll translate individual sub pages of a frame (if they aren't frames of course).

Then try the suggested sub pages. But if that doesn't work, or if the problem is different such as a Java applet, highlight and copy the target text and paste it into the AltaVista translation form. Note that the copy-and-paste technique won't work for text that's embedded in graphics. In that instance, the copy function on your PC and in your browser won't recognize there's any text on the page you could copy.

If you submit for translation a URL that's behind a firewall or on the other side of a password-protected registration page, AltaVista won't be able to fetch and translate the text. But if you're authorized to access that page, you can open it in one browser window and copy and paste the text from there into the AltaVista translation form in another browser window.

Points to remember

The translation page lets you translate strings of text. Just go straight to the translation page, type or copy and paste any text in a language that the service handles, and click Translate.

A Pragmatic Approach to Translation

Since the AltaVista Translation Assistant at http://babelfish.altavista.digital.com became available in December, information providers have been scrambling to find ways to take advantage of this new service to expand their reach to Spanish, French, Italian, Portuguese, and German audiences.

Because the potential audience for any Web site is global, it often makes sense to provide your content in more than one language. Additional languages can help open new markets, both by making your content understandable to more people and also by showing respect for the culture and heritage of people in your target audience. Often readers who can understand English, but for whom English is a second language, will go out of their way to connect to text in their native tongue. And some Web sites are even required by law or by the charters of their organizations to provide all their content in more than one language. For instance, this is true of government sites in Canada and sites that support many organizations of the United Nations.

First, you need to make sure your pages are in a format that can be translated. If much of your content is plain text, you're in good shape. But if you're using sophisticated techniques that create pages dynamically on the fly, if you're using frames, or if the text is generated from databases or appears in Java applets, you've locked yourself out from taking full advantage of this new capability. Keep in mind that the same factors that lock your pages out from the automatic translation service also lock them out of the index of search engines like AltaVista. Hence, it might be worth your while to create plain text versions of your pages that will then be both translatable and findable by search engines.

If your pages have translatable text, you could use AltaVista to translate them and save the resulting pages. You could even translate large pages by repeatedly cutting and pasting chunks of text and assembling the pieces in polished pages at your site. Then you could offer visitors the choice of which language they'd like to see. But by so doing, you'd make yourself vulnerable to the vagaries of automatic translation, and a horrendous blunder caused by the inability of the software to understand a colloquial phrase might damage your company's reputation among the very people you're trying to open your site to. Also, in that case, you'd take on a significant maintenance burden--having to change your translated pages every time you change the originals; and additional overhead in terms of disk space and Web site complexity.

You also could create hyperlinks that take visitors to the AltaVista translation page with the URL for one of your pages already entered in the translation form (as I suggested in the April article "Keywords and Translation"). But the typical Web user would be mystified if suddenly transported to that translation page without some explanation. And the translation service will handle only a limited amount of text (about 5 KB when traffic is heavy), leaving the visitor with a Web page that's only partially translated.

Instead, I decided to post a clear and simple explanation of how users can take advantage of the translation service--that empowers them to get the translations they need, while leaving the responsibility in their hands. I then used the translation service to create versions of that document in French, German, Italian, and Portuguese. At the top of my home page (http://www.samizdat.com), I now have:

These phrases (with the appropriate accent marks, all captured and cut and pasted from the translation service) connect with hyperlinks to the matching documents. Over time, I plan to add those same words and links to all the pages where automatic translation would be helpful (not including, for instance, documents that consist of poetry and lists of titles of books). Here is the full text of that explanation, which you're welcome to use at your own site, if you wish:

Translate Into French, German, Spanish, Portuguese, or Italian

To translate foreign language text, first connect to AltaVista search's automatic translation service. In a separate window, connect to the page you want to translate (a Web page, a word processing document, an email message, or a newsgroup item).

On the target page, click the left mouse button in the left margin beside the starting point in the text, and drag your cursor down over a couple of paragraphs (about a third of a typed page) to select them. Then, in the toolbar, click EDIT, then COPY to save the selected text to your Clipboard.

Next, bring up the translation page. Position your cursor over the translation form. Click the right mouse button and then PASTE. The selected text should now appear in the form. Below the form, click on the down arrow to select the language pair you want (such as English to French). Then click on TRANSLATE. The translated text should appear in a second or two.

To save the translated text in a file, click and drag (as above) to select the text; and click EDIT and COPY to place the text on your Clipboard. Then open a document in your word processor and paste the text. Return to the original document and select the next piece of text. Return to the translation page, click NEW TRANSLATION, paste the text in the new form, and proceed as before. Keep doing this as many times as necessary to translate and save the entire text.

The results should be useful, but they'll be far from perfect. (If you're reading this text in a language other than English, you can judge for yourself how good or bad it is.)

The Impact of Search Engines on the Legal Profession, Part 1: Search

(based on a speech delivered to the Boston Bar Association, February, 1998)

Imagine when you wake up in the morning, there's a strange new gadget beside your bed. You pick it up, push a couple of buttons, and all of a sudden you can see things you never were able to see before -- ultraviolet and infrared and you can even see through some materials. You go out on the street and discover that many of the people you meet have the same kinds of gadgets. And with this gadget, some people who don't know any better and haven't tried to protect themselves look stark naked because you can see right through those clothes. Imagine all the laws are the same today as they were yesterday. Nevertheless your profession would be transformed because people's expectations would have changed -- what you expect of them and what they expect of you, and how you define privacy and discovery and due diligence, etc.

Full-text search engines, like AltaVista, are making that kind of change in the legal profession today.

Here are a few examples of the kinds of searches you can and should do (because some of your competitors are using them already, and because very soon your clients and the judges you face will presume you know how.)

Finding Trademarked Names

You can use AltaVista as a preliminary check to see if a name you'd like to trademark is already in use, or to see if others are infringing on a trademark or service mark of a client. For this kind of search, it's important to understand how AltaVista handles uppercase and lowercase letters and punctuation.

If you type a word in all lowercase, AltaVista will search for both lowercase and uppercase. But if any letter is in uppercase, it looks only for that. This comes in handy when you're searching for trademarked names, which very often include unique capitalization to distinguish them from ordinary words. For instance, if I search for eXcursion (with the X capitalized), as shown in Figure A, I get matches for a trademarked product made by Digital Equipment--and only that product. The unique capitalization makes my query rare, meaning that I get precisely the results I want on the first try.

At AltaVista, all punctuation is handled in the same way. It doesn't matter whether you type a period, a comma, a slash, an underscore, or a hyphen. This means you don't need to try to guess all the ways that people posting on the Web or in newsgroups may have misspelled a product/service name that includes punctuation. For example, Digital Equipment used to make a family of computers called the PDP-11. There were several models, including PDP-11/20 and PDP-11/70. Many people would forget whether the model name used a hyphen, an underscore, a front slash, or a backslash. If AltaVista matched punctuation literally--a period to a period and a slash to a slash--to search for such a trade name, you'd have to go to Advanced Search and enter a lengthy string of queries, such as

PDP-11/20 AND PDP_11/20 AND PDP.11-20

You'd have to try to imagine every way that people might have misspelled the model name and include all those variations in the string. But since all punctuation is treated equally, all you have to do is enter PDP-11/20 in either Simple or Advanced Search, and you'll get matches for all the possible variations of punctuation.

Keep in mind that AltaVista generates a unique URL for each search you perform. So if you want to perform the same trademark search periodically, you can simply click on your bookmark to go straight to AltaVista and launch the same search, getting fresh results. Also, in Advanced Search, you can set the time frame, limiting your search to a certain range of dates. So, for instance, if you wanted to recheck once a week or once a month for trademark infringements, you could bookmark a series of searches and then alter the range of dates to show only pages that had been posted since the last time you looked.

Detecting Plagiarism

Every day, the AltaVista crawlers fetch more than ten million documents, following the trail of the hyperlinks they find and continually refreshing the index, which now contains over 100 million documents. If something has been posted publicly on the Web and other Web sites have hyperlinks to that site, the chances are excellent that the content has already been indexed or will be indexed soon.

AltaVista indexes every word in each of those documents, (with one minor exception--for extremely long documents, only the first 100 KB are indexed.) This means that you can search not only for single words, but for phrases, sentences, and even paragraphs. If you place a series of words inside quotation marks, AltaVista (in both Simple and Advanced Search) looks for instances of all those words appearing in the same document in exactly that order. It doesn't matter if a given word is extremely common--"to be or not to be" works just fine. Hence, you can periodically test with sample elements of text to see if your work or the work of a client has been reused, with or without permission, anywhere on the Web.

Also, you might consider "tagging" or marking Web pages that you believe others are likely to copy (because of the content or the layout). If you're trying to protect a Java applet, there's no way to search for the content/code of that applet. But you can search for the applet's name. Someone who copied an applet might very well use the same name, not realizing that it could be uncovered by a search. At AltaVista, in either Simple or Advanced Search, just enter applet: followed by the name of the file. The search query applet:* will yield a list of all the applets on the Web.

Similarly, AltaVista doesn't index graphical images, but it does index the names of images. And people who "borrow" images often keep the same name for the file. So a search for image: followed by the name of the file (if it is a unique/rare name) could very well find instances of copying.

Detecting "In-lining"

One way that some people "borrow" an image on the Internet is through a practice known as "in-lining." Instead of copying the image file and posting it on their own site, they include the complete address of your image as part of the structure of their Web page. They haven't "taken" your image in the literal sense of the word--it exists only on your machine, not theirs. But ordinary users looking at the page on the infringer's computer would see your work out of context and think that it was theirs, and a photo that you intended for one purpose might be used in ways you didn't anticipate.

Also, keep in mind that many Internet service providers who offer Web-hosting services charge their customers based on traffic (above some limit). When someone "in-lines" your image, every time their page is accessed, your image comes up--adding to the traffic at your site, but not at the infringer's site.

If you see a sudden rise in traffic at your site or notice a strange pattern in the statistics for your site with an image getting many more hits than the page it is embedded in, you should do an AltaVista search (either Simple or Advanced) using the link: command followed by the complete address of the image in question. That's likely to uncover the infringer.

This is a new kind of problem and the law in this area is not clearly defined, but if the behavior is clearly damaging to you or your client, you should take action to stop it. And in most instances, a strongly worded letter from a lawyer should suffice.

Note of warning

Keep in mind that AltaVista wasn't specifically designed for this type of work. It is intended to provide useful results for tens of millions of ordinary users. If traffic is particularly heavy, the search engine will continue to provide results very quickly, but it might occasionally not check through each page in its index, truncating the search to balance the load. In most instances, that should make no difference. Whether there were 50,000 matches or 100,000 matches to your query normally makes no difference, since you'll check only the first few and they'll probably have the information you need. If you need to be sure that your search is exhaustive--which could certainly be the case in the legal profession--you would be well advised to do critical searches at off-hours (that is, when people on the West Coast aren't at work) and to do the same search on more than one occasion.

As a rule, consider AltaVista an excellent source of positive evidence. You'll find information that otherwise was unattainable. But don't put too much weight on the fact that a given query produced zero matches. Just because you didn't find it doesn't mean that it doesn't exist. You might have made an error in your query, there might have been some traffic-related transient error that led to a particular match not being found, or the page(s) in question might not yet be in the index.

Impact of Search Engines on the Legal Profession Part 2: Finding People and Redefining Discovery

(based on a speech delivered to the Boston Bar Association, February, 1998)

Internet search engines are a supplement to, not a substitute for the databases you have come to rely on in your profession. In general, if you are a lawyer, trying to search for information about specific cases on the Internet at large is likely to be time-consuming and futile, because that kind of information is not typically made available for free or posted in plain HTML pages. The services that focus on that kind of information charge subscription fees and/or store their information in databases, which means the information is not indexed by public search sites. If you want information about a specific case, you are best off using the services you have always subscribed to, such as Lexis-Nexis http://www.lexis.com. They are in the business of gathering, categorizing, organizing legal information.

There are exceptions. For instance, the Legal Information Institute at Cornell University University http://www.law.cornell.edu/index.html makes lots of good material available for free (including Supreme Court decisions) and much of it is indexed at AltaVista. But it would probably be simpler to go straight to that site and to some of the law-specific resources referenced there, rather than do general searches through the entire AltaVista index.

Lawcrawler, http://www.lawcrawler.com, provides a middle ground. Their site searches a subset of the AltaVista index -- only including pages that are directly related to law. They also have links to a number of related law-related sites.

But there are many instances when the information a lawyer needs is not specifically law-related. You may want to do background research on a specific subject or locate a person. In that case, using AltaVista to search the vast unstructured reaches of the Internet may help you uncover information that previously was inaccessible.

Finding People and Being Found

If you need to track down people -- missing heirs, witnesses, potential members of a class action suit -- you should probably start with the on-line telephone directories, such as Switchboard http://www.switchboard.com and Infospace http://www.infospace.com. Then try the on-line email directories like http://www.four11.com. (Keep in mind, too, that some of these services, such as Infospace, allow you to do reverse lookup -- to enter a telephone number and find the name and address of the individual.)

If the name is rare -- unusual name or unusual spelling -- also try AltaVista. First, in Simple Search, try the name is quotation marks (a phrase -- match these exact words in this exact order) and in lower case (match either upper or lower case), as "richard seltzer". If that doesn't work, then go to Advanced Search and try

firstname NEAR lastname

e.g., richard NEAR seltzer

That matches all instances of those two words within 10 words of one another and in any order. In other words, that would catch all the likely variants of a name such as

Seltzer, Richard

Richard W. Seltzer

Richard Warren Seltzer

If those techniques do not work for you and if you are not in an immediate rush, consider trying the "flypaper" approach (described in my article Flypaper: using AltaVista Search in reverse to let people find you ). Basically, create a Web page tailored to your search need. The HTML title of the page should consist of the name of the person you are trying to find, followed by anything known to be near and dear to this person (i.e., terms this person might conceivably search for). The first line of text should be the same as the HTML title. Those are the two factors that make the most difference in how a page is ranked in a list of matches.

After that, simply state why you want to find this person and how this person should get in touch with you. You don't need to have links to this page from anywhere. Simply post it on the Web, go to AltaVista, click on ADD URL, and at the bottom of the next screen enter the complete URL of the page you just created. Within a day or two that page should be in the index. You might want to also ADD URL at other popular search sites, such as Excite and HotBot. Then if this person ever does a search for him/herself, he or she will find your page. And, in fact, many people do use search engines to look for themselves.

This is not a sure-fire method, but it will work sometimes, uncovering individuals who otherwise would never be found.

Similarly, if you are trying to find witnesses or possible members of a class action suit, create Web pages calculated to maximize the chances that the people you want to find will find you -- putting likely search words in the HTML title and the first line of text, and clearly explaining who you are looking for and why.

And, of course, the same technique can and should be used to market your legal practice -- setting up your Web page in such a way that it will be likely to be found by people in your specialty in your locale. If you have access to lists of prospects -- perhaps members of an industry association related to a specialty of yours -- consider posting such a list on the Web. At the top of the document indicate what these people have in common and say, briefly, how you can help them, with links to your other Web-based content. Then perhaps offer to add email addresses and URLs to the list, if people want to send them in. And perhaps offer to host at your site discussions among these folks on topics of common interest (by way of forum and/or chat-style software).

Once again, ADD URL for this specific page at the major search engines Then people on that list, when they look for themselves may very well find your page and be motivated to contact you.

Access to Unstructured Information

The legal responsibilities of citizens, corporations, governments, and lawyers depend in part on what is considered common practice. Often you are required to perform due diligence -- in notifying the public that a change of laws or regulations or zoning codes is pending or that certain property is to be sold at auction, in doing background checks on individuals or in trying to locate individuals. The availability of search engines and the ease with which they can be used changes the norm and hence changes the level of your responsibility. If an eight-year-old could have found a certain piece of information in under five minutes, your failure to find it for the benefit of your client or at the request of the court could be grounds for claiming that you were negligent, might in some cases even lead to charges of malpractice. Similarly, if you are legally required to keep certain information confidential and certain records private, it is important that you understand what can be found over the Internet and how, so you can take appropriate safeguards.

Keep in mind that not only does AltaVista Search run on the public Internet as a free service, but also there is a product sold by Digital Equipment which can run on private corporate intranets. The intranet version has been designed based on experience at the public site, which acts as an enormous test-bed for search technology. It has the same look and feel as the public service, but it is in many ways far more powerful. While the public site only indexes static HTML pages, the intranet version can handle over 200 different document types -- including Acrobat, PostScript, PowerPoint, and Word. With this software and a related software developer's kit, a large corporation could centrally index and make findable virtually all of its documents, including the text contents of its databases. For instance, Digital runs this on its corporate intranet and has indexed over a million documents.

Consider the implications of this capability for corporate law. Remember the IBM anti-trust case of the 1980s. In that day, for "discovery" IBM delivered truckload after truckload of paper documents, literally filling buildings with them; and it took long years for the court to try to sort through them all and make sense of them. Today, all major corporations have company-wide networks, and if they were to use software of this kind, it might be possible to find all the information relevant to a given case in a matter of hours, rather than years. This hasn't happened yet, but it's easy to imagine that a technology-savvy judge could order a company install such software of this kind, index all its documents, and give the court access to them on-line. This could be considered an essential element of the discovery process, and by balking at such a request a company would be demonstrating its unwillingness to cooperate with the court.

Protecting Trademarks and Service Marks

Many trademarks and service marks are made-up words or ordinary words spelled or joined or capitalized in unique ways. Often the intent is to come up with a word or phrase that is unique, but yet conveys the general concept or feeling of your product or service. Then, once having laid claim to that mark (having had experts do appropriate searches of existing marks to make sure no one who in any way competes with you already using the mark, and having applied to register your mark), it is your responsibility to use due diligence in protecting it.

Fortunately, AltaVista makes the Web-related part of this due diligence relatively easy. If a term is spelled uniquely, a simple search for that term should turn up all instances of it. The list of matches is likely to be relatively small (unless your company is very well known, like CocaCola or Xerox), and most the matches are likely to be of interest to you. A few may be simply typos or instances from totally unrelated industries, but most will probably be the kind of generic misuse that you will want to squelch, with a note like that quoted above.

AltaVista's methods for handling capitalization and punctuation also make it particularly effective for searching for trademarks. If you type a word or phrase in all in lower case, it will search for both lower case and upper case. But if any letter is in upper case, it looks for particular letter in upper case and only in upper case. Many brand names have unusual spelling, such as using all capital letters, like LISTSERV (or putting capital letters in the middle of words. If I at search at AltaVista for LISTSERV, I get exactly what I want, because every page in the results list should have a word exactly matches that unique spelling (unless the page has been edited since it was entered in the AltaVista index).

Keep in mind that AltaVista works the reverse of how you normally get information at a library. If you talk to a reference librarian, if you have a question that has many possible ways to answer it, the librarian will be happy -- pull a book off the shelf, give you an answer, and you walk away as a happy customer. But if you ask for something that is rare and difficult to find, that librarian is going to start tearing her hair out, and it's going to take a long time and be very painful to get that answer. At AltaVista, the more rare, the more unique, the more hard to find something is, the easier it is to find. Because the term is rare and because the index includes the full text of so many Web pages, the likelihood is very great that AltaVista will come back with just a few matches, and nearly all of them will be exactly what you want. The more rare it is, the easier it is. If it's rare, you're not going to have to worry about how to refine your search. It's just going to be there at the top of your list. And a bizarre spelling or unusual use of capital letters is a good way to make a search unique.

Once you find a site that you believe is misusing your trademark or service mark, you can do a search of that entire site, using the query host:domainname That will give you a sense of how widespread the abuse is at that particular site, and an opportunity to check out all the variants there before drafting your note of complaint.

AltaVista does not cover every page on the Web. That is an impossible goal, because 1) the Web is expanding so quickly, 2) some Web sites and pages are constructed in such as way as to preclude indexing. For instance, when a Web crawler arrive at a site that generates dynamic pages, that looks like division by zero -- like there were an infinite number of pages ahead. Also, content inside frames and in non-HTML formats (like PostScript and Acrobat), and behind firewalls and behind registration forms is not in the index. But the index today includes over 150 million Web pages, and it continues to grow at a rapid rate, with the crawlers visiting and revisiting 10 million pages a day. So while you can't expect to catch everything, a search at AltaVista (which can take just a few seconds) would probably be considered "due diligence" with regard to Web content. If you are going to do this same search regularly, then use Advanced Search and limit the query by date, so you only see material added to the Web since the last time you looked.

With more and more companies waking up to this fact, at one time or another, you will probably receive notes like the one I just got from L-Soft. When that happens, you will want to thoroughly search all your Web pages for the offending words and phrases. In my case, since all my pages are indexed at AltaVista (I submit the URL -- "add page" -- at AltaVista for each and every page I add or change), I can use AltaVista myself. A simple search for +host:samizdat.com +LISTSERV gives me all instances of pages at my Web site where that trademarked term appears.

 info@samizdat.com privacy statement