By Richard Seltzer, seltzer@samizdat.com, www.samizdat.com
Reprinted with permission from Internet Search Advantage, ZD Journals. http://www.zdjournals.com
How to translate this article into French, Spanish, Italian, Portuguese, or GermanComment traduire en français, Cómo traducir a los españoles, Come tradurre in italiano, Como traduzir em portuguêses, Wie man in Deutschen übersetzt.
ALERT: LiveTopics is a powerful feature that was once offered through AltaVista. They later renamed it "Refine." It is now no longer available. The project, code named "Cow9," was a collaborative effort between researchers at Digital Equipment Corporation and François Bourdoncle of Ecole des Mines de Paris, www.ensmp.fr The underlying technology has enormous potential. This article will give you a sense of how it can be used. If this topic interests you should also check the slides beginning at www.samizdat.com/script/lt1.htm and related articles:
If you're interested in a fast-changing field, you may want to use AltaVista Search to get a quick overview of your field instead of just to find Web pages and nuggets of information--for example, what's important in a particular field, what's changing, and what are the major companies and/or schools. To simplify finding anything on the Internet, AltaVista Search can generate categories on the fly and provide a graphical view of those categories. As a result, you get more than just pointers to nuggets of information--you get information about information, a picture of how pieces of information relate to one another. Keep in mind that these categories are created statistically, live, on-the-fly based on the information in the AltaVista Search index. There is no human bias. No one is making assumptions about the content. This isn't like the Dewey Decimal system, subject to obsolescence as the world rapidly changes. This is a view of today's Web content based on the information currently in the AltaVista Search index. And that set of information is enormous enough to yield very interesting and unexpected results.
The AltaVista Search site (http://www.altavista.digital.com) continues to grow and change in response to the needs of users and because of new opportunities opened by technology. A few months ago, AltaVista changed the look and feel of Live Topics search features. Now, you don't see terms like Way-cool Topics Map! Rather when you first submit a query (from either the Simple or Advanced Search), you have a choice of clicking on Search to get the standard list of matching Web pages or on Refine to see a set of categories based on words found on the same pages as your query words. These categories can be displayed in either list form (20 items, with subcategories listed next to each) or in graph form, as shown in Figure A. The subcategories become visible when you move your cursor over a given term. If you prefer to go straight to the graphic view or to the list of categories, you can pre-select that choice in your personal preferences (just as you can choose to go straight to Advanced instead of Simple, as your default setting).
Figure A: Refine displays categories as either a list or
a graph.
Once you see the graphic view, you may well be tempted to print it and show it to a colleague, save it for comparison with another such picture, or build a set of data for spotting long-term trends. Unfortunately, because the image is created in a Java applet, the normal technique of choosing Print from your Web browser doesn't work. The printout includes everything but the image you wanted.
As a workaround, you could use the Print Screen option on your PC to capture whatever is on the screen and send it to a printer and/or to put that image into your clipboard for pasting into another application, like PowerPoint. That approach, however, tends to drop some of the detail, sometimes making the chart difficult to read--not very useful for showing your boss or presenting at a meeting.
There are numerous graphics software packages that can improve the visual quality of your results. I use Paint Shop Pro to capture the image and then save it in GIF format for use in my Web-based presentations, or in BMP format for use with PowerPoint. To capture a graph, first open Paint Shop Pro, go to the Capture menu, and select New. The Web page will appear on the screen. Just drag the mouse cursor across the image you want to capture. When you release the mouse button, you'll automatically return to Paint Shop Pro with the Web page image appearing in the work area.
Once you have a satisfactory way of saving and printing these images, you may want to go to Advanced Search and run the same query, limiting the search to a range of dates, and generate a series of images that show how the Web content in this subject area has changed over time. And you can go back to AltaVista Search later to generate fresh images to add to your collection and perhaps help you recognize important trends.
With these images, what you don't see is often just as important as what you do see. What's missing? What is the true range? Often, the value comes from the unexpected--the set of categories and subcategories that you wouldn't have thought of on your own. I think of these images as X-rays rather than snapshots, because it takes experience and insight to interpret them. X-rays are diagnostic tools, a starting point for research both on the Internet and by traditional methods. X-rays represent a means, rather than an end. The objective isn't to get answers to questions, but rather to help you decide what questions to ask and to help you avoid missing something significant.
In many cases, you should use a Refine search not to narrow your query, but to expand it and get a "10,000-foot view" of a particular subject. For instance, suppose you're a biochemistry major and are interested in graduate school or in getting a job in that field. Enter the query biochemistry and click Refine. Check the list of categories and the graphic view. Choose one or more categories or subcategories of particular interest. Add those to your query by clicking in the box beside each category, then clicking Refine again. Continue doing this until the X-ray begins to more closely resemble the range of your personal interests. Just getting a breakdown of all the information on the Web about that subject, seeing the relationships, and knowing what the categories are could be of value to you. To focus the X-ray further, choose a company or university that appears prominently in the list of categories, or choose one that you know by reputation and add that company or university to the query. For instance, you might add this to the end of your query:
+host:harvard.edu
The command +host: means that you now want results limited to that one domain name; in this case, you only want to track information on biochemistry jobs found at Harvard University Web sites. Save or print the results, then substitute another university or company, and so on.
In the November 1997 article "Understanding the Limits of Accuracy," we discussed the need to understand the limits of accuracy before making precise judgments. That advice is very important in this case. Some Web sites use techniques that prevent the AltaVista Search spider from making a full index of their content. For instance, they may require a registration/password process, may appear only in frames, or may even draw content on the fly from databases. If that's the case for one or more of the sites you're examining, you won't be comparing apples to apples. So if accuracy is important to you, go directly to the individual sites and see if you encounter any of the telltale signs of a barrier to indexing.
As a further test, enter the target site by way of its home page and navigate through internal links to pages that are particularly interesting. Note the directories in which these pages appear or the URLs of individual pages. Then go back to AltaVista Search and do queries for those particular directories or pages by entering url: followed by the Web address, as shown in Figure B. If you get zero results, that means the directory or page hasn't been indexed by AltaVista Search, perhaps because it was only recently added to the site and the spider hasn't found it yet. Hence, the X-rays you obtained off the site are incomplete and perhaps misleading.
Figure B: You can test the accuracy of your searches by
testing individual Web sites with the url: command.
If accuracy is essential and time is no object, you could click Add/Remove URL on the AltaVista Search home page and enter the URL for each important page that isn't yet in the AltaVista Search index. Anyone can do this. You don't need special privileges or any official relationship with the site in question. The spider will immediately fetch the page and add it to the index, usually by the next day. Then you can go back and get a fresh X-ray, based on the more complete information.
If you're in business, you can use that same approach to get X-rays of the Web sites of competitors, suppliers, partners, and major customers, as well as your own Web site. You can make comparisons and look for trends, in each case using the query format +host:.
If the company in question makes extensive use of the Web for marketing (and nearly all medium- to large-size companies do nowadays), you'll get a candid, unrehearsed view of what's important to them. What's the range of topics they cover? Do words like customer or satisfaction appear in the top 20 or in the first set of subcategories? Does the content present a clear, coherent picture, with all the various categories interlinked? Or is the information mostly scattered and unconnected?
You can also search a subset of these same companies' sites. For instance, entering
+host:digital.com +internet
yields all the pages at Digital Equipment that mention internet, as shown in Figure C.
Figure C: You can search a subset of a company's site with
the host: command.
To get a broader view of a company's visibility and range of activity, use as the query the company name itself (in quotation marks if it consists of more than one word; or if the company is well-known under more than one form of its name, use Advanced Search and enter each name--in quotation marks--separated by the word OR). That will give you a picture of all the indexed pages on the public Web that mention the company by name.
For an X-ray view of pages that have hypertext links to a given site but that aren't at the site itself (an indication of the types and range of content at sites that consider the target site useful and valuable), use a combination of the link: and host: commands. For instance, Figure D shows the results of the search
+link:digital.com -host:digital.com
Figure D: Use a combination of the link: and host:
commands .to search pages with hypertext links to a Web site (while
excluding the site itself).
If your field of business is new and rapidly changing, like the Internet, AltaVista Search X-rays may be a good way to get a first approximation of the market segments and the key areas of interest. For instance, try doing separate queries for intranet, electronic commerce, and isp* (the common abbreviation for Internet service provider). Using traditional techniques, such as questionnaires and focus groups, it would take a market research firm months to gather and assemble data on any of these topic areas. By the time you received your high-priced report, the marketplace would have changed again. Properly done (taking into account all the issues of accuracy and completeness described above and in previous articles), the X-ray approach can provide useful information almost immediately, at no cost, and help you better focus and speed up professional market research efforts.
What works for universities and companies also works for individuals who are active and well-known on the Internet. For the fun of it, try getting X-rays of yourself, your friends, people you're dating or might want to date, people you're considering hiring or working for. Use the person's name as the query term or use the host: command if the person has his or her own Web site with its own domain name, or use the url: command with the directory address of the person's personal Web pages. If the name is rare, just enter it in quotation marks, like "richard seltzer", in either the Simple or Advanced Search. I use lowercase to catch all instances of a name. If it's a common name, go to the Advanced Search, enter the name in the query field, and in the Ranking field, enter a series of words that are likely to distinguish the person you're interested in from others who have the same name. To be sure that you catch all instances of the name, you can use the NEAR command in the Advanced Search. For instance, richard NEAR seltzer catches all instances where those words appear within ten words of one another, in any order. Hence, it catches Seltzer, Richard, and Richard Warren Seltzer--names that would be missed by entering "richard seltzer", which only matches instances where those two words appear next to one another in exactly that order.
The results can be both amusing and illuminating. Because virtually everything of significance that I've written over the last 30 years is at my Web site and has been indexed by AltaVista Search, I can see the instantaneous results of a statistical analysis of all those words. In an X-ray of myself, I see the word irrational in the top 20 categories, and in an X-ray of my Web site (http://www.samizdat.com), the word gratifying is in the top 20. I see the pieces of my life--here's my son who plays chess; here are characters from books I`ve written; here are my business contacts--some are separate islands of activity and others are interrelated in interesting ways.
The next time you plan a party list or business meeting that includes people who are active on the Internet, do graphic-style searches of all prospective attendees. Print these information X-rays so you can see them side-by-side as you're deciding who to invite and where to seat them. Then for the event itself, post these pictures on the wall as a conversation starter and have AltaVista Search open on a nearby PC for folks to follow up and play with what they find out about one another.
As you try these techniques, please let us know about your successes and failures. Send us your tips, the creative approaches you've tried, and your questions. Let's share and learn from one another. You can reach the author directly at seltzer@samizdat.com.
Go to Richard Seltzer's AltaVista Search tutorial
Return to B&R Samizdat Express
Can we help you build an Internet business? Richard Seltzer is an independent Internet writer/speaker/consultant. Click here for details. or send email to seltzer@samizdat.com
| Internet Business Showcase: | |||
|
|
|
|