Questions and answers

By Richard Seltzer, seltzer@samizdat.com, www.samizdat.com


Reprinted with permission from Internet Search Advantage, ZD Journals. http://www.zdjournals.com

How to translate this article into French, Spanish, Italian, Portuguese, or GermanComment traduire en français, Cómo traducir a los españoles, Come tradurre in italiano, Como traduzir em portuguêses, Wie man in Deutschen übersetzt.



The following questions are based on recent correspondence about my articles. They deal with matters that many people find confusing, and many of you should find the answers interesting. Please feel free to send your questions to seltzer@samizdat.com.

Bogus research results

Q A new publication, Content Creator's Newsletter, produced by RealNetworks (the creators of RealAudio and RealVideo), claims these products? have about 90 percent marketshare and bases this claim on results from AltaVista and HotBot ("Search Engines Reveal that 88-94% of Internet Streaming Media URLs use RealAudio and RealVideo"). Is this claim accurate? Did Content Creator's Newsletter use the search engine correctly? (A marketshare number that high sounds absolutely incredible.)

A This looks like a good example of the misuse of search engines for market research. For AltaVista, Content Creator's Newsletter says its criteria were

"www.altavista.com 
(search using "link:*.XXX", advanced search, precise count)."






In other words, it used Advanced Search and selected Give Me Only A Precise Count Of Matches using the link: command (which counts the pages that have hyperlinks to particular pages or sites) with the following search criteria:

RealNetworks-(link:*.ra OR link:*.ram OR link:*.rm OR link:*.rpm)
Netshow-(link:*.asx OR link:*.asf)
VDO-link:*.vdo
Vivo-link:*.viv
Xing-link:*.xdm
Vxtreme-(link:*.ivy OR link:*.ivx)
Vosaic-link:*.vos

Content Creator's Newsletter notes that RealNetworks includes RealAudio, RealVideo, and RealFlash. The results it obtained by doing this search on November 5, 1997, indicated the following:

Media type URLsMarketshare
RealNetworks512,85693.8%
Netshow9,5701.7%
VDO10,3411.9%
Vivo3,736.7%
Xing4,148.8%
Vxtreme6,3211.2%
Vosaic2,804.5%

First, a minor detail--Content Creator's Newsletter indicates that instead of going directly to the AltaVista Search site, located at http://www.altavista.digital.com, it went to the site of a company named AltaVista Technology, located at http://www.altavista.com. That site redirects you to the real AltaVista search site, but garners advertising revenue from the confusion.

Unfortunately, no command at AltaVista Search can tell you the number of files with a particular file extension (aside from domain:, which works for edu, com, net, and so on, and for the file extensions that designate countries). The query Content Creator's Newsletter constructed with link: simply finds files that have hyperlinks to pages containing .ra and so forth anywhere in their filenames. It looks like Content Creator's Newsletter made the mistake of using the precise count feature without ever getting a list of results it could check. If Content Creator's Newsletter had checked--clicking to list items and then viewing the pages' source code to see what hyperlinks are on those pages--it would have quickly realized the command wasn't returning the desired results.

Similarly, the alternative command url:*.ra wouldn't have told Content Creator's Newsletter the number of files on the Web with that extension. Once again, AltaVista would show all pages that have .ra as an element of the URL--not necessarily at the end of the URL. Just looking at the URLs of the files in the list of matches would make that clear. For information about how to subscribe to this new newsletter, visit

http://www.real.com/mailinglist/index.html

Getting indexed at AltaVista

Q When I edit my Web page, I don't know how to update it in AltaVista, other than by simply uploading with FTP. In my eagerness, I tried going to Add/Remove URL on AltaVista, but I'm afraid I've just messed things up. My page used to come up as number 6 or 7 when I typed +folsom +homes. Now only one of my subsidiary pages appears under the heading TEST. My index page has all but disappeared.

A First, the main criteria for ranking are the HTML title and the first couple of text lines. On each of your pages, make sure you have the most important terms in those two places (the words you expect people to search for). After you've uploaded your edited pages to your Web site, go to AltaVista and use Add URL for each page.

Return to AltaVista after a day or two, search for the terms most important to you (terms you've included in your HTML titles and first lines of text), and see how you rank. You also can search for host: followed by your domain name (if you have your own), or search for url: followed by the directory in which your pages reside (if you don't) to see which of your pages are already included in the index.

A problem with comments

Q I've tried to properly handle the META tags, and for a while, everything was working fine. Then, after looking at document sources on other pages, I decided to add comments like

<!--LISTING="homes/folsom/amriv.jpg"--






I then went to Add URL, and I've had nothing but problems ever since.

A Comments shouldn't make any difference--they're not indexed. As for META tags, if you put what you need in the HTML title and the first couple of text lines, META tags should be unnecessary.

Offending Scooter

Q One thing you suggested really surprised me. You said when I finished editing, I should go to Add URL for each page. I've never done that because when I go to AltaVista's Add URL page, it says, "Please submit only one URL. Our crawler, Scooter, will eventually explore your entire site by following links." I want to be really clear before I start because I certainly don't want to offend Scooter.

A If you want your pages in the index by the next day, use Add URL for each page. If you don't mind waiting several weeks, just enter the home page.

Purging URLs in AltaVista

Q When I was first composing my subsidiary Web pages, I used TEST as my title for pages I was testing. I never added their URLs, and I've since changed the titles. (No I haven't added their URLs yet, because I thought I wasn't supposed to.) Yet somehow, when I type +homes +"El Dorado Hills", I see TEST on the second page, and then something about school scores in one of my subsidiary pages or other information I wrote on different pages. For instance, I had a page on climate, a page on recreation, a page on school scores, and so on. But my index page has failed to show up like it once did. So how do I remove all those TEST pages. Do I go to Remove URL?

A Once again, use Add URL and enter the URL of each old, dead TEST page. Scooter will immediately try to fetch the pages. If the pages no longer exist, Scooter will get the message Error 404 and remove the pages from the index, usually by the next day.

Updating your URLs

Q Now that I've worked with AltaVista as an information provider instead of a researcher, I find that I don't like the search index nearly as much. My job is to list companies in the index, but I've had no success. Is there something I don't know about the AltaVista schedule? Why does AltaVista insist that it updates and has the most current and relevant listings, when in fact it doesn't?

A The information in the AltaVista index is as current as you want to make it. At any time, you can go to the AltaVista site, click Add URL, and simply type the URL of a new or revised page of yours; that info will be in the index usually by the next day. I do that for all of my pages.

AltaVista now has more than 90 million pages in its index, and its crawlers visit about 10 million pages a day. So eventually your pages will be found automatically--soon if there are many links from other pages to yours (because that's how the crawler learns about pages--from links), later if there are few links, and perhaps never if there are no links. If being indexed is important to you, simply take charge and use Add URL.


Go to Richard Seltzer's AltaVista Search tutorial

Other search articles

Return to B&R Samizdat Express

Can we help you build an Internet business? Richard Seltzer is an independent Internet writer/speaker/consultant. Click here for details. or send email to seltzer@samizdat.com


Internet Business Showcase:
| |

Internet Business Showcase: