By Richard Seltzer, seltzer@samizdat.com, www.samizdat.com
Reprinted with permission from Internet Search Advantage, ZD Journals. http://www.zdjournals.com
How to translate this article into French, Spanish, Italian, Portuguese, or GermanComment traduire en français, Cómo traducir a los españoles, Come tradurre in italiano, Como traduzir em portuguêses, Wie man in Deutschen übersetzt.
Here's my current concern. How do I, at no cost, build a complete index of my Web site with custom search forms, index password-protected pages, and keep the index clean? Well, that's what I want to talk about this month. At my little sandbox Web site where I test new ideas, I have more than 600 text-only documents, some of which are complete books (http://www.samizdat.com). It was through my site that I discovered the importance of using search engines to drive traffic to a Web site (only about 12 percent of my visitors come by way of my home page; others go straight to the document they want, thanks largely to search engines). That was also how I discovered the "flypaper" principle for catching the attention of potential visitors. (I discussed this principle in the article "Flypaper: Using AltaVista Search in Reverse to Let People Find You," which appeared in the August issue of Power Searching with AltaVista.)
Six months ago, a number of people who frequented my site began asking me to add some indexing software. They praised the site as content-rich but complained that it was difficult to find what they wanted, because there was simply too much information to browse through.
Finally, it occurred to me that there was no need to invest money and time in new software. All I had to do was take advantage of the capabilities of the free public AltaVista Search service.
At the point I realized this, every page at my Web site was accounted for in the AltaVista Search index. Whenever I added new pages or made significant changes to old ones, I clicked the Add/Remove URL link on the new AltaVista Search page, as shown in Figure A, and separately added each page on my Web site. Then, the AltaVista Search spider, Scooter, fetched those pages immediately and added them to the AltaVista Search index by the next day.
Figure A: On the new AltaVista Search page, you'll see
the Add/Remove URL link.
I did this because I knew how important AltaVista Search indexing is in letting the right people know about my Web site. By providing lots of useful free content on my site, I've been able to build my traffic to more than 1,000 page retrievals per day, without spending a penny on promotion or advertising.
Because I indexed all my pages, the search command
+host:samizdat.com
will yield a list of every page at my Web site. And adding search terms to that command will limit the search to my site. In other words, +host:samizdat.com +chat will provide a list of all the documents on my site that mention chat--both the transcripts of my weekly chat sessions and the articles in which I discuss business opportunities related to chat. I also know I can bookmark the results of any search at AltaVista Search, and that I can make a hyperlink to the unique URL of a search result from a Web page at my site. So I can search for +host:samizdat.com and use my right mouse button to copy the resultant URL onto my clipboard. Then, I can paste that address onto my home page, making it a hyperlink with the anchor
"click here"<a href="http://altavista.digital.com/cgi-bin/query?pg=q&c9usej=on&what=web &Kl=XX&q=%2Bhost%3Asamizdat.com& search.x=71&search.y=16">Click here to use AltaVista to search this Web site"</A
Next, I included the following explanation: "To find anything at this site, connect to AltaVista Search and enter in the query box +host:samizdat.com followed by the words or phrases you want to look for, or simply click here and enter your query (+host:samizdat.com will already be in the box)." Most visitors are content with such an informal low-tech approach. One time, though, I got E-mail from Jorn Barger (http://www.mcs.net/~jorn), who explained that I could use forms to better accomplish this task. I replied that what he was suggesting was beyond my limited HTML abilities. So, he was kind enough to write the code for me. Here it is:
<FORM name=mfrm method=GETaction="http://www.altavista.digital.com/cgi-bin/query" <INPUT NAME=q size=65 maxlength=800 wrap=virtualVALUE="+host:samizdat.com"><br <INPUT TYPE=hidden NAME=act VALUE=search <INPUT TYPE=hidden NAME=pg VALUE=q <INPUT TYPE=hidden NAME=text VALUE=yes <INPUT TYPE=hidden NAME=what VALUE=web in language: <SELECT NAME=kl><OPTION VALUE=enSELECTED>English<OPTION VALUE=fr >French<OPTION VALUE=pt >Portuguese</SELECT <INPUT TYPE=submit VALUE=Submit </form
To use this form, all I had to do was replace +host:samizdat.com with the domain name of your site. If you're reading this article at The Cobb Group Web site, there's no need for you to retype the code template. Simply save it as an htm file and use your HTML authoring tools to copy and paste it into your Web page. If you're reading this article in paper form, you could also go to my Web site at http://www.samizdat.com and perform your copy-and-paste action there.
Since I have French and Portuguese translations of some of the items at my site, this code takes advantage of the new language-tags capability at AltaVista Search and gives users a choice of languages. If your site uses English only, you can delete those three language-related lines (beginning with "in language" and ending with "Portuguese").
If you'd like to restrict the search to a directory at your site rather than have it include your entire site, use the URL: command instead. For instance, the command
+url:www.samizdat.com/commun
will limit the search to that particular directory. You also might want to create a series of forms tailored for people who are unfamiliar or uncomfortable with the search syntax at AltaVista Search. Simply edit the item VALUE="+host: samizdat.com", adding query terms within the quotation marks, just as you would use them at the AltaVista Search site. For instance,
VALUE="+host:samizdat.com internet"
will create a form with those search terms pre-loaded, inviting people to add whatever further refinements they might like and making those refinements very easy to submit. (By the way, this form connects to AltaVista Search in text-only mode, so you see none of the logos or ads, and the response time is amazingly fast.)
You may wonder why I use the Add/Remove URL link. If so, you may ask, "Doesn't the AltaVista Search spider follow all the links that it finds and index every page on the Web? And, if I add the URL of my home page (to alert the spider that I have a new site), isn't that enough for all my pages to get indexed?" Yes, AltaVista Search does go from link to link. Yes, it does find all pages, but not immediately and not every day. Even when the spider continuously crawls, with more than a thousand threads working simultaneously to capture one page after another, it still takes the spider a while to go everywhere and get every-thing. The developers at AltaVista Search keep improving the spider's techniques, and they keep upgrading and expanding the equipment at the AltaVista Search site. But at the same time, the total content on the Web keeps growing, making it increasingly difficult to find and index all of it.
If you have new material and you want people to start finding you right away, or if you want to use AltaVista Search as a complete index of your Web site, the only way to make sure that the new pages are added promptly is for you to add the new pages individually by hand. It takes diligence and discipline to keep the information about your site up-to-date, but, in my case at least, the benefit is well worth the effort.
Unfortunately, my approach of adding a URL at AltaVista Search for each new or modified page is appropriate only if you have a rel-atively small site. If you try to add more than a dozen pages per day by hand, AltaVista Search responds that you've added too many pages, and it stops accepting your input. You can then return and add another dozen pages the next day, and a dozen the next.
If you have a large Web site, or if you don't have your own domain name and hence are part of a very large domain (like Geocities) from which many people submit several URLs, you might run into a dead end at your first URL submission. The folks who run AltaVista Search are trying to prevent "index spamming," that is, attempts to stuff the index with bogus information intended to promote someone's business or to serve as simply malicious hacks. That's why the folks at AltaVista Search had to set a limit. But if you can stay below the limit, it's a no-cost simple way to make it easy for visitors to find anything at your site.
Have you performed an AltaVista search and found that some items in the list of matches no longer exist? Instead of cursing or sending a nasty feedback message to the AltaVista Search folks, you should click Add/Remove URL and enter the URL of the dead page. The spider will immediately determine if the page is actually gone or if there was only an intermittent network problem. If you receive an error message indicating that the page no longer exists, AltaVista Search will remove that item from the index very soon, usually by the next day. Over time, Scooter will revisit all the pages in the AltaVista Search index, and pages that have vanished will be eliminated from the index. But that doesn't happen instantaneously. So there's always a residue of old information buried among all the good stuff in the index.
Remember that AltaVista Search is a public service, provided for free for the benefit of the Internet community. And consider it your responsibility, as a good Internet citizen, to take a few moments to alert AltaVista Search when you discover that a page has vanished. If we all do that, we'll improve the accuracy of the information in the index, for the benefit of us all.
And don't forget the obvious: Use Add/Remove URL to fix your own typos and other mistakes. Sooner or later, we all end up posting a page that contains an embarrassing mistake. Fortunately, Web technology makes it easy to post a corrected page (unlike having to throw away tens of thousands of copies of a printed brochure). But if the mistake appears in the HTML title, in the first couple of text lines, or in a description META tag, it could be perpetuated at AltaVista Search for weeks after you've corrected the page, unless you use Add/Remove URL to update the copy of that page in the index.
If your Web pages consist of content that you sell by subscription and hence are protected by password (like the Power Searching with AltaVista pages at The Cobb Group), you face a bit of a dilemma. You want to protect the information, because that's the source of your revenue. But at the same time, you'd like people to know that you have this information. You'd like your information to be completely indexed at the AltaVista Search site to attract potential subscribers. But once you password-protect your pages, you also prevent the AltaVista Search spider (and other search-engine spiders) from ever retrieving and indexing your information. If you have a small business, you might be willing to try some workarounds using the Add/Remove URL link. (Large companies would probably prefer more direct, though perhaps costly, solutions.) For instance, you can copy your password-protected page to a new URL and add that URL at the AltaVista Search page. Then, you can delete the content and replace it with a pointer to your home page, where you can set up a form that visitors can use to see free samples/excerpts or to sign up for a subscription. Then, all the words on your site will be indexed, but you'll retain control of the content. That will be a temporary solution until the next time the spider visits your page and finds new content. You should be able to make this work on a more permanent basis by using the Robots Exclusion Standard link to deter Web spiders from retrieving that page again. (To see details on how to do that, just click Add/Remove URL and scroll down to the paragraph shown in Figure B.)
Figure B: Use the Robots Exclusion Standard link
to deter the spider from retrieving a page again.
Use this workaround selectively and judiciously. It makes good sense if you have real content that you don't want to give away for free. But the same method can be abused by someone trying to draw people to a site using a bait-and-switch strategy, where the person pretends to have content that isn't really there. That strategy borders on "index spamming," and the folks who run AltaVista Search take a very dim view of such behavior. As they say at Add/Remove URL, "Left unchecked, this behavior [will] make Web indexes worthless. [Alta-Vista Search] will disallow URL submissions from those who spam the index. In extreme cases, [AltaVista Search] will exclude all their pages from the index."
As you try these techniques, please let us know about your successes and failures. Send us your tips, the creative approaches you've tried, and your questions. You can reach the author directly at seltzer@samizdat.com.
Go to Richard Seltzer's AltaVista Search tutorial
Return to B&R Samizdat Express
Can we help you build an Internet business? Richard Seltzer is an independent Internet writer/speaker/consultant. Click here for details. or send email to seltzer@samizdat.com
| Internet Business Showcase: | |||
|
|
|
|