BUSINESS ON THE WORLD WIDE WEB --

November 21, 1996 -- Impact of Search Engines on Internet Business


Transcript of the live chat session that took place Thursday, November 21, 1996. These sessions are normally scheduled for 12 noon-1 PM US Eastern Time (Standard Time = GMT -5, Daylight Savings Time = GMT -4) on Thursdays.

To connect to the chat room, go to www.samizdat.com/chat-intro.html

Since the chat itself happens at a rapid pace, it's often difficult to note interesting facts and URLs as they appear live. Here's a place to take a more leisurely look. I've rearranged some of the pieces to try to capture the various threads of discussion (which sometimes get lost in the rush of live chat).

Please send email with your follow-on questions and comments, and suggestions for topics we should focus on in future sessions. So long as the volume of email responses is manageable, I'll post the most pertinent ones here for all to see.

These sessions are hosted by Richard Seltzer. If you would like to receive email reminders of our chat sessions, simply send a blank email message to businessonthewebchats-subscribe@yahoogroups.com or go to http://groups.yahoo.com/group/businessonthewebchats and sign up there.

This is one of the longest-running chat programs on the Web. (Please let us know if you know of ones that are older.) We've been doing this since June 1996.

In parallel with the chat program, Richard does a 5-minute weekly radio segment on "The Computer Report," which is heard live on WCAP in Lowell, Mass., and is syndicated on WBNW in Boston and WPLM in Plymouth, Mass. He often bases his five-minute segment on a recent chat discussion. He also often turns that same segment into a weekly article for posting at this site and for distribution to other sites through iSyndicate www.isyndicate.com.

For transcripts of previous sessions and a list of future topics, www.samizdat.com/chat.html.

For an article on how to make "business chat" work (based on this experience), www.samizdat.com/events.html.

For articles on topics related to this one, check our newsletter, Internet-on-a-Disk www.samizdat.com/ioad.html


Threads (reconstructed after the fact): Followup

INTRODUCTIONS

Richard Seltzer (199.3.129.189) - 11:53am -- The scheduled chat for noon to 1 PM is on Business on the WWW. If you are here for that discussion, please identify yourself.

We're here to share experiences about doing business on the Internet -- particularly the World Wide Web. What works? What doesn't work? Why? What are the trends that matter? How can you/should you adapt to the Internet culture and environment?I work for the Internet Business Group at Digital Equipment in Littleton, MA. In that capacity, I end up talking to people from large companies about how they can use the Web for business. I also have my own personal Web page -- which is content rich and no frills -- which I do for practically nothing and draws a fair amount of traffic and attention.

In a chat session like this things can get pretty frantic. It's sometimes difficult to follow the threads of conversation. And there's no time to write down interesting URLs and facts. So last week, I took a copy of the raw transcript and edited it to make the threads clearer and posted it at my own little Web site so anyone could take a look. You can see it at http://www.samizdat.com/chat18.html I plan to do the same today. Barring technical difficulties, I hope to have a transcript up later today. I'll post it at the same site, naming this one /chat19.html

Today, we plan to continue our discussion on the impact of search engines on Internet business. My book The AltaVista Search Revolution was just published by Osborne/McGraw-Hill so this topic is foremost in my mind right now. But, as always, we welcome any questions and insights related to Internet business.

Richard Seltzer (199.3.129.189) - 11:58am -- If you just connected and are here for the discussion on Business on the Worldwide Web please identify yourself and let us know about your interests.

barbara (199.93.126.34) - 11:58am -- Hi! I'm here.

Richard Seltzer (199.3.129.189) - 11:59am -- Hello, Barbara. What's uppermost on your mind today?

bill_h (192.135.44.202) - 12:00pm -- Hello all.

Richard Seltzer (199.3.129.189) - 12:01pm -- Hell, bill_h -- Do you have particular questions/concerns with regard to search engines, like AltaVista?

Audrey (204.166.232.87) - 12:01pm -- Greetings, folks . . .

Richard Seltzer (199.3.129.189) - 12:02pm -- Welcome, Audrey -- Do you have a Web site/page? And does much of your traffic come by way of search engines?

Audrey (204.166.232.87) - 12:04pm -- Hi, Richard - I work for an web site developer and I don't see much traffic coming in thru search engines but I use them all the time.

Bob Fleischer, http://www.tiac.net/users/rjf/ (192.208.46.249) -- Hi! I'm with Digital's Network and System Integration Services.

Tate (205.230.10.45) - 12:06pm -- Hello

anonymous (206.225.193.24) - 12:06pm -- Sorry I am late.

Linn & John (199.186.173.215) - 12:07pm -- Hi! We don't know too much about business on the Internet, but we'd like to lurk and learn.

TheForth (192.58.206.22) - 12:08pm -- Hi, TheForth is with you now... :-)


UPDATE ON ALTAVISTA CAPABILITIES

Richard Seltzer (199.3.129.189) - 12:00pm -- By the way, I just learned of two recent changes at the AltaVista site which could helpful to you. 1) In the past, you could from ADD URL submit a new page and AltaVista would fetch the page immediately and add it to the index within a day or two. But it only accepted new pages. If the URL was already in the index, you got a message that you'd have to wait until the crawler got to your site again. Now that limitation has gone away -- you can submit the URLs of specific updated pages for quick entry in the index.

The second change is -- If you have deleted a page and want the information deleted from the index, now you can simply submit the URL of the deleted page. The crawler will immedately try to fetch it, and when it gets the error message that indicates the page does not exist at that site, it will pass along instructions to the indexer to delete that information.

bill_h (192.135.44.202) - 12:06pm -- I use AltaVista once or twice a day. I have noticed that maybe 10% of the URL's give error's, ie. The requested URL ... was not found on this server. If you can ADD a URL, can you signal AltaVista to fix the reference? If not, how long before AltaVista updates the reference and fixes itself?

Audrey (204.166.232.87) - 12:08pm -- Good point, Bill - I've noticed that too. What about multiple listings of the same site? Why does Alta Vista allow that?

Richard Seltzer (199.3.129.189) - 12:11pm -- bill_h -- Good question. Actually, a new feature was just added to AltaVista. You can submit the URL of a dead/non-existent page. When you do so, the crawler immediately tries to fetch the page. When it gets the error message that such a page does not exist, it passes that information along to the indexer and within a ay or two the old information should be purged from the index. Anyone (not just the information provider) can submit such a URL. Also, when the crawler (Scooter) in its normal travels finds that a page previously in the index no longer exists, the same process is triggered. Unfortunately, pages seem to come and go pretty quickly on the Web these days, so it's not unusual to have the experience you reported.

bill_h (192.135.44.202) - 12:10pm -- Looks like you answered my question. To fix a bad reference, submit an ADD URL with the bad reference, which will trigger an update.

Richard Seltzer (199.3.129.189) - 12:13pm -- Audrey -- There is no way to get multiple listings of the same site, unless you have multiple copies of the same page on your server or have mirror sites. And when AltaVista recognizes that two pages are nearly identical, it lists them together for your convenience. Remember it gets its information straight from a site, not from a form that someone fills out.


COOKIES AND SEARCH ENGINES

Todd (192.208.46.249) - 12:04pm -- Greetings. Richard - I was rather confused by the discussion of cookies in last week's transcript. Could you try to clarify their relevance to search engines.

Richard Seltzer (199.3.129.189) - 12:08pm -- Welcome Todd, Audrey, and Bob -- Yes, Todd, we seemed to get a bit off the track last week on the discussion of cookies. The point I was trying to make was that AltaVista and other search engines today cannot and do not fully index "dynamic pages". In other words, if a Web site is set up to recognize a user and provide pages tailored to that user's interests or based on where that user has been before at that site, that practice locks out indexing. I believe that in some cases that kind of dynamic personalization is linked to "cookies." Is that clearer?

JK (207.152.136.144) - 12:09pm -- Hi, On cookies...would you mind explaining what they are? They seem to be a hot topic lately.

Jacques (Was annonymous) (206.225.193.24) - 12:10pm -- I was surprised to fing a cookie from Boston Com in my browser files after last week...

bill_h (192.135.44.202) - 12:20pm -- My understanding of 'cookies' is that the Web Server delivers a message to the browser (your machine) which includes an unique string of data. This string gets appended to a default 'cookie' file. After this, anytime you access a Web Server, the 'cookie' file is sent to the server (maybe requested by the Server). This gives the server a unique 'signal' that it is your Browser (machine) making the request. As a result, all future requests from your machine can be identified. Deleting or renaming the 'cookie' file removes this unique tag, but the next access to the same Web Server will load another 'cookie'.

Richard Seltzer (199.3.129.189) - 12:28pm -- bill_h -- Thanks for the clear explanation of cookies. The relationship to this discussion is that some sites use cookies to find out if you have been to their site before and to learn what pages you have seen before and based on that information provide you with different dynamic material. Search engines today can't deal with dynamic material. It does no good to index a page that was tailored for a particular individual.


HOW TO CHANGE INTERNET PROVIDER WITH MINIMUM ADDRESS-CHANGE DISRUPTION

Jacques (206.225.193.24) - 12:14pm -- I need advice on strategy: I am planing to change internet provider. I have created an access to my web page through an AOL account which provides a link to our web site. The idea is to be able to change the main site's address with minimum disruption. Good or bad idea?

Bob Fleischer, http://www.tiac.net/users/rjf/ (192.208.46.249) -- Jacques -- is your existing access to your own domain name, or through an AOL-owned domain? In the former case, of course, you can move the name to a new provider. In the latter case, you can maintain a page diving a redirection (manual, automatic, or timed via the META tag) to the new location -- but this requires keeping a minimal account with the old provider!

Richard Seltzer (199.3.129.189) - 12:21pm -- Jacques -- I'm not quote sure why you need/want that AoL connection. The best way that I know of to maintain continuity is to own your own domain name. Then when you switch providers, you can maintain the same (alias) address. There might be a day or two disruption when switching, but that should be it. What's the AoL thing about? Please explain further.

Jacques (206.225.193.24) - 12:27pm -- I think I understand Richard's question. If my URL was some proprietary name like "Jacques.com" I could just switch access provider... Unfortunately my pages are on a site that came with and ISP which was provided by an ex client. In order to provide continuity I also provide access through a page that comes free as part of my AOL subscription. I'll keep the aol subscription, and when I change the ISP I'll modify the link in the AOL page so as to give my new URL.

Richard Seltzer (199.3.129.189) - 12:32pm -- Jacques -- Okay. makes sense. But if you are serious, you should get your own domain name soon. Also, keep in mind that with AltaVista you can check to see what pages on the Web have hyperlinks to your old address and then you can send email to the Webmasters requesting that they update the links. Simply enter as your query: link:theoldurl


HOW TO GET A DOMAIN NAME

Todd (192.208.46.249) - 12:38pm -- Richard - how does one get a domain name? How much does it cost? Will most ISP host my domain or do I need to maintain a server?

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- Todd-- "Do you need your own server?"...The answer really is "What are YOUR needs? Your business needs,etc.?" What is your purpose in getting a web site? How busy do you anticipate it will be? I generally advise against getting your own web server (for a small business or individual) ..UNLESS you have the technical know-how and want to be tied down 24 hrs/7 days a week. I recommend letting ISPs host your site..Again only you will know the cost-effectiveness of this..(Most ISPs are hosting for $19-$29 a month)

Richard Seltzer (199.3.129.189) - 12:48pm -- Todd -- check with your Internet access provider that's the simplest route to get a domain name. There's a company that gives these things out for a fee (I don't remember the name of the company or the price, but it's on the order of about $50 per year). Your access provider would certainly know. And you'll also need to deal with your access provider if you are running your pages hosted on their server and now want to operate using an "alias" (your new domain name). It's relatively quick and painless to do the switchover. Then you'll want to do a link: search at AltaVista to find out who is linked to your old address and let them know about your new one.

Larry (199.232.57.94) - 12:46pm -- Todd- Most ISPs will handle the paperwork etc to get you setup with your own domain name. They'll charge you a nominal one-time setup fee ($25-50). InterNIC (the keeper of all domain name info) charges you $100 for the 1st 2 years, and $50/year thereafter for your domain name. (This is in addition to whatever your ISP has charged you.) You don't need to have your own domain name server. http://rs.internic.net for details. If you want a *.us domain, the procedure is slightly different.

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- Re: setting up your own domain name-- It's really easy, and you can do it yourself by registering at http://rs.internic.net as Larry said.. Then you just have to coordinate with your ISP to get on their system..(Admittedly, sometimes it's better to let the ISPs handle the paperwork.)


HOW MUCH OF THE WEB IS INDEXED?

Bob Fleischer, http://www.tiac.net/users/rjf/ (192.208.46.249) -- Are there any estimates of what percentage of the material on the Web is inaccessible to robots/crawlers, either due to robot exclusion, or because it's a database behind a search form, or because a password is required? (Alternatively, what percentage *is* indexed?)

Richard Seltzer (199.3.129.189) - 12:18pm -- Bob -- Excellent question to which I don't think anyone has an answer right now. All we know is that over 30 million pages are accessible and indexed by AltaVista. As you know, there are many pages on corporate intranets, behind firewalls, that the public AltaVista Search service does not get to. Just within Digital, at last count I heard that there were over 900,000 pages on over 1,100 servers. And that's just one corporation.

Tate (205.230.10.45) - 12:18pm -- Following from PC MAG: Sites Indexed: AltaVista 30M, Excite 50M, HotBot 54M, Infoseek 1.5M, Lycos 66M, Megellan 6M, Open Text 20M, WebCrawler 2M, Yahoo! .37M Where M=Million.

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) - Tate: Just curious...what issue of PC Mag?

Richard Seltzer (199.3.129.189) - 12:24pm -- Tate -- I'd take all those numbers with a large grain of salt. Different sites count pages different. Different sites go to different lengths to purge duplicates and old information etc. (The harder you work at making your index clean and accurate the lower the total number you get). It's like hit counts -- useful for comparing a site's performance today vs. performance in the past, but not much good for comparing one site to another.

Tate -- The Search Engine data were from PCMAG Dec. 3, '96


HOW MANY ALTAVISTA CLONES ARE THERE?

bill_h (192.135.44.202) - 12:29pm -- I understand that AltaVista software can be purchased, so that customers can own their own AltaVista search machine. Are there any public numbers on how many AltaVista 'clones' exist?

Bob Fleischer, http://www.tiac.net/users/rjf/ (192.208.46.249) -- AltaVista Search software most certainly can be purchased -- quite a few businesses and organizations already run *internal* versions to search their internal Webs.

bill_h (192.135.44.202) - 12:36pm -- Bob - Thanks. Any numbers on how may AltaVista 'clones' are out there? I assume that AltaVista can be tailored so that a customer can search only for info of interest vs the AltaVist 'index the world' technique. Correct?

Richard Seltzer (199.3.129.189) - 12:38pm -- bill_h -- There is a personal edition of AltaVista which runs on Windows 95 and NT 4.0, which you can download (beta) from the AltaVista software site http://www.altavista.software.digital.com That let's you find things on your own hard disk and local LAN in the same format, with the same commands as the public AltaVista search service. There is also a version that is for sale for indexing a corporate intranet. Check the AltaVista software site for details.

Richard Seltzer (199.3.129.189) - 12:45pm -- bill_h -- At the AltaVista site there's a list of about four "clones" -- one of which I remember is LawCrawler. That one uses the AltaVista technology and provides tailored, narrowed searches for a particular area of interest. You might want to check out the others for creative ideas. They are also franchising mirror sites on other continents -- one is running in Australia already -- to reduce traffic tie-ups on trans-oceanic lines, and the folks running the mirror sites are doing so on a commercial basis with advertising. (The main AltaVista site deliberately does not run advertising).


LOOKING FOR AN INDIVIDUAL

jamie (131.123.120.203) - 12:30pm -- I am looking for chris montecalvo, he goes to the art institute. He was a high school friend of mine, and we lost touch. I am in Ohio, and if anyone knows his E-mail adress, please, please let me know!

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- Jamie--Maybe you could try the Yahoo search under Reference and People and Email addresses

Richard Seltzer (199.3.129.189) - 12:36pm -- Jamie -- The quickest way to find an email address is at http://www.four11.com/

Bob Fleischer, http://www.tiac.net/users/rjf/ (192.208.46.249) -- Jamie -- there are also searchable phone-book databases on the Web, some of which include personal as well as business listings (white- as well as yellow-pages). http://superpages.gte.net/ is one. http://bigyellow.com/ is another.


MAKING A PERSON'S NAME FINDABLE

barbara (199.93.126.34) - 12:02pm -- In order to promote a name of a person on Alta Vista via their web page, does his/her name have to be in a special location?

Richard Seltzer (199.3.129.189) - 12:05pm -- Barbara -- First, what do you mean by "promote"? If you mean make it so somebody looking for that name will find your Web site, that actually depends on how rare that name is. If it is very rare, putting the name anywhere on any page (within the first 100 Kbytes of text of that page) should be sufficient. If it is a common name, and you want searches for that name to get a results list with your page near the top -- include the name in your HTML title and in the first first lines of text. Is that what you meant?

barbara (199.93.126.34) - 12:12pm -- Richard, yes. Although I have a friend who now has a web site, his name however doesn't appear on the home page, but on one of the pages that you drill down to. His web site title, "EXIT 42," comes up first when you check alta vista, but when I try his name, "Jon Leslie," 3 names come up, none of which is him.

Richard Seltzer (199.3.129.189) - 12:16pm -- Barbara -- Are you sure that his name appears in exactly that form on that page of his "Jon Leslie". Two possibilities -- 1) in Advanced Search enter the query jon* NEAR leslie to get all possible variants of the name 2) enter url:thepageyouknowthenameison to doublecheck that that page is in the AltaVista index. If it isn't in the index, for any reason, click on ADD URL and enter that URL and the page should then be added to the index within a day or two.

Tate (205.230.10.45) - 12:21pm -- To ensure success with names, always include common misspellings, i.e., Radisson is correct but also use Raddison, Raddisson, etc.

Richard Seltzer (199.3.129.189) - 12:26pm -- Tate -- Right, it's good to include alternate spellings. You can also use the wildcard * (in AltaVista you can only use that after thre first three characters, and use it to represent up to 5 missing characters, hence radis* would get radisson and radison. I tend to always use lower case since that matches both upper and lower case.

Tate (205.230.10.45) - 12:29pm -- Richard - Understand that user can find with wild cards. However, they may be intimidating to or ignored by many users.

Richard Seltzer (199.3.129.189) - 12:35pm -- Tate -- Yes, many users never bother to look at the Help files. Keep in mind for AltaVista that you have two ways to search -- Simple and Advanced. If you want to enter a series of alternate spellings in Simple Search, just enter the words with spaces between. In Advanced search enter the words with OR between. And if you want to enter an entire name use quotation marks like "Richard Seltzer" to indicate the words must both appear and must appear in that order. If you don't know the order of the words (might be Seltzer, Richard) and don't know if there might be initial s or a middle name, then use Advanced search and enter the NEAR command, such as Richard NEAR Seltzer that catches any instance of those two words appearing within 10 words of one another. Hope that helps.


ALTAVISTA AS TOOL FOR FIGHTING SPAM?

Jacques (206.225.193.24) - 12:35pm -- Can altavista be used as a tool to fight spam and other annonymous emailers?

Richard Seltzer (199.3.129.189) - 12:42pm -- Jacques -- There are several varieties of spam. AltaVista can't really help in fighting the email variety. But it can help with newsgroups. If you have seen a "spam-type" message in a favorite newsgroup of yours you can do a newsgroup search at AltaVista to check to see what this sender has posted and where. Use the from: command to limit it to the sender. You might also search by a characteristic phrase in the text. (The more subtle spammers may use different Subjects/titles in different groups.) Identifying who is doing it and to whom and how often is the first step in discouraging that kind of behavior.

Tate (205.230.10.45) - 12:45pm -- We practice spam control by returning E-mail with WP-97 attached. Does the body good to send a 84M file to the spammer. We also enroll them for 15 or 20 newsgroups so their E-mail box won't be empty :-)

barbara (199.93.126.34) - 12:45pm -- What is Spam?

bill_h (192.135.44.202) - 12:52pm -- SPAM, What happens when you throw a can of Spam into a fan, also unwanted e-mail messages, typically advertising some uneeded service.

Jacques (206.225.193.24) - 12:48pm -- Tate & Richard: To me the issue is to identify the spammers. There seems to be more and more sophistication in the methods used to hide their identities. Hence my interest in using Altavista.


WILL SEARCH ENGINES BECOME LESS AND LESS USEFUL?

Tate (205.230.10.45) - 12:35pm -- What is your opinion on the idea that as the Internet grows the search engines will become less and less useful and that one will eventually need a "Web Compass" type program to organize the data from even more niche locators?

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- As the Internet grows, the search engines will become less and less usful---maybe...maybe not.. If we push for "standards" across searching, then we wouldn't end up with 20 different ways of searching...I still argue that the Information Specialists have the advantage because they've been searching databases for 20 years already.. (e.g. DIALOG, BRS, Lexis/Nexis....librarians)

Tate (205.230.10.45) - 12:42pm -- Carol - not only 20 different ways of searching but 20 different ways of submitting site information!

Richard Seltzer (199.3.129.189) - 12:52pm -- Carol -- Regarding the long-term usefulness of search engines. First, the fact that there are 21 million accesses to AltaVista a day makes it pretty attractive to do whatever you can to be indexed by such a site -- which means avoiding the use of technology that gets in the way of being indexed. Second, the search technology is improving. I hear that the AltaVista folks are close to being able to index PostScript and Acrobat files. Third, it should be possible to add information that is in databases to the indexer. There's no way right now, but I believe that's a business rather than a technology issue. Since there's extra work involved, it would make sense to charge the information provider for the service of adding that information to the general index.


HOW TO MAKE MONEY ON THE INTERNET

Tappy (145.17.100.1) - 12:17pm -- Has anyone determined a way to make money on the internet?

Richard Seltzer (199.3.129.189) - 12:22pm -- Tappy -- the quick and simple answer is -- you make money on the Internet by building and serving an audience. Check the articles on my Web site http://www.samizdat.com for some ideas on how to do that.

Bob Fleischer, http://www.tiac.net/users/rjf/ (192.208.46.249) -- Tappy -- "how to make money on the Internet" is a very open-ended question. The Web is mostly "information", but that is very broad. Back when Sears had their mail-order catalog, they probably didn't think of themselves as making money in the publishing business.


LOOKING FOR A HOST FOR WEB PAGES

Jack N (206.196.132.51) - 12:22pm -- I am having a web page built and I will be looking for someone to host it Could you please tell what to look for,questions to ask and fair pricing. Thanks

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- JackN: etal. I'd suggest asking these questions when finding an ISP to host web site: 1)What's their "up" time? 2)Will you be listed in THEIR business or personal web pages? 3)Are you allowed to put "special programs" (e.g. FrontPage extensions) on the server? 4) Do they charge for putting these extra programs on your site? That's just a start...


SEARCHING FOR NON-ENGLISH CHARACTERS WITH ALTAVISTA

bill_h (192.135.44.202) - 12:48pm -- If I wanted to enter an AltaVista search for words using non-English characters, where would I go to find out how to enter the search string from my Win95 PC and could AltaVista handle it?

Richard Seltzer (199.3.129.189) - 12:55pm -- bill_h -- AltaVista is independent of language. It indexes everything. If your PC is set up so you can enter characters with French or German or Spanish character/accents, just type that way in the query box and AltaVista will search only for documents in which those words appear with those accents. (Asian languages present different problems because of conflicting codes, lack of uniformity. But if all were to adopt UNICODE, AltaVista would work for those languages as well.)


PLUG FOR THE BOOK -- THE ALTAVISTA SEARCH REVOLUTION

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) - Richard: Plz put in a plug for your book on AltaVista...where do we get it? Is any of it online?

Richard Seltzer (199.3.129.189) - 12:57pm -- Carol, thanks for the prompt. Many of your questions could be answered by my book, just published by Osborne/McGraw-Hill. The AltaVista Search Revolution. It's available in many bookstores (including Quantum in Cambridge, Mass.) and can also be order on-line from the AltaVista site http://altavista.digital.com/


WRAPUP

Richard Seltzer (199.3.129.189) - 12:58pm -- All -- I can't believe the hour is almost over. Seems like people still have questions on this topic. Please let me know if you'd like us to continue on this next time -- not next week, which we'll skip because of Thanksgiving, but the Thursday after that.

Jacques (206.225.193.24) - 12:59pm -- I'd love to have a discussion on spam on turkey day

Richard Seltzer (199.3.129.189) - 12:59pm -- As usual, I'll post the transcript of today's session (edited to reconstruct threads) at my Web site, probably tonight. Check at http://www.samizdat.com/index.html#chat Also, please send me email with followup questions and comments and I'll add them to the transcript. seltzer@samizdat.com

Richard Seltzer (199.3.129.189) - 1:00pm -- Also, please send me email regarding your preferences for future topics -- including whether we should go with this one for one more time. seltzer@samizdat.com

Richard Seltzer (199.3.129.189) - 1:00pm -- Also, before signing off please let everyone know your email and URL addresses for followup purposes.

barbara (199.93.126.34) - 1:01pm -- Search engines is a topic I would like to continue. Bye.

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- cjsnyder@snyderinfo.com http://www.snyderinfo.com Let's continue the details of search engines.

Carol J. Snyder(snyderinfo*Web Design) (199.3.134.192) -- Invitation to all to join the Boston Internet Group at tonight's meeting at MIT (See http://bcs1.ziplink.net/Groups/isig/ Topic: Hi-speed access to the Internet

Richard Seltzer (199.3.129.189) -- Thanks to all for your participation.


FOLLOWUP

HOW TO BUY THE BOOK -- Eileen Moyer

From: Eileen Moyer <emoyer@mcp.edu> Date: Thu, 14 Nov 1996 17:40:08 -0500

Richard - How can I quickly buy your book? Is it available in local bookstores or online?

Eileen Moyer, Asst. Librarian, Massachusetts College of Pharmacy & Allied Health Sciences, Boston, MA

REPLY --It's available at many bookstores (including Quantum Books in Cambridge, Mass.) and also online at the AltaVista site (http://altavista.digital.com)


IMPORTANCE OF DATA BASES, THE FUTURE OF FREE AVAILABILITY OF SEARCH ENGINES, AND INTELLIGENT AGENTS -- Eileen Moyer

From: Eileen Moyer <emoyer@mcp.edu> Date: Fri, 15 Nov 1996 11:48:01 -0500

Thanks for the info about your book. I plan to check it out at Quantum at lunch today.

I started reading yesterday's chat at 12:45 and didn't have time to catch up, but printed out most of it and read it this am. As a librarian I am discouraged by some of the remarks. Do without databases??? I refer to one of your opening remarks "You don't need traditional databases (in many instances)." I disagree. although much information is available and we all hope that more government info will be disseminated over the web, traditional databases report on primary research and are traditionally costly because of the value added by the vendor ie indexing, abstracting etc. If you want to know about your competition you might not exclude a web search, but you had better have done a search on several of the business databases through a vendor such as Knight-Ridder or STN. If a doctor or pharmacist is looking for information on treatment of a disease state or a drug monograph s/he had best be looking on a proprietary database. I only wish that we keep the information on the web in context. It does not replace, merely supplement.

Another issue is the free availability of these search engines - I doubt that they will remain free for long. I know that InfoSeek bowed out of its pay portion, but I think this is just a free ride for now until someone decides how to squeeze money out of us. also, the information is just not worth paying for. One of my colleagues subscribed to Individual, Inc. and was not impressed with the coverage it gave her. She gleaned very little that she did not already get from other sources - not free sources either. I do not single out Individual,Inc.

Just a note about full-text search engines. I believe that Open Text was the first search engine/database which indexed full-text of web pages. They used that "to be or not to be" example on their homepage last year.

Yes, I would like to see you continue this theme. You could open it to Intelligent Agents which Search Engines are moving towards, although I believe Firefly was a guest on the chat a while ago.

BTW,I am co-chair of a program committee for the Special Libraries Association, Boston Chapter which will be presenting a program next March for our group tentatively entitled Information Delivery on the Internet: At Your Service? A Look at the Latest Developments and Integration in Search Engines, Intelligent Agents and Webcasting My committee and I are therefore interested in your topic.

Eileen Moyer, Asst. Librarian, Massachusetts College of Pharmacy & Allied Health Sciences, Boston, MA

REPLY FROM RICHARD SELTZER

Interesting. I meant the phrase "don't need traditional databases" from the perspective of an information provider. Instead of investing in having human beings make numerous judgements to categorize information and put it in predetermined fields, you can store your information in flat fields -- without even a directory structure -- and retrieve what you want when you want it.

I believe that you are thinking in terms of existing proprietary databases, which have been built through years of work and are now well-populated with useful information. I'm not suggesting that they will go away. I am suggesting that that approach to storing and retrieving information will become less and less useful because of its enormous cost and lack of flexibility.

It will be interesting to see how today's database-centric commercial information services migrate to the new mode or come up with creative combinations of new and old. (It would, for instance, be possible to feed the full contents of an existing database into an AltaVista-type index.)

> Another issue is the free availability of these search engines - I doubt that they will remain free for long.

I, on the other hand, would be very surprised if anyone successfully charged for such a search service. Digital has derived tremendous benefit from providing AltaVista as a search service (about five months after it went online focus groups indicated that among people on the Internet the AltaVista name was far better known than the Digital name, despite the tens of millions of dollars and over thirty years of advertising and marketing that went into trying to establish the Digital brand. And not a penny had been spent to brand AltaVista at that point. People remembered it and thought highly of it because of the value free service. The company then decided to use the AltaVista name as the brand name for a whole range of related Internet software products).

And as long as AltaVista Search is free, it will be virtually impossible for anyone else to charge for a competing service.

> Just a note about full-text search engines. I believe that Open Text was the first search engine/database which indexed full-text of web pages. They used that "to be or not to be" example on their homepage last year.

I first saw Open Text at Internet World in Boston in the fall of 1995, about a month before AltaVista went public. Yes, they made very broad claims about having indexed every single word of every single page. But when I did a search of my site at their booth, it turned out they had only indexed about half a dozen of the over 300 pages at my site. And searches on a variety of topics provided hit-or-miss results -- sometimes excellent, and sometimes not having indexed the pages with the answers (or not able to retrieve the info from their index.) As soon as AltaVista came on line, everything at my little site was available. That to me was the first real instance of "full text" search.

I'm sure they've made improvements since. And I have no knowledge of the internal workings of their system. I suggest trying the same search on both machines -- using all the available tools to fine-tune your search -- and see which you prefer.

> Yes, I would like to see you continue this theme. You could open it to Intelligent Agents which Search Engines are moving towards, although I believe Firefly was a guest on the chat a while ago.

Yes, I'll be interested in what will happen with regard to Intelligent Agents. I suspect that they will take off in parallel with electronic commerce, because an important application will be for price comparisons, and buying agents and selling agents.

Today, for information retrieval agents don't make much sense, since virtually all the public text of the Internet is now available from a single system AltaVista. Why build/use an agent to look at 30 million pages for you when all you need to do is look in one place?

REPLY TO REPLY FROM EILEEN MOYER

Richard - I bought the book and will look at it tonight.

> I am suggesting that that approach to storing and retrieving information will become less and less useful because of its

> enormous cost and lack of flexibility.

I hope not and disagree about the lack of flexibility

At a recent seminar by two business librarians I was led to believe that Digital would not be adverse to someone buying AltaVista because it is not making any money - Yes, the PR power has been tremendous.

I believe that last year/early this year OpenText was a possible contender and did bill themselves as full text indexing of the pages they scanned - but AltaVista eclipsed them quickly and no one hear of them anymore.

Don't you think that AltaVista's lack of indexing gifs and jpegs is something they should address? It isn't perfect, but I can bang at Lycos and get a gif of a skull.

Does this mean that you will definitely be discussing the search engines next thurs? I will alert my committee members.

Eileen Moyer, Asst. Librarian, Massachusetts College of Pharmacy & Allied Health SciencesBoston, MA

REPLY TO REPLY TO REPLY FROM RICHARD SELTZEr

From: Richard Seltzer <seltzer@@samizdat.com> To: Eileen Moyer <emoyer@mcp.edu>

> > I am suggesting that that approach to storing and retrieving information will become less and less useful because of its enormous cost and lack of flexibility.

> I hope not and disagree about the lack of flexibility

By lack of flexibility, I mean that with traditional databases I need to decide today how I am going to access information five years from now. I need to categorize and fill in fixed fields and decide what words are "key".

> At a recent seminar by two business librarians I was led to believe that Digital would not be adverse to someone buying AltaVista because it is not making any money - Yes, the PR power has been tremendous.

That is not at all the case. It was never intended to make money. It was a research project, an experiment that became wildly popular and hence spawned business opportunities.

They are licensing mirror sites in Australia, Europe and elsewhere to make access easier for for folks on the other side of slow trans-oceanic lines. And those mirror sites will be financed by advertising.

And Digital has gone through the preliminary steps necessary to spin the AltaVista software business off as a separate company (retaining 80% ownership, for full control).

Perhaps one or the other of those moves spawned the rumor you are referring to.

> I believe that last year/early this year OpenText was a possible contender and did bill themselves as full text indexing of the pages they scanned - but AltaVista eclipsed them quickly and no one hear of them anymore.

> Don't you think that AltaVista's lack of indexing gifs and jpegs is something they should address? It isn't perfect, but I can bang at Lycos and get a gif of a skull.

It may be simply a matter of familiarity with the query language. With AltaVista I can search image:skull.gif and if there are gifs with that name, I'll get matches.

The names of images are indexed, as is any alternate (ALT) text associated with them. (I'm sure that's all Lycos is doing at this point.)

By the way, you can also search for java applets by name, with applet:name

But, yes, there's lots of room for improvement and the researchers are having a ball taking on those challenges -- such as indexing PostScript and Acrobat, dealing with Asian languages, adding material from databases to the index, etc.

> Does this mean that you will definitely be discussing the search engines next thurs? I will alert my committee members.

Yes, from all the interest that's been shown, that's what we should do.

Please spread the word.

Richard


DEFINITION OF COOKIES

From: "Michael S. Hart" <hart@prairienet.org> Date: Thu, 14 Nov 1996 22:40:48 -0600

The following is an excerpt from a message was sent out over the Project Gutenberg Email List Gutnberg@postoffice.cso.uiuc.edu It is included here with permission (because of the many questions about cookies.)

This is "Ask Dr. Internet" for November, 1996

Q0.

Cookies

A0.

This First Part Is Perhaps Best Considered as Cookie Propaganda

A cookie is an unique piece of information generated on-the-air and sent to you from up-to-date WWW servers that identifies your request of a particular Web page.

That cookie might be then used by the server to recognize you just AS YOU when you will re-contact it later.

This could bee useful for purposes such as a commerce-on-line service to let the server track your orders just as YOURS, while others are visiting the same cyberstore and each one receives his own and unique cookie.

The cookie is a small, inoffensive text file saved in your browser's directory or folder and stored in RAM while your browser is running. Most of the information in a cookie is pretty mundane stuff, but some Web sites use cookies to store personal preferences, such as items already ordered or any other kind of preference you could have set meanwhile.

If you want to see what information is stored in your current cookie file, use a text editor or a word processor to open a file called cookie.txt or MagicCookie in your browser's folder or directory.

***However, Many People Hate Dealing With Cookies***

***So Much So That Some People Still Consider Cookies As A Virus***

THWARTING COOKIES

For computer users who dislike the idea that Web site operators can track their repeat visits through "cookie" technology, there are several ways to block the software from collecting or relaying that information. PrivNet's Internet Fast Forward < http://www.privnet.com/ > prevents the browser from sending cookies. The program can also block those annoying little ad banners, eliminating the time it takes to download them. Anonymizer <http://www.anonymizer.com/ > functions more as a proxy service -- the information is not given out unless the user grants permission. "Surfing feels anonymous, like reading a newspaper," says Anonymizer's creator, "but it's not. What Netscape needs is a feature saying, `Look, I never want to see another cookie again.'" (Scientific American Oct 96 p50)

HTTP COOKIES

Preliminary Specification - Use with caution

Cookies are a general mechanism which server side connections (such as CGI scripts) can use to both store and retrieve information on the client side f the connection. The addition of a simple, persistent, client-side state significantly extends the capabilities of Web-based client/server applications.

OVERVIEW

A server, when returning an HTTP object to a client, may also send a piece of state information which the client will store. Included in that state object is a description of the range of URLs for which that state is valid. Any future HTTP requests made by the client which fall in that range will include a transmittal of the current value of the state object from the client back to the server. The state object is called a cookie, for no compelling reason.

This simple mechanism provides a powerful new tool which enables a host of new types of applications to be written for web-based environments. Shopping applications can now store information about the currently selected items, for fee services can send back registration information and free the client from retyping a user-id on next connection, sites can store per-user preferences on the client, and have the client supply those preferences every time that site is connected to.

A cookie is introduced to the client by including a Set-Cookie header as part of an HTTP response, typically this will be generated by a CGI script.

Syntax of the Set-Cookie HTTP Response Header

This is the format a CGI script would use to add to the HTTP headers a new piece of data which is to be stored by the client for later retrieval.

Set-Cookie: NAME=VALUE; expires=DATE;

path=PATH; domain=DOMAIN_NAME; secure

NAME=VALUE

This string is a sequence of characters excluding semi-colon, comma and white space. If there is a need to place such data in the name or value, some encoding method such as URL style %XX encoding is recommended, though no encoding is defined or required.

This is the only required attribute on the Set-Cookie header.

expires=DATE

The expires attribute specifies a date string that defines the valid lifetime of that cookie. Once the expiration date has been reached, the cookie will no longer be stored or given out.

The date string is formatted as:

Wdy, DD-Mon-YYYY HH:MM:SS GMT

This is based on RFC 822, RFC 850, RFC 1036, and RFC 1123, with the variations that the only legal time zone is GMT and the separators between the elements of the date must be dashes.

expires is an optional attribute. If not specified, the cookie will expire when the user's session ends.

Note: There is a bug in Netscape Navigator version 1.1 and earlier. Only cookies whose path attribute is set explicitly to "/" will be properly saved between sessions if they have an expires attribute.

domain=DOMAIN_NAME

When searching the cookie list for valid cookies, a comparison of the domain attributes of the cookie is made with the Internet domain name of the host from which the URL will be fetched. If there is a tail match, then the cookie will go through path matching to see if it should be sent. "Tail matching" means that domain attribute is matched against the tail of the fully qualified domain name of the host. A domain attribute of "acme.com" would match host names "anvil.acme.com" as well as "shipping.crate.acme.com".

Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT".

The default value of domain is the host name of the server which generated the cookie response.

path=PATH

The path attribute is used to specify the subset of URLs in a domain for which the cookie is valid. If a cookie has already passed domain matching, then the pathname component of the URL is compared with the path attribute, and if there is a match, the cookie is considered valid and is sent along with the URL request. The path "/foo" would match "/foobar" and /foo/bar.html". The path "/" is the most general path.

If the path is not specified, it as assumed to be the same path as the document being described by the header which contains the cookie.

secure

If a cookie is marked secure, it will only be transmitted if the communications channel with the host is a secure one. Currently this means that secure cookies will only be sent to HTTPS (HTTP over SSL) servers.

If secure is not specified, a cookie is considered safe to be sent in the clear over unsecured channels.

Syntax of the Cookie HTTP Request Header

When requesting a URL from an HTTP server, the browser will match the URL against all cookies and if any of them match, a line containing the name/value pairs of all matching cookies will be included in the HTTP request. Here is the format of that line:

Cookie: NAME1=OPAQUE_STRING1; NAME2=OPAQUE_STRING2 ...

Here are some sample exchanges which are designed to illustrate the use of cookies.

First Example transaction sequence:

Client requests a document, and receives in the response:

Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 09-Nov-99 23:12:40 GMT

When client requests a URL in path "/" on this server, it sends:

Cookie: CUSTOMER=WILE_E_COYOTE

Client requests a document, and receives in the response:

Set-Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001; path=/

When client requests a URL in path "/" on this server, it sends:

Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001

Client receives:

Set-Cookie: SHIPPING=FEDEX; path=/foo

When client requests a URL in path "/" on this server, it sends:

Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001

When client requests a URL in path "/foo" on this server, it sends:

Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001;

SHIPPING=FEDEX

For more about cookies, and there is plenty, try:

You can reach the Dr. Internet Web Sites as follows: http://promo.net/drnet/

This page includes their disclaimer, has links to the Project Gutenberg sites, to all the issues and, for each issue, to ALL the questions/answers, plus links to a page to submit questions.


META TAGS -- Cristen Hewett

From: Cristen Hewett <cristen@outreach.com> Date: Fri, 15 Nov 1996 10:53:39 -0500

I'm interested to know how often the "robot" or search engine, updates the site. I added META tags about 2-3 months ago, and places like Lycos still haven't reflected the change.

Cristen Hewett cristen@outreach.com http://www.outreach.com

REPLY FROM RICHARD SELTZER

AltaVista's crawler (Scooter) runs continuously, but the Web is a big place. How often it will return to your site is not predictable (it follows an expanding trail from one URL to another, with 1000 threads scouring the Web simultaneously). If you added the tags two months ago, it is very likely that AltaVista has picked them up. Have you checked recently?

(I have no direct knowledge of how Lycos does things, but my impression is that it takes a long long time for their crawler to do its full circuit).

Remember, if you have a new page, you can click on "Add URL" at the AltaVista site and enter the URL of that page. The crawler will immediately (within a couple seconds) retrieve the text of your page, and it should be in the index in about a day.

If you made a change to an existing page and wish to update the info in the AltaVista index, that simple procedure doesn't work. You'll get a message that that page is already in the index. If it is critical to you that the changed/updated info be available from the AltaVista index, I'd suggest a work-around. You could create a duplicate of the page in question and give it a new file name and submit that new URL. The new page would get indexed right away and if all the links are identical, people going to that page from AltaVista would be able to go everywhere you want them to. (The disadvantage of that is the old information also remains in the index until the crawler takes a new look at your whole site. But at least the new info is available as well.)

I hope this helps.


ADVICE ON MAKING A NEW SITE FINDABLE -- Mika

From: Mika <mika@easybase.com> Date: Fri, 15 Nov 1996 08:30:04 -0200

I just read your post to the I-sales digest, and was wondering if you could help me. (I won't be offended if you're not interested).

I have a new Web site (see URL below) to promote a just-released product. I have a list of targets and different ways that I intend to promote my site. I'm not a newbie to Internet Marketing, but what do you do when you do All The Right Things, but it still doesn't work?

I submitted my URL manually to the top 7 or so search engines and indexes - I went back two weeks later and it was as if I hadn't submitted anything at all! Alta Vista, who had a listing, only had one page listed, not my whole site.

I'm using meta-tags and carefully chosen titles. Yet if I do a search for "database", outside of Yahoo, my pages don't show up. I'm stumped, I can't figure out how to make my pages shine above the rest.

Any ideas would be appreciated! (sorry if I'm a pest)

Mika, easyBASE - The First Database For Dummies http://www.easybase.com

REPLY FROM RICHARD SELTZER

Check an article of mine about how to promote a Web site over the Internet (doing the free stuff). http://www.samizdat.com/public.html

When you submit a URL to AltaVista, it immediately retrieves that particular page, and adds that text to the index in a day or two. It also adds that URL to the list of URLs the crawler (Scooter) will visit as it moves around the Web. Yes, Scooter has 1000 simultaneous threads (so it's like 1000 crawlers working at once). But the Internet is enormous -- over 30 million pages, so it can take a few weeks, working continuously, for it to visit every single page. And it will be random when in its cycle it visits the URL you submitted and hence sniffs out and follows all the links from there.

If you have pages other than your main page that it is crucial for you to have indexed quickly, submit those pages as separate URLs.

> I'm using meta-tags and carefully chosen titles. Yet if I do a search for "database", outside of Yahoo, my pages don't show up. I'm stumped, I can't figure out how to make my pages shine above the rest.

Keep in mind that "database" is an awful term to depend upon. It is simply too common on the Internet. An Altavista search for that word alone would probably result in hundreds of thousands, if not millions of matches.

To make your pages "shine above the rest" you need to have a clearer more specific idea of what business you are in. What niche? What subset of database business? Not just a category -- but what words would immediately come to the mind of your ideal customer; and make sure those words appear in the HTML titles, and first lines of text of the pages you want them to find. (Metatags are an additional route -- but you are far better off clearly stating what you are up to in the text of the page itself.)

I hope this helps.

REPLY TO REPLY

From: Mika <mika@easybase.com> Date: Sat, 16 Nov 1996 05:47:28 -0200

Thank you so much for responding! I will check out your book.

My problem is not so much that *I* don't know what our niche is, but how to second guess the user. We are aiming for non-technical people - not the kind of people who do power searches, and I'm not counting on them to do qualifying searches (but then I assume that they would be quick enough to give up searching for "database" on AltaVista and go to Yahoo).

Thanks again. Check out our site when you have a chance.

-Mika


THE IMPORTANCE OF SEARCH ENGINES -- Richard Seltzer

From: Richard Seltzer <seltzer@samizdat.com> Date: Wed, 20 Nov 1996 09:48:36 -0500 (EST)

To: John Audette - Moderator <isales@mmgco.com>

I agree entirely about the importance of search engines -- especially for small operations. Before AltaVista came along, small Web sites were in deep trouble. What had been an "even playing field" was becoming dominated by sites with clout in the traditional media. Then with full-text search it became easy to find any Web page -- based solely on content, not on publicity. Now it's becoming increasingly important to clearly state -- in text -- what your site and your pages are about and to put the most important words in the HTML title and in the first few lines of text -- to do everything you can to ensure that people who want your kind of information and/or products see your site high on the list of search results. The better you understand how search engines like AltaVista work, the better you can do that.

We've been talking about these issues for the last two weeks and will be talking about it again at the regular weekly chat session I host at the Boston Globe's Web site. Transcripts of the last two sessions are available at http://www.samizdat.com/chat18.html and /chat17.html

Richard Seltzer

REPLY FROM TOM HUKINS

From: Tom Hukins <tom@eborcom.com> Date: Thu, 21 Nov 1996 18:01:18 +0000

AltaVista has done a great job, but I do have a few criticisms of AltaVista and the other major engines.

Search engines are inadequate. This isn't a new opinion, they just don't help you find the information you want. Sure, you can do a full text search, you can search for whatever you want, but you're always restricted to the front-end querying system and the way in which data is displayed for every engine. In fact you're also forced to use the searching algorithm which has been setup by the engine's designers.

Because search engines are inadequate we have seen "intelligent" agents come along. These things run off round the net swallowing up bandwidth repeating an operation which has already been performed many times, namely grabbing web pages, storing them and looking for information.

A truly intelligent search engine would allow users access to the data via some sort of programming language. Macros could be implemented so the system is accessible to everyone (such as the 'link:' macro on AltaVista) and complex operations could easily be performed. The retrieval of arbitrary web pages by anyone with a $50 piece of software would become uneccessary, relieving the bandwidth for more effective uses.

The problem is that bandwidth is taken up regardless of how important it is deemed to be. This allows search engines not to make optimal use of the data they have gathered and intelligent agent users to slow down the net for the rest of us.

A similar problem is that every search engine I know of, when visiting your site, grabs the robots.txt file then the file it wants if permitted. If the engine comes back 5 mintues later it grabs the robots.txt file again then the next file it wants. Can't search engines use 'If-Modified-Since' and free up more bandwidth?

I agree with Richard that search engines can be extremely useful tools - but we should not be satisfied. They cause immense wastage of bandwidth and we cannot search them effectively. If the search engine market were not an oligopoly we might see some _real_ innovation.

Tom eBORcOM --- http://www.eborcom.com/ --- eborcom@eborcom.com

REPLY TO REPLY -- Richard Seltzer

Well, you may be underestimating the power of those search algorithms. Have you checked the help files? Have you ever used Advanced Search at AltaVista? Yes, you are forced to use the algorithms set up by the designers -- but it's extremely powerful and fast and should become a de facto standard.

Actually, AltaVista obviates the need for a plain old information-seeking agent. Why send a robot to look at the entire Web to track down one particular kind of information, when a complete index of all the words on the Web is sitting on one site? Seems like a ridiculous waste of time and bandwidth.

> A truly intelligent search engine would allow users access to the data via some sort of programming language...

But there's no need for such agents on today's Internet. (I suspect that will change over time, as electronic commerce becomes more prevalent there will be a role for buying agents and selling agents. But for just seeking information, the personal doesn't make sense.

As for "macros", AltaVista has quite a few. Yes, link:yourdomainname will tell you all the Web pages that have hypertext links to your site. Others include -- host: url: applet: image: domain: anchor: text: and for newsgroup searches from: to: date: subject: text:

> The problem is that bandwidth is taken up regardless of how important it is deemed to be. This allows search engines not to make optimal use of the data they have gathered and intelligent agent users to slow down the net for the rest of us.

I agree that that info-gathering agents are a waste of bandwidth and probably a nuisance to the sites they visit. But the optimal use of search agents depends on the users. Most users of AltaVista just type in a word or two in the Simple Search query box. Often that gives them what they want. But they very very rarely bother to look at the help files or use Advanced Search. If they don't find what they want right away, they are probably inclined to just connect to another search engine and then another, each time typing in the same word of two. When what they should do is fine tune their search, using the commands that are available and using the iterative capability of AltaVista (your old query stays in the box, so you add to/build on it).

> A similar problem is that every search engine I know of, when visiting your site, grabs the robots.txt file then the file it wants if permitted. If the engine comes back 5 mintues later it grabs the robots.txt file again then the next file it wants. Can't search engines use 'If-Modified-Since' and free up more bandwidth?

AltaVista does time its visits according to the records it keeps of how frequently a page tends to be updated. And it also, politely, has algorithms to avoid taxing the capacity of smaller sites. It measures the time it takes to retrieve a page from a given site. If it's slow (perhaps the system is busy, perhaps they have a slow server or a slow connection), then AltaVista waits a while before asking for the next page there (the delay depending directly on the response time).

I agree that agents, today, are an enormous waste of bandwidth. But search engines, used properly by people who make the effort to learn to use them well, go a long way toward reducing bandwidth waste, and helping people get exactly what they want quickly (pointing them right to the page that has the answer, rather than just to the front door of a site where the answer may be buried many clicks away.

So whatever search engine you use, invest a little time in learning it (you have no problem taking a few minutes to learn a new piece of software). (And if it's AltaVista you are using you might also want to check out the book -- The AltaVista Search Revolution -- which is now in bookstores, and which you can also buy online at the AltaVista site.) (And you might also want to check the transcripts of the recent chat sessions where we're been discussing these kinds of issues: http://www.samizdat.com/index.html#chat

Richard Seltzer

REPLY TO REPLY TO REPLY -- Tom Hukins

From: Tom Hukins <tom@eborcom.com> Date: Fri, 22 Nov 1996 13:58:04 +0000

>Well, you may be underestimating the power of those search algorithms. Have you checked the help files? Have you ever used Advanced Search at AltaVista? Yes, you are forced to use the algorithms set up by the designers -- but it's extremely powerful and fast and should become a de facto standard.

The Advanced Search facility isn't bad. The documentation does not tell you how to do the following, however:

1. Search only the META keywords

2. Perform a search exclusing any text which may be in the same <FONT COLOR> as the background.

3. Specifying whether you wish to search <!-- commented text --> and the weighting that should be given to this relative to normal text.

Here are 3 examples of things you just cannot do with AltaVista that could be very useful. I'm sure there are more, I thought up these examples in 2 minutes. If I want to do a search like this I have to write my own intelligent agent and then search the data it retrieves.

I cannot weight the proximity of words in AltaVista. Yes, there is the NEAR command, but I can't say things like "fairly near" and "very near".

>Actually, AltaVista obviates the need for a plain old information-seeking agent. Why send a robot to look at the entire Web to track down one particular kind of information, when a complete index of all the words on the Web is sitting on one site? Seems like a ridiculous waste of time and bandwidth.

It doesn't. I hope the examples above show the need to provide a front-end to AltaVista which allows users to program and store macros which can operate on the HTML.

>I agree that that info-gathering agents are a waste of bandwidth and probably a nuisance to the sites they visit. But the optimal use of search agents depends on the users. Most users of AltaVista just type in a word or two in the Simple Search query box. Often that gives them what they want. But they very very rarely bother to look at the help files or use Advanced Search. If they don't find what they want right away, they are probably inclined to just connect to another search engine and then another, each time typing in the same word of two. When what they should do is fine tune their search, using the commands that are available and using the iterative capability of AltaVista (your old query stays in the box, so you add to/build on it).

It would not surprise me that most users do not use search engines as effectively as they could. I don't have access to the data to show this though, but I assume you do. Educating users is important - how many search engines do you know which say "Your search may be ineffective, please read the documentation" if you type "dog" in the simple search box? How are people supposed to know that could use search engines more effectively if they aren't told by the people who run the search engines? Surely the maintainers of search engines must take some of the responsibility for the lack of education of web users and therefore they are partly responsible for the over-use of "intelligent" agents. I will admit that AltaVista is better than most as it sends out useful tips with each search, but it could do better.

>I agree that agents, today, are an enormous waste of bandwidth. But search engines, used properly by people who make the effort to learn to use them well, go a long way toward reducing bandwidth waste, and helping people get exactly what they want quickly (pointing them right to the page that has the answer, rather than just to the front door of a site where the answer may be buried many clicks away.

Search engines do go a long way. I hope I have shown how they could go further.

Tom eBORcOM --- http://www.eborcom.com/ --- eborcom@eborcom.com


MESSAGE FROM FAN -- Gina Zuccala

From: Sales <sales@northeast.net> Date: Thu, 21 Nov 1996 20:43:48 -0500

Hello. I am the owner of an Northeast Internet Services in Long Island and I wanted to let you know that your discussion today about search engines was very interesting. Sorry I missed it. We have been experiencing the same frustrations that were discussed today. I appreciate all the tips from todays discussion. I look forward to sitting in on your next discussion.

Gina Zuccala, President, Northeast Internet Services, Inc. http://www.northeast.net


MESSAGE FROM EXCITE -- Ted Resnick

From: Ted Resnick <tedres@excite.com> Date: Thu, 21 Nov 1996 18:42:54 -0800

I found your site several weeks ago, but have not had the opportunity to sit in on a chat. If you have a schedule of future topics, please let me know. If you plan on continuing your topic of search engines, please let me know as well, and I will do my best to attend. I would welcome the opportunity to add another perspective for your audience. In the meantime, have a happy Thanksgiving!

Ted Resnick -- Online Marketing Manager -- http://www.excite.com/


SUGGESTIONS FOR FUTURE TOPICS -- Tracy Marks

From: Tracy Marks <tmar@tiac.net> Date: Fri, 22 Nov 1996 04:34:15 -0500

I have been visiting your site regularly and reading your transcripts from your Thursday afternoon chats although I can't attend them. In regard to topics you might address in the future, I'm interested in: personalizing Web pages and sites (cookies, Each-to-Each, etc.)

On a related note, I'm also interested in the following:

a) drawing visitors to a site via low-cost interactive means - uses of chat, irc, message boards, survey etc.

b) how to put one's business online without spending much money - esp. for one-person businesses

c) difficulties and suggestions for LOCAL businesses using the web and creating sites - those dependent on local customers and clients (psychotherapists, beauticians, clothing shops etc.)

How about more focus on the little guys and those with limited budgets? My total budget for advertising is about $500 a year and I'm just starting to plan my site.

Tracy Marks, M.A. tmar@tiac.net tracy@marks.net http://www.ezref.com/states/ma.htm

p.s. thanks for all your services!


ADVICE ON HOW TO BUILD LOW-COST, MULTI-PURPOSE WEB SITE -- Tracy Marks

From: Tracy Marks <tmar@tiac.net> Date: Sat, 23 Nov 1996 01:09:50 -0500

I enjoyed the transcript of your recent chat session, and have a question about creating a multi-purpose web site so that it gets a lot of listings in search engines. My issue is: I have not one, but four different and unrelated part-time professions - all local. Among other things, I am a psychotherapist and author of numerous self-help books as well as a part-time trainer and consultant regarding Windows software, and a nature photographer.

Recently, I purchased my domain name (Windweaver) and am wondering how to create a site that caters to three or four professions. My vision is three separate entranceways, with a few loose links in between, and also a variety of links and information pertaining to Boston, to attract local people.

But my question is: Will I be able to get a variety of separate listings for each of my entranceways, so that I can keep them separate and have entry pages on each that only pertain to one particular specialty? I don't have the funds to pay additional $600 or so a month per additional domain,address and web space, and would prefer to divide up one domain into 3 or 4 segments, of 2-4 MB each.

Given the above, is there a particular strategy you'd advise in setting up my site so that it comes up frequently in the search engines and directories under four different addresses (all with the same domain)?

Thanks for all the services you are providing and for (hopefully) responding to my questions....

Tracy Marks, M.A. tmar@tiac.net tracy@marks.net http://www.ezref.com/states/ma.htm

REPLY FROM RICHARD SELTZER

From: Richard Seltzer <seltzer@samizdat.com> Date: Sat, 23 Nov 1996 08:49:31 -0500 (EST)

> Recently, I purchased my domain name (Windweaver)

good name.

> and am wondering how to create a site that caters to three or four professions.

First, in a world of full-text search engines like AltaVista, home pages do not have much significance. People who submit queries get sent to the particular page with the particular information they want, regardless of where it is in your site. You should design your site with the idea that any/every page can be an entry point. You should have some brief information on each page that let's people know where they are (what the site is all about), and navigation buttons to lead them to wherever else they may want to go at your site. In your case, probably you'd want at least four navigation buttons on each page, leading them to index pages for each of the four areas. And each of the index pages should list all the documents in that area. That way, from any page the user could get to any other page in just two or three clicks.

> But my question is: Will I be able to get a variety of separate listings for each of my entranceways, so that I can keep them separate and have entry pages on each that only pertain to one particular specialty?

There's no need to get fancy or register separate domain names. You don't even have to use separate directories, though you migh for your own convenience. Remember you don't have four entranceways. Every page is an entranceway. You simply provide four different categories of information.

> Given the above, is there a particular strategy you'd advise in setting up my site so that it comes up frequently in the search engines and directories under four different addresses (all with the same domain)?

What you suggest is no problem at all with AltaVista, since it gives no priority at all to home pages. It treats every page equally.

I'd suggest that you submit the four index pages (the more text, as opposed to graphics the better) to AltaVista (clicking on ADD URL) as soon as the pages are public.

For Yahoo, you'll have to go through the lengthy process of figuring out the right categories to be filed under. And the other search engines are all different.

Take a look at my article on how to publicize Web sites over the Internet.

Hope this helps.

Richard Seltzer


FUTURE SESSIONS -- WE'LL SKIP THANKSGIVING, AND CONTINUE TO TALK ABOUT THE IMPACT ON SEARCH ENGINES ON INTERNET BUSINESS DECEMBER 5 -- Richard Seltzer

From: Richard Seltzer <seltzer@samizdat.com> Date: Fri, 22 Nov 1996 13:55:14 -0500 (EST)

We'll skip next week because of Thanksgiving and have our next session on Thursday, Dec. 5.

From the very active discussion yesterday it seems that by focusing on the same topic for several sessions in a row, interest seems to grow and the discussion becomes much more useful and informative (we seem to attract more well-informed people as we go along). Hence on Dec. 5 we'll once again focus on "The impact of search engines on Internet business." (My main area of experience there is AltaVista. I understand from a recent email that someone from Excite plans to join us next time.)

Richard Seltzer


POTOMAC KNOWLEDGE WAY -- Michael Horwatt

From: Michael Horwatt <michaelhorwatt@horwattassocpc.com> Date: Wed, 31 Dec 1969 19:00:00 -0500

Thanks for the heads up on the transcript. I found your observation about pursuing the same topic useful. Next month - God willing - a colleague and I will launch a project for the Potomac Knowledge Way. You might check out knowledgeway on the web. It's a promising organization that I expect will have a pervasive impact regionally and nationally. Based on your comment, I plan to test that idea immediately: keeping the same topic for a period of time. Unlike you, we plan to start in an E-mail format. As moderators, we want to learn the ropes, get a better idea about preparation time, logistics and scheduling before jumping to the chat format. I admire your grit for taking on such a demanding commitment. Once I get a breather, and a chance to look at your material, I'll follow up with about suggestions for collaboration. For now, I have to prepare my next newsletter, part of a series on legal issues related to defamation claims that arise from electronic communications.

Michael Horwatt, Michael Horwatt & Associates P.C., Member: Virginia State Bar, District of Columbia Bar


Two Questions: Determining the Source of Web Traffic and Cookies -- Nancy Enright

From: Nancy Enright <Nancy_Enright.IDEA@mailgw.idea.com> Date: 26 Nov 96 19:03:19 EDT

1. Source of web site traffic.

During the 11/21/96 chat, you asked Audrey:

>Do you have a Web site/page? And does much of your traffic come by way of search engines?"

How can you tell which traffic is coming via search engines? Can you tell which keywords were used? I used to poll all of the people who sent me email about my Oceanfront Vacation Rental in Maine, in order to fine tune my web page. But most people could not remember which keywords or the search engine that they had used, so it was not very helpful. In addition, many people did not respond, so I stopped asking. I did not want to scare off potential customers.

2. Cookies

I recently had a bad experience with cookies - the slimy type. I am a developer for Microsoft NT. On their home page, Microsoft announced that they were the top rated site on the top 100 most popular web site list. Their home page had a URL to click to see the list, which I did. When I reached the top 100 site, I skimmed the screen and saw only 1 URL to click on the first screen. Since I was in a hurry, I did not read the text, I just clicked it, expecting to see Microsoft as the first entry. But instead, all I saw was a list of sex related URLs. I read down the list looking for Microsoft, wondering why in the world they would brag about being on this list. I quickly went back to the previous page. Then I read it more closely and learned that the list of the top 100 non-X-rated sites was a few screens down and that what I had clicked on was the list of the top 100 XXX sites. I did not visit any of these sites, but the next day I received email advertising a sex web page; the first such advertising I have ever received, and I hope the last.

Back on 8/12/96, in order to try to prevent cookie attacks, I followed instructions that I heard on the radio show "On Computers". I deleted my cookie.txt file for both Internet Explorer and Netscape, created empty files and made them write protected. Apparently this did not do the trick. Both cookie.txt files are still dated 8/12/96, so I do not think this sex page left anything on my machine, but clearly it made a record of my visit to the top 100 xxx page. I just learned that both Internet Explorer 3.0 and Netscape 3.0 have options to prompt me before accepting any cookies. If I had had that option turned on, would this have prevented my email address from being recorded by this top 100 page? I just turned on this option in both browsers, but I do not know if either one is working. Do you know of a web page which uses cookies that is safe to visit? (No sex pages, thank you!) (Oops.. I just did a search of my hard drive for cookie.txt and found a third one that was not write protected - not sure if that was the problem. I have multiple installs of Netscape, so I have to figure out which one my shortcut is pointing to. Can the cookie files be named something other than cookie.txt?)

As another approach to this privacy problem, I followed some of the links in the "Definition of Cookies" section of the Nov. 21 chat transcript. I tried http://www.privnet.com, but Privnet was just bought out by Pretty Good Privacy, Inc., and their web page states to come back in December to see if their free products will still be free. I also tried www.anonymizer.com, but their FAQ was unavailable. Do you have to start all of your web surfing from their page in order to stay anonymous? Their start page came up very slowly and it is a pain not to be able to use my favorites list.

Thanks for all of your hard work,

Nancy Enright nenright@idea.comhttp://www.vacation-inc.com/rentals/enright.html

Reply

1) I keep counters on all my pages. The first symptom that lots of your traffic comes by search engines is when pages other than your home page start getting more hits than your home page. To double-check that, do an AltaVista link: search for the URLs of pages with higher than expected hit rates and see if perchance one or more high traffic sites have linked to those particular pages. If that isn't the case, then put an email on the high traffic pages, asking readers to drop you a quick note saying how they found your page (what search engine and what search).

In my case, I have lots of "flytrap" pages -- ones like my list of every book I've read for the last 38 years, which comes up on results lists for searches for about a thousand different authors. That results in lots of interesting email from folks who are very willing to say what search engine they were using (in my case it's about 90% AltaVista).

That's bad news about privnet. I really like Internet Fast Forward, and my beta version is due to time out soon...

Here's some cookie stuff from issue #18 of my Internet-on-a-Disk http://www.samizdat.com/news18.html:

NSClean http://www.simtel.net/pub/simtelnet/win3/inet/nsdemo2.zipftp://ftp.simtel.net/pub/simtelnet/win3/inet/nsdemo2.zip

This is a demo of commercial software that can clean your hard disk of information that is automatically saved by the Netscape Navigator and that you may not (for privacy reasons) want to pass along to the Web sites you visit or to others who have access to yourcomputer. The demo shows you your "cookie" file, your bookmark file, your cache directory contents, newsgroups you visit, newsgroup messages you read, as well as the Netscape history database which logs every file and picture you look at. The NSClean productlets you control the contents of those files. (Current version works with Windows 3.x with Netscape installed.)

Center for Democracy and Technology http://www.13x.com/cgi-bin/cdt/snoop.pl

Discussion regarding civil liberties and privacy implications of software that gathers info about Web site visitors, without consent. They run a Java program that captures your email address and automatically sends you an email message to demonstrate that it can be done.

Andy's Netscape HTTP Cookie Notes http://www.illuminatus.com/cookie

Shows what cookies can tell a Web site about you. (This further confirmed for me that that Internet Fast Forward effectively blocks out cookie info.)

The Anonymizer http://www.anonymizer.com

A simple, no-cost way to block your identity from the Web sites you visit (without having to download any software). By visiting the Anonymizer Web site before visiting other sites, you are assigned an anonymous identity, which is all that is revealed to those other sites.

Also there's more stuff on Cookies in the followup in the chat18 transcript -- http://www.samizdat.com/chat18.html#followup

I hope that's some help.

Best wishes.

Richard Seltzer

PS -- I really appreciate your contributions to the chat sessions. Hope you can make it on Dec. 5.


Previous transcripts and schedule of upcoming chats -- www.samizdat.com/chat.html

To connect to the chat room, go to www.samizdat.com/chat-intro.html

The full text of Richard Seltzer's books The Social Web, Take Charge of Your Web Site, Shop Online the Lazy Way, and The Way of the Web, plus more than a hundred related articles are available on CD ROM My Internet: a Personal View of Internet Business Opportunities.
 

a library for the price of a book.

This site is Published by B&R Samizdat Express, 33 Gould St., West Roxbury, MA 02132. (617) 469-2269. seltzer@samizdat.com
 


Return to B&R Samizdat Express



Internet Business Showcase:
| | 
Google
  Websamizdat.com