BUSINESS ON THE WORLD WIDE WEB --

November 14, 1996 -- Impact of Search Engines on Internet Business


Transcript of the live chat session that took place Thursday, November 14, 1996.

These sessions are scheduled for noon-1 PM US Eastern Time (GMT -4) every Thursday.

These sessions are hosted by Richard Seltzer. If you would like to receive email reminders of our chat sessions, simply send a blank email message to businessonthewebchats-subscribe@yahoogroups.com or go to http://groups.yahoo.com/group/businessonthewebchats and sign up there.

For transcripts of other previous sessions and a list of future topics, click here.

For an article on how to make "business chat" work (based on this experience), click here.

Since the chat itself happens at a rapid pace, it's often difficult to note interesting facts in particular URLs as they appear on-line. Here's a place to take a more leisurely look. I've rearranged some of the pieces to try to capture the various threads of discussion (which sometimes get lost in the rush of live chat).

Please send email with your follow-on questions and comments, and suggions for topics we should focus on in future sessions. So long as the volume of email responses is manageable, I'll post the most pertinent ones here for all to see.


Threads (reconstructed after the fact):


INTRODUCTIONS

Richard Seltzer (199.3.129.189) - 11:56am -- The scheduled chat is on Business on the WWW. If you are here for that discussion, please identify yourself.

Richard Seltzer (199.3.129.189) - 11:57am -- We're here to share experiences about doing business on the Internet -- particularly the World Wide Web. What works? What doesn't work? Why? What are the trends that matter? How can you/should you adapt to the Internet culture and environment? I work for the Internet Business Group at Digital Equipment in Littleton, MA. In that capacity, I end up talking to people from large companies about how they can use the Web for business. I also have my own personal Web page -- which is content rich and no frills -- which I do for practically nothing and draws a fair amount of traffic and attention.

Richard Seltzer (199.3.129.189) - 12:00pm -- Today, we're going to focus on the impact of search engines on Internet business, including, but not limited to, Web page/Web site design. My book The AltaVista Search Revolution was just published by Osborne/McGraw-Hill, so I'm talking about this topic often and people seem to have lots of questions.

Richard Seltzer (199.3.129.189) - 12:07pm -- We're here to talk about Business on the WWW, and in particular about the impact of search engines (like AltaVista) on Internet Business. We also welcome other Internet-business questions, particularly followups to previous topics. By the way, I post the transcripts of these sessions and add followup messages check http://www.samizdat.com/index.html#chat For instance, I just added about half a dozen followup messages to the transcript about WebTV and other low-cost Web-access devices -- http://www.samizdat.com/chat14.html Take a look.

Al (153.35.98.134) - 11:57am -- Is the scheduled chat over?

Richard Seltzer (199.3.129.189) - 11:58am -- Al -- The scheduled chat on Business on the WWW is about to begin -- officially at noon. I'm posting preliminary info now. Are you here for Business on the WWW?

Al (153.35.98.134) - 11:59am -- I'll be back in 10.

G. Benett (192.146.145.220) - 12:17pm -- Hi, Gordon Benett here, editor of web-based Intranet Design Magazine(sm), published biweekly at http://www.innergy.com/.

Richard Seltzer (199.3.129.189) - 12:19pm -- Welcome Barbara, and Gordon, and anonymous (I'm not typing fast enough).

Bob Fleischer (192.208.46.249) - 12:00pm -- Bob Fleischer, Digital's System Integration, Nashua, NH -- I've been working in the "search" business for about 25 years -- it seems that only very recently has useful searched come to the masses, partly because the masses are much bigger, and partly because people have access to more text (on the Internet/Web) and can maintain more text on their desktops (when a gigabyte is a "starter" disk!)


Implications of Full-Text Search

Richard Seltzer (199.3.129.189) - 12:02pm -- Welcome, Bob -- Yes, the search question is broader than just the Web or newsgroups or even the Internet. How do you find any electronically stored information anywhere? Brian Reid at Digital's Network Systems Lab brags that he has saved every email he has sent or received for about 20 years, and now with AltaVista he has a tool to retrieve any of it whenever he wants -- making that otherwise massive haystick of information extremely valuable to him.

Richard Seltzer (199.3.129.189) - 12:04pm -- Bob -- To me the biggest mind-shift change from full-text search/AltaVista comes from the notion that now you do not have to organize information in order to find it later. You don't need to guess now the categories that will be important to you in the future. You don't need to construct neat little branching tree directories and make careful time-consuming choices about how to file information. You don't need traditional databases (in many instances). It's a whole new way of dealing with information.

Al (153.35.98.134) - 12:06pm -- From Al - explain briefly the revolutionizing nature of Alta V and how it facilitates this new method for info retreival.

Richard Seltzer (199.3.129.189) - 12:12pm -- Al -- I'll try to deal with your questions one at a time. First, the revolutionizing nature of AltaVista has to do with its full-text indexing. Absolutely every word is held in the index -- not just key words. That makes it possible to do searches for phrases like "to be or not to be" where none of the individual words would be "key" by anyone's definition, but altogether they are important. Once you have a huge set of text to index in this manner -- with the ability to get results in a few seconds despite the enormous size -- you can search for anything and get useful results even though the original set of information is totally disordered -- complex like "spaghetti" is the way one of the developers, Jeremy Dion, put it. Imagine being able to find not just a needle in a haystack, but any piece of straw in a haystack and without disturbing/reorganizing the stack.

Bob Fleischer (192.208.46.249) - 12:10pm -- While I agree that it is great to be able to retrieve information without organizing it (or, almost as difficult, knowing its organization) -- I've been using that to push text search for years -- I see it as a tool highly complementary to traditional hierarchies and categories. Depending upon the data, the user population, and the task, you may want to have both ways to find info.

Richard Seltzer (199.3.129.189) - 12:19pm -- Bob -- Interesting. Yes, one tool doesn't solve all needs. There's a synergy and coexistence. I'm inclined to wax enthusiastic about full-text indexing because it's a new choice, a new tool that's available to us and many people aren't yet aware of the implications. But, yes, it doesn't replace so much as complement more familiar ways of doing things -- while making it possible to accomplish things that were impossible before. And thanks for that pointer with regard to industry specific searches. Does anyone else have examples of other industries or areas of inquiry with good targeted search capabilities?

G. Benett (192.146.145.220) - 12:21pm -- Richard, full-text indexing *is* powerful, but don't all major search engines today support it? Is AltaVista doing more, or doing the same thing better?

Richard Seltzer (199.3.129.189) - 12:26pm -- Gordon -- Others are following the AltaVista lead. But before AltaVista came along, the only ones out there were indexing keywords, or excluding a fixed set of "common" words to save space and speed response. The only way to truly tell what a search engine has in its index is to try specific queries with several different ones and see the results. The AltaVista researchers/developers were really interested in testing the limits of the technology. Business goals came into play after it was all together. They made no compromises -- their aim was to index everything completely and provide as rapid response as possible at the same time.


Search and Bookmarks

Bob Fleischer (192.208.46.249) - 12:06pm -- One thing I've noticed is that I "bookmark" things far less often -- usually once I've found an interesting site or page, I can remember enough about it to find it again, if I ever have need to, using a search engine such as AltaVista. (Finding things on huge bookmark lists can be a problem, too!)

Richard Seltzer (199.3.129.189) - 12:09pm -- Bob -- I agree about bookmarks -- too many is hard to manage, and it's easier to do an AltaVista search. On the other hand, if I come up with a particular good and complex query that focuses on something I'm interested in checking regularly, I can add the AltaVista query results to my bookmarks, and then readily relaunch that search whenever I want.


Robot Exclusion

Richard Seltzer (199.3.129.189) - 12:16pm -- Regarding why to exclude Web crawlers. Several reasons: 1) work in progress -- you have pages that aren't finished, that you want a limited set of people to look at and comment on but that you don't want generally indexed. Exclude the robot from that particular file or directory. (Keep in mind that not linking to a page from anywhere else is not enough. Be sure to shut of the "directory indexing feature" in your server software, if it has that feature, Otherwise, crawlers can find the complete contents of your site, regardless of whether there are no links to particular pages.

Databases not indexed

Richard Seltzer (199.3.129.189) - 12:42pm -- By the way, AltaVista (today) also does not index databases on the Web -- because the crawler ("Scooter") just fetches pages, it can't submit queries. This too influences entrepreneurial behavior. Or even the behavior of individuals. For instance, if you are looking for a job, simply posting your resume on your Web site and submitting that URL to AltaVista so it's added soon, could make you far more visible to headhunters and potential employers than going through all the rigamarole of submitting information to dozens of separate job search databases on the Web.

Cookies

Richard Seltzer (199.3.129.189) - 12:23pm -- All -- by the way, one of the interesting implications of having a search engine like AltaVista around as a free service is that it changes the behavior of Internet entrepreneurs by shifting the cost/benefit balance for a variety of activities. For example -- "cookies." I personally hate "cookies". I believe that a Web site should and could ask visitors to voluntarily provide information about themselves, but shouldn't do it with technology like "cookies." It so happens that AltaVista does not index pages with "cookies" hence there is a pretty good incentive for not using cookies if you want your pages found.

G. Benett (192.146.145.220) - 12:28pm -- Richard, I have to agree with you about cookies. The latest browsers can be configured to prompt on cookie requests. Try it sometime and you'll be amazed how many little messages are being written to your disk! Sun's Java Workshop site, for instance, tosses 10-12 cookies for each page load. In prompt mode you have to hit Cancel to decline each one, which gets old *very* quickly.

Richard Seltzer (199.3.129.189) - 12:32pm -- Gordon -- Yes, I'm using Netscape Gold 3.0 and I get those annoying cookie messages all the time -- or at least I did until I installed Internet Fast Forward (from http://www.privnet.com) which eliminates cookies and also zaps banner ads. Love it.

John Fricker (editor www.Program.com) (206.101.74.69) - 12:30pm -- G. Benett, some servers attach a cookie to every header unless a cookie already exists. The server is only trying to store one cookie but it tries for every element on a page. Each graphic that is.

John Fricker (editor www.Program.com) (206.101.74.69) - 12:32pm -- Richard, how does AV deny cataloging to pages with cookies?

Richard Seltzer (199.3.129.189) - 12:36pm -- John -- The cookie exclusion was a side effect rather than a deliberate choice on the part of the designers. With a page with cookies attached the text indexed would not match what a user would see when linking to that site through the AltaVista Search engine. (Every visit is unique.)

John Fricker (editor www.Program.com) (206.101.74.69) -- 12:40pm -- Richard, would it be accurate to say that AV doesn't catalog pages that change given the state in the cookie but that AV will catalog pages that contain cookies.

Bob Fleischer (192.208.46.249) - 12:43pm -- I appreciate the distaste for cookies (the web kind :-), but it seems to me that an "index everything" crawler weakens its claim the more it arbitrarily chooses not to index pages. Unfortunately a lot of useful information is behind those cookies, and it isn't all (or even mostly, as far as I see) customized, never-twice-the-same info. Except for honoring the robot exclusion file, I would want an "index everything" crawler to really index everything, to be as aggressive as possible.

Richard Seltzer (199.3.129.189) - 12:47pm -- John -- Good question. I need to be more precise. 1) AltaVista doesn't index dynamic pages -- pages that change very often, for instance to fit the profile of an individual user 2) it also doesn't index pages that use cookies in the URL -- pages that in handing off a cookie make a change in the URL -- because then the URL is unique and wouldn't do another user any good. But if neither of those are the case, a page with cookies will be indexed. Thanks for that one.

G. Benett (192.146.145.220) - 12:36pm -- John F. - that makes sense ... what would be the value of branding the same client twelve times? Maybe you know the answer to this cookie question: if I make the file "cookies.txt" read-only, I can turn off prompting and still load pages smoothly. Of course nothing is being written to local storage. Have I found a cheap cookie cutter, or am I kidding myself? as if

John Fricker (editor www.Program.com) (206.101.74.69) - 12:39pm -- G. Benett, that will eliminate persisant cookies between sessions. Netscape at least caches current cookies in memory until exit. If cookies.txt is read only all cookies from that session would be lost.

Richard Seltzer (199.3.129.189) - 12:51pm -- Bob -- Yes, regarding inclusiveness in the index. As I tried to clarify in a previous note -- its the "unique URL" (because of a cookie) and the dynamic page that AltaVista does not cope with (because technically it would be virtually impossible to do so). But given that fact, a Webmaster could provide alternative pages/text that is static and cookie free, in order that the site be fully indexed.

John Fricker (editor www.Program.com) (206.101.74.69) - 12:53pm -- Richard, ahhh. I see you are using the term "cookie" more broadly than most. An HTTP cookie is specifically an extension to the HTTP header. What you refer to as a "cookie in an URL" serve a similar function as HTTP cookie in that it provides state information to the server yet to call it a "cookie" is misleading. Dynamic URLS (those ugly incomprehensible paths) are something special themselves.

Dave from MI (208.130.91.27) - 12:51pm -- In case anyone was wondering: Cookie Definition The main purpose of cookies is to identify users and possibly prepare customized Web pages for them. When you enter a Web site using cookies, you may be asked to fill out a form providing such information as your name and interests. This information is packaged into a cookie and sent to your Web browser which stores it for later use. The next time you go to the same Web site, the server will request the cookie and will therefore have all the information you previously entered. The server can use this information to present you with custom Web pages. So, for example, instead of seeing just a generic welcome page you might see a welcome page with your name on it. The name cookie derives from UNIX objects called magic cookies. These are tokens that are attached to a user or program and change depending on the areas entered by the user or program. Cookies are also sometimes called persistent cookies because they typically stay in the browser for long periods of time.

Richard Seltzer (199.3.129.189) - 12:55pm -- Dave -- That definition sounds good except for the bit about your having to fill out a form. That's a good/voluntary way to do it. What I object to is the passing back and forth of information about me the user between the browser and the server which happens automatically without me having registered in the first place. And that's what commonly happens today. There are some neat sites that demonstrate what servers learn about you without your consent. I talked about it in my newsletter Internet-on-a-Disk issue #18 http://www.samizdat.com/news18.html#curious

John Fricker (editor www.Program.com) (206.101.74.69) - 12:56pm Dave, your definition of cookie is close. Cookies stored on the client side are limitted in size and typically are not large. Typically they are an ID number that provides for lookup into a database that may or may not be attached to specific user provided information.

G. Benett (192.146.145.220) - 12:55pm -- It seems to me that cookies are an attempt to reintroduce state memory into the stateless HTTP protocol, whereas dynamic URLs are an attempt to prevent bookmarking (to enforce a prescribed path through a site).

John Fricker (editor www.Program.com) (206.101.74.69) - 12:58pm -- Very little can be learned by a web server about who you actually are. There is more information on this page about who I am than available to the web server.


Meta Search Engines

Dave from MI (208.130.91.27) - 12:32pm -- Richard: How do you feel about sites such as "http://www.metacrawler.com" which uses many search engines, gleams off the top ten or so sites? Is this in your opinion an effective way to search the Internet?

Bob Fleischer (192.208.46.249) - 12:36pm -- AltaVista isn't perfect (dare I say that, Richard :-) and it helps to know the strengths and weaknesses of several search engines. There are also other tools available for those really tricky search tasks. One of my favorite is MetaCrawler, http://metacrawler.cs.washington.edu:8080/, which is not a search engine itself but uses other search engines (including AltaVista) and gathers, collates, and compares the results. It is much slower than a direct search engine, but sometimes the quality of the searches seems much higher (I tend to use AltaVista first, and if AltaVista returns a lot of stuff, but the density of relevant stuff is low (lots of irrelevant hits to wad through), MetaCrawler often can be helpful. (Conversely, I'll use it as a last resort when even AltaVista can't find much.)

Richard Seltzer (199.3.129.189) - 12:39pm -- I haven't tried metacrawler (I should). But my gut feel is that when you get used to a search engine, like AltaVista, and understand how to use the range of commands there to fine tune your searches, you can get all you want and get it quickly. A site that submits a single query to more than one search engine would have to make compromises -- since they all speak different query languages -- meaning you would not have a precise a control over the results.

Bob Fleischer (192.208.46.249) - 12:55pm -- Some research in text retrieval conducted at the University of Massachusetts (at a center partially funded by Digital -- pat on the back) found that not only did different search algorithms return different documents (and in the case of ranked lists, different documents were at the top), but that combining the results from different algorithms produced a list with generally higher quality. Remember, ranking is important: if a given search engine lists the document you want on the 20th continuation page, there's a good chance that you'll give up before you get to it. Different engines, even different engines of the same generic type, do rank things very differently. (I sometimes look at Infoseek Ultra, http://ultra.infoseek.com/, just because its rankings are very different from AltaVista. I can't say they're better, but different sometimes helps me find things.

Richard Seltzer (199.3.129.189) - 12:56pm -- Bob -- Interesting point, and yes I should check metacrawler.


Repetitions of words and results ranking

G. Benett (192.146.145.220) - 12:42pm -- Richard - does AV have provisions for filtering hit-enhancing text, such as the word "intranet" appended 1000x at the bottom of a page?

Richard Seltzer (199.3.129.189) - 12:48pm -- Gordon -- Yes, it's really simple. AltaVista gives higher rank to a page that has the query word twice rather than once. But anything more than two times makes no difference at all in the ranking. All that extra repetition that people might want to throw in for the sake of making themselves more "visible" is just a waste of time.


Price of AltaVista Software

Dave from MI (208.130.91.27) - 12:25pm -- Dave Schafer, MI What could I expect to pay for a full-text, website indexing package, like AltaVista for my company's internet website?

Richard Seltzer (199.3.129.189) - 12:28pm -- Dave -- Check http://altavista.software.digital.com If you can't find the answer you want there (and it should be there), please send me email richard.seltzer@ljo.dec.com and I'll followup.


How to publicize a Web site in the right places?

barbara (199.93.126.34) - 12:12pm -- Hi! I'm here. I want to know how to publicize my website in the right places. I work for an electronic commerce company and I've listed the site in a lot of search engines, but it seems if the person doesn't know the URL or company name, it's hard to locate us. If someone is looking for a company selling electronic commerce, where would they look? What would they type in to find what they need in the search engines?

barbara (199.93.126.34) - 12:23pm -- Any chance we can talk about how a user goes about finding a certain site as opposed to the merits of search engines?

barbara (199.93.126.34) - 12:26pm -- I guess I'm a little frustrated. It's not much use trying to publicize a web site if you're not sure how the outside world can actually find you.

Richard Seltzer (199.3.129.189) - 12:31pm -- Barbara -- Sorry for the delay in getting to your questions. First, a full-text search engine like AltaVista is great for getting an answer to a particular question, for retrieving a specific piece of information. It ill take you right to the page with the answer (not just the front door of a Web site.) If it is a site that you are looking for -- you want to find a place with lots of different kinds of information about a given subject or category of products where you can browse, I'd suggest using Yahoo -- which is organized by categories (by hand) and points you to sites rather than pages. The two approaches are complementary.

Bob Fleischer (192.208.46.249) - 12:22pm -- Once you've built a web page and had it indexed by a search engine, one interesting thing to do is to use the text of your web page (or an exceprt) itself as a query, and see what other pages you get.


Industry-targeted search engines

Bob Fleischer (192.208.46.249) - 12:13pm -- An interesting kind of business site on the web is one that provides targeted search in a given industry or topic. Verity is fond of pointing to Xilinx, http://www.xilinx.com/, (since they use Verity's products), which provides a search site for the entire programmable logic industry.

Techniques for refining searches

Bob Fleischer (192.208.46.249) - 12:28pm -- One of the most important techniques in searching is iterative refinement -- doing a search, seeing what you got back, modifying the search, repeat, repeat, until satisfied. I like AltaVista especially because it makes this process rather easy. First off, AltaVista responds fast, so repeated searches are not painful! Secondly, it gives your query right back to you, so you can modify it without re-typing. Thirdly, features like the "+", "-", string quoting, and wildcard make even simple search rather powerful and facilitate refinement.

Richard Seltzer (199.3.129.189) - 12:34pm -- Bob -- You might add to your list the separation in Advanced Search mode between the query itself and the ranking words. You can fine tune your main search in the query, and then keep adding more words to the ranking to try to get the best targeted results to bubble to the top of the list.


Search engine user stats

barbara (199.93.126.34) - 12:51pm -- Is there any feedback on what percentage of users use Alta Vista as opposed to another search engine?

Richard Seltzer (199.3.129.189) - 12:52pm -- Barbara -- It's hard to put it in percentages. But I believe that AltaVista at 21 million hits per day is number one today.


Metatags and results ranking

John Fricker (editor www.Program.com) (206.101.74.69) -- 12:45pm -- AV uses the meta KEYWORD and meta DESCRIPTION tags. Do any other search engine use the same or similar tags?

G. Benett (192.146.145.220) - 12:47pm -- Having recently updated the search registration of my webzine, I can attest that most commercial engines use META tags -- AltaVista, Infoseek, Infoseek Ultra, HotBot, Inktomi and Lycos do.

Dave from MI (208.130.91.27) - 12:58pm -- Richard: We recently submitted our site to AV (www.MercyHealth.com. We have a buffer page which prompts for the correct browser. The first text of that buffer page contains a statement like "If your browser is not Netscape ...". How can I have the text that appears as a result of the search be more relevant (I.E. We are a health services organization...)?

G. Benett (192.146.145.220) - 1:01pm -- Dave S. - post that question on Intranet Exhchange and I'll answer it. Bye all.

Richard Seltzer (199.3.129.189) - 1:02pm -- Dave -- To make the descriptive text more relevant use a META tag. Details can be found in the on-line help. (Also in the book. -- by the way, remember my book The AltaVista Search Revolution is now in book stores in most places (from Osborne/McGraw-Hill). You can now also order it from the AltaVista Web site http://altavista.digital.com/

Bob Fleischer (192.208.46.249) - 1:00pm -- To increase your ranking, use the meta tags, use relevant terms in your title, use all your important terms near the top of the document. (It will depend on the search engine, of course -- I'm sure Richard's book has lots of information for AltaVista.) Also, make sure you understand what terms people will be using to find you -- it's a guessing game, will they guess the words and names that you use?

LocalCity - Andrea (204.191.105.104) - 1:02pm -- Hello Bob, I have used Meta tags. They are not being used.?


Last-Minute unanswered questions

D.Jewell (38.233.242.63) - 12:56pm -- What kinds of elements/designs in a Web page increase the chances of it coming to the top of a search list? I'm looking at the revision of an exisitng site and we're focusing on our content as it relates to our customers/prospects, but what are the "search engine" considerations?

LocalCity - Andrea (204.191.105.104) - 1:00pm -- hello everyone, I have a question regarding Lycos, I have a real problem in getting them to spider my sites. Some times they do it sometimes they don't. I have designed about 12 sites and only some of them will lycos spider


Wrapup

Richard Seltzer (199.3.129.189) - 12:58pm -- All -- time is running out and, as usual, it feels like we're just warming up. Please, before signing off, post your email and URL (even those of you who have been reading without posting) so we can followup. Also please send email to me at seltzer@samizdat.com with your followup questions and comments which I'll include with the transcripts.

G. Benett (192.146.145.220) - 12:59pm -- Gordon Benett, Editor, Intranet Design Magazine(sm). URL http://www.innergy.com/, email: editor@innergy.com. Thanks Richard.

tom dadakis DadaCom tomdadak@ix.netcom.com (198.211.91.66) - 1:00pm -- Joined the discussion too late to add anything.

John Fricker (editor www.Program.com) (206.101.74.69) - 1:01pm -- jfricker@vertexgroup.comhttp://www.program.com and http://www.vertexgroup.com. Thanks Richard!

barbara (199.93.126.34) - 1:00pm -- Richard, will this topic continue next week?

Richard Seltzer (199.3.129.189) - 12:59pm -- All -- Also, please let me know if you would like us to continue with this subject next week, or send suggested alternatives. I've seen a flurry of questions here in the last few minutes. I'll try to provide some answers in the transcript, but it feels like this could and should continue. For the transcript look at http://www.samizdat.com/index.html#chat later today (probably tonight.)

D.Jewell (38.233.242.63) - 1:03pm -- Dick Jewell, New Media Marketing Manager, OpenLink Software. URL http://www.openlinksw.com, e-mail: djewell@openlinksw.com. Yes, please continue next week and thanks Richard!

Richard Seltzer (199.3.129.189) - 1:03pm -- Thanks to all. Please send email to seltzer@samizdat.com with followup questions etc. and check the transcript.


Followup

How Often Does a Robot Visit a Site?

From: Cristen Hewett <cristen@outreach.com> Date: Fri, 15 Nov 1996 10:53:39 -0500

I'm interested to know how often the "robot" or search engine, updates the site. I added META tags about 2-3 months ago, and places like Lycos still haven't reflected the change.

Cristen Hewett http://www.outreach.com

Reply

AltaVista's crawler (Scooter) runs continuously, but the Web is a big place. How often it will return to your site is not predictable (it follows an expanding trail from one URL to another, with 1000 threads scouring the Web simultaneously). If you added the tags two months ago, it is very likely that AltaVista has picked them up. Have you checked recently?

(I have no direct knowledge of how Lycos does things, but my impression is that it takes a long long time for their crawler to do its full circuit).

Remember, if you have a new page or have added significant new material to an old page, you can click on "Add URL" at the AltaVista site and enter the URL of that page. The crawler will immediately (within a couple seconds) retrieve the text of your page, and it should be in the index in a day or two.

I hope this helps.

Richard Seltzer


Do Without Databases???

From: Eileen Moyer <emoyer@mcp.edu> Date: Fri, 15 Nov 1996 11:48:01 -0500

Thanks for the info about your book. I plan to check it out at Quantum at lunch today.

I started reading yesterday's chat at 12:45 and didn't have time to catch up, but printed out most of it and read it this am. As a librarian I am discouraged by some of the remarks. Do without databases??? I refer to one of your opening remarks "You don't need traditional databases (in many instances)." I disagree. although much information is available and we all hope that more government info will be disseminated over the web, traditional databases report on primary research and are traditionally costly because of the value added by the vendor ie indexing, abstracting etc. If you want to know about your competition you might not exclude a web search, but you had better have done a search on several of the business databases through a vendor such as Knight-Ridder or STN. If a doctor or pharmacist is looking for information on treatment of a disease state or a drug monograph s/he had best be looking on a proprietary database. I only wish that we keep the information on the web in context. It does not replace, merely supplement.

Another issue is the free availability of these search engines - I doubt that they will remain free for long. I know that InfoSeek bowed out of its pay portion, but I think this is just a free ride for now until someone decides how to squeeze money out of us. also, the information is just not worth paying for. One of my colleagues subscribed to Individual, Inc. and was not impressed with the coverage it gave her. She gleaned very little that she did not already get from other sources - not free sources either. I do not single out Individual,Inc.

Just a note about full-text search engines. I believe that Open Text was the first search engine/database which indexed full-text of web pages. They used that "to be or not to be" example on their homepage last year.

Yes, I would like to see you continue this theme. You could open it to Intelligent Agents which Search Engines are moving towards, although I believe Firefly was a guest on the chat a while ago.

BTW, I am co-chair of a program committee for the Special Libraries Association, Boston Chapter which will be presenting a program next March for our group tentatively entitled Information Delivery on the Internet: At Your Service? A Look at the Latest Developments and Integration in Search Engines, Intelligent Agents and Webcasting

My committee and I are therefore interested in your topic.

Eileen Moyer, Asst. Librarian, Massachusetts College of Pharmacy & Allied Health Sciences, Boston, MA

Reply

Interesting. I meant the phrase "don't need traditional databases" from the perspective of an information provider. Instead of investing in having human beings make numerous judgements to categorize information and put it in predetermined fields, you can store your information in flat fields -- without even a directory structure -- and retrieve what you want when you want it.

I believe that you are thinking in terms of existing proprietary databases, which have been built through years of work and are now well-populated with useful information. I'm not suggesting that they will go away. I am suggesting that that approach to storing and retrieving information will become less and less useful because of its enormous cost and lack of flexibility.

It will be interesting to see how today's database-centric commercial information services migrate to the new mode or come up with creative combinations of new and old. (It would, for instance, be possible to feed the full contents of an existing database into an AltaVista-type index.)

> Another issue is the free availability of these search engines - I doubt that they will remain free for long.

I, on the other hand, would be very surprised if anyone successfully charged for such a search service. Digital has derived tremendous benefit from providing AltaVista as a search service (about five months after it went online focus groups indicated that among people on the Internet the AltaVista name was far better known than the Digital name, despite the tens of millions of dollars and over thirty years of advertising and marketing that went into trying to establish the Digital brand. And not a penny had been spent to brand AltaVista at that point. People remembered it and thought highly of it because of the value free service. The company then decided to use the AltaVista name as the brand name for a whole range of related Internet software products).

And as long as AltaVista Search is free, it will be virtually impossible for anyone else to charge for a competing service.

> Just a note about full-text search engines. I believe that Open Text was the first search engine/database which indexed full-text of web pages.

I first saw Open Text at Internet World in Boston in the fall of 1995, about a month before AltaVista went public. Yes, they made very broad claims about having indexed every single word of every single page. But when I did a search of my site at their booth, it turned out they had only indexed about half a dozen of the over 300 pages at my site. And searches on a variety of topics provided hit-or-miss results -- sometimes excellent, and sometimes not having indexed the pages with the answers (or not able to retrieve the info from their index.) As soon as AltaVista came on line, everything at my little site was available. That to me was the first real instance of "full text" search.

I'm sure they've made improvements since. And I have no knowledge of the internal workings of their system. I suggest trying the same search on both machines -- using all the available tools to fine-tune your search -- and see which you prefer.

> Yes, I would like to see you continue this theme. You could open it to Intelligent Agents which Search Engines are moving towards ...

Yes, I'm interested in what will happen with regard to Intelligent Agents. I suspect that they will take off in parallel with electronic commerce, because an important application will be for price comparisons, and buying agents and selling agents.

Today, agents don't make much sense for information retrieval, since virtually all the public text of the Internet is now available from a single system -- AltaVista. Why build/use an agent to look at 30 million pages for you when all you need to do is look in one place?

> BTW,I am co-chair of a program committee for the Special Libraries Association, Boston Chapter which will be presenting a program next March for our group tentatively entitled Information Delivery on the Internet: At Your Service? A Look at the Latest Developments and Integration in Search Engines, Intelligent Agents and Webcasting My committee and I are therefore interested in your topic.

Sounds like an interesting topic. (By the way, how do you define "webcasting"?) I hope you'll be able to join us next Thursday. And please spread the word to your committee and others who might be interested.

Richard Seltzer

Reply to Reply

From: Eileen Moyer <emoyer@mcp.edu> Date: Fri, 15 Nov 1996 17:16:24 -0500

Richard - I bought the book and will look at it tonight.

> I am suggesting that that approach to storing and retrieving information will become less and less useful because of its enormous cost and lack of flexibility.

I hope not and disagree about the lack of flexibility

At a recent seminar by two business librarians I was led to believe that Digital would not be adverse to someone buying AltaVista because it is not making any money - Yes, the PR power has been tremendous.

I believe that last year/early this year OpenText was a possible contender and did bill themselves as full text indexing of the pages they scanned - but AltaVista eclipsed them quickly and no one hear of them anymore.

Don't you think that AltaVista's lack of indexing gifs and jpegs is something they should address? It isn't perfect, but I can bang at Lycos and get a gif of a skull.

Does this mean that you will definitely be discussing the search engines next thurs? I will alert my committee members.

I'm not familiar - yet - with Webcasting; it was a term my co-chair had found and we will pursue for the seminar.

Thanks

Eileen Moyer

Reply to Reply to Reply

> > I am suggesting that that approach to storing and retrieving information will become less and less useful because of its enormous cost and lack of flexibility.

> I hope not and disagree about the lack of flexibility

By lack of flexibility, I mean that with traditional databases I need to decide today how I am going to access information five years from now. I need to categorize and fill in fixed fields and decide what words are "key".

> At a recent seminar by two business librarians I was led to believe that Digital would not be adverse to someone buying AltaVista because it is not making any money - Yes, the PR power has been tremendous.

That is not at all the case. It wasn't intended to make money directly. It was a research project, an experiment that became wildly popular and hence spawned business opportunities.

They are licensing mirror sites in Australia, Europe and elsewhere to make access easier for folks on the other side of slow trans-oceanic lines. And those mirror sites will be financed by advertising.

And Digital has gone through the preliminary steps necessary to spin the AltaVista software business off as a separate company, if and when they choose to (retaining 80% ownership, I believe).

Perhaps one or the other of those moves spawned the rumor you are referring to.

> I believe that last year/early this year OpenText was a possible contender and did bill themselves as full text indexing of the pages they scanned - but AltaVista eclipsed them quickly and no one hear of them anymore.

> Don't you think that AltaVista's lack of indexing gifs and jpegs is something they should address? It isn't perfect, but I can bang at Lycos and get a gif of a skull.

It may be simply a matter of familiarity with the query language. With AltaVista I can search image:skull.gif and if there are gifs with that name, I'll get matches.

The names of images are indexed, as is any alternate (ALT) text associated with them. (that's probably what Lycos is doing.)

By the way, you can also search for java applets by name, with applet:name

But, yes, there's lots of room for improvement and the researchers are having a ball taking on those challenges -- such as indexing PostScript and Acrobat, dealing with Asian languages, adding material from databases to the index, etc.

> Does this mean that you will definitely be discussing the search engines next thurs? I will alert my committee members.

Yes, from all the interest that's been shown, that's what we should do.

Please spread the word.

> I'm not familiar - yet - with Webcasting; it was a term my co-chair had found and we will pursue for the seminar.

Interesting -- it could mean something along the lines of "flypaper" (in the transcript for 11/7/96).

Richard


Advice on Promoting a Web Site

From: Mika <mika@easybase.com> Date: Fri, 15 Nov 1996 08:30:04 -0200

I just read your post to the I-sales digest, and was wondering if you could help me. (I won't be offended if you're not interested).

I have a new Web site (see URL below) to promote a just-released product. I have a list of targets and different ways that I intend to promote my site. I'm not a newbie to Internet Marketing, but what do you do when you do All The Right Things, but it still doesn't work?

I submitted my URL manually to the top 7 or so search engines and indexes - I went back two weeks later and it was as if I hadn't submitted anything at all! Alta Vista, who had a listing, only had one page listed, not my whole site.

I'm using meta-tags and carefully chosen titles. Yet if I do a search for "database", outside of Yahoo, my pages don't show up. I'm stumped, I can't figure out how to make my pages shine above the rest.

Any ideas would be appreciated! (sorry if I'm a pest)

Mika easyBASE - The First Database For Dummies http://www.easybase.com

Reply

Check an article of mine about how to promote a Web site over the Internet (doing the free stuff) http://www.samizdat.com/public.html

When you submit a URL to AltaVista, it immediately retrieves that particular page, and adds that text to the index in a day or two. It also adds that URL to the list of URLs the crawler (Scooter) will visit as it moves around the Web. Yes, Scooter has 1000 simultaneous threads (so it's like 1000 crawlers working at once). But the Internet is enormous -- over 30 million pages, so it can take a few weeks, working continuously, for it to visit every single page. And it will be random when in its cycle it visits the URL you submitted and hence sniffs out and follows all the links from there.

If you have pages other than your main page that it is crucial for you to have indexed quickly, submit those pages as separate URLs.

> I'm using meta-tags and carefully chosen titles. Yet if I do a search for "database", outside of Yahoo, my pages don't show up. I'm stumped, I can't figure out how to make my pages shine above the rest.

Keep in mind that "database" is an awful term to depend upon. It is simply too common on the Internet. An Altavista search for that word alone would probably result in hundreds of thousands, if not millions of matches.

To make your pages "shine above the rest" you need to have a clearer, more specific idea of what business you are in. What niche? What subset of database business? Not just a category -- but what words would immediately come to the mind of your ideal customer; and make sure those words appear in the HTML titles, and first lines of text of the pages you want them to find. (Metatags are an additional route -- but you are far better off clearly stating what you are up to in the text of the page itself.)

I hope this helps.

Richard Seltzer


Previous transcripts and schedule of upcoming chats -- www.samizdat.com/chat.html

To connect to the chat room, go to www.samizdat.com/chat-intro.html

The full text of Richard Seltzer's books The Social Web, Take Charge of Your Web Site, Shop Online the Lazy Way, and The Way of the Web, plus more than a hundred related articles are available on CD ROM My Internet: a Personal View of Internet Business Opportunities.

Web Business Boot Camp: Hands-on Internet lessons for manager, entrepreneurs, and professionals by Richard Seltzer (Wiley, 2002). No-nonsense guide targets activities that anyone can perform to achieve online business
success. Reviews.

a library for the price of a book.

This site is Published by B&R Samizdat Express, 33 Gould St., West Roxbury, MA 02132. (617) 469-2269. seltzer@samizdat.com


Return to B&R Samizdat Express


<


Internet Business Showcase: