For a library for the price of a book, visit our online store at http://store.yahoo.com/samizdat
Back in 1995, in the early days of the Web, I delighted in the ability to put an entire book in a single document. With plain-vanilla static HTML and no graphics, very large, textually rich pages loaded quickly. And with internal links it was easy to have a detailed table of contents at the top of a large "page" from which you could click to any chapter, and footnote numbers in the text linked to the list of footnotes at the end, and links from there back to where you were before in the text. But then along came search engines.
Thanks to the search engines, people could now find your pages even though you did nothing to advertise/market them. That was a very valuable and important free service. So we began to pay close attention to how they worked and how they ranked the pages they indexed, because changes in the ways we designed our pages could make a big difference in traffic.
When I was researching my book The AltaVista Search Engine, back in 1996, when I was at Digital, I was surprised and annoyed to discover that AltaVista only indexed the first 100K of the Web pages it found. It would pick up links from the rest of the text, but only the beginning of a large page was fully indexed. On the one hand, AltaVista gave extra value to large, text-heavy pages, as opposed to short ones that consisted of just a few sentences and graphics and links. But on the other hand, they set this arbitrary limit. Knowing that, I went back and broke up my biggest, most useful pages into series of shorter ones -- watching that 100K limit. So instead of presenting books as single documents, I broke them into chapters. The result was less useful to my visitors, but the difference in terms of search-engine-generated traffic was important. Later, without making this information public, AltaVista changed that limit to 67K, so people, like myself, who had redesigned pages to comply with their arbitrary limit were unintentionally penalized.
On the plus side -- at least from my perspective -- the fact that search engines indexed every word on every page meant that you could create Web pages designed to be found by particular people or particular sets of people -- as a research tool or a marketing ploy. All you had to do was include the people's names (if they were unique) and the names organizations and activities that you knew they were interested in and likely to search for. I started writing articles about that technique, which I call "flypaper", back in 1996 http://www.samizdat.com/search.html#fly ; and I recently wrote another article telling about the amazing results that flypaper brought in my research into the Sergei Solovieff mystery: a variant of the Spanish Prisoner Scam http://www.samizdat.com/solovieff.html
Also, back in the early days of the Web, "anchors" were insignificant. The anchor is the set of words that is highlighted in an HTML page to indicate that it is associated with a hyperlink. Click on the anchor words, and you go to another page. Many sites would use something innocuous and meaningless like "click here" for all their links. I preferred to write out the complete URL and have that as the anchor, so visitors could know where they were going when they clicked, and also so they might remember the address itself for future reference (and for that reason, I kept my URLs as short and simple and logical as possible). Then along came Google, which treated anchors in an unusual way. Like other search engines, Google filled its index by sending out crawlers which followed trails of links. But Google also paid special attention to anchor text, and "remembered" the anchor text associated with particular links, and used that text in its ranking algorithm. They even added to their index pages that they had not yet crawled to, but only knew of indirectly through the anchors that linked to them. So if hundreds of different Web sites all had anchors that read "tyrannosaurus rex" and all those anchors linked to the same page, that page would likely come up very high on a search for "tyrannosaurus rex" regardless of what was on that page or whether Google had ever visited it. Over time, Google covered more and more pages directly and added an algorithm for estimating the relevance of a page to the text in the anchors pointing to it. But still they include many pages in their index that they know of only indirectly by anchors. And because of this practice of theirs, my practice of using the URL itself as anchor text means that my links to useful and important pages at other sites probably give those sites less benefit than if I used catchy phrases as anchors instead.
Worst of all, in terms of distortion of Web site design and encouragement of non-productive practices and business models, has been the concept of "popularity." Google and other search engines decided to define "popularity" in terms of links to a page from pages at other sites and decided to give that "popularity" lots of weight for ranking in search engine results. Before, links to your pages from other sites were helpful because visitors might click on them, giving you extra traffic. Now such links were even more important -- not for the clicks, but rather for the unwarranted interpretation that the search engines gave to them. Pages with lots of links to them got high ranking on searches, and hence got lots of traffic -- even if no one ever clicked on those links.
This mechanism gave a boost to businesses that helped mediate link exchanges among sites that had never heard of one another and might have nothing at all to do with one another. It also led to the creation of many Web pages that consisted of nothing but links -- useless links to other sites with content that had nothing in common with the linking site. Eventually, search engines like Google caught on to this practice (which they had unwittingly encouraged) and figured out how to estimate the relevance of links. Hence, those links no longer provide the traffic boost they used to; but those useless pages and link exchange programs linger on.
The value of link-based "popularity" also meant that if you were going to create a set of sites -- either by yourself or by recruiting others to run them for you -- you would be better off buying a separate domain name for each, rather than running each in a separate directory of the same site or a separate sub-domain of the same domain. If you used separate domain names, Google and other search engines using a similar algorithm, would interpret the numerous links from one of your sites to another as if those were independent sites, and hence would give you a big boost in the rankings for "popularity". For instance, Webseed built a business with a couple thousand volunteer-run Web sites, each with its own domain name. The links among these sites made these sites very "popular" by Google's algorithm, which led to substantial traffic, and (in the days when banner advertising was viable) helped generate revenue. From Webseed's final messages to its volunteer Webmasters (of which I was one), I gather that, eventually, Google caught on, and figured out how to discount incestuous linking (at least in the case of Webseed); hence, Webseed's traffic dropped precipitously, and their business model collapsed.
So why should we care? Millions of Web sites and Web-based businesses are dependent on one another. What one business does in pursuing its own best interests can affect other businesses in unintended ways. For instance, a company sending out a spam email with a subject line intended to fool people into opening it immediately trains the recipients to doubt any future message sent with a similar subject line. The more spam messages sent, the more words and phrases become "tainted", limiting more and more the vocabulary available for legitimate communication. We're all drinking from the same waterhole, and when one person pollutes it, we all suffer.
The people who make the rules and formulate the algorithms for the major search engines should take into account that their decisions affect more than the internal working of their systems and more than the satisfaction of their visitors. Those decisions can make an enormous difference in the traffic to the sites that they index (and that they don't index), bringing some companies sudden success and destroying others. Those decisions can also lead to strange, unintended distortions in Web page design, as companies, in their struggle to survive, do their best to understand the underlying mechanisms of search engines and make changes intended to boost their search-engine-generated traffic.
But while standards get publicly aired and debated by bodies with representatives of the interested parties, the details of search engine design and their ranking algorithms remain shrouded, as proprietary trade secrets; and the designers can make changes whenever they please, without telling anyone beforehand or even afterhand. And those secret decisions can have enormous repercussions throughout the Web.
We have here a case where private business interests can collide with the good of the overall community, a case where the normal rules governing "trade secrets" need to be modified. That could happen by the search engines themselves recognizing their responsibility and sharing such information in ways they never have before, and seeking input and feedback from affected companies and individuals. They could do that publicly and individually as a way to enhance their image, or privately through participation, say, in the Worldwide Web Consortium, where design changes might be openly discussed, without full disclosure to the public -- giving an opportunity for experts to probe and seek to understand the business and technical implications and the possible unintended consequences, without giving away crucial proprietary information.
This site is Published by B&R Samizdat Express, 33 Gould St., West Roxbury, MA 02132. (617) 469-2269. seltzer@samizdat.com
For a library for the price of a book, visit our online store at http://store.yahoo.com/samizdat
Return to B&R Samizdat Express
Buy Richard's book Web Business Bootcamp (published by Wiley) http://www.amazon.com/exec/obidos/ASIN/0471164194/brsamizdatexpres
.
<
| Internet Business Showcase: | ||
|
|
|