The impact of search engines on the legal profession

Part 1: What to search for and how

(based on a speech delivered to the Boston Bar Association, February, 1998)

By Richard Seltzer, seltzer@samizdat.com, www.samizdat.com


Reprinted with permission from Internet Search Advantage, ZD Journals. http://www.zdjournals.com

How to translate this article into French, Spanish, Italian, Portuguese, or GermanComment traduire en français, Cómo traducir a los españoles, Come tradurre in italiano, Como traduzir em portuguêses, Wie man in Deutschen übersetzt.



Imagine when you wake up in the morning, there's a strange new gadget beside your bed. You pick it up, push a couple of buttons, and all of a sudden you can see things you never were able to see before -- ultraviolet and infrared and you can even see through some materials. You go out on the street and discover that many of the people you meet have the same kinds of gadgets. And with this gadget, some people who don't know any better and haven't tried to protect themselves look stark naked because you can see right through those clothes. Imagine all the laws are the same today as they were yesterday. Nevertheless your profession would be transformed because people's expectations would have changed -- what you expect of them and what they expect of you, and how you define privacy and discovery and due diligence, etc.

Full-text search engines, like AltaVista, are making that kind of change in the legal profession today.

Here are a few examples of the kinds of searches you can and should do (because some of your competitors are using them already, and because very soon your clients and the judges you face will presume you know how.)

Finding trademarked names

You can use AltaVista as a preliminary check to see if a name you'd like to trademark is already in use, or to see if others are infringing on a trademark or service mark of a client. For this kind of search, it's important to understand how AltaVista handles uppercase and lowercase letters and punctuation.

If you type a word in all lowercase, AltaVista will search for both lowercase and uppercase. But if any letter is in uppercase, it looks only for that. This comes in handy when you're searching for trademarked names, which very often include unique capitalization to distinguish them from ordinary words. For instance, if I search for eXcursion (with the X capitalized), as shown in Figure A, I get matches for a trademarked product made by Digital Equipment--and only that product. The unique capitalization makes my query rare, meaning that I get precisely the results I want on the first try.

At AltaVista, all punctuation is handled in the same way. It doesn't matter whether you type a period, a comma, a slash, an underscore, or a hyphen. This means you don't need to try to guess all the ways that people posting on the Web or in newsgroups may have misspelled a product/service name that includes punctuation. For example, Digital Equipment used to make a family of computers called the PDP-11. There were several models, including PDP-11/20 and PDP-11/70. Many people would forget whether the model name used a hyphen, an underscore, a front slash, or a backslash. If AltaVista matched punctuation literally--a period to a period and a slash to a slash--to search for such a trade name, you'd have to go to Advanced Search and enter a lengthy string of queries, such as

PDP-11/20 AND PDP_11/20 AND PDP.11-20

You'd have to try to imagine every way that people might have misspelled the model name and include all those variations in the string. But since all punctuation is treated equally, all you have to do is enter PDP-11/20 in either Simple or Advanced Search, and you'll get matches for all the possible variations of punctuation.

Keep in mind that AltaVista generates a unique URL for each search you perform. So if you want to perform the same trademark search periodically, you can simply click on your bookmark to go straight to AltaVista and launch the same search, getting fresh results. Also, in Advanced Search, you can set the time frame, limiting your search to a certain range of dates. So, for instance, if you wanted to recheck once a week or once a month for trademark infringements, you could bookmark a series of searches and then alter the range of dates to show only pages that had been posted since the last time you looked.

Detecting plagiarism

Every day, the AltaVista crawlers fetch more than ten million documents, following the trail of the hyperlinks they find and continually refreshing the index, which now contains over 100 million documents. If something has been posted publicly on the Web and other Web sites have hyperlinks to that site, the chances are excellent that the content has already been indexed or will be indexed soon.

AltaVista indexes every word in each of those documents, (with one minor exception--for extremely long documents, only the first 100 KB are indexed.) This means that you can search not only for single words, but for phrases, sentences, and even paragraphs. If you place a series of words inside quotation marks, AltaVista (in both Simple and Advanced Search) looks for instances of all those words appearing in the same document in exactly that order. It doesn't matter if a given word is extremely common--"to be or not to be" works just fine. Hence, you can periodically test with sample elements of text to see if your work or the work of a client has been reused, with or without permission, anywhere on the Web.

Also, you might consider "tagging" or marking Web pages that you believe others are likely to copy (because of the content or the layout). If you're trying to protect a Java applet, there's no way to search for the content/code of that applet. But you can search for the applet's name. Someone who copied an applet might very well use the same name, not realizing that it could be uncovered by a search. At AltaVista, in either Simple or Advanced Search, just enter applet: followed by the name of the file. The search query applet:* will yield a list of all the applets on the Web.

Similarly, AltaVista doesn't index graphical images, but it does index the names of images. And people who "borrow" images often keep the same name for the file. So a search for image: followed by the name of the file (if it is a unique/rare name) could very well find instances of copying.

Detecting "in-lining"

One way that some people "borrow" an image on the Internet is through a practice known as "in-lining." Instead of copying the image file and posting it on their own site, they include the complete address of your image as part of the structure of their Web page. They haven't "taken" your image in the literal sense of the word--it exists only on your machine, not theirs. But ordinary users looking at the page on the infringer's computer would see your work out of context and think that it was theirs, and a photo that you intended for one purpose might be used in ways you didn't anticipate.

Also, keep in mind that many Internet service providers who offer Web-hosting services charge their customers based on traffic (above some limit). When someone "in-lines" your image, every time their page is accessed, your image comes up--adding to the traffic at your site, but not at the infringer's site.

If you see a sudden rise in traffic at your site or notice a strange pattern in the statistics for your site with an image getting many more hits than the page it is embedded in, you should do an AltaVista search (either Simple or Advanced) using the link: command followed by the complete address of the image in question. That's likely to uncover the infringer.

This is a new kind of problem and the law in this area is not clearly defined, but if the behavior is clearly damaging to you or your client, you should take action to stop it. And in most instances, a strongly worded letter from a lawyer should suffice.

Note of warning

Keep in mind that AltaVista wasn't specifically designed for this type of work. It is intended to provide useful results for tens of millions of ordinary users. If traffic is particularly heavy, the search engine will continue to provide results very quickly, but it might occasionally not check through each page in its index, truncating the search to balance the load. In most instances, that should make no difference. Whether there were 50,000 matches or 100,000 matches to your query normally makes no difference, since you'll check only the first few and they'll probably have the information you need. If you need to be sure that your search is exhaustive--which could certainly be the case in the legal profession--you would be well advised to do critical searches at off-hours (that is, when people on the West Coast aren't at work) and to do the same search on more than one occasion.

As a rule, consider AltaVista an excellent source of positive evidence. You'll find information that otherwise was unattainable. But don't put too much weight on the fact that a given query produced zero matches. Just because you didn't find it doesn't mean that it doesn't exist. You might have made an error in your query, there might have been some traffic-related transient error that led to a particular match not being found, or the page(s) in question might not yet be in the index.

Notes

I'll continue this discussion in the next issue of Internet Search Advantage. Please send me your questions, tips, and the creative approaches you've tried. You can reach me at seltzer@samizdat.com


Go to Richard Seltzer's AltaVista Search tutorial

Other search articles

Return to B&R Samizdat Express

Can we help you build an Internet business? Richard Seltzer is an independent Internet writer/speaker/consultant. Click here for details. or send email to seltzer@samizdat.com


Internet Business Showcase:
| |

Internet Business Showcase: