|
One problem still haunts the peer-to-peer (P2P) world: how
the hell do you find anything? Traditional
search engines are impractical because by the time a P2P
network has been spidered, the makeup of the network and
the content will probably have changed. Real-time keyword
searches are too slow. Yahoo-style index pages don't cut
it. We need some crazy new ideas. Enter Scoundrel...
Searching P2P networks sucks. With
systems such as Napster and Gnutella you put in keywords, or partial
or full titles or artist names, and you get a big list of crap back. There are
always tons of redundant results to sift through, and you have to decide which
file to download. Once you finally decide on what to snarf, the file transfer
bombs out half way through, so you have to try again. This all assumes that
you know exactly what you're searching for. Usually there is no
cross-indexing.
Then there's the "catch as catch can" problem. Because P2P nodes are
connecting and disconnecting to the network at unpredictable times, the
availability of resources is constantly in flux. You might be able to find
something one day, but the next day the same thing might not be available. If
you really want to find something, you may need to search for it over and over
again.
Some P2P systems barely have any way to search at all. To discover what's on
Freenet, for instance, you
generally have to look through big text files full of "keys" (Freenet's version
of hyperlinks).
So what can be done about this?
The Scoundrel Project has developed a system for automating the process of
searching, re-searching, and downloading things from P2P networks. The core
idea is to have what the author of Scoundrel calls a "linkless index." This
index, rather than being an index of what actually is on the network, is
an index of what theoretically, or ideally, should be on the network.
It's "linkless" because the actual index entries do not directly point to
resources on the network.
The user can browse this index and select things that he or she would like to
retrieve, regardless of whether or not the files are actually available on the
network at that particular time. Later on, an agent (bot, or whathaveyou)
retrieves the files. The agent does all the shit work of searching,
re-searching, selecting the right file, downloading, retrying, etc., all in the
background, or while you're away doing something fun. After all, that's what
computers should be doing -- tedious drudge work.
This idea has some interesting ramifications. Because the index is
disconnected from the actual files on the network, no checks need to be
performed to ensure that the index accurately reflects the contents of the
network. Thus the index can be more complete, be maintained independently, and
kept up-to-date more conveniently. And with a big, comprehensive index, it
will be easier to have good cross-referencing.
What's more, the user is free to browse the index at high speeds, selecting
things willy-nilly like a kid in a candy store, without waiting for a search to
complete, downloads to finish, or being disappointed when a real-time search
turns up an empty result set. Everything is ultra-responsive, and it's a better
user experience.
Another implication of this strategy is that the network doesn't have to be
fast. Some P2P systems are blecherously slow right now. This will certainly
change as the software develops and the systems get more users, but at the
moment the wait to get a file can be maddening. But who cares if you're not
doing all the waiting yourself?
There've also been strange and evil rumblings from the Digital Millennium Copyright Act (DMCA)
people about how innocent hyperlinks to net resources containing copyrighted
material will be considered some kind of horrible copyright infringement in
themselves, punishable by hanging and whatnot. This could put an ugly chill on
the whole Internet. A linkless index steps around this issue quite handily.
Although this linkless index strategy can be used for all kinds of data, it is
obviously well-suited for digital music trading (e.g. MP3s), which seems to be
the focus of many of the P2P projects right now. So to build it's
proof-of-concept application, the Scoundrel Project decided to use an existing,
highly-developed database for it's linkless index: Amazon.com. Amazon's music
index is huge, cross-indexed, chock-full of user reviews, and has all sorts of
handy features which make it great for browsing music titles.
Here's how the Scoundrel program works: You fire it up and configure it to
know about several OpenNap
servers -- the open source clone of the Napster system. Currently, Scoundrel
only works with OpenNap. Next, you use Scoundrel's built-in Web browser to
navigate Amazon's music section. Scoundrel watches as you browse, and when you
visit the description page for a CD, Scoundrel automatically picks up the
title, artist, and track listings. You are given an opportunity to review the
list of stuff that Scoundrel has created, and to modify and delete items. When
you are ready to have Scoundrel go to work for you, you hit the "get'em"
button, and it crawls the various OpenNap servers looking for MP3s of the music
you want. Then you can minimize Scoundrel and play some Nethack or whatever, or continue to browse
Amazon for even more goodies.
This works surprisingly well. As a test, I chose a few CDs from Amazon's "Top
Sellers" list, set Scoundrel loose, and went to breakfast. When I came back
there were at least two complete CDs in MP3 form on my hard drive, and several
partial CDs. And Scoundrel was still out busting ass for me. Anyone who has
ever spent all night on Napster trying to put together an entire track list of
MP3s knows how cool that is.
Scoundrel isn't perfect, and neither is the linkless index idea. Even though
there are copious widgets and screens obstensively indicating what the program
is doing, it's hard to figure out. Sometimes it seems to just hang, and
sometimes it doesn't seem to search for all of the things you tell it to. But
after all, it is just a proof of concept. The author calls it a "technology
preview." And while a comprehensive index of music files already exists, and
there are other databases for things such as movies (e.g. IMDB), how do we deal with P2P resources that
aren't already in a nice tidy index somewhere? And how would you
create an index for that stuff?
Another thing is that Scoundrel only runs on Windows. It's an open source
project under the GNU GPL,
but it's written in some sick language like Delphi (which may be excusable
considering that it's only a prototype).
Despite these concerns, I give a big warm Beaujolais to the Scoundrel Project!
There is one last intriguing thing to say about Scoundrel. The author of the
program is a mystery man. He remains anonymous to this day. On March 1st,
just after releasing the latest incarnation of Scoundrel, he posted a message
on the Scoundrel home page announcing that he is abandoning all work on the
project, and will never be heard from again, although he hopes that others will
continue work on the project. This is from the Scoundrel web page:
Well, so much for what Scoundrel has and has not done. As of today, March
1st, 2001, I will no longer be able to continue development on Scoundrel. I'll
be disappearing from the face of the earth and will not be reachable. I will
not go into the reasons behind this.
Could it be that the big media companies got to him too? Is the RIAA playing hardball behind the
scenes? Will we ever know?
In the meantime, give Scoundrel a whirl.
Check it out yourself
quadratic@pigdog.org
|