July 22, 2008

How to protect yourself from DNS hacks

A couple weeks ago, Dan Kaminsky found a flaw in DNS.  Without getting into details, this flaw enables a malicious attacker to fool your web browser into connecting to the wrong computer to get your web pages.  So when you type www.facebook.com into your browser, you might actually go to Joe Hacker's site, even though your browser says http://www.facebook.com/ in its address bar just like it should.

Dan, being a "good guy", tried to keep the details of this hack quiet for long enough for network operators to patch their systems and close the loophole.  He wanted everybody running a DNS server to do this before the "bad guys" figured out what the bug is and started to take advantage of it.  He was hoping for 30 days of time to prepare, but somebody spilled the beans after 13 days, and now the hackers are off and running.

It's a jungle out there

You might be asking, So what? What are the dangers of being directed to the wrong website?  Of course, you could read incorrect news and that's not great.  More likely you're going to have your password stolen for whatever site you log into.

The obvious attacks are to sites like paypal or banks, but they're actually safe from such attacks if you use your browser properly.  Any financial site will use a secure connection.  You can tell because of the https:// at the beginning of their address.  These sites use a digital certificate that your browser checks to verify their authenticity.  All this happens independently of the DNS system.

But you can still connect to a hacked site with https.  Your browser will probably warn you saying something about a certificate not matching.  More often than not these errors occur because of a lazy sysadmin or something.  But right now, I strongly advise you to take all HTTPS warnings seriously.

Protect yourself

If you want to be sure you're safe, manually connect your machine to OpenDNS, as Dan recommends.  We know they're patched and can take the traffic.  I'll give you the steps to do this on Windows:

1. Start menu
2. Control Panel
3. Network connections  (might have to switch to "classic view")
4. Select the one you're actually using.  It's likely called "local area connection".
5. Click Properties on the status dialog
6. Scroll down in the list of checkboxes, and select "Internet Protocl (TCP/IP)" so that it's highlighted.  (Leave it checked!)
7. Click Properties
8. In the first General tab, change the second radio-button from "Obtain DNS server address automatically" to "Use the following DNS server addresses:"
9. For Preferred DNS server, enter: 208.67.222.222
10. For Alternate DNS server, enter: 208.67.220.220

That's mostly it, but to be safe, you should reboot, restart your browsers, and/or:

11. (Windows key+R).  In the dialog type  "ipconfig /flushdns" (without the quotes) and hit okay.

XMPP PubSub: a great compliment to Atom/RSS

I spent the day yesterday at XMPP Summit #5 alongside OSCON in Portland.  It was a great chance to catch up with old friends and meet a few new ones.  But my favorite part was the break-out discussion of XMPP PubSub as it relates to micro-blogging.  We discussed what hopefully will emerge as a standard way to associate an existing Atom/RSS feed with an XMPP PubSub Node.  First some background on the relevant technologies.  Feel free to skip ahead if you understand this stuff.

PubSub 101: Push vs Pull

PubSub is short for "publish subscribe" which is a common design pattern describing a way to distribute information to interested parties.  The publisher tells a server about new information, and the server fans the information out to everybody who has registered interest in that topic or channel.  Data consumers find out about the new information very quickly, with relatively little load on the whole system, since the pubsub mechanism provides a means to "push" data to them. 

By contrast, almost all of the web today follows uses a "pull model" where a data consumer only finds out about new information when it gets around to checking if there is something new.  This data distribution model is significantly simpler because the server only needs to keep track of the content, not who is interested in knowing about it.  Modern networks are optimized for this kind of query-based traffic where data consumers (clients, web browsers) initiate connections to servers, such that it's often impossible for servers to initiate conncetions to clients because of firewalls or NAT.

The downside of the pull model is that the only way a data consumer can find out if thanything is new on the server is to "check back frequently" or "poll" the server for changes.  If you want to know within 15 minutes if anything new has been posted, you have to ask the server at least every 15 minutes "anything new?"  No.  "How about now?"  No.  "Got anything yet?"  No.  Mulitply this by potentially millions of interested data consumers and you end up spending a lot of network bandwidth and server resources doing very little.  Even worse, the problem scales horribly.  If clients want to know about changes within 5 minutes instead of 15, that puts 3 times the load on the server.  Want to know within a few seconds?  Forget it -- the servers would crash.  There's an intrinsic delay in distributing information in this model, and reducing that delay is very expensive.

XMPP as an alternative to polling

XMPP is the protocol used for Instant Messaging by Google Talk and Jabber and a large number of small servers.  In order to deliver instant messages, XMPP systems maintain persistent connections between all machines that allow packets of data to be pushed with very low latency -- IM messages are typically delivered within a second of sending them.  So it's natural to want to use this infrastructure to deliver other data more efficiently than through polling HTTP.

The XMPP PubSub spec known as XEP-0060 describes how to do exactly this at the protocol level.  But for a variety of reasons, this standard has not gained wide adoption.  IMHO the biggest reason is that there isn't a very pressing need.  The current system is horribly inefficient, but it works.  Moreover, it puts the burden of inefficiency squarely in the hands of the information publishers.  Popular publishers are just expected to shell up for necessary hardware to meet the demands of their readers, and with advertising they can typically recoup the necessary investment.

Another way to state that is that pubsub hasn't found its niche yet.  IMHO this is partly because the mechanism is so useful it can be applied to almost anything.  Not just breaking news, but everything from e-mail mailing lists to doorbell chimes get used as examples of how XMPP pubsub technology could be applied.  Not wanting to exclude any of these potentially interesting uses, the protocol remains very generic.

Micro-blogging, Atom and Yesterday's Realization

One place where the current HTTP model breaks down is micro-blogging, which is the generic term for services like Twitter or Facebook's udpates.  Here, the payload of actual content is very small, so the overhead of checking far outweighs the "useful data" which is delivered.  Also, because the information is social (i.e. "Heading to Broadway for a bite -- wanna come?") consumers demand it be delivered quickly.  Nonetheless, current micro-blogging services still rely on polling clients, and their servers suffer as a result.

Yesterday, a group of us including Blaine Cook, Anders Conbere, Evan Prodromou, and XEP-0060 co-author Ralph Meijer were discussing how to apply XMPP PubSub to micro-blogging.  This was likely obvious to many there already, but during the discussion I had a realization.  We aren't solving this problem from whole cloth.  RSS and Atom feeds already describe all the information we need.  We just need to find a way to substitute XMPP for the assumed transport HTTP.

So we discussed mechanisms for mapping an Atom URL to an XMPP PubSub Node.  (We pretty much ignored RSS because RSS isn't as cool for reasons I really don't understand.)  We talked about putting a link-rel tag in the feed to point to the XMPP PubSub node.  This would look something like 

   <link rel="alternate" type="xmpp/pubsub" href="xmpp:twitter.com?;node=users/leopd" />

Even better, the URL for these nodes should be guessable from the URL for the HTTP feed.   The above node would be the normal place to look for a the pubsub version of http://twitter.com/leopd.  Even though it's not as generic and robust to have a standard mapping like this, I think it's an important way to speed adoption of a new standard.  The code to do a bit of string manipulation is vastly easier than fetching the URL and looking for a link-rel tag.  And developers are intrinsically lazy (for good reasons!) so making things easier for them means they'll succeed a lot more.

Ever pragmatic, Blaine pointed out that we should use HTTP for things it is good at, and not re-invent them in XMPP.  I wholeheartedly agree.  Re-transmission is a key example.  What happens if a client is offline when a new post happens, and so never hears about it?  Answer: The clients should fetch the historic archive of the feed over HTTP.  These feeds exist today -- no need to improve on them.  If all the posts have sequence numbers on them, then it's easy to figure out if you've missed one.  So all the posts from a user should have sequence numbers.  I don't think this is standard in Atom feeds today.

The story unfolds...

There's a lot more to be worked out and standardized here.  And clearly many more people need to voice their opinions before we can reach consensus.  Sadly I can't be down in Portland today to continue the discussion, so this post will have to take my place as I return to my regular daily commitments.  If you'd like to stay tuned as the story unfolds, you'll have to poll this site, as I can't yet give you a PubSub node to subscribe to for updates.  If I could it would probably be something like xmpp:embracingchaos.com?;node=xmpp -- try it.  By the time you read this, it might be working!

July 19, 2008

Why Evolution Runs Backwards in the Refrigerator

Reverse Evolution in the FridgeEvolution-like processes exist in many places beyond genetic adaptation of biological species.  We see similar processes in a great many aspects of modern life, generally running many orders of magnitude faster.  Much of economics and business is governed by processes that select for the most successful product or business model or manufacturing process or organizational structure.  Successful practices thrive and out-compete ones which are less effective at meeting human needs and desires.  Warfare has very obvious parallels.  In computer science, user interfaces, programming languages and system architectures all evolve by analogous processes.  Similar effects can be found in governments, religions, cell phone design or city planning, just to name a few more.  The basic idea that human choices lead to faster propagation and increased presence of BETTER STUFF can be seen almost everywhere.  Except in our refrigerators.

Open your fridge.  If you've lived with that fridge for a while, there's a good chance it looks something like mine does.  Shelf upon shelf of half-used bottles and jars of long-lasting meta-foods.  Condiments, salad dressings, jellies, beverages, chutneys, nut butters, salsas, pickled vegetables, etc.  We expect our fridges to be full of food, so this doesn't in itself challenge the evolutionary principal of selection.  But taking an inventory shows that there is a strong bias towards foods we don't actually like.  In fact, the typical selection process for foods in our refrigerators tends to concentrate foods we don't like, thus running backwards to what should intuitively evolve towards a selection of our favorite foodstuffs.  But for a couple very understandable reasons, that just doesn't happen.

Consider salad dressings.  Most of us like to have some choices when we're topping our raw vegetables.  So when we're at the store, we don't just buy the one salad dressing we like, but will often try a new variety.  There's a documented psychological principal called Variety Seeking that encourages diversity in buying because people want to explore different choices.  But what happens when we buy a variety we don't particularly enjoy?  Like that orange blossom vinaigrette or the honey mustard that's just a bit too thick and sweet.  We try it once, form an opinion, and the next time we have salad we go for the old-reliable Goddess dressing.  So it lingers.  But we don't throw it away.  Because there's nothing WRONG with it.  Besides, one day when we have guests over they might prefer a syrupy honey-mustard dressing.  Or maybe we could dip chicken knuckles into it or something.  Plus the combination of preservatives, low-temperature and food that doesn't promote bacterial growth in the first place means it can stay edible for years.  So their continued presence provides some small marginal benefit of choice.  The only real alternative is throwing them away  (which makes us feel guilty) since there's no secondary market for used condiments.

Beyond choice, they do provide marginal benefit in terms of ballast for heat capacity.  Refrigerators run more efficiently when they're full since there's a larger thermal mass which is more stable.  But this assumes the fridge has ample space for the food that is being cycled through and consumed.  In many households the need to find space for food you're actually going to eat creates a selection pressure to remove such undesirable foods.  But the door of the fridge is a niche environment that isn't very well suited to large, short-lived main courses and thus things like eleven different varieties of mustard tend to thrive.

What's the take-home lesson here?  How do we fight this scourge on our pallets?  Actually I don't think it's that big of a problem.  When we need space in the fridge, we find it.  But otherwise we collect things like Mang Thomas All Purpose Sauce, and pickled cherry peppers.   If clutter bothers you, resist the temptation to try something new and stick with something you know you'll use.  Heck, get a really big bottle.  Or look for similar reverse-evolutionary processes in your medicine cabinet, liquor shelf, or office supplies, and be conscious that you have the power to change things.  Or just accept that sometimes human nature tends to concentrate our surroundings with things we don't actually like.

July 09, 2008

Recovering a RAID Array after Lightning

RAID arrayThe EVMS RAID 5 array in my linux fileserver crashed recently due to a lightning storm, and I thought I'd lost everything.  But with some luck and intuition I was able to recover all my files.  I'll tell you how I did it, so hopefully others who run into similar problems can recover their data too.  But first, a little background.

Last week Seattle had some crazy electrical storms.  In recent years' storms, my block has done better than most with respect to power failures making me think we're either lucky or in a particularly robust section of the grid.  So I was a little surprised to find my whole house offline on Wednesday morning.  After a bit of debugging I figured out that the small UPS that runs all my networking gear got toasted, and for some reason the file server was down.

I left it alone for several days, and when I got around to turning it back on, I was happy that the whole stack through the samba server came up by itself.  (It doesn't always!)  But when I started looking around I quickly realized things were amiss.  The media/video directory normally has 4 subdirectories: movies, episodic TV, imake and other.  But today it listed:

    leo@elephant:/raid/shares/media/video$ ls
    dpisndic TV  hmakd  movies  nther

WTF!?  A few bits had been scrambled in the directory names.  This sounds really bad.  Moreover, even though the first couple levels of the directory hierarchy were there, but no files were to be found.  Definitely a problem.

Step 1: As soon as you suspect your RAID array has a problem, stop writing to it until you know what's going on.  Writing changes can make things worse.  Stop the bleeding.   

I unmounted the drive from my mac, not trusting Finder or Spotlight to sprinkle damaging meta-files over the array.  Once I remembered how to ssh into the box, I stopped the samba daemon,

    leo@elephant:/$ sudo /etc/init.d/samba stop

unmounted the filesystem

    leo@elephant:/$ sudo umount /raid

and changed fstab so it would be read-only when it comes back, and that it wouldn't come back without me asking.

    leo@elephant:/$ sudo vi /etc/fstab

changing

    /dev/evms/teraraid500 /raid ext3 defaults  0 0

to

    /dev/evms/teraraid500 /raid ext3 ro,noauto  0 0

I tried poking around in EVMS by running

    leo@elephant:/$ evmsn

But it hung during initialization with blue dialog saying "Discovering segments..."  I'm thinking EVMS can't help me.  After a bit of googling I thought I should try e2fsck or some such.  First, I tried to mount it again read-only and see what's there.

    mount: wrong fs type, bad option, bad superblock on /dev/evms/teraraid500,
           missing codepage or other error
           In some cases useful info is found in syslog - try
           dmesg | tail  or so

Bad superblock.  Uh oh.  Well this guy managed to recover a drive with a bad superblock.  Lots of things were pushing me in this direction -- fix the filesystem.  But I realized that was a mistake.

Step 2: Do not make changes at the filesystem level until you're confident that the RAID array is working properly.  You set up RAID for a reason.  You've still got a chance to recover everything, but if you start making changes to it in a broken state, you're almost certainly going to make things worse.

Me to self: Think about it.  EVMS is confused.  Linux is confused.  Ext2 and ext3 are messed up complaining about bad superblocks.  The problem was caused by lightning.  When the drive was mounted there were wierd bit-level corruptions in the data that were still there.  Maybe one of the drives in the array got data scrambled, but didn't get totally fragged so it went offline.  RAID 5 is designed to survive total loss of a single drive.  But if a drive gets corrupted, who knows what will happen.  So I came up with this plan:

Step 3: Try physically disconnecting the drives in your array, one at a time.  If only one of them is scrambled, disconnecting it should restore all the data in the array.

Having followed my own advice, it's easy for me to tell the drives in my array apart since each drive in the RAID array is from a different manufacturer (which makes array failure due to manufacturing defects far less likely). 

This plan actually worked perfectly!  Removing a drive caused a bit of a hassle in getting the machine back up, because when I booted it couldn't find the /boot partition complaining

     * Starting Enterprise Volume Management System...
    [42949392.340000] raid5: raid level 5 set md1 active with 2 out of 3 devices, algorithm 0

    * Checking all filesystems...
    fsck.ext3: No such file or directory while trying to open /dev/sdd5
    /dev/sdd5:
    The superblock could not be read or does not describe a correct ext2 filesystem. 
    If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else),
    then the superblock is corrupt, and you might try running e2fsck with an alternate superblock:
        e2fsck -b 8193 <device>

Notice the complaint about the superblock again -- don't trust it, and don't do what it says!  What really happened was that the boot drive letter had been changed from /dev/sdd to /dev/sdc, so I had to change /etc/fstab to mount /boot from  /dev/sdc5 instead of /dev/sdd5.  In my system, I boot off a non-RAID disk attached to the mobo, which for some annoying reason gets the last drive letter after all the drives no the SATA card.

But once I got past this, it quickly turned out that the Samsung drive was the culprit.  With it removed, the software RAID kicked in and plugged all the whole.  Everything the array looked completely normal again.  All the directories.  All the files.  Hooray!

July 06, 2008

Externalities of the Columbian Hostage Rescue

This last week there was a lot of news coverage of a "daring hostage rescue in Columbia."  Fifteen people were freed from the FARC.  Many had been held captive for years, including politician Ingrid Betancourt, and three Americans.  The press has been celebrating the victory along several lines.  How wonderful it is for these people to be set free after years of captivity.  How the US military helped plan and support the operation.  How the guerrillas were fooled into giving the hostages up without firing a single shot.  (Aren't we smart!  Aren't they stuipd?)

But there's a dark side to this rescue that I haven't seen anybody discuss.  The reason the guerrillas allowed those hostages to get on that helicopter without firing a shot because they thought it was operated by a humanitarian group.  It's true that the operation relied on intercepted communications and a spy in the FARC's command structure.  But the operation relied on a having military helicopter painted white and its crew claiming to be apolitical.  The press even describes the acting lessons the soldiers took to pretend to be NGO workers.  Oh those foolish rebels who fell for such a simple trick by trusting aid workers.  What dupes!

Now look at this from another angle.  Imagine you really are an NGO worker, trying to provide some kind of support service to remote Columbia.  How does knowledge of an operation like this make you feel?  Scared, probably.  From now on, rebels are going to doubt the legitimacy of all NGO workers.  They might think you're in the Columbian military trying to take advantage of them again.  They might even start shooting down Red Cross helicopters.  The negative externality of this rescue is that all legitimate humanitarian work in the area has just gotten a lot more difficult and dangerous.

So as Santos brags that this rescue "will go down in history for its audaciousness and effectiveness" he ignores the fact that he just cashed in a bunch of good will to make this happen.  This stuff doesn't grow easily like coca plants.  I'm glad those people have their lives back, but I am in no way convinced it was worth the sacrifice.  What's going to happen next time there's a public health crisis in the area?  The moral calculus is undoubtedly complex.  But ask yourself, would you trade the freedom of a dozen captives (including three Americans) for risking the well-being of many thousands of needy individuals?  How about for the lives of a half dozen International Red Cross workers murdered by suspicious rebels?

July 02, 2008

Google launches web chat client for iPhone

Wonder what I've been up to at work lately?  Here's a tiny glimpse.

Google just launched another way to access the Google Talk network.  It's a web-based instant messaging chat client optimized for the iPhone browser.  It's not my primary project or my secondary or tertiary, but I did write a blog post about it and made sure the whole thing got out the door today.

If you have an iPhone, try it out at www.google.com/talk.  Warning: Non-iPhone browsers will be directed away.

June 15, 2008

Shoulder Surgery

A bit over a week ago I had surgery to keep my arm from falling off.  It's happened at least a half dozen times in the last couple of years -- while snowboarding, rock climbing or climbing Mt Rainier.  Then the attachment became really weak and it would come off for no good reason at all -- just taking off a backpack or even reaching for a glass of water.  While I was wiping my ass was definitely the worst.  Thank god for awesome roommates.

Anyway, after a long process of finding a kick-ass orthopedic surgeon who specializes in shouders and figuring out how to get insurance to pay for it, I finally went under the knife to have the old bolts tightened.  10 days later and I can finally type again.  Technically it was a bankart repair which I'll leave you to research if you care, but in my case involved drilling some tiny holes in my bones and tying some connective tissue back into place.  You might be able to follow along on this video he took while poking around arthroscopically before performing the actual repairs:

(my favorite part is when he pulls out the hedge trimmer attachment to get a clearer view.)

Anyway, now I'm left with a few nice clean cuts and one extremely weak arm.  Funny things I've noticed include that washing my hands is often quite painful.  I figured out this is because pushing your hands together requires using internal rotation, which uses the subscapularis muscle, that he had to cut through to get a clean shot at the problem.  Pushing light switches with the wounded wing has also nearly reduced me to tears.  But it's getting better every day.  I think another month I'll put my new cadillac sling on the shelf next to the others, and then a month after that I should be biking, and another month and I'll be swimming.  And shortly thereafter, I'll be biking through Vietnam.  w00t!

April 22, 2008

Greening up the Home Office

MillerIt was pretty late at night at my friend Miller's birthday party last week.  She had asked everybody to do something good for the world in lieu of birthday presents.  The awake were discussing options as I was dozing off.  I overheard somebody say "If you've got an old linux box that you're using as a firewall drawing 400 watts continuously, consider spending $30 on a dedicated router."  I thought about the headless Pentium 3 box in my office closet which is running the IP Cop Linux firewall distro.  I thought about the four matching ethernet cards I'd put in it and the rainbow of color-coded cat-5 coming off it: red for untrusted outside world, green for safe, orange for servers and blue for wifi.  I thought about all the time I'd spent configuring the thing perfectly and routing cables throughout the house and I thought, yeah it draws a lot of power, but I NEED all that.

When I sobered up the next afternoon it occured to me that I'd pulled my file server off the orange DMZ network for performance and simplicity, and that the other server box had long since been virtualized into the file server.  I moved my local public wifi off the blue network onto the red to make its security brain-dead simple.  So despite all the pretty color-coded cables and corresponding hubs, all I really had was a big loud NAT box with a few key port holes in it.  And since I've switched from outlook to Gmail, I never even RAS into my home XP boxes any more.  And since I do all my personal development on EC2 or some other host, I never use my home dev servers any more.  So in fact, I don't need to tunnel home for anything.  Cloud computing.  For real.  All this stuff I used to need I don't any more.  I could replace that old linux box with a cheap low-power firewall.

But that got me thinking.  There's this li'l XP box sitting next to the printer that I have configured never to go to sleep because otherwise I can't print from my laptops.  Print servers are similarly small and low-power and sometimes come in the same box as the firewall.  Then my eye turned to the terabyte file server in the corner and next thing you know I've got an Apple Time Capsule in the mail to replace all three permanently powered-on PCs in my house.

Happy BEarthday, Miller!

March 11, 2008

Homework assignments: Count words not pages

As I draw my graduate educational experience to a close (tonight is the last class of my MBA!), I’d like to send an open suggestion to all educators who ask their students to produce written assignments.  Let’s assign essays with a required word count instead of a page count.  I’m guessing the page count is a throw-back to days when some students hand-wrote their assignments.  This was true for me in high school 20 years ago.  But today, turning an essay written in long-hand is unthinkable.  Professional writers and editors usually consider the length of a document by the number of words, although "column inches" is still common in newspapers.

All modern word processors make it trivial to change font size, margins and spacing, making it possible to fit almost any number of words onto a page, from tens to thousands. But instructors are probably expecting 250 - 400 words per page.  Some of my instructors have gone so far as to specify that essays should be “6 pages, 1 inch margins, 12pt Times Roman font, double spaced.”  Wouldn’t it be easier to just say “2,000 words”?  Every modern word processor has a word count function.

Aside from being simpler, it allows students to focus on writing great content rather than getting the content to fit on the page.  I've had to turn in essays with really ugly papers because it was the only way to fit all of my ideas into the specified page count.  And educators who aren't completely explicit open themselves up to students gaming their assignments.  In college we had a phrase called the “Courier Transform” (rhymes with Fourier Transform) which one would apply to a paper that didn’t meet the necessary minimum page count for an assignment.   By switching to a fixed-width font, we would boost our content to meet the required page count.

February 27, 2008

Why Amazon Kindle might succeed where others have failed

Amazon has a history of facilitating disruptive change.  First by selling books online, they demonstrated the advantages of a well-run online store.  Then with music, movies and just about everything else, they have shown that centralizing inventory and customer experience allows for reduced costs and an improved experience over a traditional distributed retail model.  Today, Amazon Web Services is starting to disrupt IT operations similarly by providing a higher quality service at lower cost than most companies can manage themselves.  They achieve these scale economies through centralization.  With Kindle Amazon is attempting another disruptive change, this time in the way people read books.  Lower distribution costs give electronic “e-books” an intrinsic advantage over physical books, hinting that e-books are inevitable.  But will Kindle be able to “cross the chasm” and become a mass-market device?  Amazon’s complementary assets, scale and technology all make it likely that Kindle will succeed.

Several startup companies have sold e-book readers in the past, but none successfully.  Sony is the only other large company to have tried.   Assurance that a risky new technology is backed by a company that won't disappear is important for mass-market adoption, giving Sony and Amazon an advantage.  This is especially important for devices that consume media, as the device’s utility dwindles without new content.  Amazon is especially well positioned to offer media for Kindle through its complementary assets.

Amazon’s established relationships with book publishers are extremely valuable to Kindle.  Book publishers control e-book content.  Amazon’s history of selling physical books has earned them the trust of almost every publishing house, ensuring easy access to electronic versions of books. In addition to existing e-books, Amazon’s scale gives them leverage to encourage publishers to release electronic versions of books.

Beyond that, Amazon has rare technology to make electronic versions of books available with far less work on the publishers’ parts.  Amazon has spent years scanning physical books to enable a feature called “Search Inside This Book” on their website.  Along with Google, they have one of the only large archives of scanned physical books in the world.  This enables selling e-books for books that publishers don’t even have original electronic copies of, with rights negotiations as the only remaining barrier.

Innovators have been jibbing together their own e-book readers out of laptops and PDF files for years.  Early-adopters look for concrete advantages like the ability to search books.  Med-students give Kindle rave reviews for this capability.   The easy availability and portability of dozens of books appeal to the small segment of truly voracious readers.  Kindle seems to serve these early segments well.  To cross the chasm into the mass market of the early majority, Kindle must make the experience simple and reliable.  Kindle’s wireless data connection sets it apart from all previous e-book readers.  By leveraging Sprint’s nation-wide 3G cellular data network, Kindle can load content without the operator even owning a computer.  Thus Kindle dodges the inevitable complexity that arises anytime a PC is involved.  This, along with Amazon’s well-established customer service, promise to make Kindle much easier for the early majority to accept.

Kindle seems well positioned for acceptance by the mass market.  If successful, Amazon will need to balance publishers’ need for DRM against consumers’ desire for open content.  The music industry has exposed these issues but certainly not solved them.

[This is another recycled homework assignment.  Something to keep y'all entertained while I'm in New Zealand!]