Amazon S3 via Ruby

I've been exploring various aspects of Cloud Computing lately and opened an Amazon S3 account to check it out. Initially, I used the S3Fox add-on to Firefox to create a couple of buckets and upload a small test file. Then I started exploring S3's SOAP APIs.

Normally I'd use Java but since I've been learning Ruby I figured I'd try that approach. It turned out to be quicker, smaller & easier than Java would have been. Why? Ruby has easy to use SOAP and Crypto modules that are built-in to the language. Yes, Java has crypto in its API but it doesn't built-in turnkey SOAP client handling. There are well-known JARs for SOAP functionality, and they're not difficult to use, but neither are they as simple and turn-key as Ruby.

Here I describe this simple little program. It may help others to get started with S3, Amazon APIs, or the Ruby SOAP or crypto modules.

First I wrote a short Ruby module for common stuff needed for Amazon S3. This contains constants for the API WSDL URL, the account ID, etc. S3 requires a specially formatted timestamp and an HMAC SHA1 private key hash for every function call. It also has a few methods to do this dirty work. Check it out:
S3util.rb

In researching this online, I found several different ways of producing these keys, all of which claimed to work, but only one of which actually did work for me.

Also, Amazon's documentation was inadequate. Perhaps I missed something, but if I did then it wasn't easy to find as I spent a couple of hours digging through it. For example:

  • It doesn't mention that you have to pass the parameters as a single parameter which is a Ruby hash containing the individual parameters. When passing them directly didn't work, I read the WSDL, fished out the parameter names & structure and tried this. Didn't expect it to work, but it did - lucky me :)
  • It doesn't describe how exactly to form the signature key, nor does it mention that you have to base 64 encode it. An hour or two of online research and another hour or two of using the Ruby crypto stuff in different ways, produced the super-secret formula for getting this to work.
  • The next part is using the Ruby SOAP module to call the S3 SOAP APIs. I chose the simplest APIs, operations "ListMyBuckets" and "ListBucket(name)". Here's the code:
    listBuckets.rb

    Example session: (the warnings come from Ruby SOAP::WSDLDriverFactory reading the WSDL)

    S:\ruby\AmazonS3>ruby listBuckets.rb
    USAGE: listBucket [-list] [bucketname]
    S:\ruby\AmazonS3>ruby listBuckets.rb -list
    ignored attr: {}abstract
    warning: peer certificate won't be verified in this SSL session
    S3 Buckets for this account
    Owner: mrclem3
    Bucket: mclements.net-data
    Bucket: mclements.net-dnd
    S:\ruby\AmazonS3>ruby listBuckets.rb mclements.net-dnd
    ignored attr: {}abstract
    warning: peer certificate won't be verified in this SSL session
    S3 bucket: mclements.net-dnd
    Name: mclements.net-dnd
    Keys: 1000
    Partial?: false
    Contents:
    Key: AbilScoreAdj.dat
    Date: 2009-04-10T19:28:43.000Z
    Size: 2905
    Owner: mrclem3
    Class: STANDARD
    Key: build.xml
    Date: 2009-04-13T18:22:08.000Z
    Size: 1237
    Owner: mrclem3
    Class: STANDARD
    S:\ruby\AmazonS3>

    What is interesting about this code is first, how easy it is to make SOAP calls from Ruby. Next, you can see in the Ruby code that the formatting of the SOAP reply is problematic.

    I don't know if this is a problem in Amazon S3, or in the Ruby SOAP package, but the structure of the reply object is different if there is exactly 1, or more than 1, bucket. According to the WSDL, the reply should have an array (collection, whatever) of buckets. But if there is only one, in Ruby, it has to be accessed as a simply hierarchical child of the reply, not as a container. If you try to access it as a container, it fails. But if there is more than one, then they must be accessed as a container. I suspect this is a problem in the Ruby SOAP module, but who knows?

    My solution was to print the multiples (as this is the most common case). If there is only one, this will raise a Ruby exception. I catch this exception and print the single. Optimize for the common case and handle the special case - kludgy, but it works.