Wednesday, December 2, 2009

Amazon SimpleDB

Introduction to SimpleDB:
SimpleDB is a cloud
based web service provided by Amazon and it is designed to store relatively small amounts of data and optimized for fast data access and flexibility in how that data is expressed.

It is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

To minimize costs in utilizing AWS services, large objects or files should be stored in Amazon S3, while the pointers and the meta-data associated with those files can be stored in Amazon SimpleDB.

Highlights of SimpleDB:
Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity.

Simple DB is not relational i.e., no concept of a JOIN on multiple tables. As a result, tables (called Domains) are created like non-normalized reporting tables. Very flat tables holding columns of support data.

Also, Simple DB offers only one data type: string. All values saved in the database are handled as strings. So while storing values of type numeric or date time, the users need to format the values that can be sorted lexicographically.

Using a relational database is a no-brainer when user got a big organization behind them with entities mapped to scaling, indexing, backups, and so on. But when he is on his own, without any support services, and he just needs a database, then the SimpleDB is a handy one.

SimpleDB is trivial to setup and use, no schema required, insert data on the fly with no upfront preparation, and it will scale with no work on user’s part.

Focal Points in SimpleDB 
  •  Schema-less: It is an attribute-value store, where the users no need to define a schema before using database.
  • No joins: In relational theory the goal is to minimize update and deletion anomalies by normalizing data into separate tables related by keys. Users, then join those tables together when they need to retrieve the data. In SimpleDB there is no concept of joins. For 1:1 relationships this works out great. For many-to-many to relationships life is not so simple.
  • Query process: In a RDBMS users can select which columns need to be returned using a query which is not in case of SimpleDB. On querying SimpleDB it just returns record ID, not the values of the record. User need to make another trip to database, to get the required record contents.
  • No sorting: Records are not returned in a sorted order and also values for multi-value attribute fields are not returned in sorted order.
  • Limited Query Resultset:A SimpleDB query returns only 250 results at a time. When the users need to display more results, they need travel through the result set using a token mechanism.
  • Scaling Accomplished:Testing retrieving 10 record ids from 3 different database sizes. For 1K record database it took an average of 141 msecs to retrieve the 10 record ids. For 100K record database it took 266 msecs on average.For 1000K record database it took an average of 433 msecs to retrieve the 10 record ids. Inspite of relatively fast, it is relatively consistent.

Comparison of terminologies
SimpleDB
RDBMS
Domains
Tables
Items
Records or Rows
Attribute name
column name
Attribute value
column value



Experimenting with SimpleDB:

My earlier experiments with SimpleDB started with the initial version of CSS lab’s CloudBuddy Analytics, a tool for analyzing Amazon S3 buckets utilization. As per the original architecture, the tool used SQlite database and I am able to tweak the code to make use of SimpleDB in less time.


Structure of SimpleDB:

The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination.


Here are some observations on SimpleDB experimentation
  1. Not a relational database
  2. You need to create your own unique row identifiers,because SimpleDB doesn’t have a concept of auto-increment. To overcome I recommend UUID’s, which seems to be working for me as well.
  3. No joins in the database. If needed, more effort need to be put in, which can be expensive.
  4. The De-normalization of data is recommended.
  5. No Schema: You can add new columns (new row attributes) anytime you want.
        Data is automatically replicated across Amazon’s huge SimpleDB cloud. But they only guarantee something called “Eventually Consistent”, which means data which is “put” into the system is not guaranteed to be available in the next “get”.

        RDBMS Table:
        Column
        column 1
        column 2
        column 3
        Row 1
        value 1
        value 2
        value 3
        Row 2
        value 1
        value 2
        value 3
        Row 3
        value 1
        value 2
        value 3
        Row 4
        value 1
        value 2
        value 3


        SimpleDB Table:
        Domain1
        Attr name1
        Attr Name 2
        Attr name 3
        Itemname1
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname2
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname3
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname4
        Attr value 1
        Attr value 2
        Attr value 3


        Domain: Within the database, you create Domains, these are similar to traditional database Tables. One difference though, because as the SimpleDB model is non-relational, there is no command to JOIN (INNER JOIN or OUTER JOIN for example) one Domain to another and produce a result set.

        Item: A Simple DB Item is a row of data inside a Domain.

        Attributes: Columns of the Domain are called Attributes.

        Attribute
        Maximum
        Domains
        100 active domains(per Account)
        Each domains size
        10GB
        Attributes per item
        256 attributes
        Size per attribute
        1024 characters



        Data Types

        Since all values (data) in SimpleDB are of UTF-8 strings data type, care must be taken when an Attribute (column) is used in a WHERE or ORDER BY type SELECT statement.

        For example:

          1. Dates should be entered in the format of YYYY-MM-DD which will help in proper lexicographical comparisons.  
          2. Numbers needed to be handled specially, if they are to be sorted or range selected by a process called Zero Padding.For Zero Padding, add zeros to the front of each number until all are of the same length.

          For example if you had two numbers, 18 and 9, pad the 9 to become 09. Now on a sort, the nine will list before 18 as we would expect in Ascending order. You have to trim the leading zeros.

            Charges for SimpleDB:

            • Amazon SimpleDB users pay no charges on the first 25 Machine Hours, 1 GB of Data Transfer, and 1 GB of Storage that they consume every month. That implies “Data transferred between Amazon SimpleDB and other Amazon Web Services within the same region is free of charge (i.e., $0.00 per GB).”

            whatsoever hitting the SimpleDB instance resides on the Amazon cloud (in the same region), the users need not pay for data transfer.


            So approximately 20 Lakh Hits( GET or SELECT) API requests can be completed per month without incurring any usage charges.

            For more details
              
            Working with Amazon SimpleDB:


            1. Provides SOAP and (what passes at Amazon for) REST interfaces to the API
            2. REST requests all use HTTP GET, specifying the API method with a query param
            3. Requests specify the database, record, attributes, and modifiers with query params
            4. Record creation, updating, and deletion is tomic, at the level of individual attributes
            5. All data is considered to be UTF-8 strings
            6. Automatically indexes data, details unknown 
            7. Queries:  
            1. Limited to 5 seconds running time. Queries that take longer “will likely”   return a time-out error.
            2. Defined with HTTP query parameters.
            3. Composed of Boolean and set operations with some obvious comparison operators (=, !=,=, etc.). 
                8.As all values are UTF-8 strings, there are no sorting options.
                9. Responses are XML 


                Examples:

                Note: For starters its always better working with sample programs which supports languages C#, Perl, Java, PHP


                Now we are going to explore one example from above sample programs for methods necessary to create, manipulate, and work with Amazon Simple DB using perl


                This example creates a new Domain (Table).

                Two use statements will be needed to reference Simple DB:

                use Amazon.SimpleDB;
                use Amazon.SimpleDB.Model;


                Next, anytime we interact with Simple DB, an Access Key and private Secret key are passed.
                my accessKeyId = "myAccessKey";
                my secretAccessKey = "mySecretKey";

                Now a new instance of Simple DB is invoked using our keys.
                my $service = Amazon::SimpleDB::Client->new( accessKeyId, secretAccessKey);

                You can now  follow the amazon Getting Started Guide for several operations on database.




                Tools and Sample Codes:


                For Beginners, there is one good free management console called SDB tool (New Open Source SimpleDB Firefox Plug-in) for interacting with SimpleDB. The tool provides a visual interface to Amazon SimpleDB in the form of a Firefox plug-in for querying and updating your Simple DB database domains. Click here to download it directly.


                All interactions with Simple DB can be done though code. Several languages are supported, including Java, C#, Perl, Python, PHP, and VB.
                The Sample Code libraries for various languages are as follows:

                SimpleDB Libraries
                 



                In Addition there are several other tools for SimpleDB like SQL backend converter for SimpleDB interface in PHP, with this client class it is possible to use the SimpleDB programming interface on a MySql database. It can be used to implement and test SimpleDB code without a SimpleDB database access.

                The short Perl script tool amazon-simpledb-cli also provides a simple command line interface to Amazon SimpleDB.

                There is now a Scratchpad for Amazon SimpleDB which is just a small set of HTML/JS pages that you save locally and run in a browser.


                Documentation:

                There are several documents on the Amazon SimpleDB web site that will assist in programming Simple DB.

                First is the getting started guide located at http://docs.amazonwebservices.com/AmazonSimpleDB/latest/GettingStartedGuide/. The guide contains an introduction to the web service and examples of creating domains, entering data, and selecting rows from Simple DB.

                Also on the web site is the Developers Guide located at http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/. The Developers Guide provides API, SOAP, and REST explanations.

                A Code and Samples library page located at http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=189 contains an interesting assortment of Simple DB applications such as “Simple DB Wrapper for iPhone” and a “Microsoft Excel Plug-in”.


                Conclusion:

                The application demanding complex OLAP style database never entertains SimpleDB. But, for the applications with simple easy structure and scalability, SimpleDB is recommended.