Showing posts with label Amazon. Show all posts
Showing posts with label Amazon. Show all posts

Wednesday, December 2, 2009

Amazon SimpleDB

Introduction to SimpleDB:
SimpleDB is a cloud
based web service provided by Amazon and it is designed to store relatively small amounts of data and optimized for fast data access and flexibility in how that data is expressed.

It is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

To minimize costs in utilizing AWS services, large objects or files should be stored in Amazon S3, while the pointers and the meta-data associated with those files can be stored in Amazon SimpleDB.

Highlights of SimpleDB:
Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity.

Simple DB is not relational i.e., no concept of a JOIN on multiple tables. As a result, tables (called Domains) are created like non-normalized reporting tables. Very flat tables holding columns of support data.

Also, Simple DB offers only one data type: string. All values saved in the database are handled as strings. So while storing values of type numeric or date time, the users need to format the values that can be sorted lexicographically.

Using a relational database is a no-brainer when user got a big organization behind them with entities mapped to scaling, indexing, backups, and so on. But when he is on his own, without any support services, and he just needs a database, then the SimpleDB is a handy one.

SimpleDB is trivial to setup and use, no schema required, insert data on the fly with no upfront preparation, and it will scale with no work on user’s part.

Focal Points in SimpleDB 
  •  Schema-less: It is an attribute-value store, where the users no need to define a schema before using database.
  • No joins: In relational theory the goal is to minimize update and deletion anomalies by normalizing data into separate tables related by keys. Users, then join those tables together when they need to retrieve the data. In SimpleDB there is no concept of joins. For 1:1 relationships this works out great. For many-to-many to relationships life is not so simple.
  • Query process: In a RDBMS users can select which columns need to be returned using a query which is not in case of SimpleDB. On querying SimpleDB it just returns record ID, not the values of the record. User need to make another trip to database, to get the required record contents.
  • No sorting: Records are not returned in a sorted order and also values for multi-value attribute fields are not returned in sorted order.
  • Limited Query Resultset:A SimpleDB query returns only 250 results at a time. When the users need to display more results, they need travel through the result set using a token mechanism.
  • Scaling Accomplished:Testing retrieving 10 record ids from 3 different database sizes. For 1K record database it took an average of 141 msecs to retrieve the 10 record ids. For 100K record database it took 266 msecs on average.For 1000K record database it took an average of 433 msecs to retrieve the 10 record ids. Inspite of relatively fast, it is relatively consistent.

Comparison of terminologies
SimpleDB
RDBMS
Domains
Tables
Items
Records or Rows
Attribute name
column name
Attribute value
column value



Experimenting with SimpleDB:

My earlier experiments with SimpleDB started with the initial version of CSS lab’s CloudBuddy Analytics, a tool for analyzing Amazon S3 buckets utilization. As per the original architecture, the tool used SQlite database and I am able to tweak the code to make use of SimpleDB in less time.


Structure of SimpleDB:

The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination.


Here are some observations on SimpleDB experimentation
  1. Not a relational database
  2. You need to create your own unique row identifiers,because SimpleDB doesn’t have a concept of auto-increment. To overcome I recommend UUID’s, which seems to be working for me as well.
  3. No joins in the database. If needed, more effort need to be put in, which can be expensive.
  4. The De-normalization of data is recommended.
  5. No Schema: You can add new columns (new row attributes) anytime you want.
        Data is automatically replicated across Amazon’s huge SimpleDB cloud. But they only guarantee something called “Eventually Consistent”, which means data which is “put” into the system is not guaranteed to be available in the next “get”.

        RDBMS Table:
        Column
        column 1
        column 2
        column 3
        Row 1
        value 1
        value 2
        value 3
        Row 2
        value 1
        value 2
        value 3
        Row 3
        value 1
        value 2
        value 3
        Row 4
        value 1
        value 2
        value 3


        SimpleDB Table:
        Domain1
        Attr name1
        Attr Name 2
        Attr name 3
        Itemname1
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname2
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname3
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname4
        Attr value 1
        Attr value 2
        Attr value 3


        Domain: Within the database, you create Domains, these are similar to traditional database Tables. One difference though, because as the SimpleDB model is non-relational, there is no command to JOIN (INNER JOIN or OUTER JOIN for example) one Domain to another and produce a result set.

        Item: A Simple DB Item is a row of data inside a Domain.

        Attributes: Columns of the Domain are called Attributes.

        Attribute
        Maximum
        Domains
        100 active domains(per Account)
        Each domains size
        10GB
        Attributes per item
        256 attributes
        Size per attribute
        1024 characters



        Data Types

        Since all values (data) in SimpleDB are of UTF-8 strings data type, care must be taken when an Attribute (column) is used in a WHERE or ORDER BY type SELECT statement.

        For example:

          1. Dates should be entered in the format of YYYY-MM-DD which will help in proper lexicographical comparisons.  
          2. Numbers needed to be handled specially, if they are to be sorted or range selected by a process called Zero Padding.For Zero Padding, add zeros to the front of each number until all are of the same length.

          For example if you had two numbers, 18 and 9, pad the 9 to become 09. Now on a sort, the nine will list before 18 as we would expect in Ascending order. You have to trim the leading zeros.

            Charges for SimpleDB:

            • Amazon SimpleDB users pay no charges on the first 25 Machine Hours, 1 GB of Data Transfer, and 1 GB of Storage that they consume every month. That implies “Data transferred between Amazon SimpleDB and other Amazon Web Services within the same region is free of charge (i.e., $0.00 per GB).”

            whatsoever hitting the SimpleDB instance resides on the Amazon cloud (in the same region), the users need not pay for data transfer.


            So approximately 20 Lakh Hits( GET or SELECT) API requests can be completed per month without incurring any usage charges.

            For more details
              
            Working with Amazon SimpleDB:


            1. Provides SOAP and (what passes at Amazon for) REST interfaces to the API
            2. REST requests all use HTTP GET, specifying the API method with a query param
            3. Requests specify the database, record, attributes, and modifiers with query params
            4. Record creation, updating, and deletion is tomic, at the level of individual attributes
            5. All data is considered to be UTF-8 strings
            6. Automatically indexes data, details unknown 
            7. Queries:  
            1. Limited to 5 seconds running time. Queries that take longer “will likely”   return a time-out error.
            2. Defined with HTTP query parameters.
            3. Composed of Boolean and set operations with some obvious comparison operators (=, !=,=, etc.). 
                8.As all values are UTF-8 strings, there are no sorting options.
                9. Responses are XML 


                Examples:

                Note: For starters its always better working with sample programs which supports languages C#, Perl, Java, PHP


                Now we are going to explore one example from above sample programs for methods necessary to create, manipulate, and work with Amazon Simple DB using perl


                This example creates a new Domain (Table).

                Two use statements will be needed to reference Simple DB:

                use Amazon.SimpleDB;
                use Amazon.SimpleDB.Model;


                Next, anytime we interact with Simple DB, an Access Key and private Secret key are passed.
                my accessKeyId = "myAccessKey";
                my secretAccessKey = "mySecretKey";

                Now a new instance of Simple DB is invoked using our keys.
                my $service = Amazon::SimpleDB::Client->new( accessKeyId, secretAccessKey);

                You can now  follow the amazon Getting Started Guide for several operations on database.




                Tools and Sample Codes:


                For Beginners, there is one good free management console called SDB tool (New Open Source SimpleDB Firefox Plug-in) for interacting with SimpleDB. The tool provides a visual interface to Amazon SimpleDB in the form of a Firefox plug-in for querying and updating your Simple DB database domains. Click here to download it directly.


                All interactions with Simple DB can be done though code. Several languages are supported, including Java, C#, Perl, Python, PHP, and VB.
                The Sample Code libraries for various languages are as follows:

                SimpleDB Libraries
                 



                In Addition there are several other tools for SimpleDB like SQL backend converter for SimpleDB interface in PHP, with this client class it is possible to use the SimpleDB programming interface on a MySql database. It can be used to implement and test SimpleDB code without a SimpleDB database access.

                The short Perl script tool amazon-simpledb-cli also provides a simple command line interface to Amazon SimpleDB.

                There is now a Scratchpad for Amazon SimpleDB which is just a small set of HTML/JS pages that you save locally and run in a browser.


                Documentation:

                There are several documents on the Amazon SimpleDB web site that will assist in programming Simple DB.

                First is the getting started guide located at http://docs.amazonwebservices.com/AmazonSimpleDB/latest/GettingStartedGuide/. The guide contains an introduction to the web service and examples of creating domains, entering data, and selecting rows from Simple DB.

                Also on the web site is the Developers Guide located at http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/. The Developers Guide provides API, SOAP, and REST explanations.

                A Code and Samples library page located at http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=189 contains an interesting assortment of Simple DB applications such as “Simple DB Wrapper for iPhone” and a “Microsoft Excel Plug-in”.


                Conclusion:

                The application demanding complex OLAP style database never entertains SimpleDB. But, for the applications with simple easy structure and scalability, SimpleDB is recommended.


                Wednesday, November 11, 2009

                AWS Workshop, Chennai

                With the onset of the Monsoons in Chennai there were Cloud of a different kind looming, and the Met Department was evangelizing them; there was also another Cloud Evangelist in Chennai, Jinesh Varia evangelizing Clouds of different kind, the Amazon Clouds.

                Event and Filling Gaps



                The event kicked off with Bobby Varghese from CSS Corp doing a keynote for the workshop. Bobby spoke about CSS adoption of the AWS and how they have come up with products and use cases based on AWS.

                The real clouds were holding back Jinesh from reaching the venue. So Ezhilaran Babaraj(Ezhil), from CSS Labs, and Lakshmanan Narayan(Lux), from Vembu, had to fill up for the latency. Both these session's were to be midway sessions to Jinesh's overview on AWS, that would have given mileage to these session for which AWS is the basis. Nevertheless, considering the audience were not real novices to AWS, these sessions were well taken.

                Ezhil, explained at large the initiative at CSS Labs towards Cloud enablement, and specifically about the CloudBuddy API, and the Plugin Framework which will help extend Windows based applications to Cloud, and extending CloudBuddy itself repectively. The demo of the plugin framework, and APIs, of CloudBuddy, was not possible thanks to BSNL's connectivity issues.

                Lux, who chose not use the mic, started off on an overview of Vembu Technologies. And then showcased Vembu Home BETA with its Adobe Air based UI. And how it helps the user backup on Desktop as well as the Cloud.

                Jinesh Arrives



                As Lux, was finishing his talk, Jinesh arrived. And there was a coffee break.

                Post coffee! Jinesh was quick to take the stage. He, started off with an introduction to Cloud Computing taking the analogy of a Belgian Beer Company which in its early days had to generate its own power(electricity), and now that power comes from a grid, and the beers tastes no different. And that power generating machine is now in the mueseum. An ominus sign for our inhouse data centers.

                Having introduced the crowd to Cloud Computing he jumped to what he was here to do, showcase Amazon Web Services. Jinesh, did well to get the some temporary Keys to access Amazon Web Services, which were distributed to the audeience, but the connectivity issues rendered them useless.

                Jinesh's articulation of the Amazon Web Services did manage to fill whatever gap, that were left due to (dis)connectivity. The "Overview of AWS" introduced the audiences to nuances of creating AWS accounts, about the access credentials, usage of those accounts, billing- pay as you go model, the API, tools, and the Architecture of AWS. The Architecture Diagram built was a Jigsaw of various Amazon's Web Service offerings fitting into one service, to run an application on the cloud.

                Specifics



                The overview naturally spilled over to specifics. Lack of connectivity meant more talking than doing, also this meant the start of a marathon session on AWS specifics.

                Amazon S3(Simple Storage Service) was the first of the talks on specific Amazon Web Services. The S3 session also included coverage on CloudFront, CD based on S3. S3 session was followed by, EC2(Elastic Compute Cloud) which is the computing face of AWS, this session also included the failover support services for EC2- CloudWatch, Autoscaling, and Elastic LoadBalancing. Apart from which the persistent storage, Elastic Block Storage, on EC2 was explained.

                Each of these sessions Features, Terminology, Concepts,In Action, Tools and API, Pricing and Typical Use Cases for each of the Amazon Web Sservices.

                Finally Jinesh, talked on the "Best Practices" and "Migration to the Cloud".



                And finished it with an exercise on building Cloud Enabled application using various AWS offering.

                The Unconference



                This was where all the action was expected. But due to Jinesh's marathon sessions there was only a little time distributed equally among the presenters. What follows is what happened.

                Bosky from Hover who spoke about the Key-Value based systems, and distributed environments.

                Kiran from MarketSimplified spoke about their SaaS application and how they use AWS to host it.

                Senthil from RailsFactory spoke about "What Jinesh did not mention?"

                Murthy and Sam from CSS Corp explained about whats and whys of Hybrid Cloud, and also presented a Demo of "scaling out" to public cloud.

                There were experiences shared by XLSoft and Anantara Solutions.

                The unconference's finale was a video "Cloud Cloud Maybe" compile by Vembu



                Adios



                All in all it was great event. With a good attendance, good lunch, and with a pinch of togetherness as an AWS community. The basic motive of this whole exercise were two things, Building an engaging community around AWS, and to see if Cloud is real. About the latter there really was no doubt. With usages like Animoto, TimesMachine, and our Payroll Processing, it surely is the future of computing. But doubts shall remain if former will stand, as it took Jinesh's presence for this kind of event to happen.

                To sum it up in Jinesh's words "Keep this engagement going".