Wednesday, December 2, 2009

Amazon SimpleDB

Introduction to SimpleDB:
SimpleDB is a cloud
based web service provided by Amazon and it is designed to store relatively small amounts of data and optimized for fast data access and flexibility in how that data is expressed.

It is a web service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud.

To minimize costs in utilizing AWS services, large objects or files should be stored in Amazon S3, while the pointers and the meta-data associated with those files can be stored in Amazon SimpleDB.

Highlights of SimpleDB:
Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity.

Simple DB is not relational i.e., no concept of a JOIN on multiple tables. As a result, tables (called Domains) are created like non-normalized reporting tables. Very flat tables holding columns of support data.

Also, Simple DB offers only one data type: string. All values saved in the database are handled as strings. So while storing values of type numeric or date time, the users need to format the values that can be sorted lexicographically.

Using a relational database is a no-brainer when user got a big organization behind them with entities mapped to scaling, indexing, backups, and so on. But when he is on his own, without any support services, and he just needs a database, then the SimpleDB is a handy one.

SimpleDB is trivial to setup and use, no schema required, insert data on the fly with no upfront preparation, and it will scale with no work on user’s part.

Focal Points in SimpleDB 
  •  Schema-less: It is an attribute-value store, where the users no need to define a schema before using database.
  • No joins: In relational theory the goal is to minimize update and deletion anomalies by normalizing data into separate tables related by keys. Users, then join those tables together when they need to retrieve the data. In SimpleDB there is no concept of joins. For 1:1 relationships this works out great. For many-to-many to relationships life is not so simple.
  • Query process: In a RDBMS users can select which columns need to be returned using a query which is not in case of SimpleDB. On querying SimpleDB it just returns record ID, not the values of the record. User need to make another trip to database, to get the required record contents.
  • No sorting: Records are not returned in a sorted order and also values for multi-value attribute fields are not returned in sorted order.
  • Limited Query Resultset:A SimpleDB query returns only 250 results at a time. When the users need to display more results, they need travel through the result set using a token mechanism.
  • Scaling Accomplished:Testing retrieving 10 record ids from 3 different database sizes. For 1K record database it took an average of 141 msecs to retrieve the 10 record ids. For 100K record database it took 266 msecs on average.For 1000K record database it took an average of 433 msecs to retrieve the 10 record ids. Inspite of relatively fast, it is relatively consistent.

Comparison of terminologies
SimpleDB
RDBMS
Domains
Tables
Items
Records or Rows
Attribute name
column name
Attribute value
column value



Experimenting with SimpleDB:

My earlier experiments with SimpleDB started with the initial version of CSS lab’s CloudBuddy Analytics, a tool for analyzing Amazon S3 buckets utilization. As per the original architecture, the tool used SQlite database and I am able to tweak the code to make use of SimpleDB in less time.


Structure of SimpleDB:

The developer documentation for SimpleDB states that attributes may have multiple values, but that attributes are uniquely identified in an item by their name/value combination.


Here are some observations on SimpleDB experimentation
  1. Not a relational database
  2. You need to create your own unique row identifiers,because SimpleDB doesn’t have a concept of auto-increment. To overcome I recommend UUID’s, which seems to be working for me as well.
  3. No joins in the database. If needed, more effort need to be put in, which can be expensive.
  4. The De-normalization of data is recommended.
  5. No Schema: You can add new columns (new row attributes) anytime you want.
        Data is automatically replicated across Amazon’s huge SimpleDB cloud. But they only guarantee something called “Eventually Consistent”, which means data which is “put” into the system is not guaranteed to be available in the next “get”.

        RDBMS Table:
        Column
        column 1
        column 2
        column 3
        Row 1
        value 1
        value 2
        value 3
        Row 2
        value 1
        value 2
        value 3
        Row 3
        value 1
        value 2
        value 3
        Row 4
        value 1
        value 2
        value 3


        SimpleDB Table:
        Domain1
        Attr name1
        Attr Name 2
        Attr name 3
        Itemname1
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname2
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname3
        Attr value 1
        Attr value 2
        Attr value 3
        Itemname4
        Attr value 1
        Attr value 2
        Attr value 3


        Domain: Within the database, you create Domains, these are similar to traditional database Tables. One difference though, because as the SimpleDB model is non-relational, there is no command to JOIN (INNER JOIN or OUTER JOIN for example) one Domain to another and produce a result set.

        Item: A Simple DB Item is a row of data inside a Domain.

        Attributes: Columns of the Domain are called Attributes.

        Attribute
        Maximum
        Domains
        100 active domains(per Account)
        Each domains size
        10GB
        Attributes per item
        256 attributes
        Size per attribute
        1024 characters



        Data Types

        Since all values (data) in SimpleDB are of UTF-8 strings data type, care must be taken when an Attribute (column) is used in a WHERE or ORDER BY type SELECT statement.

        For example:

          1. Dates should be entered in the format of YYYY-MM-DD which will help in proper lexicographical comparisons.  
          2. Numbers needed to be handled specially, if they are to be sorted or range selected by a process called Zero Padding.For Zero Padding, add zeros to the front of each number until all are of the same length.

          For example if you had two numbers, 18 and 9, pad the 9 to become 09. Now on a sort, the nine will list before 18 as we would expect in Ascending order. You have to trim the leading zeros.

            Charges for SimpleDB:

            • Amazon SimpleDB users pay no charges on the first 25 Machine Hours, 1 GB of Data Transfer, and 1 GB of Storage that they consume every month. That implies “Data transferred between Amazon SimpleDB and other Amazon Web Services within the same region is free of charge (i.e., $0.00 per GB).”

            whatsoever hitting the SimpleDB instance resides on the Amazon cloud (in the same region), the users need not pay for data transfer.


            So approximately 20 Lakh Hits( GET or SELECT) API requests can be completed per month without incurring any usage charges.

            For more details
              
            Working with Amazon SimpleDB:


            1. Provides SOAP and (what passes at Amazon for) REST interfaces to the API
            2. REST requests all use HTTP GET, specifying the API method with a query param
            3. Requests specify the database, record, attributes, and modifiers with query params
            4. Record creation, updating, and deletion is tomic, at the level of individual attributes
            5. All data is considered to be UTF-8 strings
            6. Automatically indexes data, details unknown 
            7. Queries:  
            1. Limited to 5 seconds running time. Queries that take longer “will likely”   return a time-out error.
            2. Defined with HTTP query parameters.
            3. Composed of Boolean and set operations with some obvious comparison operators (=, !=,=, etc.). 
                8.As all values are UTF-8 strings, there are no sorting options.
                9. Responses are XML 


                Examples:

                Note: For starters its always better working with sample programs which supports languages C#, Perl, Java, PHP


                Now we are going to explore one example from above sample programs for methods necessary to create, manipulate, and work with Amazon Simple DB using perl


                This example creates a new Domain (Table).

                Two use statements will be needed to reference Simple DB:

                use Amazon.SimpleDB;
                use Amazon.SimpleDB.Model;


                Next, anytime we interact with Simple DB, an Access Key and private Secret key are passed.
                my accessKeyId = "myAccessKey";
                my secretAccessKey = "mySecretKey";

                Now a new instance of Simple DB is invoked using our keys.
                my $service = Amazon::SimpleDB::Client->new( accessKeyId, secretAccessKey);

                You can now  follow the amazon Getting Started Guide for several operations on database.




                Tools and Sample Codes:


                For Beginners, there is one good free management console called SDB tool (New Open Source SimpleDB Firefox Plug-in) for interacting with SimpleDB. The tool provides a visual interface to Amazon SimpleDB in the form of a Firefox plug-in for querying and updating your Simple DB database domains. Click here to download it directly.


                All interactions with Simple DB can be done though code. Several languages are supported, including Java, C#, Perl, Python, PHP, and VB.
                The Sample Code libraries for various languages are as follows:

                SimpleDB Libraries
                 



                In Addition there are several other tools for SimpleDB like SQL backend converter for SimpleDB interface in PHP, with this client class it is possible to use the SimpleDB programming interface on a MySql database. It can be used to implement and test SimpleDB code without a SimpleDB database access.

                The short Perl script tool amazon-simpledb-cli also provides a simple command line interface to Amazon SimpleDB.

                There is now a Scratchpad for Amazon SimpleDB which is just a small set of HTML/JS pages that you save locally and run in a browser.


                Documentation:

                There are several documents on the Amazon SimpleDB web site that will assist in programming Simple DB.

                First is the getting started guide located at http://docs.amazonwebservices.com/AmazonSimpleDB/latest/GettingStartedGuide/. The guide contains an introduction to the web service and examples of creating domains, entering data, and selecting rows from Simple DB.

                Also on the web site is the Developers Guide located at http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/. The Developers Guide provides API, SOAP, and REST explanations.

                A Code and Samples library page located at http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=189 contains an interesting assortment of Simple DB applications such as “Simple DB Wrapper for iPhone” and a “Microsoft Excel Plug-in”.


                Conclusion:

                The application demanding complex OLAP style database never entertains SimpleDB. But, for the applications with simple easy structure and scalability, SimpleDB is recommended.


                Tuesday, November 17, 2009

                Hybridfox: Cross of Elasticsfox and Imagination

                Now we have a Eucalyptus' Private Cloud installed and running on our premise, and it remained kinda of an artifact in our data-center for sometime. So I thought why has not someone written anything about how make to make Elasticfox work with Eucalyptus.
                But there were quite a few pointers to what version will be ideally suited to use for Eucalyptus, like this one, thanks Ajmf. I took the cue from there, I enabled debugging on elasticfox, and used firebug to dig deeper. And I came up with Hybridfox, yeah, and it works.

                What is Hybridfox?


                Hybridfox is an attempt to get the best of both world of popular Cloud Computing environments, Amazon EC2(public) and Eucalytpus(private). The idea is to use one hybridfox tool, which itself is a modified or extended elasticfox, to switch seamless between your Amazon account and you Eucalyptus Account in order to manage your Cloud "Computing" environment.

                What can Hybridfox do?


                Hybridfox can help you to all everything that you could possible do with elasticfox, on the Eucalyptus Computing environment

                • Manage Images

                • Raise and Stop Instances

                • Manage Instances

                • Manage Elastic IPs

                • Manage Security Groups

                • Manage Keypairs

                • Manage Elastic Block Storage


                Why this Project?

                There something about the elasticfox development that restricts it only to EC2 environment. But Manoj(The maintainer of Elasticfox) has done well to keep it open source, so that people like us could just take it further, and hence this project.

                Moreover I am kinda of beginner with JavaScript, and with a little bit of digging found the ways to extend it to eucalyptus in my own limited ways. It would be nice if the community gets involved and extends this a little further.

                Caveat: Hybridfox is an extension of an earleir version of elasticfox, 1.6.x.

                How the hell?

                Oh yes! This is more important right? Those who are familiar with Eucalyptus will know that there is eucarc file that gets download when you download the certificates. When you "cat" this file you have some env variables specific to your Eucalyptus instance, and make not of the EC2_URL, EC2_ACCESS_KEY and EC2_SECRET_KEY

                Once you have installed the xpi file do the following step.

                1. Define a Region: Click on Regions,in the Popup dialog, specify a logical name say "Eucalyptus" or "MyEucalyptus" or whatever suits you, and the give EC2_URL as the Endpoint URL.

                2. Define Credentials: Click on Credentials, in the Popup dialog, specify a logical name say "EucaAcc1" or whatever suits you, and give EC2_ACCESS_KEY and EC2_SECRET_KEY as the AWS Access Key and AWS Secret Access Key respectively.

                3. Define Key Pairs: Click on KeyPairs tab, followed by create a ney keypair icon, in the Popup dialog it prompts for "Please,provide a key pair name" enter the name as "eucakey" or whatever suits you, and it prompts for the location to save the id file

                4. Define Security Groups:Click on SecurityGroups,in the Popup dialog, specify Group name say "Eucalyptus" or "EucaGroup" or whatever suits you, and enter the description and click on create button

                5. Image: Click on Image, right click on ami-id to Launch instance(s)

                6. Launch a new instance: On right click on launch instance(s), a Popup dialog shows to select/enter AKI ID, AMI ID, Minimum and Maxiumu number of instances,Securit Group to be launched respectively

                7. Manage Instances: Click on Instances, you able to view the newly launched instances details

                8. ElasticIPs: Click on ElasticIPs, you able to find Associate IP address with Instances ID

                9. Volume and Snapshot: Click on Volume and Snapshot, you able to create volume for the instance with mentioning the size in "GB'
                Now select the Region and Credentials accordingly. And you will be good to go.

                Note: You could download the hybridfox from here and also feel free to contribute.

                This screen just shows the list of images that the are registered with our Eucalyptus Cloud.

                Show me!

                Hybrid Fox

                Hybridfox
                Regions



                Credentials



                KeyPairs



                Security Groups



                Image




                Launch New Instance




                Manage Instance




                Elastic IPs




                Volume and Snapshots




                Doesn't Elasticfox for Eucalyptus?


                Yes, heard that with Eucalyptus 1.6.1, elasticfox 1.7.x will work out of the box. Havent tried that out but they claim. Having said that there hybrid fox will need to be, more focused on supporting all features of eucalyptus without breaking the EC2 functionalities.

                Wednesday, November 11, 2009

                AWS Workshop, Chennai

                With the onset of the Monsoons in Chennai there were Cloud of a different kind looming, and the Met Department was evangelizing them; there was also another Cloud Evangelist in Chennai, Jinesh Varia evangelizing Clouds of different kind, the Amazon Clouds.

                Event and Filling Gaps



                The event kicked off with Bobby Varghese from CSS Corp doing a keynote for the workshop. Bobby spoke about CSS adoption of the AWS and how they have come up with products and use cases based on AWS.

                The real clouds were holding back Jinesh from reaching the venue. So Ezhilaran Babaraj(Ezhil), from CSS Labs, and Lakshmanan Narayan(Lux), from Vembu, had to fill up for the latency. Both these session's were to be midway sessions to Jinesh's overview on AWS, that would have given mileage to these session for which AWS is the basis. Nevertheless, considering the audience were not real novices to AWS, these sessions were well taken.

                Ezhil, explained at large the initiative at CSS Labs towards Cloud enablement, and specifically about the CloudBuddy API, and the Plugin Framework which will help extend Windows based applications to Cloud, and extending CloudBuddy itself repectively. The demo of the plugin framework, and APIs, of CloudBuddy, was not possible thanks to BSNL's connectivity issues.

                Lux, who chose not use the mic, started off on an overview of Vembu Technologies. And then showcased Vembu Home BETA with its Adobe Air based UI. And how it helps the user backup on Desktop as well as the Cloud.

                Jinesh Arrives



                As Lux, was finishing his talk, Jinesh arrived. And there was a coffee break.

                Post coffee! Jinesh was quick to take the stage. He, started off with an introduction to Cloud Computing taking the analogy of a Belgian Beer Company which in its early days had to generate its own power(electricity), and now that power comes from a grid, and the beers tastes no different. And that power generating machine is now in the mueseum. An ominus sign for our inhouse data centers.

                Having introduced the crowd to Cloud Computing he jumped to what he was here to do, showcase Amazon Web Services. Jinesh, did well to get the some temporary Keys to access Amazon Web Services, which were distributed to the audeience, but the connectivity issues rendered them useless.

                Jinesh's articulation of the Amazon Web Services did manage to fill whatever gap, that were left due to (dis)connectivity. The "Overview of AWS" introduced the audiences to nuances of creating AWS accounts, about the access credentials, usage of those accounts, billing- pay as you go model, the API, tools, and the Architecture of AWS. The Architecture Diagram built was a Jigsaw of various Amazon's Web Service offerings fitting into one service, to run an application on the cloud.

                Specifics



                The overview naturally spilled over to specifics. Lack of connectivity meant more talking than doing, also this meant the start of a marathon session on AWS specifics.

                Amazon S3(Simple Storage Service) was the first of the talks on specific Amazon Web Services. The S3 session also included coverage on CloudFront, CD based on S3. S3 session was followed by, EC2(Elastic Compute Cloud) which is the computing face of AWS, this session also included the failover support services for EC2- CloudWatch, Autoscaling, and Elastic LoadBalancing. Apart from which the persistent storage, Elastic Block Storage, on EC2 was explained.

                Each of these sessions Features, Terminology, Concepts,In Action, Tools and API, Pricing and Typical Use Cases for each of the Amazon Web Sservices.

                Finally Jinesh, talked on the "Best Practices" and "Migration to the Cloud".



                And finished it with an exercise on building Cloud Enabled application using various AWS offering.

                The Unconference



                This was where all the action was expected. But due to Jinesh's marathon sessions there was only a little time distributed equally among the presenters. What follows is what happened.

                Bosky from Hover who spoke about the Key-Value based systems, and distributed environments.

                Kiran from MarketSimplified spoke about their SaaS application and how they use AWS to host it.

                Senthil from RailsFactory spoke about "What Jinesh did not mention?"

                Murthy and Sam from CSS Corp explained about whats and whys of Hybrid Cloud, and also presented a Demo of "scaling out" to public cloud.

                There were experiences shared by XLSoft and Anantara Solutions.

                The unconference's finale was a video "Cloud Cloud Maybe" compile by Vembu



                Adios



                All in all it was great event. With a good attendance, good lunch, and with a pinch of togetherness as an AWS community. The basic motive of this whole exercise were two things, Building an engaging community around AWS, and to see if Cloud is real. About the latter there really was no doubt. With usages like Animoto, TimesMachine, and our Payroll Processing, it surely is the future of computing. But doubts shall remain if former will stand, as it took Jinesh's presence for this kind of event to happen.

                To sum it up in Jinesh's words "Keep this engagement going".

                Friday, October 30, 2009

                CloudBuddy has Support for viewing Jungle Disk files

                Here I am at CSS Labs, developing components for our very own CloudBuddy… Wonder what CloudBuddy is? Take a look at www.mycloudbuddy.com and I promise, you would want to have a go at it.

                And what exactly am I going to write here? After thinking for quite some time, I decided that I could tell you about what CloudBuddy can do with Jungle Disk files…

                Here we go… This one is interesting… Jungle Disk Support

                I was going through the users’ feedback on CloudBuddy… One Jungle Disk user had this problem of not being able to view the files that he had uploaded (using Jungle Disk) into S3 using other tools and he had raised an issue on whether this can be handled in CloudBuddy.

                And I thought – Why not??

                Jungle Disk website briefed me about what Jungle disk is all about, with their encrypted storage mechanism and stuff… And there I found that Jungle Disk gives us his decryption API… A very good gesture indeed…

                So, I decided to have a look at the Jungle Disk Decryption API. C# Code, of course… I was just going through the code and believe me, it was Greek and Latin for me the first time I looked into it…

                Of course, going through the code written by someone else is always a tough task… I was searching for answers on how I’m going to make it… And the solution was right in front of me… Jungle disk had given a sample application that uses his API!

                And that was all that I needed… And the way CloudBuddy was programmed, it made my life all the more simpler… All I had to do is just build a file structure that both the Jungle Disk API and CloudBuddy can understand… Done!! That’s it… CloudBuddy has built-in support for viewing and downloading Jungle Disk files… :-) who else has??

                Jungle Disk users, if you wish to try out CloudBuddy to view or download your Jungle Disk files, read on for the instructions…

                JUNGLE DISK:

                Jungle Disk creates only one bucket per location (US / EUROPE) per user. Hence the user can have utmost 2 buckets.

                The 2 buckets are named by (possibly) prefixing “jd2-” and suffixing “-us” (or “-eu”) to the MD5 hash value of the access key provided by Amazon to the particular user.

                For example, the 2 possible buckets for a user will look like this :

                ▪ jd2 - MD5 Hash value of your AWS Access Key - us
                ▪ jd2 - MD5 Hash value of your AWS Access Key - eu

                jd2 – Jungle Disk Version 2
                us – Location United States
                eu – Location Europe


                Jungle Disk considers either or both of the above buckets as Top-Level Buckets and allows us to create infinite number of Sub-Buckets (or Folders, to be more precise) under the Top-Level Buckets.

                One important feature about Jungle Disk is that it provides the option of encryption of sub-buckets (folders) and files.

                It gives us the option of whether or not a Sub-Bucket (Folder) must be password protected.

                Jungle Disk creates a different architecture on how the Sub-Bucket's contents are organized based on whether the Sub-Bucket is password protected or not. And that's where CloudBuddy can make a difference

                Here’s what CloudBuddy can do with your Jungle Disk files / folders…

                Cool… Isn’t it?... Try CloudBuddy… You'll love it:-)

                Friday, April 3, 2009

                Eucalyptus: Clouding privately

                In our constant pursuit of exploring new technologies, we stumbled upon the Eucalyptus and greatly contemplated on its installation. Being driven by our leaning towards cloud. Eucalyptus really clouded our minds and we decided, "no pain, no gain". And here we are having successfully installed Eucalyptus and here is how we did it. We have the Cloud Controller, the Cluster Controller and Node Controller all up and running. But the document is still a draft version.

                Why Draft?
                We are not basking in the clouds having did it, Eucalyptus did not allow us to do so. We have some post installation pains and trying to solve them.We shall release the final version once we figure out a way. Keep watching this blog post.

                Download the Eucalyptus Installation Document from here.