THECRYSTALCLOUDS

Wednesday, 5 September 2012

AWS Management Console Improvements to EC2 Tab

AWS recently made some improvements to the EC2 tab of the AWS Management Console. It is now easier to access the AWS Marketplace and to configure attached storage (EBS volumes and ephemeral storage) for EC2 instances.

Read on a good post by Jeff.

Marketplace Access

This one is really simple, but definitely worth covering. You can now access the AWS Marketplace from the Launch Instances Wizard:

AWS Marketplace

After you enter your search terms and click the Go button, the Marketplace results page will open in a new tab. Here's what happens when I search for wordpress:

AWS Week in Review - August 4th to August 12th, 2012

Let's take a quick look at what happened in AWS-land last week:

Monday, August 6	AWS Elastic MapReduce now supports Hadoop 1.0.3 including HDFS over HTTP.
Wednesday, August 8	AWS released some EBS Provisioned IOPS resources including a benchmarking guide.
Thursday, August 9	AWS reduced the minimum throughput for DynamoDB tables.

SOURCE

AWS Week in Review - July 30th to August 3rd, 2012

Let's take a quick look at what happened in AWS-land last week:

Monday, July 30	AWS friends at Netflix published the code for their Chaos Monkey, a member of their Simian Army of testing tools. You can now buy EC2 Reserved Instances for Red Hat Enterprise Linux. Jeff posted the newest episode of The AWS Report with Dave Peck, Co-Founder of Cloak.
Tuesday, July 31	You can now use Creately to build AWS architecture diagrams.
Wednesday, August 1	AWS introduced Provisioned IOPS for EBS volumes.
Friday, August 3	AWS released three new Trusted Adviser checks.

SOURCE

AWS Week in Review - August 13th to August 19th, 2012

Let's take a quick look at what happened in AWS-land last week:

Monday, August 13	AWS updated AWS Direct Connect with new locations and AWS Management Console support. AWS published another AWS Report, this one featuring Kate Matsudaira, CTO of Decide.com.
Thursday, August 16	AWS added additional features to Amazon RDS for Oracle Database, including support for running in a VPC, support for Oracle XMLDB, and Oracle Application Express (APEX). AWS published a new white paper, the Total Cost of (Non) Ownership of Web Applications in the Cloud.
Sunday, August 19	AWS released Python support for Elastic Beanstalk including seamless integration with Amazon RDS.

SOURCE

AWS Week in Review - August 20th to August 26th , 2012

Let's take a quick look at what happened in AWS-land last week:

Monday, August 20	AWS released Amazon Glacier, archival storage for one penny per Gigabyte per month.
Tuesday, August 21	Amazon DynamoDB now supports storage and retrieval of binary data. AWS released AWS Cost Allocation for customer bills.
Wednesday, August 22	You can now launch EC2 Cluster Compute Instances in the US West (Oregon) region.
Thursday, August 23	Amazon RDS for SQL Server now supports SQL Server Agent.
Friday, August 24	This week, AWS continued to add many new products to AWS Marketplace, including two AMI-deployable CloudOptimizer products from CloudOpt Incorporated, and SaaS products Docunym, HotRiot Web Application Development Engine, Bias , and Insight. AWS is also excited to welcome our first South American vendor to AWS Marketplace, Artsoft Sistemas, who listed two versions of Fenicia this week.

SOURCE

AWS Week in Review - August 27th to September 2nd , 2012

Let's take a quick look at what happened in AWS-land last week:

Monday, August 27	New AWS Report interview, Chris Wheeler discusses the Amazon Simple Email Service (SES).
Friday, August 31	Amazon S3 now supports Cross Origin Resource Sharing (CORS). AWS Evangelist Joe Ziegler uploaded a new presentation, Enterprise Journey to the Cloud. New AWS Report interview, Todd Fassulo of Smartsheet.

SOURCE

AWS Report with Michael Kellen of Sage Bionetworks

Michael Kellen, Director of Technology of Sage Bionetworks, talks about how Sage Bionetworks, a non-profit organization that help medical research analyze data is using AWS to save costs, allowing them to focus on their core business.

Really interesting use of Amazon S3, Elastic Beanstalk, and the Simple Workflow Service.

Monday, 3 September 2012

Amazon S3 - Cross Origin Resource Sharing Support

GREAT NEWS!!!

AWS has announced support for Cross-Origin Resource Sharing (CORS) in Amazon S3.

You can now easily build web applications that use JavaScript and HTML5 to interact with resources in Amazon S3, enabling you to implement HTML5 drag and drop uploads to Amazon S3, show upload progress, or update content. Until now, you needed to run a custom proxy server between your web application and Amazon S3 to support these capabilities. A custom proxy server was required because web browsers limit the way web pages loaded from one site (e.g., mywebsite.com) can interact with content from another location (e.g., a location in Amazon S3 like assets.mywebsite.com.s3.amazonaws.com). Amazon S3’s support for CORS replaces the need for this custom proxy server by instructing the web browser to selectively enable these cross-site interactions.

Configuring your bucket for CORS is easy. To get started, open the Amazon S3 Management Console, and follow these simple steps:

1) Right click on your Amazon S3 bucket and open the “Properties” pane.
2) Under the “Permissions” tab, click the “Add CORS configuration” button to add a new CORS configuration. You can then specify the websites (e.g., "mywebsite.com”) that should have access to your bucket, and the specific HTTP request methods (e.g., “GET”) you wish to allow.
3) Click Save.

For more information on using CORS with Amazon S3, review the Amazon S3 Developer Guide.

Sunday, 2 September 2012

CLOUD INFOGRAPHIC: GOOGLE DRIVE VS THE COMPETITION

Google's cloud service is serious about taking on the competition.

This infographic compares Google Drive to its top three competitors: Dropbox, Microsoft's Skydrive, and Apple's iCloud. Besides comparing size, pricing, and platform support, we also take a look at a brief history of these cloud services.

There are hundreds of cloud services available today, a fact which does not make it any easier for users to select a cloud storage provider that fits their needs.

Google is no longer the only kid on the cloud computing block. In fact, Apple, Microsoft, Amazon and even private parties are entering into the cloud storage industry, and rather aggressively. The race is on for these cloud storage services to bring in the most customers and attract the most attention, but who is the best of the best? Answering this question is not as simple as one might hope. In fact, there’s a lot more to selecting the best cloud storage company than meets the eye.

This is because cloud storage is customized to users based on their distinct needs, which means while a certain cloud storage may be perfect for one individual, it might not work well for another.

Original source : infographiclabs.com cloudtweaks.com

Cloud Computing & the Public Sector ?

Cloud Computing Image

Source: AMD and Redshift Research

Cloud Infographic: The World And Big Data

SOURCE

Thursday, 30 August 2012

Going to the Cloud - Cloud in Education

A very good Infographic that looks at how schools and colleges are adopting 'the cloud' and how Adobe, IBM, Microsoft, and Google are responding with their respective cloud suites for educators.

Going to the Cloud

From: OnlineColleges.net

SOURCE

Wednesday, 29 August 2012

What's new in VMware vSphere 5.1

Today VMware announced vSphere 5.1. This posting will give an overview of the most interesting new features.

vSphere 5.1 will be available September 11 2012 !!

Some highlights are as follows:

Paul Maritz steps down as CEO after leading the company for 4 years. His successor is Pat Gelsinger
VMware is focused on building the architecture for Cloud Computing which is called the Software Defined Datacenter
vCloud Suite is announced, consisting of:
- vSphere
- vCloud Director
- vCloud Networking and Security
- Site Recovery Manager
- vCenter Operations Manager
- vFabric Application Director
- vCloud API’s
- vCloud Connector
- vCenter Orchestrator

vSphere 5.1 is announced

vCloud Director 5.1 is announced

vCloud Networking and Security 5.1 is announced

vCenter Site Recovery Manager 5.1 is announced

vRAM is no more! VMware will use a priced per CPU model

Cloud Ops, a new operating model for IT

Monster VMs will get bigger: 64 virtual CPUs and 1 million IOPS...per VM

Enhanced vMotion: Live migration without the need of shared storage!

New virtualized storage options

Create secure and logical networks using the new vCloud Networking & Security suite and VXLAN

vSphere 5.1 contains a full featured browser based vSphere Client, the Web Client

The vCloud Director interface is now vSphere Web Client style

The vSphere Web Client now offers great extensibility options for 3rd party vendors

Use vFabric Application Director for deploying complex applications

Existing vSphere Enterprise Plus customers will get a free upgrade to the vCloud Suite

VMware recently acquired Nicira, a company that virtualizes networking

More detailed information on all these announcements follow below:

VMware changed the features in the vSphere editions. The features below all are available now in Standard Edition as well.

SURPRISE !! VMware Will Join OpenStack

Never say never. VMware is about to join the OpenStack Foundation, a group initially backed by other industry giants as a counterweight to VMware’s server virtualization dominance. Intel and NEC are also on deck to join as Gold OSF members.

OpenStackLogo

Just in time for VMworld, VMware is about to join the OpenStack Foundation as a Gold member, along with Intel and NEC, according to a post on the OpenStack Foundation Wiki. The applications for membership are on the agenda of the August 28 OpenStack Foundation meeting.

A year ago, a VMware-OpenStack hookup would have been seen as unlikely. When Rackspace and NASA launched the OpenStack Project more than two years ago, it was seen as a competitive response to VMware’s server virtualization dominance inside company data centers and to Amazon’s heft in public cloud computing. Many tech companies including but not limited to Rackspace, IBM, Hewlett-Packard, Citrix, Red Hat and Microsoft saw VMware as a threat and were bound and determined to keep the company from extending its virtualization lock into the cloud.

But, things change. VMware’s surprise acquisition of Nicira and DynamicOps last month, showed there might be a thaw in the air. For one thing, Nicira is an OpenStack player. By bringing Nicira and DynamicOps into the fold, VMware appeared to be much more willing to work with non-VMware-centric infrastructure, as GigaOM’s Derrick Harris reported at the time.

This is a symbolic coup for OpenStack and its biggest boost since IBM and Red Hat officially joined as Platinum members in April. And it’s especially important since Citrix, a virtualization rival to VMware undercut it’s own OpenStack participation last April by pushing CloudStack as an alternative open source cloud stack.

OpenStack Gold members, which include Cloudscaling, Dell, MorphLabs, Cisco Systems, and NetApp, pay a fee pegged at 0.25 percent of their revenue — at least $50,000 but capped at $200,000 according to the foundation wiki. (VMware’s fee will be $66,666, according to the application, submitted by VMware CTO Steve Herrod, which is linked on the wiki post.) Platinum members — AT&T, Canonical, HP, Rackspace, IBM, Nebula, Red Hat, and SUSE – pay $500,000 per year with a 3-year minimum commitment.

SOURCE

Original Source : Gigaom.com

Introduction to Virtualisation - VMware

This video webcast is designed to help those with little to no virtualization experience understand why virtualization and VMware are so important to driving down both capital and operational costs

Introduction to Virtualisation - VMware

View the slides here

SOURCE : Infoworld Newsletter

Tuesday, 28 August 2012

AWS Cost Allocation For Customer Bills

A good new feature by AWS to help customers keep control over costs and well put blog by Jeff...

Growth Challenges

You probably know how it goes when you put AWS to work for your company. You start small -- one Amazon S3 bucket for some backups, or one Amazon EC2 instance hosting a single web site or web application. Things work out well and before you know it, word of your success spreads to your team, and they start using it too. At some point the entire company jumps on board, and you become yet another AWS success story.

As your usage of AWS grows, you stop charging it to your personal credit card and create an account for your company. You use IAM to control access to the AWS resources created and referenced by each of the applications.

There's just one catch -- with all of those departments, developers, and applications making use of AWS from a single account, allocating costs to projects and to budgets is difficult because we didn't give you the necessary information. Some of our customers have told us that this cost allocation process can consume several hours of their time each month.

Cost Allocation Via Tagging

Extending the existing EC2 tagging system (keys and values), we are launching a new cost allocation system to make it easy for you to tag your AWS resources and to access billing data that is broken down by tag (or tags).

With this release you can tag the following types of AWS resources for cost allocation purposes:

S3 buckets
EC2 Instances
EBS volumes
Reserved Instances
Spot Instance requests
VPN connections
Amazon RDS DB Instances
AWS CloudFormation Stacks

Here's all that you need to do:

Decide on Your Tagging Model - Typically, the key name identifies some axis that you care about and the key values identify the points along the axis. You could have a tag named Department, with values like Sales, Marketing, Development, QA, Engineering, and so forth. You could choose to align this with your existing accounting system. You can use multiple tags for cost allocation purposes, each of which represents an additional dimension of usage. If each department runs several AWS-powered applications (or stores lots of data in S3), you could add an Application tag, with the values representing all of the applications that are running on behalf of the department. You can use the tags to create your own custom hierarchy.
Tag Your Resources - Apply the agreed-upon tags to your existing resources, and arrange to apply them to newly created resources as they appear. You can add up to ten tags per resource. You can do this from the AWS Management Console, the service APIs, the command line, or through Auto Scaling:

You can use CloudFormation to provision a set of related AWS resources and easily tag them.
Tell AWS Which Tags Matter -Now you need to log in to the AWS Portal, sign up for billing reports, and tell the AWS billing system which tag keys are meaningful for cost allocation purposes by using the Manage Cost Allocation Report option:

You can choose to include certain tags and to exclude others.
Access Billing Data - The estimated billing data is generated multiple times per day and the month-end charges are generated within three days of the end of the month. You can access this data by enabling programmatic access and arranging for it to be delivered to your S3 bucket.

Data Processing

The Cost Allocation Report will contain one additional column for each of the tag keys that you selected in step 3. The corresponding tag value (if any) will be included in the appropriate column of the data:

AWS Cost Allocation For Customer Bills

In the Cost Allocation Report above, the relevant keys were Owner, Stack, Cost Center, Application, and Project. The column will be blank if the AWS resource doesn't happen to have a value for the key. Data transfer and request charges are also included for tagged resources. In effect, these charges inherit the tags from the associated resource.

Once you have this data, you can feed it in to your own accounting system or you can slice and dice it any way you'd like for reporting or visualization purposes. For example, you could create a pivot table and aggregate the data along one or more dimensions:

AWS Cost Allocation For Customer Bills

Identifying Workloads for the Cloud

Superb Information by Rightscale... Read on...

Identifying workloads to move to the cloud can be tricky. You have dozens or hundreds of apps running in your organization, and now that you’ve seen the operational efficiencies and agility available to you in the cloud, you’re tempted to move as many of them to the cloud as quickly as possible. As you’ll see in the examples below, cloud computing is indeed a good fit for many common workloads.

I firmly believe that infrastructure-as-a-service (IaaS) cloud is for every organization, but not for every application. The reality is that some applications just aren’t a good fit for the ephemeral and dynamic environment of the cloud. Still others have very specific environmental requirements that make them ill suited. Read on as I explore more about what you should consider before earmarking a workload for the cloud.

3 Quick Criteria for a Good Fit

While each application is unique, and it’s important to apply your own lens when evaluating your cloud strategy, there are some rules of thumb that should help identify applications that are winning choices for cloud:

Unpredictable load or potential for explosive growth: Whenever your app is public facing, it has the potential to be wildly popular. Social games, eCommerce sites, blogs and software-as-a-service (SaaS) products fall into this category. If you release the next Farmville™ and your traffic spikes, you can scale up and down in the cloud according to demand, avoiding a “success disaster” and never over-provisioning your infrastructure.

Partial utilization: When traffic fluctuates – say with daily cycles of playing or shopping, or with occasional, compute-intensive batch processing – you can spin up extra servers in the cloud during the peaks and spin them down afterwards.

Easy parallelization: Applications like media streaming can be scaled horizontally and are generally a good use case for the cloud, because they scale out rather than up.
Finally, keep in mind the ideal of cloud computing as a way of using multiple resource pools – public cloud, private cloud, hybrid, your internal data center – not choosing one over the others. RightScale lets you see and manage all of them through one interface with a single set of tools and best practices.

3 Ideal Cloud Workloads

INFOGRAPHIC- Is The Future Of Cloud Computing Open Source? A Few Things To Consider

Companies are embracing cloud computing solutions because of their flexibility, scalability and cost-effectiveness, and those who have successfully integrated the cloud into their infrastructure have found it quite economic. They can expand and contract, and add and remove services as per requirement, giving them a lot of control over the resources being used and the funds being spent on those resources. This highly controllable environment not only cuts the costs of services, but also saves funds that are spent on the infrastructure of the company.

Replacement of Personal Computers with Personal Clouds

Cloud computing is not only becoming popular in business, but also among individual consumers. With the passage of time, personal computers are being replaced by personal clouds, and more and more companies are offering personal cloud services. People prefer to store their images, videos and documents online, both as a backup and to make them secure. Storing data on personal clouds makes it available anytime, anywhere. You just need a computing device and an Internet connection, and you can access all your photos, videos and documents.

Stability, Scalability and Reliability of Open-Source Software

Open-source software is becoming popular on an enterprise level because of its stability, scalability and reliability. Companies love to use open-source technologies because they are highly customizable, secure, reliable and accountable. With proprietary software, we are highly dependent on the software company for its development and support. But for open-source, we can find huge support from developers across the world, and we can tweak it according to our needs. Just hire a team of developers, and there you go.

Lessons Learned from Linux and Android

Monday, 27 August 2012

Getting Started with IAM Roles for EC2 Instances

AWS Identity and Access Management (IAM) helps you securely control access to Amazon Web Services and your account resources. IAM can also keep your account credentials private. With IAM, you can create multiple IAM users under the umbrella of your AWS account or enable temporary access through identity federation with your corporate directory. In some cases, you can also enable access to resources across AWS accounts.

Without IAM, however, you must either create multiple AWS accounts—each with its own billing and subscriptions to AWS products—or your employees must share the security credentials of a single AWS account. In addition, without IAM, you cannot control the tasks a particular user or system can do and what AWS resources they might use.

AWS has recently launched IAM Roles for EC2 Instances. A role is an entity that has a set of permissions that can be assumed by another entity. Use roles to enable applications running on your Amazon EC2 instances to securely access your AWS resources.You grant a specific set of permissions to a role, use the role to launch an EC2 instance, and let EC2 automatically handle AWS credential management for your applications that run on Amazon EC2. Use AWS Identity and Access Management (IAM) to create a role and to grant permissions to the role.

IAM roles for Amazon EC2 provide:

AWS access keys for applications running on Amazon EC2 instances
Automatic rotation of the AWS access keys on the Amazon EC2 instance
Granular permissions for applications running on Amazon EC2 instances that make requests to your AWS services

The below video demonstrates basic workflow of:

Create new role AWS IAM Workflow

For more help, refer the AWS documentation for IAM here.

For other AWS Documentations, please refer to the quick links provided in the Blogger's right-side panel.

Infographic : Evolution of Computer Languages

All the cloud applications you use on the Internet today are written in a specific computer language. What you see as a nice icon on the front end looks like a bunch of code on the back end. It’s interesting to see where computer languages started and how they have evolved over time. There are now a series of computer languages to choose from and billions lines of code. Check out the infographic below to see the computer language timeline and read some fun facts about code along the way.

SOURCE

Infographic: Demystifying AWS - Revealing Behind the scenes usage

Amazon Web Services (AWS) is the biggest public cloud around, yet what goes on behind the scenes remains a mystery.

Read on for a good Infographic by newvem blog !

"For heavy users, such as enterprise level CIOs, AWS’s “Reserved Instances” are a cost effective model to scale their cloud activity and benefit from the full service offering that Amazon provides.

The infographic is based on analysis made by our Reserved Instance Decision Making Tool. This advanced analytics tool can help enterprise CIOs to capture the added value and benefit by:

Ensuring that reserved instances meet cost and performance expectations.
Identifying consistent onOn-demand Demand usage that can be shifted to reserved Reserved instances.
Tracking Reserved Instance expiration dates and recommend actions for renewal and scale up and down.

Infographic: Demystifying Amazon Web Services

SOURCE

Friday, 24 August 2012

AWS New Whitepaper: Mapping and GeoSpatial Analysis in the Cloud Using ArcGIS

Great new whitepaper by Jinesh Varia...

Esri is one of the leaders in the Geographic Information Systems (GIS) industry and one of the largest privately held software companies focused on mapping and geospatial applications in the world with offices in more than 100 countries. Both public and private sector organizations use Esri technology to analyze and manage their geographic information and make better decisions – uses range from planning cities and improving the quality of life for residents, to site selection, customer analytics, and streamlining logistics.

Esri and AWS have been working together since 2008 to bring the power of GIS to the masses. The AWS Partner Team recently attended the 2012 Esri International User Conference with over 14,000+ attendees, 300 exhibitors and a large number of ecosystem partners. A cloud computing theme dominated the conference.
Esri and AWS have co-authored a whitepaper, "Mapping and GeoSpatial Analysis Using ArcGIS", to provide users who have interest in performing spatial analysis using their data with complimentary datasets.

The paper discusses how users can publish and analyze imagery data (such as satellite imagery, or aerial imagery) and create and publish tile cache map services from spatially referenced data (such as data with x/y points, lines, polygons) in AWS using ArcGIS.

Download PDF: Mapping and GeoSpatial Analysis Using ArcGIS

The paper focuses on imagery because that has been the most challenging data type to manage in the cloud, but the approaches discussed are general enough to apply to any type of data.

It not only provides architecture guidance on how to scale ArcGIS servers in the cloud but also provides step-by-step guidance on publishing map services in the cloud.

For more information on GeoApps in the AWS Cloud, see the presentation -
The Cloud as a Platform for Geo below:
GeoApps in the AWS Cloud - Jinesh Varia from Amazon Web Services

SOURCE

Wednesday, 22 August 2012

Automating Linux Installation and configuration with Kickstart

Automating Linux Installation and configuration with Kickstart

If you are working for an IT Support company means you regularly have to install OSs like CentOS, Fedora & Redhat on servers, desktop computers or even Virtual Machines.

Following this guide will explain you how to automate the install process using a simple Kickstart file.

Read more for the very well explained guide here.

Tuesday, 21 August 2012

Deploy a .NET Application to AWS Elastic Beanstalk with Amazon RDS Using Visual Studio

In this video, walk you through deploying an application to AWS Elastic Beanstalk (link: http://aws.amazon.com/elasticbeanstalk/), configuring an Amazon RDS for SQL Server DB instance (link: http://aws.amazon.com/rds/), and managing your configuration, all from the confines of Visual Studio. The AWS Toolkit for Visual Studio streamlines your development, deployment, and testing inside your familiar IDE.
To learn more about AWS Elastic Beanstalk and Amazon RDS, visit the AWS Elastic Beanstalk Developer Guide at http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/create_deploy_NE....

SOURCE

Amazon CloudSearch - Start Searching in One Hour for Less Than $100 / Month

Extract from Amazon Web Service Evangelist Jeff Barr's CloudSearch blog post for more information about how you can start searching in an hour for less than $100 a month...

Continuing along in our quest to give you the tools that you need to build ridiculously powerful web sites and applications in no time flat at the lowest possible cost, I'd like to introduce you to Amazon CloudSearch. If you have ever searched Amazon.com, you've already used the technology that underlies CloudSearch. You can now have a very powerful and scalable search system (indexing and retrieval) up and running in less than an hour.

You, sitting in your corporate cubicle, your coffee shop, or your dorm room, now have access to search technology at a very affordable price. You can start to take advantage of many years of Amazon R&D in the search space for just $0.12 per hour (I'll talk about pricing in depth later).

What is Search?

Search plays a major role in many web sites and other types of online applications. The basic model is seemingly simple. Think of your set of documents or your data collection as a book or a catalog, composed of a number of pages. You know that you can find the desired content quickly and efficiently by simply consulting the index.

Search does the same thing by indexing each document in a way that facilitates rapid retrieval. You enter some terms into a search box and the site responds (rather quickly if you use CloudSearch) with a list of pages that match the search terms.

As is the case with many things, this simple model masks a lot of complexity and might raise a lot of questions in your mind. For example:

How efficient is the search? Did the search engine simply iterate through every page, looking for matches, or is there some sort of index?
The search results were returned in the form of an ordered list. What factor(s) determined which documents were returned, and in what order (commonly known as ranking)? How are the results grouped?
How forgiving or expansive was the search? Did a search for "dogs" return results for "dog?" Did it return results for "golden retriever," or "pet?"
What kinds of complex searches or queries can be used? Does the result for "dog training" return the expected results. Can you search for "dog" in the Title field and "training" in the Description?
How scalable is the search? What if there are millions or billions of pages? What if there are thousands of searches per hour? Is there enough storage space?
What happens when new pages are added to the collection, or old pages are removed? How does this affect the search results?
How can you efficiently navigate through and explore search results? Can you group and filter the search results in ways that take advantage of multiple named fields (often known as a faceted search).

Needless to say, things can get very complex very quickly. Even if you can write code to do some or all of this yourself, you still need to worry about the operational aspects. We know that scaling a search system is non-trivial. There are lots of moving parts, all of which must be designed, implemented, instantiated, scaled, monitored, and maintained. As you scale, algorithmic complexity often comes in to play; you soon learn that algorithms and techniques which were practical at the beginning aren't always practical at scale.

What is Amazon CloudSearch?

Amazon CloudSearch is a fully managed search service in the cloud. You can set it up and start processing queries in less than an hour, with automatic scaling for data and search traffic, all for less than $100 per month.

CloudSearch hides all of the complexity and all of the search infrastructure from you. You simply provide it with a set of documents and decide how you would like to incorporate search into your application.

You don't have to write your own indexing, query parsing, query processing, results handling, or any of that other stuff. You don't need to worry about running out of disk space or processing power, and you don't need to keep rewriting your code to add more features.

With CloudSearch, you can focus on your application layer. You upload your documents, CloudSearch indexes them, and you can build a search experience that is custom-tailored to the needs of your customers.

How Does it Work?

The Amazon CloudSearch model is really simple, but don't confuse simple, with simplistic -- there's a lot going on behind the scenes!

Here's all you need to do to get started (you can perform these operations from the AWS Management Console, the CloudSearch command line tools, or through the CloudSearch APIs):

Create and configure a Search Domain. This is a data container and a related set of services. It exists within a particular Availability Zone of a single AWS Region (initially US East).
Upload your documents. Documents can be uploaded as JSON or XML that conforms to our Search Document Format (SDF). Uploaded documents will typically be searchable within seconds. You can, if you'd like, send data over an HTTPS connection to protect it while it is transit.
Perform searches.

There are plenty of options and goodies, but that's all it takes to get started.

Amazon CloudSearch applies data updates continuously, so newly changed data becomes searchable in near real-time. Your index is stored in RAM to keep throughput high and to speed up document updates. You can also tell CloudSearch to re-index your documents; you'll need to do this after changing certain configuration options, such as stemming (converting variations of a word to a base word, such as "dogs" to "dog") or stop words (very common words that you don't want to index).
Amazon CloudSearch has a number of advanced search capabilities including faceting and fielded search:

Faceting allows you to categorize your results into sub-groups, which can be used as the basis for another search. You could search for "umbrellas" and use a facet to group the results by price, such as $1-$10, $10-$20, $20-$50, and so forth. CloudSearch will even return document counts for each sub-group.

Fielded searching allows you to search on a particular attribute of a document. You could locate movies in a particular genre or actor, or products within a certain price range.

Search Scaling
Behind the scenes, CloudSearch stores data and processes searches using search instances. Each instance has a finite amount of CPU power and RAM. As your data expands, CloudSearch will automatically launch additional search instances and/or scale to larger instance types. As your search traffic expands beyond the capacity of a single instance, CloudSearch will automatically launch additional instances and replicate the data to the new instance. If you have a lot of data and a high request rate, CloudSearch will automatically scale in both dimensions for you.

Amazon CloudSearch will automatically scale your search fleet up to a maximum of 50 search instances. We'll be increasing this limit over time; if you have an immediate need for more than 50 instances, please feel free to contact us and we'll be happy to help.

The net-net of all of this automation is that you don't need to worry about having enough storage capacity or processing power. CloudSearch will take care of it for you, and you'll pay only for what you use.

Pricing Model

The Amazon CloudSearch pricing model is straightforward:

You'll be billed based on the number of running search instances. There are three search instance sizes (Small, Large, and Extra Large) at prices ranging from $0.12 to $0.68 per hour (these are US East Region prices, since that's where we are launching CloudSearch).

There's a modest charge for each batch of uploaded data. If you change configuration options and need to re-index your data, you will be billed $0.98 for each Gigabyte of data in the search domain.
There's no charge for in-bound data transfer, data transfer out is billed at the usual AWS rates, and you can transfer data to and from your Amazon EC2 instances in the Region at no charge.

Advanced Searching

Like the other Amazon Web Services, CloudSearch allows you to get started with a modest effort and to add richness and complexity over time. You can easily implement advanced features such as faceted search, free text search, Boolean search expressions, customized relevance ranking, field-based sorting and searching, and text processing options such as stopwords, synonyms, and stemming.

CloudSearch Programming

You can interact with CloudSearch through the AWS Management Console, a complete set of Amazon CloudSearch APIs, and a set of command line tools. You can easily create, configure, and populate a search domain through the AWS Management Console.
Here's a tour, starting with the welcome screen:

Amazon CloudSearch

You start by creating a new Search Domain:

Amazon CloudSearch

You can then load some sample data. It can come from local files, an Amazon S3 bucket, or several other sources:

Amazon CloudSearch

Here's how you choose an S3 bucket (and an optional prefix to limit which documents will be indexed):

Amazon CloudSearch

You can also configure your initial set of index fields:

Amazon CloudSearch

You can also create access policies for the CloudSeach APIs:

Amazon CloudSearch

Your search domain will be initialized and ready to use within twenty minutes:

Amazon CloudSearch

Processing your documents is the final step in the initialization process:

Amazon CloudSearch

After your documents have been processed you can perform some test searches from the console:

Amazon CloudSearch

The CloudSearch console also provides you with full control over a number of indexing options including stopwords, stemming, and synonyms:

CloudSearch in Action
Some of our early customers have already deployed some applications powered by CloudSearch. Here's a sampling:

Search Technologies has used CloudSearch to index the Wikipedia (see the demo).
NewsRight is using CloudSearch to deliver search for news content, usage and rights information to over 1,000 publications.
ex.fm is using CloudSearch to power their social music discovery website.
CarDomain is powering search on their social networking website for car enthusiasts.
Sage Bionetworks is powering search on their data-driven collaborative biological research website.
Smugmug is using CloudSearch to deliver search on their website for over a billion photos.

Subscribe to: Posts (Atom)