Elasticity and quick provisioning are hallmarks of any good cloud platform. Cloud customers have gotten used to rapidly acquiring right-sized resources that fit a given workload. No longer do developers have to build the biggest (physical) server possible just to avoid requests to resize later on. Rather, provision for what you need now, and adjust the capacity as the usage dictates. But how do you know when it’s time to size up?
The CenturyLink Cloud engineering team just released a monitoring and alert service (alongside our powerful server UI redesign) that gives you the data you need! We designed this feature with three things in mind:
- Offer a simple, straightforward toolset that users can understand and take advantage of quickly.
- Deliver reliable, accurate statistics that reflect the current state of a server.
- Provide multiple ways to identify that an alert was fired.
Together, these three principles kept us focused on delivering a service that met market need. Let’s take a look at how the new monitoring and alert service applies each principles.
It’s easy to get lost in a sea of rarely-used options offered by a monitoring platform. Instead, we focused on ease of setup, a common theme in the CenturyLink Cloud. Users only have to follow two steps.
First, access the Alerts item in the top level navigation menu. This takes you to a list of all the alert policies for your account. Policies can measure CPU, memory, or storage consumption of a server. Creating a policy is as simple as providing a friendly name for the alert, indicating the measure and usage threshold, choosing a duration that the chosen threshold must be exceeded before an alert fires, and a list of the alert’s email recipients.
Once a policy (or polices) are created, simply apply it to one or many servers. The server’s Settings page now has a tab for Alerts where users can quickly add or more policies to the server. To aid usability, we show you a preview of the policy’s core parameters as you select it. This keeps policy names crisp, and prevents incorrect assignment of policies.
Immediately after applying a policy, the platform compares a server’s consumption to the policy’s trigger. Furthermore, you can update policies in a central location and instantly impact all of the servers attached to that policy. Simple, easy – and elegantly powerful!
What’s more, you will easily see when a server has alert policies attached. In our new user interface (available to all users as a public beta!), there are three ways you’ll identify that a server has an alert policy. First, we put an indicator on the monitoring chart that displays the alert level. Secondly, all of a server’s policies are listed in the summary pane. Finally, all policy activities are logged and available in the server’s audit trail.
Monitoring and alerting features exist to deliver proactive, timely, accurate statistics about a virtual machine. It does no good to find out that a server was running hot yesterday. False alarms are counterproductive as well.
In the CenturyLink Cloud monitoring and alerting service, we capture near-real time statistics about each server and show both current and aggregate perspectives. There’s the current consumption highlighted on the left, and the aggregated consumption available on the chart. You’re able to look at a long term aggregation, or even jump down to the average consumption on an hourly basis.
Because the CenturyLink Cloud runs a highly tuned virtualized environment, you may see a difference between what a virtual server shows for consumption, and the value we show in the Control Portal. The Control Portal identifies what the hypervisor itself thinks the utilization is, and this is MORE accurate because the hypervisor can intelligently add horsepower to servers under stress. So, keep this in mind and don’t worry if a server appears slightly stressed to you, but the platform itself doesn’t completely agree!
Finally, it’s important to be able to consume alerting information in multiple ways. We offer three wildly different but extremely complementary mechanisms. By default, a policy must have an email recipient for any alerts. So even if you aren’t logged into the Control Portal, you can instantly find out, in real-time, if an alert condition has been met for the threshold period. Additionally, Control Portal clearly displays when a server is in an alerting stage. If you’re on the server’s details page itself, you’ll see a warning as well as the utilization indicator turned to red. But even better, we highlight the offending server at different levels in the UI - in the left side navigation, the server’s group, and the group’s data center! This means that you can easily see where you have servers experiencing alerts from anywhere in the interface.
The final option is to configure a webhook. Recall that the CenturyLink Cloud offers webhook capabilities which push notifications to an external endpoint of your choosing whenever certain platform conditions occur. We’ve added a new webhook for “alert notification” that will send a data-rich message to any endpoint. For example, you could configure the webhook to feed into your support system so that the two environments (cloud and on-premises) are automatically integrated.
Alerts aren’t helpful if you don’t know they are occurring! So, we’ve built in a host of ways to send notifications and quickly see relevant information.
We’re excited to ship this new capability, and have other plans for building upon these services. Don’t hesitate to provide feedback or feature suggestions by accessing the “feedback” link within the Control Portal!
Last year, we made 12 predictions about what would happen in the cloud space in 2013. As the year comes to a close, it’s only fair for us to assess our hits and misses to see how well we did.
Recap and Scorecard
PREDICTION #1: 2013 will be the year of cloud management software.
REALITY: Hit. We saw this come true on multiple fronts. First, cloud management providers Enstratius and ServiceMesh were acquired by Dell and CSC, respectively. Tier 3 – known for the sophisticated management software that runs our IaaS – was acquired by CenturyLink. On top of this, Gartner estimates that a new vendor enters the cloud management space every month, and nearly every cloud provider is constantly beefing up their own management offerings. This shows the strategic value of comprehensive management capabilities in a cloud portfolio. Customer adoption of these platforms is also on the rise and Gartner sees 60% of Global 2000 enterprises using cloud management technology (up from 30% in 2013).
PREDICTION #2: While the largest cloud providers duke it out on price and scale, smaller cloud providers see that enterprise adoption really depends on tight integration with existing tools and processes.
REALITY: Mixed. Of course, cloud prices definitely declined in 2013 and massive scale continued to be a key selling point. Hybrid cloud picked up momentum this year as more companies looked to establish an IT landscape that leveraged on-premises assets while taking advantage of cloud scale. In order to maximize the efficiency of hybrid scenarios, companies need consistency in processes and tools. While cloud management platforms have helped with this a bit, there wasn’t a wholesale move by cloud providers to seamlessly integrate their core offerings with established products.
PREDICTION #3: Enterprises move from pilots to projects, and architecture takes a front seat.
REALITY: Hit. There’s been much less gnashing of teeth on “should I use the cloud” this year, and much more discussion about how to capitalize on the cloud. We’ve seen our customers move to more substantial solutions and ask for more sophisticated capabilities, such as self-service networking. Throughout the industry, we’re seeing more enterprise-class case studies where customers are putting mission critical workloads in the cloud. However, outages still occur on any cloud, and providers are publishing guidelines on how to properly architect for high availability. The recent AWS conference was full of sessions on architecture best practices, and developers are hungry for information about how those best practices are applied.
PREDICTION #5: Standalone, public PaaS offerings will be slow to gain enterprise adoption.
REALITY: Hit. In 2013 we saw renewed discussion on what PaaS actually is and what it SHOULD be. Longtime PaaS providers Microsoft and Google added IaaS products to their portfolio, while smaller firms like Apprenda saw success in private PaaS. Our sister company, AppFog, has launched over 100,000 apps, including some impressive enterprise deployments. Former Tier 3 colleague Adron Hall asked whether PaaS was still “a thing” or whether new container technologies like Docker were going to replace it. However, as some like our own Jared Wray and Red Hat’s Krish Subramanian have said, PaaS is about more than JUST application containers. A rich PaaS also includes the orchestration, management, and services that make it a valuable platform for web applications of any type. Either way, PaaS is still in its infancy and will continue to morph as customer scenarios take shape.
PREDICTION #6: Public goes private.
REALITY: Mixed. There were hints of this in 2013 as Amazon won a bid to win a private cloud for the CIA (and for you too if you have half a billion sitting around!), Microsoft offered a “pack” for making on-premises environments resemble their public cloud, and platforms like OpenStack gained traction as a private cloud alternative. We continued to make advances in supporting private scenarios by adding self-service site-to-site VPN capabilities to an already-robust set of connectivity options. I gave this a “mixed” score because as a whole, public cloud providers don’t yet (and may never) make it simple to run their stack in a private data center for mainstream enterprises.
PREDICTION #7: Cloud providers embrace alternate costing models.
REALITY: Hit. 2013 saw some changes to how cloud customers paid for resources. We modified our pricing to decouple some components while still making it easy to provision exactly the amount of CPU, memory and storage that you need for a given server. Google and Microsoft both launched their IaaS clouds with “per minute” pricing for compute resources. Cloud providers have yet to move to a “pay for consumption instead of allocation” model for things like storage, but overall we’ve seen a maturation of pricing considerations in 2013.
PREDICTION #8: While portability will increase at the application and hypervisor layer, middleware and environment metadata will remain more proprietary.
REALITY: Mixed. We might have been too pessimistic last year! DevOps tools have flourished in 2013 and platform adapters have made it possible to move workloads between clouds without a massive re-architecture effort. To be sure, code portability is still MUCH simpler than environment portability. Each cloud provider has their own value-added services that rarely transfer easily to other locations, and no clear IaaS standard has emerged. However, platforms like OpenStack are attempting to make cloud portability a reality, and the increasing prevalence of public APIs makes it possible for tools like Pivotal’s BOSH or Chef to orchestrate deployments in diverse provider environments.
PREDICTION #9: Global expansion takes center stage.
REALITY: Hit. One of the first questions we hear from prospective customers is “where are your data centers?” This year, almost all of the leading cloud providers expanded their footprint around the globe. For our part, we added data centers in Canada, the UK, and Germany. Now, as part of CenturyLink, we have major expansion plans in 2014.
PREDICTION #10: IaaS providers who don’t court developers get left behind.
REALITY: Hit. In 2013, Stephen O’Grady wrote that developers are the “new kingmakers” and this was reinforced by Gartner analyst Lydia Leong who wrote that IT operations no longer has a monopoly on cloud procurement. Developers are now running the show – bringing in vendors that meet their unique criteria. Consequently, a new crop of developer-centric cloud providers has popped up. While they don’t offer managed services or sophisticated resource management, they DO help developers get going quickly in the cloud. We wooed developers with new self-service capabilities, API improvements, and with new features like Autoscale and webhooks. Developers will continue to be a focus for us at CenturyLink and we plan on continuing our regular Open Source contributions!
PREDICTION #11: Clouds that cannot be remotely managed through an API will fall behind.
REALITY: Hit. APIs are the gateway to modern services and allow ecosystems to flourish. Consider the vibrant crop of cloud management platforms discussed in prediction #1. And that is just one small example. The vast majority of clouds listed in Gartner’s 2013 Magic Quadrant for Cloud Infrastructure have public, comprehensive APIs that developers can use to consume the cloud in whatever way they want. In 2013, we started an effort to replace our existing API with an even more expansive offering that offers complete parity with our industry leading Control Portal user interface. That effort will continue into the next year. When complete, a new host of capabilities will be accessible for CenturyLink, our partners, and mostly important, our customers.
PREDICTION #12: Usability and self-service become table stakes for cloud providers.
REALITY: Mixed. In 2013, we seemed to hit the point where “clouds that aren’t really clouds” struggled as the market began to demand more. Customers expected more and more self-service capabilities, and Tier 3 – along with most every other major provider – focused heavily on that in 2013. Platform usability was a lesser focus this year. While new clouds from Microsoft and Google included relatively straightforward user experiences, few providers made any massive visual improvements. While the CenturyLink Cloud continues to be lauded for an easy to use, powerful interface, we haven’t stood still. A major redesign is underway that will surface more data, simplify activities, and improve performance.
2013 was an important year in the maturation of the cloud industry. New vendors were introduced, popular platforms were acquired, and consumption of cloud services skyrocketed. What will happen in 2014? Stay tuned for our predictions!
“Getting a little bit of the right information just ahead of when it’s needed is a lot more valuable than all the information in the world a month or a day later.” That quote – found in the book The Two Second Advantage by Vivek Ranadive and Kevin Maney – highlights a new reality where responsiveness can be a competitive advantage. Smart companies are building a responsive IT infrastructure where data isn’t just hoarded in massive repositories, but analyzed quickly and acted upon. How can you know more, faster and have better situational awareness?
With an increasing amount of critical IT systems running in the cloud, there’s a need to know what’s happening and act on it. This month, CenturyLink Cloud introduced Webhooks, making us among the first public IaaS cloud providers to send real-time notifications to a web service endpoint. For this initial release, customers can set up Webhooks for events within accounts, users, and servers.
When To Use This?
Webhooks are relatively new idea, although already used by diverse web properties like Wordpress and Zoho. Let’s look at three different scenarios where CenturyLink Cloud Webhooks can lead to better decisions.
Scenario #1 – Data Synchronization
Polling is an inefficient way to retrieve data from an external system, but it remains a popular choice. When you poll a system for changes, you’re effectively asking “do you have anything new for me?” Many times, the answer is “no.” With push-based notifications, the only time you are contacted is when something relevant happens. For example, some customers synchronize CenturyLink Cloud data with their internal support or configuration management systems. They do this for auditing purposes, or to give support staff an accurate picture of cloud deployments. The issue? Staying in sync requires an aggressive polling frequency that needless encumbers systems. Webhooks provide a better alternative.
In the scenario visualized below, as soon as a new server is created in the CenturyLink Cloud cloud, an event fires and a message is sent to an endpoint specified by the customer. That listener service then updates the appropriate internal system. Within seconds, systems are completely synchronized!
Scenario #2 – Anomaly Detection
People love the cloud because of the self-service capabilities and freedom to instantly create and delete servers at will. One downside of this freedom – for service providers anyway – is fraudulent signups. CenturyLink Cloud resellers actively monitor new accounts, but the sheer volume of manual analysis can be daunting. What if resellers could programmatically monitor specific sequences of events and then use that data to flag an account as “suspect” and deserving of special attention? Again, we turn to Webhooks to help react faster.
It’s great that developers can quickly bring gobs of new cloud machines online. But rapid provisioning can occur within the wrong sub-account or under unusual circumstances. In both of these examples, consider using a complex event processing solution that monitors streams of Webhook events and detects aggregate patterns that reveal more than any single event can.
Scenario #3 – Compliance Monitoring
Cloud and governance don’t have to be at odds with each other – and in fact, these two ideas go hand-and-hand when it comes to IT as a service. CenturyLink Cloud already provides customers with many ways to do this today through sophisticated account management capabilities. But we often get customers requesting a “corner case” scenario – like preventing a certain user from being added to an account, or making sure that database servers aren’t given a public IP address. Webhooks are a way for us to programmatically empower customers to support unique scenarios, in self-service fashion. Via Webhooks, users compare events to previous ones using a data repository. This way, customers can immediately find out if a server was changed inappropriately, a user was added to an account, or the contact information was changed. If an out-of-compliance change is made, the customer can respond almost instantly!
It’s very simple to configure Webhooks in the CenturyLink Cloud cloud. Simply visit the API section of the Control Portal and choose Webhooks. Here, users can browse the list of available Webhooks, then specify the “target” URL to receive a JSON-encoded message. Each Webhook is configured with an HTTPS URL, and includes an optional capability to send events that occur within sub-accounts.
For more details on how to create a Webhook listener service, take a look at our Webhook FAQ article in the Knowledge Base. This is an innovative and exciting capability for the platform and we can’t wait to see how customers use it to create more responsive systems and processes!
Does running your application in the cloud mean that it’s suddenly able to survive any problem that arises? Alas, no. Even while some foundational services of distributed systems are built for high availability, a high performing cloud application needs to be explicitly architected for fault-tolerance. In this multi-part blog series, we will walk through the various application layers and example how to build a resilient system in the CenturyLink Cloud cloud. Over the course of the next few posts, we will define what’s needed to build a complete, highly available system. The reference architecture below illustrates the components needed for a fictitious eCommerce web application.
In this first post, we look at a core aspect of every software system: storage. What type of storage is typically offered by cloud vendors?
- Temporary VM storage. Some cloud providers offer gobs of storage with each VM instance, but with the caveat that the storage isn’t durable and does not survive server shutdown or server failure. While this type of cheap and easy accessible block storage is useful in some situations, it’s not as familiar to enterprise IT staff who are used to storage that’s durable by default.
- Persistent VM storage. This sort of block storage is attached to a VM as durable volumes. It can survive reboots, resets, and can even get detached from one server and reattached to another. Multiple servers cannot access the same volume, but this is ideal for database servers and other server types that need reliable, durable, high-performing storage.
- Object storage. What happens if you want to share data between consumers? Object storage offers HTTP access to a highly available repository that can hold virtually any file type. This is a great option for storing business documents, software, server backups, media files, and more. It is also a useful alternative for secure file transfer.
At CenturyLink Cloud, we offer customers two options: persistent block storage and object storage.
Provisioning Persistent Block Storage
Each virtual server launched in the CenturyLink Cloud cloud is backed by one or more persistent storage volumes. Product details:
- Block storage volumes can be of any size up to 1 TB apiece. Why does this matter? Instead of over-provisioning durable storage – which can happen with cloud providers that offer fixed “instance sizes” – CenturyLink Cloud volumes can be any size you want. Only pay for what you need, and resize the drive as necessary.
- The volumes are attached as ISCSI or NFS and offer at least 2500 IOPS. Why does this matter? Run IO-intensive workloads with confidence and get reliable performance thanks to an architecture that minimizes latency and network hops.
- Block storage is backed by SANs using RAID 10 which provides the best combination of write performance and data protection. Why does this matter? We’ve architected highly available storage for you. Data is striped across drives and mirrored within RAID sets. This means you that you won’t lose your data even if multiple underlying disks failed.
- We take daily snapshots of each storage volume automatically. Standard storage volumes have 5 days of rolling backups, and Premium storage volumes have 14 days of rolling backups with the 5 most recent ones replicated to a remote data center. Why does this matter? This gives you a built-in disaster recovery solution! While it may not be the only DR strategy you employ, it provides a baseline RPO/RTO to build around.
The way that CenturyLink Cloud has architected its block storage means that you do not need to specifically architect for highly available storage unless you are doing multi-site replication.
For our reference solution, provisioning persistent block storage is easy. Our web servers – based on Windows Server 2012 Data Center Edition – have 42 GB of durable storage built-in, and I’ve added another 100 GB volume to store the web root directory and server logs.
There are multiple ways to layout the disks for a database server, and in this case, we’re splitting the databases and transaction logs onto separate persistent volumes.
Here, we have running servers backed by reliable, high-performing storage. What happens if you find out later that you need more storage? The CenturyLink Cloud platform makes it easily to instantly add more capacity to existing volumes, or add entirely new volumes that are immediately accessible within the virtual machine.
Provisioning Object Storage
Object Storage is a relatively recent addition to the CenturyLink Cloud cloud and gives customers the chance to store diverse digital assets in a highly available, secure shared repository. Some details:
- Object Storage has multiple levels of redundancy built in. Within a given data center, your data is replicated across multiple machines, and all data is instantly replicated to a sister cluster located within the same country. Why does this matter? Customers can trust that data added to Object Storage will be readily available even when faced with unlikely node or data center failures.
- Store objects up to 5 GB in size. Why does this matter? Object Storage is a great fit for large files that need to be shared. For example, use Object Storage for media files used by a public website. Or upload massive marketing proofs to share with a graphics design partner.
- The Object Storage API is Amazon S3-compliant. Why does this matter? CenturyLink Cloud customers can use any of the popular tools built for interacting with Amazon S3 storage.
Customers of CenturyLink Cloud Object Storage do not need to explicitly architect for high availability since the service itself takes care of it.
In our reference solution, Object Storage is the place where website content like product images are stored. We added a new Object Storage “bucket” for all the website content.
Once the bucket was created and permissions applied, we used the popular S3 Browser tool to add CSS files and images to bucket.
A highly available system in the cloud is often a combination of vendor-provided and customer-architected components. In this first post of the blog series, we saw how the CenturyLink Cloud platform natively provides the core high availability needed by cloud systems. Block storage is inherently fault tolerant and the customer doesn’t have to explicitly ask for persistent storage. Object storage provides easy shared access to binary objects and is geo-redundant by default.
Storage provides the foundation for a software system, and at this point we have the necessary pieces to configure the next major component: the database!
Elasticity is a core tenet of cloud computing. Cloud has become so popular simply because resources can be adjusted up or down, based on business need, instantly. Manually resizing cloud environments is still MUCH easier than altering physical hardware. But human action is still required, adding human cost to cloud.
A few cloud vendors have attempted to automate this process through “auto scaling” – services that expand and reduce the size environments based on user-defined parameters. However, this capability by and large automates the addition and removal of virtual machines to an existing resource pool. In engineering terms, this is “horizontal scaling” – adding capacity across multiple virtual machines. This approach is useful for consumer applications (think Netflix scaling up for Saturday night), but the enterprise scenario is much different, as we found out in our market research when developing this feature.
While we always recommend that our customers build highly available cloud systems with no single points of failure, there is value is sizing those resources up and down (i.e. “vertical scaling”) instead of only being able to add or remove entire servers. Having multiple servers is key for fault tolerance, but some workloads can benefit from additional server capacity, not just more servers!
This month, CenturyLink Cloud introduced our new Autoscale service. The initial release is focused on vertical scaling of CPU resources, with more vertical scaling (and, yes, horizontal scaling!) on the roadmap. Today, you can now add and subtract CPUs from cloud servers based on user-defined utilization limits. Capacity is added instantly without a reboot and capacity is removed only during user-defined windows of time, to prevent a reboot from occurring during prime usage hours.