{"id":1082,"date":"2015-12-03T14:41:29","date_gmt":"2015-12-03T14:41:29","guid":{"rendered":"https:\/\/gosqeng.test\/?p=1082"},"modified":"2019-11-28T11:40:38","modified_gmt":"2019-11-28T11:40:38","slug":"an-ops-team-walks-into-a-bar","status":"publish","type":"post","link":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar","title":{"rendered":"An Ops team walks into a bar&#8230;"},"content":{"rendered":"<p>Last night was the <a href=\"http:\/\/www.passioncapital.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Passion Capital<\/a> Christmas Party, so (naturally) the GoSquared team was in attendance.<\/p>\n<p>This meant that our whole ops team, who make sure that everything stays running, were out at a party. And <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sod%27s_law\" target=\"_blank\" rel=\"noopener noreferrer\">Sod&#8217;s law<\/a> dictates that this is the exact time at which things are likely to go horribly wrong.<\/p>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p lang=\"en\" dir=\"ltr\"><a href=\"https:\/\/twitter.com\/hashtag\/wbypassionxmas?src=hash\">#wbypassionxmas<\/a> party (before it gets ugly and messy <a href=\"https:\/\/twitter.com\/hashtag\/fullhouse?src=hash\">#fullhouse<\/a> <a href=\"https:\/\/t.co\/YMpFG0bbDb\">pic.twitter.com\/YMpFG0bbDb<\/a><\/p>\n<p>&mdash; Eileen Burbidge (@eileentso) <a href=\"https:\/\/twitter.com\/eileentso\/status\/672156608361848832\">December 2, 2015<\/a><\/p><\/blockquote>\n<p><script async src=\"\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>(featuring the GoSquared team somewhere among the crowd)<\/p>\n<p>So how do we keep on top of things here at GoSquared? How do we monitor that everything is working as it should, and how do we respond when things go wrong? And how do we manage that with the whole team at a party?<\/p>\n<p>Here&#8217;s a few of the tools we use:<\/p>\n<h3>Monitoring and Alerting<\/h3>\n<p>Our two main sources of metrics and alarms are <a href=\"https:\/\/www.serverdensity.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Server Density<\/a> and <a href=\"https:\/\/aws.amazon.com\/cloudwatch\/\" target=\"_blank\" rel=\"noopener noreferrer\">Amazon CloudWatch<\/a>. Together they keep an eye on all of the metrics from our EC2 instances (CPU, memory, disk space, networking), as well as our load balancers (HTTP request count, response codes, latency, healthy\/unhealthy instances) and databases (connection count, query rates etc.).<\/p>\n<p>Along with all these metrics we have hundreds of alarms, which will trigger and alert us as soon as something doesn&#8217;t look right. If request latency is too high, or a database is running out of disk space, or we&#8217;re sending too many 5xx HTTP status codes, an alarm will trigger and we&#8217;ll know about it.<\/p>\n<p>Alerts are sent to the team via <a href=\"https:\/\/www.pagerduty.com\" target=\"_blank\" rel=\"noopener noreferrer\">PagerDuty<\/a>. One team member is &#8220;on call&#8221; at any given time, and PagerDuty takes care of alerting that person via push notification, SMS or phone call depending on severity, and of escalating it to another team member if the on-call is unable to respond. We also have a dedicated <a href=\"https:\/\/slack.com\" target=\"_blank\" rel=\"noopener noreferrer\">Slack<\/a> channel set up with the PagerDuty integration and <a href=\"https:\/\/hubot.github.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Hubot<\/a>, so anyone on the team can know when something&#8217;s up.<\/p>\n<h3>Responding<\/h3>\n<p>It&#8217;s all well and good having lots of metrics and monitoring, but if an alarm goes off and you&#8217;re not actually able to deal with the issue, it&#8217;s pretty useless. There&#8217;s plenty we can do purely from our phones to address any issues.<\/p>\n<p>The <a href=\"https:\/\/aws.amazon.com\/console\/mobile\/\" target=\"_blank\" rel=\"noopener noreferrer\">AWS Mobile Console App<\/a> enables us to very quickly perform simple tasks on our AWS resources. Whether that&#8217;s checking that a particular CloudWatch metric is returning to normal levels, or modifying the throughput on one of our DynamoDB tables, or rebooting a crashed database instance, or manually upscaling one of our Auto Scaling groups, all of these tasks can be accomplished in seconds from the mobile app.<\/p>\n<p>For more complex issues we can use SSH to access any of our EC2 instances to check memory usage or whether a process is stuck, and restart services or perform other maintenance as necessary. We have a VPN set up to allow us access to instances inside our <a href=\"https:\/\/aws.amazon.com\/vpc\/\" target=\"_blank\" rel=\"noopener noreferrer\">Virtual Private Cloud<\/a>, internal DNS which allows us to address instances without having to memorise or look up IP addresses, and Panic&#8217;s excellent <a href=\"https:\/\/panic.com\/prompt\/\" target=\"_blank\" rel=\"noopener noreferrer\">Prompt<\/a> app means we can do this all from our phones without having to waste valuable time grabbing a laptop and finding a WiFi connection.<\/p>\n<h3>Putting it all together<\/h3>\n<p>It takes a lot of different components and services to make all this possible, but the combination of effective monitoring, effective alerting, and the ability to respond and resolve issues quickly and effortlessly, means that when something <em>does<\/em> go wrong (because it always will), we&#8217;re <a href=\"https:\/\/twitter.com\/floopily\/status\/672154763501375489\" target=\"_blank\" rel=\"noopener noreferrer\">able to deal with it<\/a>, and get back to the party.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last night was the Passion Capital Christmas Party, so (naturally) the GoSquared team was in attendance. This meant that our&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1452],"tags":[],"class_list":["post-1082","post","type-post","status-publish","format-standard","hentry","category-engineering"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v18.6 (Yoast SEO v19.0) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>An Ops team walks into a bar...<\/title>\n<meta name=\"description\" content=\"How the GoSquared team makes sure everything stays running, even when everyone&#039;s at a party.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"An Ops team walks into a bar...\" \/>\n<meta property=\"og:description\" content=\"How the GoSquared team makes sure everything stays running, even when everyone&#039;s at a party.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar\" \/>\n<meta property=\"og:site_name\" content=\"GoSquared Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/GoSquared\" \/>\n<meta property=\"article:published_time\" content=\"2015-12-03T14:41:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-11-28T11:40:38+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@floopily\" \/>\n<meta name=\"twitter:site\" content=\"@GoSquared\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JT\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/#organization\",\"name\":\"GoSquared\",\"url\":\"https:\/\/www.gosquared.com\/blog\/\",\"sameAs\":[\"https:\/\/instagram.com\/gosquaredteam\",\"https:\/\/www.linkedin.com\/company\/go-squared-ltd.\",\"https:\/\/www.facebook.com\/GoSquared\",\"https:\/\/twitter.com\/GoSquared\"],\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.gosquared.com\/blog\/wp-content\/uploads\/2015\/07\/gosquared.png\",\"contentUrl\":\"https:\/\/www.gosquared.com\/blog\/wp-content\/uploads\/2015\/07\/gosquared.png\",\"width\":1270,\"height\":250,\"caption\":\"GoSquared\"},\"image\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/#website\",\"url\":\"https:\/\/www.gosquared.com\/blog\/\",\"name\":\"GoSquared Blog\",\"description\":\"Turn visitors into customers.\",\"publisher\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.gosquared.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#webpage\",\"url\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar\",\"name\":\"An Ops team walks into a bar...\",\"isPartOf\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/#website\"},\"datePublished\":\"2015-12-03T14:41:29+00:00\",\"dateModified\":\"2019-11-28T11:40:38+00:00\",\"description\":\"How the GoSquared team makes sure everything stays running, even when everyone's at a party.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.gosquared.com\/blog\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"An Ops team walks into a bar&#8230;\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#webpage\"},\"author\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/#\/schema\/person\/bfcd35bf2eba92ecbeea67937cd23eef\"},\"headline\":\"An Ops team walks into a bar&#8230;\",\"datePublished\":\"2015-12-03T14:41:29+00:00\",\"dateModified\":\"2019-11-28T11:40:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#webpage\"},\"wordCount\":599,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.gosquared.com\/blog\/#organization\"},\"articleSection\":[\"Engineering\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/#\/schema\/person\/bfcd35bf2eba92ecbeea67937cd23eef\",\"name\":\"JT\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.gosquared.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/772e026206b900a5ba17ebbe63e34a4c8a9103524cf0ba3accfa38b14d7d03ba?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/772e026206b900a5ba17ebbe63e34a4c8a9103524cf0ba3accfa38b14d7d03ba?s=96&d=mm&r=g\",\"caption\":\"JT\"},\"description\":\"JT is a co-founder and the lead front-end engineer at GoSquared. He's responsible for the shiniest of the shiny projects we work on.\",\"sameAs\":[\"https:\/\/twitter.com\/floopily\"],\"url\":\"https:\/\/www.gosquared.com\/blog\/author\/jt\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"An Ops team walks into a bar...","description":"How the GoSquared team makes sure everything stays running, even when everyone's at a party.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar","og_locale":"en_US","og_type":"article","og_title":"An Ops team walks into a bar...","og_description":"How the GoSquared team makes sure everything stays running, even when everyone's at a party.","og_url":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar","og_site_name":"GoSquared Blog","article_publisher":"https:\/\/www.facebook.com\/GoSquared","article_published_time":"2015-12-03T14:41:29+00:00","article_modified_time":"2019-11-28T11:40:38+00:00","twitter_card":"summary_large_image","twitter_creator":"@floopily","twitter_site":"@GoSquared","twitter_misc":{"Written by":"JT","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/www.gosquared.com\/blog\/#organization","name":"GoSquared","url":"https:\/\/www.gosquared.com\/blog\/","sameAs":["https:\/\/instagram.com\/gosquaredteam","https:\/\/www.linkedin.com\/company\/go-squared-ltd.","https:\/\/www.facebook.com\/GoSquared","https:\/\/twitter.com\/GoSquared"],"logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.gosquared.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.gosquared.com\/blog\/wp-content\/uploads\/2015\/07\/gosquared.png","contentUrl":"https:\/\/www.gosquared.com\/blog\/wp-content\/uploads\/2015\/07\/gosquared.png","width":1270,"height":250,"caption":"GoSquared"},"image":{"@id":"https:\/\/www.gosquared.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"WebSite","@id":"https:\/\/www.gosquared.com\/blog\/#website","url":"https:\/\/www.gosquared.com\/blog\/","name":"GoSquared Blog","description":"Turn visitors into customers.","publisher":{"@id":"https:\/\/www.gosquared.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.gosquared.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#webpage","url":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar","name":"An Ops team walks into a bar...","isPartOf":{"@id":"https:\/\/www.gosquared.com\/blog\/#website"},"datePublished":"2015-12-03T14:41:29+00:00","dateModified":"2019-11-28T11:40:38+00:00","description":"How the GoSquared team makes sure everything stays running, even when everyone's at a party.","breadcrumb":{"@id":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.gosquared.com\/blog"},{"@type":"ListItem","position":2,"name":"An Ops team walks into a bar&#8230;"}]},{"@type":"Article","@id":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#article","isPartOf":{"@id":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#webpage"},"author":{"@id":"https:\/\/www.gosquared.com\/blog\/#\/schema\/person\/bfcd35bf2eba92ecbeea67937cd23eef"},"headline":"An Ops team walks into a bar&#8230;","datePublished":"2015-12-03T14:41:29+00:00","dateModified":"2019-11-28T11:40:38+00:00","mainEntityOfPage":{"@id":"https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#webpage"},"wordCount":599,"commentCount":0,"publisher":{"@id":"https:\/\/www.gosquared.com\/blog\/#organization"},"articleSection":["Engineering"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.gosquared.com\/blog\/an-ops-team-walks-into-a-bar#respond"]}]},{"@type":"Person","@id":"https:\/\/www.gosquared.com\/blog\/#\/schema\/person\/bfcd35bf2eba92ecbeea67937cd23eef","name":"JT","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.gosquared.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/772e026206b900a5ba17ebbe63e34a4c8a9103524cf0ba3accfa38b14d7d03ba?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/772e026206b900a5ba17ebbe63e34a4c8a9103524cf0ba3accfa38b14d7d03ba?s=96&d=mm&r=g","caption":"JT"},"description":"JT is a co-founder and the lead front-end engineer at GoSquared. He's responsible for the shiniest of the shiny projects we work on.","sameAs":["https:\/\/twitter.com\/floopily"],"url":"https:\/\/www.gosquared.com\/blog\/author\/jt"}]}},"wps_subtitle":"How we make sure everything stays running, even when everyone\u2019s at a party.","_links":{"self":[{"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/posts\/1082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/comments?post=1082"}],"version-history":[{"count":0,"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/posts\/1082\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/media?parent=1082"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/categories?post=1082"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gosquared.com\/blog\/wp-json\/wp\/v2\/tags?post=1082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}