Saturday, 3 June 2017

Ich bin ein Data Nerd—Get Ready for FutureStack: Berlin!

New Relic
Ich bin ein Data Nerd—Get Ready for FutureStack: Berlin!

Hot on the heels of our wildly successful FutureStack: London event, New Relic is headed to Germany! FutureStack: Berlin, the Central European edition of our global user conference, will be held on 22 June, 2017, at andel’s by Vienna House Berlin.

futurestack berlinThe theme of this year’s conference is “Digital Intelligence at Cloud Scale.” FutureStack: Berlin is your opportunity to attend fascinating talks from industry thought leaders, gain new perspectives on digital intelligence for the modern technology stack, hear real-world success stories from top brands, and learn practical knowledge for building and running a modern digital business.

Berlin has seen a tremendous amount of investment in startups and innovation—€2.4 billion in venture capital in 2015, according to Ernst & Young. It is home to some of the most innovative companies in Europe, such as Rocket Internet, Zalando, SoundCloud, and Delivery Hero. Many larger enterprises, like German rail company Deutsche Bahn and Siemens AG, also have a significant presence in the city. With all of this energy and momentum, we believe that Berlin is the perfect setting for the German edition of our FutureStack event series.

Great lineup of speakers

We are excited to announce the following special customer and partner speakers at FutureStack: Berlin, representing some of the most innovative enterprises in Central Europe:

Oliver Feige, Director-IT, CHECK24 Vergleichsportal Shopping GmbH Rob Ginda, Head of Online Shops Web Development, 21sportsgroup Hans Gruber, CIO, Verivox Kai Hannemann, Vorstand CIO, Lotto24 Jan Kunzmann, System Owner, erasys GmbH Gerald Madlmayr, Performance Manager, 1 Media Faiz Parkar, Product Marketing, Pivotal Software Albert Stepanyan, Chief Technical Officer, Allianz X GmbH

futurestack berlin customer speakers

Lew Cirne, CEO and Founder of New Relic, will deliver a keynote presentation on the company’s vision for digital intelligence and showcase New Relic’s latest innovations. Attendees will learn how to gain deep visibility and analytic insight into their customer experience, application performance, and dynamic infrastructure—and explore how to use that knowledge to drive business results.

In addition to Lew, many New Relic engineering and product management leaders will present on a variety of tech topics:

Alberto Gomez, Principal Product Manager Cloud Infrastructure, and Ramon Guiu, Principal Product Manager, will give a presentation on full-stack visibility with New Relic Infrastructure. Lee Atchison, Senior Director of Strategic Architecture, will detail best practices in architecting for scale. Todd Etchieson, VP of Product Management, Client-Side Analytics, will cover Events, Metrics, Traces, and Why You Need All Three. Matthew Flaming, VP of Engineering, and Beth Long, Software Engineer, will discuss how New Relic’s reliability practices have evolved in response to the pressures of rapid growth at massive scale. Nadya Duke Boone, Senior Product Manager, will explain how New Relic enables organisations to get complete visibility across their dynamic infrastructure.

futurestack berlin relics

Hands-on training

Training has become one of the most popular parts of our FutureStack conferences, and FutureStack: Berlin will feature dedicated, laptop-open instruction sessions to boost your New Relic product skills, led by experienced technical trainers.

The training sessions at our Berlin event will include “End-to-End With New Relic: Full Stack Monitoring of Your Digital Experience” and “Insight into Insights: A Masterclass in Analytics.”

Closing party and reception

After a full day of engaging talks, illuminating insights, and empowering training, be sure to join your fellow delegates for an evening of great food and drink, interactive games, and networking opportunities. We’ll have local specialities like currywurst and, of course, plenty of beer!

Donation for admission

In lieu of a registration fee for FutureStack: Berlin, we are asking attendees to make an optional donation in support of the social work organization Gangway, which helps adult and youths living on the streets of Berlin.

Register today!

If you are a leader of a digital business, I am confident that you will get tremendous value from attending FutureStack: Berlin. Our FutureStack: London event sold out and and we are seeing similar demand in Berlin. We hope you will visit www.futurestack.com and register here before it’s too late!

Follow @futurestack and the #futurestack hashtag on Twitter to stay updated.

Want to see what you’d be missing if you didn’t attend? Check out this recap of FutureStack: London:

 

Note: Event dates, speakers, and schedules are subject to change without notice.

Friday, 2 June 2017

EQUITY ALERT: Rosen Law Firm Announces Filing of Securities Class Action Lawsuit Against Rackspace Hosting, Inc ... - GlobeNewswire (press release)

rackspace hosting - Google News
EQUITY ALERT: Rosen Law Firm Announces Filing of Securities Class Action Lawsuit Against Rackspace Hosting, Inc ... - GlobeNewswire (press release)

EQUITY ALERT: Rosen Law Firm Announces Filing of Securities Class Action Lawsuit Against Rackspace Hosting, Inc ...
GlobeNewswire (press release)
NEW YORK, June 02, 2017 (GLOBE NEWSWIRE) -- Rosen Law Firm, a global investor rights law firm, announces the filing of a class action lawsuit on behalf of purchasers of Rackspace Hosting, Inc. securities (NYSE:RAX) from November 11, 2014 through ...

and more »

Rackspace c-suite shakeup continues with new executives - San Antonio Business Journal

rackspace hosting - Google News
Rackspace c-suite shakeup continues with new executives - San Antonio Business Journal

San Antonio Business Journal

Rackspace c-suite shakeup continues with new executives
San Antonio Business Journal
The executive suite inside Rackspace Hosting Inc. got another shakeup this week as the company's leadership team is reorganized. Rackspace's chief operating officer and chief financial officer resigned and have been replaced, the latter with an interim ...
Rackspace continues executive shakeup with COO departuremySanAntonio.com
Rackspace Names David Meredith President, Private Cloud and Managed HostingMarketwired (press release)
Rackspace Confirms Departure of COO RoenigkTheStreet.com

all 4 news articles »

Thursday, 1 June 2017

How New Relic Limits API Overloading—And You Can, Too!

New Relic
How New Relic Limits API Overloading—And You Can, Too!

APIs are an increasingly popular way for businesses to interact with customers and partners without having to build specific features for every interaction. However, as more and more customers take advantage of these APIs, companies are being forced to grapple with how to appropriately limit API usage to prevent intentional or inadvertent overloading.

This post addresses some of the ways New Relic deals with the issue internally, and offers a deep dive into how we dealt with a particularly puzzling API usage issue. We hope it provides insight into how you can deal with API overuse in your own shop. (Yes, it’s long, but there are pictures! You should at least look at the pictures!)

New Relic’s API approach

We originally introduced the New Relic REST API to help our customers access data directly from our products. Since then we’ve made a lot of progress in protecting ourselves from overloading of all our APIs.

Specifically, we have developed a number of techniques to limit resource usage and to mitigate the effects of API overloading (note that the techniques discussed here do not apply to the ingest pipeline through which we receive data from our customers).

For example, we separated the pool of hardware for the New Relic APM API from that of its UI. APM uses the Unicorn web server, which spawns a number of worker processes, each of which can handle exactly one request at a time. This simple ‘one request per process at a time’ concurrency model isn’t new or fancy, but it is tried and true, and offers some important benefits, including strong fault isolation (failures or crashes in one worker process are guaranteed to be scoped to only that process).

Unicorn is completely single-threaded; a single process pulls a request from a socket, generates a response, writes it back to the socket, and then waits for the next request to arrive. So, to be able to process N requests concurrently, we need to run at least N Unicorn worker processes.

To serve static files without tying up a Unicorn worker, we use nginx as a reverse proxy in front of Unicorn. So the normal flow for requests looks like this:

request flow diagram

With this approach, the specific thing that prevents API users from consuming all available worker time is that API requests are served by a dedicated pool of hardware resources, meaning that we always have reserved capacity for UI requests.

Of course, no system is foolproof, so let’s refocus on the separation of API/UI worker pools. Despite the separation of API and UI traffic, a heavy API user can still break the UI by impacting shared downstream dependencies. Here’s a slightly expanded version of what a single API request might look like:

request flow diagram

Both the API hosts and the UI hosts rely on our Metric Timeslice Service, which provides access to timeslice data for consumption across all New Relic products (MTS also has its own techniques to prevent overloads). This means that if a single API user can send MTS queries that are expensive enough via API, it can impact MTS, and thus all of MTS’ consumers, including APM’s UI.

That’s why we created API Overload Protection.

This is an internal, resource-limiting tool we built to keep track of Unicorn worker time used per account. If that time becomes excessive, the tool will automatically restrict that account for a period of time.

We designed API Overload Protection to operate without the cooperation of the application to which it controls access, in order to guard against malfunctions of the application. This means it ultimately gets timing measurements from nginx.

The rough sequence goes something like this:

Request from client arrives at nginx. Some Lua code running in the nginx process (called the “bouncer”) consults a Redis database to see whether the API key associated with the request has exceeded its worker time budget. If so, the request is terminated; if not, it continues. Nginx passes the request along to Unicorn, which generates a response and sends it back. Nginx notes the amount of time that Unicorn (referred to as the “upstream” server in nginx parlance) took to generate the response and records it by sending a specially formatted UDP packet to the API Overload Protection Tool. Nginx sends the response back to the client. At some point later, the tool updates its internal counter that tracks the total amount of worker time consumed by the API key in question.

Here’s how that all looks in diagram form:

request flow diagram

Investigating a mystery from our case files

This approach has worked well for New Relic, enabling us to satisfy our customers’ needs and improve our overall API use. But edge cases still inevitably cause issues. While every case is different, we’d like to share an internal investigation of one such situation in hopes it can help other companies track down their own API overloading issues.

While investigating a recent case where it seemed like the API Overload Protection should have cut off API access for a customer but did not, we noticed a strange discrepancy: New Relic Insights transaction data from the New Relic APM API indicated that the customer had used far more than its allotment before being blocked. The difference was stark during a particular one-hour period: the tool believed the account had used only a reasonable amount of Unicorn worker time, but the Insights transaction data indicated that the total had been much, much longer! If the tool had been getting accurate information about the account’s usage, it would have been able to reduce the duration of the impact by more than 700%!

Because we had seen similar discrepancies in past incidents, we decided to investigate.

Faced with a discrepancy between Insights transaction data and the tool’s internal accounting, we looked for a tie-breaker: the nginx access logs. In our nginx access logs, we record several relevant attributes about each request:

The client-facing response time (the elapsed time between when nginx first saw the request and when it wrote a response to the client) The client-facing response code (e.g., 200, 404) The upstream response time (how much time nginx spent waiting for Unicorn to return a response) The upstream response code (the response code that Unicorn handed back to nginx)

Using some clunky heuristics, such as examining request timestamps and URLs, we were able to match request records from the nginx access logs of one particular host to the corresponding transaction events that we exported from Insights.

Then we generated a scatter plot showing the relationship between the client-facing response time and the response time reported by our own Ruby agent from within the Unicorn process.

scatter plot

[click to enlarge]

Notice anything weird?

The client-facing (nginx) and upstream (Unicorn) response times should be tightly correlated. Most of the “work” that nginx does for any given request is waiting for Unicorn to return a response. This tight correlation holds for requests in the 0 to 5 seconds range, but the graph shows a distinct “knee” at around 4.67 seconds. This was the first clue that something was amiss with the response time data in nginx (and thus in the API Overload Protection tool).

When we examined the requests with client-facing response times of 4 to 5 seconds in the nginx access log, we found a clear pattern. Many of them looked like this:

05/Feb/2017:06:49:02 -0800||404|4.668|-|-|127.0.0.1:8100|287|0|-|GET|https|/v2/applications/33591900/metrics.json|-|api.newrelic.com|NewRelic Exporter

The third and fourth items in the spreadsheet row shown above are the client-facing response code and response time. The fifth and sixth items are the upstream response code and response time. There are several strange things about this:

This request shouldn’t have generated a 404 response. The URL is correct and leads to a real application. The corresponding transaction event in Insights had a reported duration of almost 30 seconds, much greater than the 4.668 seconds reported as the client-facing response time by nginx. Nginx lists the upstream response code and upstream response time as “-”, suggesting that it hadn’t yet received a response from Unicorn at the time that it wrote out this log line. Building a repro case

This is where the assembled clues should, in retrospect, have been enough to tell us what was happening, but we were still scratching our heads.

We spent several hours trying to put together a repro case and chasing down dead ends related to esoteric Linux kernel network settings. The most straightforward explanation would have been that nginx was giving up on waiting for Unicorn before Unicorn could write back a response over the local socket, but this wasn’t supported by the evidence. Nginx was configured to wait much longer than 5 seconds for Unicorn to generate a response, and empirical testing confirmed that New Relic APM API requests of greater than 5 seconds in general did not show this same symptom.

In desperation, we used Linux’s tail and awk commands to watch the nginx access log for requests matching this pattern. We were hoping to catch a live one in the act, so that we could capture a packet trace of the loopback interface to better understand what was happening on the communication channel between nginx and Unicorn.

Luckily, this yielded an unexpected clue. The occurrence of this symptom was rare, but seemed to mostly affect requests where the user-agent was listed as some kind of mobile device.

What’s unique about mobile devices?

Mobile devices often have much spottier network connectivity than do desktops. This means that they’re more likely to hit network timeouts due to the link between the client and nginx being slow. Could that be related to our issue?

Looking back at the access log lines from the original incident, we saw that 100% of them listed a user-agent of “New Relic Exporter.” Searching for that string on GitHub, we found the source for a publicly available tool for exporting data from New Relic into another storage system. Examining the source, we saw that the outbound requests it was making to the New Relic API had a client-side timeout of 5 seconds set.

Bingo! That lined up nicely with the knee in the response time graph noted above.

What happens when the client sets its own timeout?

Armed with the theory that the issue might be caused by clients setting a request timeout, we were able to easily try a test by just curl-ing a long-running request endpoint and then Ctrl-C-ing curl before it returned, while watching the nginx access logs at the same time.

Here’s a diagram that shows the sequence of events in this scenario:

request flow diagram

In short: nginx gives up when the client closes the connection, thereby triggering the Lua code that writes the response duration to the API Overload Protection tool early, while Unicorn is still busy processing the request. Because of another bug in the API Overload Protection tool’s Lua code, the response time that gets reported in this scenario is the default period rather than the amount of time that has elapsed so far for the request.

This problem disproportionately affects long-running requests, because those requests are more likely to hit a client-side timeout.

Wait … so why 404, exactly?

The 404 error code is somewhat of a strange choice of response code to be reported in this scenario. It turns out that nginx actually has its own special, non-standard response code for this scenario: 499 (client closed connection). So, why did we see 404s in the access log rather than 499s?

This looks to be a quirk of how nginx handles the post-action config file directive (which we use for running Lua code after the request completes) along with a client disconnect.

How can we fix it?

There are several possible solutions to this problem, but we chose to set the proxy_ignore_client_abort setting in the nginx config file used for APM’s API servers.

This setting instructs nginx to defer post-processing work associated with a request until the upstream server (Unicorn in our case) has actually sent a response, rather than performing it as soon as the client gives up and closes the connection.

This will ensure that the API Overload Protection tool gets a more accurate accounting of the amount of worker time consumed by each account, even when clients set an aggressive timeout value.

Hopefully, this discussion of New Relic’s approach to limiting API overloading can help you establish or update your own processes to deal with this issue. And even though this particular edge-case investigation may not be directly transferable to your own situation, we hope the processes we used to identify and correct the situation will help illuminate a path to resolving your own issues.

One thing is for sure: the rise of the API is going to continue, so finding ways to limit overloading is likely to become an even more important issue over time.

 

Wednesday, 31 May 2017

3 Enterprise Cloud Takeaways from Mary Meeker’s 2017 Internet Trends Report

New Relic
3 Enterprise Cloud Takeaways from Mary Meeker’s 2017 Internet Trends Report

The 2017 version of venture capitalist Mary Meeker’s highly influential “Internet Trends” report is out, and as always it’s required reading for just about anyone who works in the technology industry.

New Relic has covered these reports in the past, and Meeker’s latest insights are once again too important to ignore. But while you may want to check out what the Kleiner Perkins Caufield & Byers partner has to say about growing internet activity in China and India, slowing smartphone growth, big changes in media and advertising, and how healthcare is reaching a “digital inflection point,” we’re going to focus on a topic near and dear to our hearts here at New Relic: her thoughts about how cloud computing is transforming the enterprise.

Don’t miss:

This Week in Modern Software: Mary Meeker’s State of the Internet 2016 This Week in Modern Software: Meeker’s State of the Internet 2015 The cloud is accelerating change across enterprises

Meeker makes three key points about the cloud:

Cloud adoption is reaching new heights and creating new opportunities. Customer expectations for enterprise software now mirror those for consumer apps. As more apps move to the cloud, that creates more security vulnerabilities.

Let’s look at each one a bit more closely.

1. Cloud adoption is reaching new heights and creating new opportunities

Meeker uses a 2017 RightScale survey to note that while Amazon Web Services (AWS) holds a wide lead as the most popular cloud platform used by enterprise companies, competitors like Microsoft Azure and Google Cloud Platform are also rising in popularity—with many companies experimenting with or planning to use these services in the future.

Given that growth, it’s no surprise that Meeker’s report cites IDC numbers that indicate spending on cloud infrastructure grew 37% to $36 billion in 2014, and is approaching the levels of spending on traditional data centers.

The rise of the cloud is also paving the way for infrastructure innovation, Meeker says, citing new methods of software delivery like APIs and browser extensions, and the ability of containers and microservices to simplify the software development process and reduce the complexity of managing apps.

2. Customer expectations for enterprise software now mirror those for consumer apps

Historically, enterprise apps have not been known for their great user experiences. But now enterprise apps must compete for usage and mindshare against highly sophisticated consumer applications delivered via the cloud.

Enterprise users are consumers too, and their experience with fast, reliable, frequently updated consumer apps is helping to spark enterprise software’s move from on-premise software to SaaS and ultimately mobile-first smart apps, Meeker says. Notably, enterprise app success is increasingly measured by the same metrics used for consumer apps, such as Daily Active Users (DAUs), Monthly Active Users (MAUs), and Net Promoter Scores (NPS). Fortunately, she adds, cloud-enabled enterprise apps are both cheaper to build and easier to adopt.

Not surprisingly, the trend toward “consumer quality” enterprise apps is boosting the importance of design, and many companies are responding by boosting their designer-to-developer ratios.

3. As more apps move to the cloud, that creates more security vulnerabilities.

Meeker uses a 2015 Bain study to note that concerns of cloud users are slowly shifting away from data security and “cost uncertainty” and instead to issues like vendor lock-in and compliance. But she also acknowledges that more cloud apps can lead to more vulnerabilities, and that cyber-threats are rising in severity.

New Relic is all in on the cloud

As a cloud-native SaaS vendor, New Relic totally agrees that the cloud changes everything. From our flexible cloud pricing option to helping customers like REI and MLBAM embrace the cloud, we’re all about delivering Digital Intelligence at Cloud Scale (in fact, that’s the theme for our FutureStack user conferences around the world).

We are happy to say that the specific trends Meeker cites align perfectly with our approach. The growth of the cloud in the enterprise is central to the products we create, and we’ve long discussed the importance of using the cloud and other modern software techniques to empower better design and user experiences in enterprise software—something not every company can honestly claim. And, of course, we take cloud security seriously, too!

From our standpoint, Meeker’s recognition of the cloud’s growth and importance is but more confirmation that the future of enterprise software is in the cloud.

 

Rackspace sued for failing to disclose Vodafone contract loss - mySanAntonio.com

rackspace hosting - Google News
Rackspace sued for failing to disclose Vodafone contract loss - mySanAntonio.com

mySanAntonio.com

Rackspace sued for failing to disclose Vodafone contract loss
mySanAntonio.com
Rackspace Hosting has been hit with a securities class action lawsuit from a pension fund that claims the company “intentionally concealed the impending loss” of a contract and the impact it would have on the company's growth. The City of Warwick ...

Tuesday, 30 May 2017

How Historic Swiss Retailer Migros Ensures E-Commerce Excellence Every Day

New Relic
How Historic Swiss Retailer Migros Ensures E-Commerce Excellence Every Day

When Gottlieb Duttweiler founded Migros in 1925 in Zürich, Switzerland, he sold just six essential items—coffee, rice, sugar, pasta, coconut fat, and soap—from five trucks that rolled from one local village to the next. But he made such a good job of it, the fledgling company was soon expanding rapidly.

migros logoToday, Migros has become Switzerland’s largest retailer, selling everything from fresh produce to home appliances, in both brick-and-mortar stores and online. With more than 100,000 employees, it is also the country’s largest private employer as well as a cooperative business boasting more than two million members.

From its humble beginnings, Migros has constantly evolved and adapted to meet the challenges of each successive era. Today, as e-commerce continues to grow in importance, this means harnessing the power of the New Relic Digital Intelligence Platform to provide the best possible digital customer experience.

Time to streamline and simplify

According to Alain Petignat, head of online development and operations for Migros, the company has a complicated application environment, incorporating more than 300 customer-facing applications, some developed by third parties and some custom-built in house. Most of these applications run on-premise, with some components—image processing, for example—hosted in the cloud.

The result is a complex mixture of applications, microservices, technologies, and infrastructure, maintained by various teams working according to their own processes, preferences, and schedules. To manage that complexity, explains Alain, Migros needed a way to “really streamline and simplify our deployment, operations, and monitoring processes.”

Insights you can rely on

Moving to Pivotal Cloud Foundry and Puppet helped the company standardize its deployment process. But it was only when one of the company’s software partners recommended New Relic that Migros took the next step. “When [our partner] showed us how it helps them monitor performance, we knew it would be the right tool for us as well,” says Alain.

Since deploying New Relic to monitor customer-facing applications as well as the company’s full technology stack and infrastructure, multiple Migros teams (including those focused on operational performance, web analytics, and user experience) have come to rely heavily on the data and insights New Relic provides.

“With New Relic, we can see the impact of deployments on our performance and infrastructure,” says Alain. The way content renders on different pages, for example, can be monitored via New Relic Browser, helping Migros optimize the experience for online shoppers.

And when Migros decided to consolidate its physical infrastructure, reducing its nine data centers just four, New Relic proved equally valuable: “New Relic helped ease our concerns because we used it to catch any issues created by the consolidation before they impacted our customers,” Alain explains.

Though running Migros today is a little more complicated than it was back when Herr Duttweiler dispatched his first truck full of coffee and rice, the company’s core mission of promoting the health of its customers by selling “responsible” products at low prices remains unchanged. And with New Relic helping out behind the scenes, look for that success to continue for decades to come.

Learn more

To learn more about how the New Relic Digital Intelligence Platform helps the “orange giant” stay Switzerland’s favorite retail shop, read the full Migros customer case study: Migros Ensures the Health of Its Hybrid Environment to Deliver Value for Its 2 Million Member Retail Cooperative

german flagLesen Sie die Fallstudie hier auf deutsch: Migros: Wertschaffung für Konsumgenossenschaft mit 2 Millionen Mitgliedern durch gesunde Hybridumgebung

drapeau francaisLisez l’étude de cas en français: Migros garantit l’intégrité de son environnement hybride pour aider les deux millions de membres de sa coopérative de distribution

italian flagLegga il caso di studio in italiano: La cooperativa Migros garantisce le prestazioni del proprio ambiente ibrido per fornire valore aggiunto a 2 milioni di soci

 

View all New Relic customer case studies >