I wrote this post on server monitoring tools for Raygun a couple of months ago. You can check out the original here.
You work on your software’s performance. But let’s face it: production is where the rubber meets the road. If your application is slow or it fails, then nothing else matters.
Are you monitoring your applications in production? Do you see errors and performance problems as they happen? Or do you only see them after users complain? Worse yet, do you never hear about them?
What tools do you have in place for tracking performance issues? Can you follow them back to their source?
Today, we’re talking about server monitoring tools. Without software that manages your critical applications in real time, you’re flying blind—and maybe losing customers. After all, when a user encounters a non-responsive application, they leave.
When we talk about application performance monitoring (APM), we’re talking about a critical core competency. Mean time to recovery (MTTR) is a crucial metric for web service providers. And so keeping your MTTR under control is impossible without the proper tools.
There are plenty of server-monitoring tool packages out there, and their APM offerings vary widely. Different packages have tradeoffs, and selecting the right product can be difficult.
So which one suits your infrastructure best? To help you decide, we’ve assembled a list of your best five options. We’ll compare their features and their pros and cons. Hopefully, you’ll leave knowing which is right for your company.
1. Raygun’s Integrated Platform
Raygun offers an integrated solution. With it, you can see a complete picture of your system’s health in one place. When you integrate Raygun’s platform into your application, you can monitor your software stack from inside your server. And that means right up to each of your individual users.
Application Performance Management Tools
Raygun’s APM gives you real-time and trend data about server performance and also about user experience. You can see your page load data in APM flame charts, too. The charts are navigable timelines that you can use to identify performance bottlenecks. At the same time, you’ve also got stack traces for page loads and other activities. These are made available as call trees, similar to those found in a traditional profiler.
That’s not all. Raygun’s APM boasts tight integration with GitHub. It links call traces directly to source files in the repository. Because it can cross-reference performance issues directly back to code, you’ll spend less time isolating and repairing problems. Presently, the APM tools only support GitHub, but interfaces for GitLab and Bitbucket will be available soon.
In addition, Raygun’s APM offers a built-in detection system. It creates actionable issues, in real time, for problems that impact end-user experience. Each issue then contains a workflow that DevOps team members can use to resolve those issues.
APM works with .NET servers. Support for other languages such as Java and Ruby are coming soon.
Raygun’s crash reporting integrates into server software. It captures complete diagnostics for errors. That includes stack traces, methods/functions, class names, OS versions, and other relevant details. And the console also groups multiple occurrences of the same issue. That grouping makes it easier for you to report and resolve common bugs.
You can access crash information via a robust reporting system. This system is customizable, letting you filter for dates, software versions, error groups, class names, methods, and more. Crash reporting also supports an exhaustive list of languages and frameworks. We’re talking Angular, React, Ember,.NET, Android, Java, PHP, and many more.
“Real user monitoring” is exactly what its name implies. Raygun lets you monitor all user sessions. Both performance and session duration data is available, and it can be broken down by user agent.
Raygun displays session information in waterfall graphs with “hotspots” highlighted, emphasizing opportunities for improvement. And the configurable dashboard displays complete information for every session. That means you’ll have crash reports, session duration stats, and information about slow page and component loading, making it easier to isolate bugs.
Monitis is a SaaS offering that will have your monitoring up and running in minutes. It provides you with custom plans based on the number of nodes in your network and the type of monitoring you want.
Monitis primarily focuses on traditional network and server operating system monitoring. Application monitoring is possible via log scraping and their API. You can use the API to report statistics to the Monitis console, but adding this integration means you’ll need to write code for it.
Server Monitoring Tools from Monitis
Monitis has native agents for monitoring Linux and Windows servers. The agents can report on memory, storage, network, and agents, and they can do it as often as once per minute.
The system can also monitor log files for errors and specific message text. While the agent will monitor system logs, application logs require an extra logging “plugin.”
Application Performance Monitoring
Monitis can monitor Java-based web applications via JMeter scripts. And the server monitoring tools will execute those scripts at 15-minute intervals. But there’s a potential downside here: there’s no support for continuous application monitoring.
If you need support for platforms and languages other than Java, though, don’t worry. Monitis has proprietary SDKs for Perl, Python, PHP, Ruby, and C#. With these interfaces, you can publish statistics for both graphing and alerts to the monitoring system. Naturally, this requires defining the statistics development effort.
Monitis’ real user monitoring tracks page views, page load times, page build performance, and other user statistics. But it only supports browser clients. If you’re looking for mobile application support, you’ll have to look elsewhere.
Zabbix is an open-source monitoring platform, and you can download and install it yourself. If you’d like, Zabbix can even consult with you, creating a turnkey solution for your needs. They have a cloud-based SaaS offering in beta, but that doesn’t have commercial support yet.
Zabbix Server Monitoring
Zabbix supports a wide variety of server infrastructures. Depending on your network topology, it will configure itself via “auto-discovery.” You’ll find this capability useful for server hardware and network infrastructure. But most server platforms require additional configuration. Something else to keep in mind—Zabbix has an operating system agent that can be configured for some application monitoring.
The Zabbix server tools detect problems via log scraping and network monitors. It can also use check you define yourself, like automated web server requests or IP address pings.
Zabbix Application Performance Monitoring
Similar to Monitis, Zabbix provides an API for adding monitoring and metrics to your application. However, their API is a set of REST endpoints on the Zabbix server. REST support means platform and language independence, which is good. However, the burden is on the client to define monitoring criteria and implement the data flows.
Zabbix can also monitor Java applications via JMX with its Java Gateway. But there’s no native support for other platforms or languages.
Rather than tracking users, Zabbix can emulate one via user-defined web requests. Users define requests and response criteria. That criteria may include download speed, response time, HTTP response code, and the occurrence of a string in query results. You can schedule the requests for pre-defined intervals.
With Zabbix, keep in mind that there’s no explicit support for mobile clients. Zabbix can only track mobile clients that make web or REST requests. It can’t collect performance characteristics for different browsers.
4. New Relic
New Relic is a cloud-based APM and server monitoring platform.
APM with New Relic
With New Relic’s APM tools, you’ll have automatic instrumentation for Java, Node.js, and several other languages besides. But if you’re a .NET shop, beware—it’s not supported.
New Relic monitors web and mobile users. But it supplies the capabilities in two distinct modules. This means separate installations, configurations, and billing. Frankly, if you’re looking for simplicity, then New Relic might not be the way to go. It’s six different products with individual licenses and costs.
New Relic’s Server Monitoring Tools
The server monitoring tools log exceptions to a dashboard alongside graphs for errors and error rates. However, you can only see exceptions in stack traces. You won’t find links to source code control.
New Relic server monitoring requires an agent that publishes data to its systems. These systems provide agents for major Linux distributions and recent Microsoft Windows Server versions.
One potential drawback to keep in mind: if the agents can’t be installed on a system, it can’t be monitored. And because New Relic has different ways of handling each product and language, you might be adding complexity to deployments.
Datadog is another SaaS monitoring service for applications and infrastructure. It supports performance monitoring for both web applications. But its primary focus is system monitoring.
Datadog’s Application Performance Monitoring
Datadog monitors servers via an open-source agent. For Linux and Windows, Datadog packages the agent with a “trace agent” that adds APM capabilities. Similar to New Relic, platforms that can’t install the agent are limited to log scraping.
In addition to the agent, you’ll need to instrument your applications to enable tracing. Datadog only supports Golang, Java, Python, and Ruby. Several other languages have unsupported libraries.
You can use the management console to trace Instrumented applications running on systems with Datadog’s agent. Note that Datadog does not support continuous tracing. The system only stores periodic samples of application activity.
Datadog’s Server Monitoring Software
Datadog’s initial focus was infrastructure monitoring. So unsurprisingly, the platform supports plenty of infrastructure integrations. However, as I mentioned above, it can only monitor server hosts where its agent is installed.
With that, let’s talk about some downsides to Datadog. To be sure, it places a strong emphasis on metrics and real-time monitoring. But you’ll find that Datadog’s reporting capabilities are limited, at least when you compare it to other products. There are no real-time reports—only historical. And the options for customizing them are limited.
Another thing to keep in mind is that Datadog is a complicated system with a steep learning curve. You might need to invest considerable time and effort before you can use it to its full potential.
Which provider is right for you?
Selecting the right server monitoring tools is important, but you already knew that. What you really needed to learn was the best tools available to you, complete with their pros and cons. As you saw, each platform has its advantages and disadvantages. But one thing’s certain. If you’re not monitoring your servers and using APM, then you’re falling behind your competitors.