Syntervision is comprised of a bunch of problem solvers. We love going into
an organization, finding the problem areas, and resolving the issues. Below you will find some examples of our success stories.

Application Performance Case Study

Observations

Syntervision was approached in November of 2013 to address application performance issues that were plaguing a Midwest healthcare system. The company complained of long load times, frequent outages, and high utilization of databases in two of the applications.

 

HTTP and SQL errors

Upon our investigation, we discovered that there were HTTP and SQL errors having negative impacts on both applications. Many pages were displaying HTTP 404 page not found errors as well as HTTP 501 not implemented errors. Along with HTTP errors, SQL was displaying invalid objects and the SQL patient lookup was failing to convert with Setv4charge resulting in the patient data entry to be impeded.

TCP

The next issue discovered surrounded the TCP configuration. Both the window and buffer were not properly optimized. The window was much smaller than expected and could not be sized appropriately. Unfortunately, most servers were reporting TCP re-transmissions, the percentages were in excess of 1%. Not only that, but at least three network trunks were dropping packets.

 

DNS

The DNS (Database Naming System) was the next location where we identified an issue. The DNS was providing inconsistent results when used for lookups. As well as providing inconsistent results, the DNS sub-domain appeared to change in a round-robin fashion.

 

Network Utilization

Our final observation came from the network utilization. One of the application databases reported high utilization round 7 pm every night which was early for any scheduled activities. This utilization subsequently resulted in page load time spikes in both applications.

 

Recommendations

After discovering the issues at hand, we then began to analyze the application performances on a weekly basis to define the root causes of the problems and provide recommendations to fix them. In order to resolve application and database communication errors, we suggested they should be connected to the same switches. Resolving issues with switches dropping packets that affect other applications will be vital to the applications’ success. To ensure the continued success of the applications, we also advised deploying remote synthetic testing for each application. Implementing the synthetic testing as well as the addition of a BUR network would help with data loss. Finally we offered to provide additional instrumentation access to the customer.

 

Next Steps

Post receiving our recommendations, the client agreed that we should:
  • Investigate TCP buffer/window inefficiency
  • Perform remote captures and synthetic transactions
  • Further investigation of HTTP/SQL errors
  • Verify resolution of packet drops/retransmissions
  • Identify additional network related packet drops
  • Continue infrastructure and application investigation
  • Provide recommendations related to architecture
  • Investigate inconsistent HTTP page load times

 

Results

We are happy to report that after the completion of our next steps, the application issues were resolved and there was a way to monitor the applications to further prevent similar issues.
“This is truly incredible. I have never seen any software so easy to use and maneuver.” – Director of Networks and Telecommunications, Large Midwest Health System 

 

Application Performance Management Basics

In order to understand and manage application performance and end user experience, it is important to understand key parts such as load, events, incidents, impact, and availability per standard ITIL definitions. Once these elements are understood, it will become clear what the service can provide you as well as where and how the end users can access them.

 

Load: amount of data an application is engineered to handle
Events: change within the application, positive or negative
Incidents: event that is not part of the standard operation causing disruption of service
Impact: measurement of the effect of the incident on the application or end user experience
Availability: the portion of time the application is actually available for use

 

If proactive monitoring of the application and end user experience is needed, it is important to answer four questions before beginning.

Expectation: What is the expected performance of the application?

Create a baseline

Actual: How is the load performing?

If the load approaches the engineered maximum, it’s at risk of degrading

Degraded: What range of performance is considered degraded, but still usable?

The load/performance relationship is key to determine if the results are expected or related to an incident that needs resolved

Down: What is the point in the application performance that is considered unusable?

The load/performance relationship is key to determine if the results are expected or related to an incident that needs resolved.

 

 

Question one and two can be answered in part by two of our complimentary services. These two methods of measuring end user experience and application performance are active and passive monitoring. The Active monitoring, or synthetic monitoring, simply simulates the expected traffic on the application. The Passive monitoring, or traffic analysis, analyzes the traffic that is actually on the application in near real time. We advise that you use these services together for the best results.

Before questions three of four can be answered, how the application performance degrades needs to be understood. This does not occur in a linear fashion, therefore, neither should the analytic model. The model shows the system health/performance starting with an up and running application to a down or unusable application. As the load increases (blue line), the application can become degraded. When the load crosses the threshold of degraded into completely down, the response time (red line) of the application increases exponentially. If load testing has been performed correctly, a static threshold can be used.

The impact of the event is exactly what it sounds like, how the application or end user experience is effected by an event or incident. Having the impact in context to the service provides information needed to determine what is creating the impact and where to start looking for the problem. The impact of an event or incident in turn affects the availability of the application, and vice versa. Measuring the availability of the service and paths each end user utilizes shows if a component becoming unavailable is creating impact.

 

Active Monitoring (Synthetic)

Synthetic checks allow the application or service provided to be continuously checked. This data collected provides information the easily determine when and from where performance results change. While both active and passive monitoring are incredibly useful, synthetic monitoring has major benefits as opposed to only having passive monitoring.

  • The tests are identical every time, making the expected results the same as well
  • Tests are run 24×7, providing the moment the performance begins to degrade
  • Easily determine where the impact to performance is occurring, whether it is overall service or specific location
  • Capability to understand the overall end-to-end simulated transaction performance as well as each step.
  • Passive Monitoring (Traffic Analysis)

There is no better way to know what the end user experience is or how well the application is performing than analyzing the transactions at the most basic level. The real time experiences of users is discovered and analyzed in many ways

  • IP address: performance of the application for each user community based on location
  • Transport level: how application traffic is behaving
  • Application level: what transactions are being performed, who is performing them, and how well they are performing.
  • Transaction data: which transactions are being used the most and their performance

 

Key Areas for Monitoring

When performing application monitoring, there are a few key areas from which to monitor.

  • Monitoring should always start closest to the end user. This could be referred to as a peering point or as close to the edge of the application as possible.
  • Active monitoring should be used where end users originate in order to measure performance from each of their locations
  • Passive monitoring involves VM’s (virtual machines) and next generation virtual networks that encrypt data. The traffic data will normally come from a host managing the VM