Home > It's Inevitable. Your Database Will Fail

It's Inevitable. Your Database Will Fail

January 23, 2018 | Database

Databases fail. No one can promise 100% uptime, it’s impossible. Whether the database is large, small, on-premise or cloud-based, all have the potential to fail. This could be transactional errors, system crashes, out of memory errors or out of disk space errors. Sometimes they fail suddenly and sometimes they just can’t cope with growing demand and “fail slowly” over a period of time.

The list of reasons that cause database failure is long and include:

Application code changes
Workload change as the user base grows or shifts
Hardware failure or change (Spectre and Meltdown patches, take a bow)
Database version upgrades
Configuration changes made to accommodate a new architecture or improve performance
Configuration assumptions made for an old workload that change

What happens when a database fails?

Data loss
Loss of productivity
Other systems can be negatively affected
Poor user experience as entire systems fail slowly

What can you do?

Be prepared. Failures are caused by changes, some that you control and others that you don’t. It's not about preparing for the apocalypse, it's about being the best possible application every day. Optimization is an ongoing process that should never stop. So what can you do? If you're making changes that could result in failure we suggest you:

Set up monitoring on all essential systems
Test in stages
Do gradual rollouts
Have a plan to roll back if changes cause problems
Backup and snapshot systems regularly

If you expect things outside of your control to change and potentially cause failure we suggest you:

Regularly and consistently backup and archive
Test backups
Practice strategies by introducing controlled failures
Set up alerts

Real world example: One of our long-standing customers, a high-profile online retailer, recently shared their story. A software developer made a change to a structure causing the system to slow down. During a surge of holiday season shopping the system crashed. Immediately they were notified and looked into SolarWinds® Database Performance Monitor (DPM) to compare the environment and pinpoint the change out of thousands of queries. They rectified the issue and were back up and running in MINUTES. Another example from our own environment: At SolarWinds, we are constantly ingesting metrics data from our customers’ monitored environments. If our database inserts become too slow, our data pipeline backs up and other systems are affected. The end result becomes visible to users as delayed data in dashboards. The process of catching up becomes increasingly more difficult the longer the performance degradation drags on. The monitoring that we have in place enables us to identify the root cause of a problem before it blows up and has cascading effects.

Conclusion

Be ready. Things are going to break and it’s necessary to be prepared for your users, your business and your team. Invest in the necessary tools for high availability, monitoring and backups. Start a free trial here to see how SolarWinds DPM can help you today!

SolarWinds

We’re Geekbuilt.® Developed by network and systems engineers who know what it takes to manage today's dynamic IT environments, SolarWinds has a deep connection to…

We're Geekbuilt ^™.

Developed by network and systems engineers who know what it takes to manage today's dynamic IT environments, SolarWinds has a deep connection to the IT community.

The result? IT management products that are effective, accessible, and easy to use.

Company Career Center Resource Center Email Preference Center

For Customers For Government For Partners GDPR Resource Center

It's Inevitable. Your Database Will Fail

The list of reasons that cause database failure is long and include:

What happens when a database fails?

What can you do?

Conclusion

Tags

Dungeon Mastering in IT (Part the First)

How Artificial Intelligence Can Lower Service Desk Resolution Times

It's Inevitable. Your Database Will Fail

The list of reasons that cause database failure is long and include:

What happens when a database fails?

What can you do?

Conclusion

Tags

Dungeon Mastering in IT (Part the First)

How Artificial Intelligence Can Lower Service Desk Resolution Times

You may also like