Adopting a Distributed Tracing Architecture
June 21, 2018
Monitoring and Observability
In a distributed tracing architecture, we need to define the microservices that work inside it. We also need to distinguish the “component” behavior from the “user” behavior and experience—similar words, but totally different concepts.
We should think of the multitude of microservices that constitute a whole infrastructure. Each of these microservices keeps a trace of what it’s doing (behavior) to provide it to the next microservice, and so on, so that the whole design doesn’t get confused in the middle.
Let's also look at a user behavior: dividing an application in microservices, each of them could modify its behavior according to the habits of the user and improve their experience using the application.
COMPONENT AND USER BEHAVIOUR
Imagine this as a self-learning platform: microservices learn behavior from the user habits and will change behavior according to those user habits.
Let’s imagine an e-commerce website with its own engine. It will be composed of microservices, like product displays, product suggestions, payment management, delivery options, and so on. Each of these microservices will learn from the users browsing the site. So, the microservice proposing suggestions will understand from user input that he’s not interested in an eBook, but instead prefers traditional books. The microservice passes this info to the next microservice that will compose the showcase of traditional books. The user will choose to pay by PayPal, and not credit card, so next time, the microservice will set as default payment option to PayPal and not display credit card options. Last, after the decision of where to deliver the item and how (mail, courier), the related microservices will be activated, with the default address and last used delivery method.
TRACING METHODOLOGY
To achieve this user experience, every microservice must get info from the previous one and send its info to the next one: microservice behavior. This architecture has another benefit too: every action is more agile, not monolithic, because the system will automatically use only the service required. The system does not need to parse and query all the options that it could offer. This is mostly the case of a SQL query: in the previous example, the DB will be split in many tables instead of just few, so the query will be performed only on the smaller table assigned to that particular microservice.
The distributed trace is performed using traces, of course, but also spans. The first one is based on the request received by the microservice from the “previous” module, in a sequential mode, and actively sending this to the next module. A span is a part of that trace and keeps information about every single activity performed in the previous module in a detailed way.
USER’S EXPERIENCE
Let's look at the user’s experience. Segmenting an application in a multitude of services makes the troubleshooting work simpler. Developers can get a better understanding of which of the services is responsible for a slow response to the user. This allows them to decide if they should be upgrading and working on that service, or maybe recode it. Anomalies and poor performances are quickly spotted and solved. Tracing is much more important in cases of wide architectures, such as microservices working on different sessions or, worse, in different hosts.
We can finally consider the distributed tracing as a “logging” method to offer a better user experience in a shorter time, based either on its component or the user choices behavior.