Recently we are seeing a great expansion of web systems, the emergence of hybrid applications and applications that increasingly use software as a service (SaaS).
This growth brings a new challenge to DevOps, infrastructure and development teams: only monitoring networks is no longer capable of guaranteeing data security. This is where observability comes in.
Observability helps in tracking and identifying problems from software tools. And today, we're going to teach you how to implement observability using the Domain Oriented Observability method.
This will make the various calls to logging services and analysis frameworks on your systems less technical and verbose. Check it out!
What is observability in IT?
IT observability (also called observability) is the ability to monitor, track, analyze and diagnose a system using software tools.
With it, it is possible to create a system of constant monitoring and observation of a system with the aim of better understanding how it behaves, especially in cloud architectures.
This concept is widely applied by DevOps, infrastructure and development teams, as it is already widespread in Software Engineering that it benefits the software and facilitates problem solving.
Domain-oriented observability in practice: Values Large applications focused on high-level metrics analysis, such as Mixpanel, believe in the concept of “Value moments”, which indicates which events in a given product are important to instrument.
These moments of value vary depending on the product, for example, software aimed at electronic signature solutions, such as 1Doc, may consider the signing of a contract to be a “moment of value”.
However, the moment of value that makes sense for your business does not necessarily make sense for users.
That's because the value of your business is made up of the balance between two forces: intention and expectation.
If the intention is to facilitate the process of signing a contract, and that's exactly what your users expect, then you've reached the right balance.
However, the mismatch of these two forces is a loss of opportunity, and consequently, of value.
Thanks to high-level metrics, this mismatch is not a lost cause. With them, it is possible to recover and maintain the value of your business according to the moments of value identified by your team of product analysts.
From here, your role as a developer will involve checking the technical feasibility and implementing the capture of these metrics so that the “business team” can deal with the data.
How to implement domain-oriented observability?
From now on, let's move on to the practical part of learning how to implement domain-oriented observability. To better understand this implementation, let's imagine a small task management system.
This system registers scheduled tasks and executes them according to the schedule. However, due to user needs, at certain times, it may be necessary to advance the execution of one of these tasks manually.
To meet this need for “early execution”, the structure below was created: TaskManager: Class responsible for executing a certain task based on its own code ─ it is the “use case” class; TaskRecoverer: Class responsible for abstracting the recovery of database tasks and returning domain objects ─ it is the “repository” class.
Task: Class that represents a “task” in the system ─ it is the “domain entity”.
See the example below: public class Task Manager { private static boolean TAREFA_PROCESSADA = true; private static boolean TAREFA_NAO_PROCESSADA = false; private RecoverTasks retrieverTasks; public TaskManager(RecovererTasks retrieverTasks) { this.recovererTasks = recovererTasks; ); if (task == null) { return TASK_NOT_PROCESSED; } try { task.startProcess(); return TASK_PROCESSED; } catch (TaskInterrompidaException e) { return TAREFA_NAO_PROCESSADA; } } } The code above may not be the best example, but it expresses your domain logic well.
Now, let's apply the observability in our method executeTarefaPeloCodigo.
To do this, let's imagine two libraries in our project: Log: It is a generic log library, useful for troubleshooting activities by developers.
Analytics: It is a generic library of events that metricize user interactions to a certain functionality.
public boolean executeTarefaPeloCodigo(Integer codigoTarefa) { Task task = recuperatorTarefas.recuperaPeloCodigo(codigoTarefa); if (task == null) { Log.warn("The task %d does not exist, so its process has not been started.", codeTarefa); return TASK_NOT_PROCESSADA; } try { Log.info("Task process %d has been started.", taskCode); Analytics.registraEvento("task_started", task); task.iniciaProcesso(); Log.info("Task process %d ended.", TaskCode); Analytics.registraEvento("task_finished", task); return TASK_PROCESSADA; } catch (TarefaInterrompidaException e) { Log.error(e, String.format("Process of task %d was interrupted.", codeTarefa)); Analytics.registraEvento("task_interrupted", task); return TASK_NOT_PROCESSADA; } } Now, in addition to the execution of the business rule previously expressed by the code, we are also dealing with several log calls and analyzes about the use of this functionality.
Looking at it, not from an observability instrumentation point of view, but technically, of course, the maintainability of this code has dropped.
Firstly, if this implementation is crucial to the business, it should be guaranteed with unit tests.
Furthermore, the business rule, which was clearly expressed before, is now obfuscated with the use of these libraries.
Scenarios like this are common to see in the most diverse systems and, generally, it doesn't seem to sound very good an “observability-oriented code” and a “domain-oriented code”, together.
So, is there a solution? Let's understand better below.
Solution for the Observability Case Thinking about the readability of the written code, instinctively, we ended up thinking about creating small methods that abstract this confusing content from within the executeTarefaPeloCodigo, isolating the code focused on the domain from the code focused on analysis.
However, in this case, the introduced observability is a business requirement, so even though it is “analysis-oriented code”, it remains “domain-oriented code”. Understand better in the image below: In other words, not all code aimed at the domain is aimed at observability and not all code aimed at observability is aimed at the domain, but there is, in some cases, an intersection between these, as in our case presented.
Finally, we also strongly recommend extracting the “magic” Strings, as it makes reading more pleasant and easier to understand what each one represents.
Perhaps the introduction of some ENUMs is also valid to abstract what would be the “tracking of events”, such as task_initiated and task_finished, but we will not delve into this subject, as it is not the focus.
public boolean executeTarefaPeloCodigo(Integer codigoTarefa) { Task task = recuperatorTarefas.recuperaPeloCodigo(codigoTarefa); if (task == null) { measureQueTarefaNoExiste(taskcode); return TASK_NOT_PROCESSED; } try { metricThatTaskWasStarted(task); task.startProcess(); measuresWhatTaskWasFinished(task); return TASK_PROCESSED; } catch (TaskInterruptedException e) { measureWhatTaskWasInterrupted(task); return TASK_NOT_PROCESSED; } } private void measuresQueTarefaNaoExiste(Integer codigoTarefa) { Log.warn(MESSAGEM_TAREFA_INEXISTENT, codigoTarefa); } private void measureWhatTaskWasInitiated(Task task) { Log.info(MESSAGEM_TASK_INITIADA, task.getCodigo()); Analytics.registraEvento(TASK_INITIADA, task);} private void metricifyQueTarefaFoiFinalizada(Task task) { Log.info(MESSAGEM_TASK_FINALIZADA, codeTarefa); Analytics.registerEvent(TASK_FINALIZADA, task);} private void metricifyTaskWasInterrupted(Task task) { Log.error(e, String.format(TASK_INTERRUPTED_MESSAGE, taskcode)); Analytics.registraEvento(TASK_INTERROMPIDA, task);} This is a good start, with the domain code getting well written again ─ that is, of course, if you consider that your “domain code” is just the executeTaskByCodigo method. Observing our class, it doesn't take us long to notice that we've made a trade.
If we extract several other metrics methods from within the original method which do not fit with the main objective of the TaskManager class, we are just “sweeping the problem under the carpet”.
When something like this happens, it usually tells us that a new class is looking to emerge.
Therefore, perhaps the simplest solution is to segregate this class into two: one to deal with metrics and another to process tasks.
In other words, our proposal is to create a new class specifically responsible for the application's analyzes and logs, as shown in the drawing below.
This is also a good solution because the segregation of the original responsibilities and the encapsulation of the metrics functions in a new class, added to the possible injection of dependencies introduced, favors the design for testability of the Task Manager, which is the holder of our domain rules.
We can further reinforce this idea by thinking about the fact that Java is an object-oriented language (OOP) and the testability of a class that uses static methods is reduced if the method modifies a state external to itself, and, generally, libraries of logs meet this requirement.
This way, the result of our Task Manager would be the following: public class Task Manager { private static boolean TAREFA_PROCESSADA = true; private static boolean TAREFA_NAO_PROCESSADA = false; private RecoveryTasks recovery; private MetifierTasks meterifier; public TaskManager(RecuperadorTasks retriever, MetificadorTasks metrificador) { this.recuperador = retriever; this.meterifier = meterifier; ); if (task == null) { metrificador.metrificaQueTasaiNaoExist(codeTask); return TASK_NOT_PROCESSED; ); task.startProcess(); metrificador.metrificaWhatTaskWasFinished(task); return TASK_PROCESSED; ); return TASK_NOT_PROCESSED; } }} The process of segregating the TaskManager class and encapsulating the metrics is called Domain Oriented Observability, and the new class generated is our much-coveted Domain Probe.
The name of this design pattern, “Domain Probe”, refers to “Domain Probe”. This name could not be more appropriate since our class literally acts as a “probe”, in a class that previously lacked the measurement of metrics.
How to test domain-oriented observability?
Before actually testing observability, let's go back to the first version of our class, and try to imagine a test scenario.
public class TaskManager { private static boolean PROCESSED_TASK = true; private static boolean TASK_NOT_PROCESSADA = false; private RetrieverTasks retriever; public TaskManager(TaskRetriever retriever) { this.retriever = retriever; } public boolean executeTarefaPeloCodigo(Integer codigoTarefa) { Task task = recuperator.recuperaPeloCodigo(codigoTarefa); if (task == null) { return TASK_NOT_PROCESSED; } try { task.iniciaProcesso(); return TASK_PROCESSADA; } catch (TaskInterruptedException e) { return TASK_NOT_PROCESSADA; } }} If you are used to doing this type of analysis, you will notice some scenarios: Either there is no task with the informed code, returning FALSE; Either there is a task and its processing is completed, returning TRUE; Or there is a task and its processing is interrupted, returning FALSE; For simplicity, we'll just use the third scenario as an example. Below, we can see how the implementation of this test class would be.
public class TaskManagerTest { private static final Integer CODIGO_TAREFA = 1; private TaskManager TaskManager; private RetrieverTasks retriever; @BeforeEach public void setUp() { this.recuperador = Mockito.mock(RecuperadorTarefas.class); this.TaskManager = new TaskManager(retrieval); ); Boolean foiExecutado = GestorTarefas.executaTarefaPeloCodigo(CODIGO_TAREFA); assertFalse(wasExecuted); } private Task creaTarefaComExceptionEmbutida() throws TaskInterrompidaException { Task task = Mockito.spy(new Task(CODIGO_TAREFA)); doThrow(new InterruptedTaskException()).when(task).startProcess(); return task; } } Following the GWT naming pattern (Given - When - Then), we can express our business rule in the test.
However, it is worth mentioning that here we are translating and “Brazilianizing” the writing of GWT (Given - When - Then), transforming it into “DCQ” (Should ─ Case ─ When).
Thus, we use DCQ means: “Must return false”, which is equivalent to “Then returns false”; “If a processing error occurs”, which is equivalent to the expression “When a processing error occurs”; “When there is a task with the informed code”, which represents the same as “Given an existing task with the informed code”.
From this, when we re-implement our observability, our TaskManager class goes back to being like this: public class TaskManager { private static boolean TASK_PROCESSADA = true; private static boolean TASK_NOT_PROCESSADA = false; private RetrieverTasks retriever; private MetifierTasks metricifier; public TaskManager(TaskRetrieval retriever, TaskMetifier metric) { this.retrieval = retriever; this.meter = meter; } public boolean executeTarefaPeloCodigo(Integer codigoTarefa) { Task task = recuperator.recuperaPeloCodigo(codigoTarefa); if (task == null) { metric.metricsQueTarefaNoExiste(taskcode); return TASK_NOT_PROCESSADA; } try { metric.metricifyWhatTaskWasInitiated(task); task.iniciaProcesso(); metric.metricifyWhatTaskWasFinalized(task); return TASK_PROCESSADA; } catch (TaskInterruptedException e) { metric.meterifyWhatTaskWasInterrupted(task); return TASK_NOT_PROCESSADA; } }} It is important to remember here that no behavior was changed with the increase in observability.
Therefore, the test done previously continues to fulfill its role even though it is outdated.
At most, what would occur in this case is a compilation error, which would already serve as a warning to the tests that this class now has a new dependency.
Being an increase in our original business rule, there is nothing fairer than increasing the tests by ensuring correct invocations of our instrumenter.
See the following example: public class ManagerTasksTest { private static final Integer TASK_CODE = 1; privateTaskManagertaskmanager; private RecoveryTasks recovery; private MetifierTasks meterifier; @BeforeEach public void setUp() { this.recuperador = Mockito.mock(RecuperadorTasas.class); this.metrificador = Mockito.mock(MetificadorTasks.class); this.gerenciadorTasks = new ManagerTasks(recoverer, metrifier); ); Boolean wasExecuted = managerTasks.executeTasgaPeloCodigo(CODIGO_TAREFA); Mockito.verify(metrifier, times(1)).metrifyQueTasgaWasStarted(any()); Mockito.verify(metrificador, times(1)).metrificaQueTaskerHasInterrupted(any()); Mockito.verifyNoMoreInteractions(metrifier); assertFalse(wasExecuted); ); doThrow(new TaskInterrompidaException()).when( task).iniciaProcesso(); return task; } } Taking advantage of the dependency on an instrumenter within our TaskManager, we can also inject a fake class to check only the number of invocations of each method.
In the test above, we check that the methods metricQueTaskWasInitiated and metricQueTaskWasInterrupted were invoked, and then we ensure that no further interactions are made with our instrumentation class.
So, if a new metric appears, there is refactoring or a change in the business rule, we will have tests that guarantee what the business expects, or expected.
Author's Opinion This article is, in large part, a rereading of the Domain-Oriented Observability study, written by Pete Hodgson in 2019, and also includes the views of several other authors on the subject, including the personal opinion of the author, Felipe Luan Cipriani , tech writer invited by the Softplan group..
When I read the landmark article “Domain-Oriented Observability” for the first time, I was not surprised by something revealing, as I already knew the method.
However, after a few conversations with close colleagues and a few more attempts to understand the entirety of the article, I realized how I underestimated it.
Domain Probe does not address encapsulation, segregation or dependency injection ─ although these are all elements that make it up ─, but rather the importance of metrics, and their relevance to the business.
And while the Domain Probe design pattern has similarities to a Facade, it is concerned with the essence of every system: the domain.
Therefore, it has its value. This is an essential design pattern to know and apply wherever there are metrics tools in a domain that are not designed or designed to be easy to read, interpret, or maintain.
After all, developers spend more time reading code than writing it.
Furthermore, this is a project pattern with extreme flexibility in terms of granularity.
In other words, you can create anything from a Domain Probe for each domain class, this approach being more “specific”, to even a “generic” Domain Probe. There is no wrong approach, just different approaches.
Another type of implementing a Domain Oriented Observability is through events.
In this scenario, the current project pattern is Observer, and its approach is equally interesting, making it worth an article dedicated just to it.
Finally, I thank you, dear reader, for your time and interest.