Business Monitoring with Centreon

It's been a long time since we've talked about this interesting topic, having an operational visualization, A control over our business, A way to know the status of the services that keep the company running! I wanted to make a post to share with you the possibilities we face.

As we well know, thanks to monitoring systems such as Centreon, We can monitor our infrastructure. Comprehensive monitoring, together with an in-depth analysis will allow us to know any critical point of the infrastructure that provides a service. These services will be those that we offer to our own users, Customers or suppliers.

Well, that's the idea, Scale your own infrastructure monitoring to a higher level, at one level, where non-technical people, if not managers or executives who need to know in real time how their business is doing. Web dashboards that let you know why an operational service might be affected, that allows you to get down to your guts and know why things work, panels that allow you to carry out simulations such as 'what happens…’, that they know the SLA that is being offered for each business service…

Everything we'll look at in this post is based on 100% in open source, although it is true that Centreon or other products may offer something similar under paid products. In These old posts we already saw the technical part, How to assemble it.

Today it's all about looking at a practical example, and we will give a simple example, My Company, “Open Services IT”, a company that provides IT services. So, Knowing what we need for the company to be able to perform its performance, we will relate the monitored services to each other to create different dependencies.

Let's understand this first panel, where the person in charge can know how the business is doing. In this case, so that the Open Services IT company can be productive and functional, Need:

  • Technicians can serve customers and meet any needs they have. This will be the business service that we will call 'Customer Service'.
  • That the department. of administration can invoice, otherwise, We don't eat, this will be our 'Invoicing' business service.
  • We also have an important item called 'Business Continuity', which will be any service we provide so that in the face of any disaster, the company can continue to work; or anticipate any circumstance that prevents its performance.
  • And nothing, The last important service, but what I don't want to bore you here would be that the 'Domotic' environment works, without it, The company would not open, you would lose control of certain automations, salaries would not be collected… What I said, Do not give importance to this item.

In addition to what has been said, we could indicate the SLA on the interface itself, in % or in time of each service we show. Being able to see how long it has been OK or perfect, Warning or in danger, as well as Critical or the service may be affected.

The person in charge will be able to travel between the different panels and learn about the SLA that the different services are offering. In this example we see the dependencies we have for the “Customer Service” and is composed of the following:

  • Make the incident system work, so that the customer or technician can manage, impute…
  • Through the 'Reporting Service'’ The technician or customer will be able to know in real time the status of the infrastructure managed by us, as well as access to hour usage reports, Bills…
  • We have a system that allows technicians to meet with customers, obviously if this stops, 'Customer Service' may be affected. We also use it for remote sessions with customers to connect to their station…
  • We deliver applications and desktops to our users centrally, so that any employee can work from anywhere. If this doesn't work, Nobody has apps, tools…
  • Obviously technicians and customers need to communicate via email. Here we will control everything necessary for the email to work, Let the servers be ok, that we are not on spam lists…
  • Like the mail, It happens with telephony, Technicians need to be able to communicate with customers (and vice versa). If the service provided by the telephone system falls, Well, there is no switchboard, Either calls don't come in or go out…
  • We have a Wiki environment where technicians consult KB or document any incident so as not to waste time in the future. This is necessary for the good work of the technicians.
  • For the exchange of information with customers/suppliers we have a system that must work, without it they would not be able to access the documents we have about them, Bills, Temporary exchanges…
  • And of course the Internet works! no internet, Technicians are nothing 😉

As we can see, the environment is 100% Customizable and fully corporate, We may, of course, add any link to it (to Products…)… And if we keep going down the level, you will be able to learn what has been said, all the dependencies on something to make it work. In the case of the 'Reporting Service', What we need to know that it is functional:

On the one hand, it must work internally:

  • Well, first, that the product offering the service itself works, in this case it is based on a Grafana, Well, the(s) machine(s) who offer the service are healthy, as well as whatever you need to work (harbours, Processes, DB…).
  • Our beloved 'Active Directory'’ it must work for the authentication and permissions system to work within Grafana itself.
  • The virtualization service must work, without it, no virtual machines run, and our Grafana is virtualized.
  • The internal network must work, if internal communications go down, systems would be affected and would not be able to communicate with each other.
  • And other critical infrastructure services, such as the DNS service, without it there would be no resolution of names; or the NTP service so critical.

And on the other hand, as it is a public service to customers, because we will also control that certain dependencies are fulfilled:

  • The public site must be operational, not only that he responds, but also that the port is open, The certificate does not expire, Do not expire your own domain, or offer it with certified security in SSLLABS (What do I know)…
  • Obviously if the Internet goes down (whatever WANs they are), the Reporting service may not be accessible…
  • Just like if we have a public balancer (in this case we use NetScaler), Well, let it work, Let him do his job.

And for setting the fastest example I could, if in the previous panel we click on Grafana, Well, we would see the machines that offer this service. What has been said, This example is pretty straightforward, but other services allow for more particular and interesting trips. Rolls aside, We see the machine as it stands, with integrations and visualizations of your consumption…

Business Impact Analysis

We can also have a Business Impact Analysis, We will quickly be able to know the answer to any question of 'what if'. This means, For example, that we can manually indicate that something falls, This way we will be able to know the affected services. This way we can anticipate any problem, Know what happens if we remove a cable, If a certificate expires, If we turn off a machine…

We will access this impact analysis from the Home of our business monitoring, If you look at the first image of the post, Below right we have some links to different accesses, One will be here.

The simulation can be carried out based on the current state of the platform, or forcing everything to OK if necessary.

We will be able to travel through the trees of the business processes that we have defined until we find what we want to tear down.

To follow the example of the post… what happens if the Grafana port or process goes down, for example, What would it affect me and how?

Well, we will be able to see how the 'Customer Service'’ is affected, since the Reporting service would be down…

Well, Imagine this with each process of your company, Know how to act, know in real time the SLA or Service Level Agreement that we are providing to our customers, Users or Suppliers. Simple navigation interfaces for any non-technical company profile. Think that in a post it is very complicated to do the full exercise, But think about your dependency tree and how you can visualize its status in real time.

As usual, hoping you find it of interest, Thank you very much for sharing on social networks if you find it interesting and we will continue with similar posts, Let's exploit the data and simplify its delivery!

Recommended Posts

Author

nheobug@bujarra.com
Autor del blog Bujarra.com Cualquier necesidad que tengas, Do not hesitate to contact me, I will try to help you whenever I can, Sharing is living ;) . Enjoy documents!!!