The present analysis aims at finding a solution to the problems caused by the long time taken to process SQL queries, especially during moments of the day when the system is more loaded. One of the main goal was to improve the efficiency during peak hours when the system experiences bigger loads. In order to conduct this research, 7 different methods where used to analyze the long processing of queries.
Used methods to analyze performance problems within ElasticSearch.
Setting up the ElasticSearch cluster in the developement environment, conformly to the parameters and number of nodes present in the operational environment.
The goal is to test data base loading and query execution time by recreating the same conditions. This is a possible approach in situations where you possess the necessary ressources but you still cannot recreate the problems in the current test environment.
The 3-nodes cluster ElasticSearch is here set up in the developement environment.
It is also recomended to implement the Kibana tool for a better access to data visualisation and programming tools. In order to recreate the problems occuring in the production environment it is worth proceeding to a data base production migration – with the possibility of anonimizing data. When dealing with big data bases, it’s best to separate it in multiple steps and the indexes into parts.
It’s good to integrate the programing cluster with the Prometheus software in order to get a data visualization on Grafan. We start our efficiency tests on the developing environment, starting with the analysis of the queries that are causing us trouble during peak hours.
With this method, you can collect all the activity information within a specific time frame. The statistics collected from the processing of trouble-causing tasks, that can be visualized in Grafan, may not necessarily reflect the problems occuring in the production environment. Even integrating simultaneously all the tasks. In a situation like this one, it is necessary to conduct more tests.
The tests need to start with time frame where troubles were occuring the most.
In order to simulate an increased number of queries, you can process queries using parsea and place them in automatic system scripts (f.e. through the use of bash scripts).Then you should place them in the ES base with help of curl tools.
We can continue our analysis of the ES cluster by starting with some key elements that influence on our efficiency:
The repartition of shards can be checked through:
Exemple of a result:
In order to better the data base efficiency it is also necessary to change the parametre responsible for refreshing indexes. In this example, we changed the index refresh interval in dynamic mode (without restarting the cluster) from 1s to 30s, as below:
As a result of the change we obtained better indexing intervals.
However, in this case, it did not translate into an improvement in the time of query execution.
Exemple of statistics on Grafan after implementing the indexing change.
The next element that can improve the efficiency of the ES cluster is changing the GC parameters.
The Garbage Collector module automatically detects when an object isn’t usefull anymore and deletes it, freeing up space for objects that are still used in the process.
With the Garbage Collector tool you can:
You can expect fewer GC launches over a given period of time.
GC Statistics and Query Time on Grafam after implementing changes.
The next element that can improve the efficiency of the ES cluster is the addition of new nodes.
Adding another node to the cluster improves the overall efficiency through the use of the additional ressources that will relieve the other nodes. The configuration is made by changing the number of replicas. Shards are spread evenly, and the additional node will automatically start to process data.
The setup is made by changing the amount of replicas in the elasticsearch.yaml, as below:
Other possible solutions that you can implement after adding another node: