Observability:Data pipeline:Beats => Redis => Logstash => Elasticsearch

In the architecture of Elastic Stack, we usually use the following diagram to represent:

As shown in the figure above, we usually use Kafka or Redis as a kind of Message Queue (message queue) as a kind of data buffer. In my previous article:

I detailed how to use Kafka as a message queue to import data into Elasticsearch. In today's article, I will introduce how to use Redis as a Message Queue to import data. In actual use, using Redis as a message queue is even more common than using Kafka. Redis is known for its flexibility, performance, and extensive language support. It serves both as a database and cache, as well as a message broker. For ELK-based data pipelines, Redis can be placed between Beats and Logstash as a buffer layer, giving downstream components a better chance to successfully process and index data.

beats to redis

In today's exercise, I will use the following configuration:

  • Filebeat-collect logs and forward them to Redis
  • Redis-proxy data flow and queue it
  • Logstash – subscribe to Redis, process data and send it to Elasticsearch
  • Elasticsearch-indexing and storing data
  • Kibana-Analyze data.

The configuration of today's article is very similar to my previous article " How to use Elasticsearch, Logstash and Kibana to manage Apache logs ". We just added the Redis part before Logstash. The deployment can also refer to that article. We will use Filebeat's Apache module to import Apache logs.

installation

Elasticsearch

I installed Elasticsearch according to the document " How to install Elasticsearch on Linux, MacOS and Windows ". But in order to allow my Elasticsearch to be accessed by other virtual machines, I configured the Elasticsearch configuration file config/elasticsearch.yml as follows:
config/elasticsearch.yml

network.host: 0.0.0.0discovery.type: single-node

Here, we configure network.host to 0.0.0.0 to enable our Elasticsearch deployed on MacOS to be accessed by Logstash and Filebeat on Ubuntu OS. This setting allows our Elasticsearch to bind all network interfaces. After completing the configuration of elasticsearch.yml, we restart Elasticsearch. We can see that:

We can view the results through the curl command:

curl http://192.168.0.3:9200
$ curl http://192.168.0.3:9200{  "name" : "liuxg",  "cluster_name" : "elasticsearch",  "cluster_uuid" : "kjYtLn_3QC6D4qh6FdMyNQ",  "version" : {    "number" : "7.13.0",    "build_flavor" : "default",    "build_type" : "tar",    "build_hash" : "5ca8591c6fcdb1260ce95b08a8e023559635c6f3",    "build_date" : "2021-05-19T22:22:26.081971330Z",    "build_snapshot" : false,    "lucene_version" : "8.8.2",    "minimum_wire_compatibility_version" : "6.8.0",    "minimum_index_compatibility_version" : "6.0.0-beta1"  },  "tagline" : "You Know, for Search"}

Kibana

We install Kibana according to the article " How to install Kibana in the Elastic stack on Linux, MacOS and Windows ". In order to be able to access the Elasticsearch we installed above, we need to make corresponding adjustments in the default Kibana. Let's modify the config/kibana.yml file:
config/kibana.yml

server.host: "192.168.0.3"

Please note that in the above, I set the local IP address to the server address of Kibana. You need to modify it according to your own IP address. This is because we can enable Filebeat to access Kibana for setup (generate the corresponding Dashboard, etc., for detailed reading, please refer to " Introduction to Beats Tutorial (2) "). After configuring the kibana.yml file, we restart Kibana. In our browser, we enter the corresponding IP:5601 to check whether the installation is correct:

If you can see the above output, it means that our Kibana installation is correct.

Filebeat

Let's follow the instructions given by Kibana to install Filebeat. In our exercise, we will use Apache server to produce logs and Filebeat to collect these logs.

Above it shows the installation instructions for each platform. For Ubuntu OS, we choose the DEB platform. We follow the instructions above to install:

[email protected]:~/beats$ sudo dpkg -i filebeat-7.13.0-amd64.deb[sudo] password for liuxg: (Reading database ... 368789 files and directories currently installed.)Preparing to unpack filebeat-7.13.0-amd64.deb ...Unpacking filebeat (7.13.0) over (7.13.0) ...Setting up filebeat (7.13.0) ...Processing triggers for systemd (245.4-4ubuntu3.6) ...

The above shows that our installation was successful. Let's not start Filebeat yet.


Apache

In today's Web server design, we will use a combination of Nodejs + Apache. You can choose your favorite language to design your own Web according to your favorite way:

Let's install nodejs and Apache on Ubuntu OS.

sudo apt-get updatesudo apt-get install apache2 nodejs

Next, we need to proxy all requests coming in on port 80 through the URL of the node.js application to the running local node.js process. For this, we need to install/enable mod_proxy  and  mod_proxy_http  modules on the Apache server  :

sudo a2enmod proxysudo a2enmod proxy_http

So now the exciting part begins. We need to configure the Apache server to proxy requests to the node.js application. Then, we will configure a VirtualHost for this. We first enter the directory /etc/apache2/sites-available

$ pwd/etc/apache2/[email protected]:/etc/apache2/sites-available$ ls000-default.conf  default-ssl.conf

Let me first create a conf file of our own. For my situation, I created a file called liuxg.conf with the following content:

liuxg.conf

<VirtualHost *:80>	# The ServerName directive sets the request scheme, hostname and port that	# the server uses to identify itself. This is used when creating	# redirection URLs. In the context of virtual hosts, the ServerName	# specifies what hostname must appear in the request's Host: header to	# match this virtual host. For the default virtual host (this file) this	# value is not decisive as it is used as a last resort host regardless.	# However, you must set it for any further virtual host explicitly.	ServerName www.liuxg.com        ServerAlias www.liuxg.com         ProxyRequests Off        ProxyPreserveHost On        ProxyVia Full        <Proxy *>          Require all granted        </Proxy>         ProxyPass / http://127.0.0.1:3000/        ProxyPassReverse / http://127.0.0.1:3000/ 	# Available loglevels: trace8, ..., trace1, debug, info, notice, warn,	# error, crit, alert, emerg.	# It is also possible to configure the loglevel for particular	# modules, e.g.	#LogLevel info ssl:warn 	ErrorLog ${APACHE_LOG_DIR}/error.log	CustomLog ${APACHE_LOG_DIR}/access.log combined 	# For most configuration files from conf-available/, which are	# enabled or disabled at a global level, it is possible to	# include a line for only one particular virtual host. For example the	# following line enables the CGI configuration for this host only	# after it has been globally disabled with "a2disconf".	#Include conf-available/serve-cgi-bin.conf</VirtualHost> # vim: syntax=apache ts=4 sw=4 sts=4 sr noet

In the above, we configured the following settings:

        ProxyPass / http://127.0.0.1:3000/        ProxyPassReverse / http://127.0.0.1:3000/

Please pay attention to the port 80 defined in VirtualHost. Through the above configuration, we can map all requests from 127.0.0.1:80 to 127.0.0.1:3000. I defined the ServerName above as www.liuxg.com. If we do not have our own domain name, we can define the resolution of this domain name in /etc/hosts:

[email protected]:/etc$ pwd/[email protected]:/etc$ cat hosts127.0.0.1	localhost127.0.0.1	liuxg127.0.0.1	liuxg.com192.168.0.4 ubuntu  # The following lines are desirable for IPv6 capable hosts::1     ip6-localhost ip6-loopbackfe00::0 ip6-localnetff00::0 ip6-mcastprefixff02::1 ip6-allnodesff02::2 ip6-allrouters

Next, we must enable this new site configuration and disable the default site configuration.

sudo a2ensite liuxg.confsudo a2dissite 000-default.conf
$ sudo a2ensite liuxg.confEnabling site liuxg.To activate the new configuration, you need to run:  systemctl reload [email protected]:/etc/apache2/sites-available$ sudo a2dissite 000-default.confSite 000-default disabled.To activate the new configuration, you need to run:  systemctl reload [email protected]:/etc/apache2/sites-available$  systemctl reload apache2==== AUTHENTICATING FOR org.freedesktop.systemd1.manage-units ===Authentication is required to reload 'apache2.service'.Authenticating as: liuxg,,, (liuxg)Password: ==== AUTHENTICATION COMPLETE ===

After modifying our above configuration, we need to restart the Apache service:

sudo service apache2 restart

We can use the following command to check whether apache is running normally:

systemctl status apache2
$ systemctl status apache2● apache2.service - The Apache HTTP Server     Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset:>     Active: active (running) since Tue 2021-06-01 07:30:10 CST; 1h 17min ago       Docs: https://httpd.apache.org/docs/2.4/    Process: 878 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCE>   Main PID: 1058 (apache2)      Tasks: 55 (limit: 18984)     Memory: 18.6M     CGroup: /system.slice/apache2.service             ├─1058 /usr/sbin/apache2 -k start             ├─1060 /usr/sbin/apache2 -k start             └─1061 /usr/sbin/apache2 -k start

If you see the above status as active, it means that the Apache service is running. We can also enter localhost:80 in the browser address bar   to check whether apache is working properly. Below we try to access the IP:80 port in Ubuntu OS by using the browser in the Mac:

Next, let's use a nodejs project for testing. First we download the following items:

git clone https://github.com/contentful/the-example-app.nodejs

After downloading the above project, we proceed to the root directory of the project:

[email protected]:~/nodejs/the-example-app.nodejs$ pwd/home/liuxg/nodejs/[email protected]:~/nodejs/the-example-app.nodejs$ lsDockerfile  app.json      helpers.js         package.json  testLICENSE     bin           i18n               public        variables.envREADME.md   cypress.json  lib                routes        viewsapp.js      handlers      package-lock.json  services

Type the following command:

npm install

After the installation is complete, then type the following command to run:

npm run start:dev

In this way, in our browser in Mac OS, we can check whether the Web server is running normally:

The above indicates that our nodejs server is running normally.

We can find the Apache log file at the following address:

[email protected]:/var/log/apache2$ pwd/var/log/[email protected]:/var/log/apache2$ lsaccess.log        access.log.3.gz  error.log.1      error.log.4.gzaccess.log.1      access.log.4.gz  error.log.10.gz  error.log.5.gzaccess.log.10.gz  access.log.5.gz  error.log.11.gz  error.log.6.gzaccess.log.11.gz  access.log.6.gz  error.log.12.gz  error.log.7.gzaccess.log.12.gz  access.log.7.gz  error.log.13.gz  error.log.8.gzaccess.log.13.gz  access.log.8.gz  error.log.14.gz  error.log.9.gzaccess.log.14.gz  access.log.9.gz  error.log.2.gz   other_vhosts_access.logaccess.log.2.gz   error.log        error.log.3.gz

Above we can see access.log, error.log and other files. The content of access.log is as follows:

We can do more operations in the nodejs server webpage, and you will see more logs.

Logstash

We install Logstash on Ubuntu OS.

Logstash is an open source tool that can collect, parse and store logs for future use, and can perform quick log analysis. Logstash can be used to aggregate logs from multiple sources (such as a cluster of Docker instances) and parse them from text lines into a structured format such as JSON. In the Elastic Stack, Logstash uses Elasticsearch to store and index logs.

Logstash requires Java 8 or Java 11:

sudo apt-get install default-jre

Verify that Java is installed:

java -version

If the output of the previous command is similar to the following, then you will know that you are heading in the right direction:

openjdk version "11.0.6" 2020-01-14OpenJDK Runtime Environment (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1)OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1, mixed mode, sharing)

Install Logstash using the following command:

curl -L -O https://artifacts.elastic.co/downloads/logstash/logstash-7.13.0-amd64.debsudo dpkg -i logstash-7.13.0-amd64.deb

Above we installed the 7.13.0 version that matches our Elasticsearch. You can modify the above version according to your Elasticsearch version to download.

Redis

Last but not least, our final installation step-Redis. We use the following command to run Redis on Ubuntu OS:

sudo apt install redis-server

After the installation is complete, we use the following command to start Redis:

sudo service redis start
$ sudo service redis [email protected]:~$ service redis status● redis-server.service - Advanced key-value store     Loaded: loaded (/lib/systemd/system/redis-server.service; enabled; vendor pr>     Active: active (running) since Tue 2021-06-01 10:32:36 CST; 55s ago       Docs: http://redis.io/documentation,             man:redis-server(1)   Main PID: 42700 (redis-server)      Tasks: 4 (limit: 18984)     Memory: 2.6M     CGroup: /system.slice/redis-server.service             └─42700 /usr/bin/redis-server 127.0.0.1:6379 6月 01 10:32:36 liuxgu systemd[1]: Starting Advanced key-value store...6月 01 10:32:36 liuxgu systemd[1]: redis-server.service: Can't open PID file /run>6月 01 10:32:36 liuxgu systemd[1]: Started Advanced key-value store.

From the above we can see that the redis service has been successfully started.

To make sure everything works as expected, we run the Redis CLI on the terminal:

$ redis-cli 127.0.0.1:6379> pingPONG

Logstash

The Logstash configuration file is in JSON format and is located in /etc/logstash/conf.d. The configuration consists of three parts: input, filter and output.

Let's create a configuration file called 02-apache-input.conf and set up our Apache input:

sudo vi /etc/logstash/conf.d/apache.conf

We enter the following content into apache.conf:

/etc/logstash/conf.d/apache.conf

input {  redis {    host => "localhost"    key => "apache"    data_type => "list"  }} filter {    grok {      match => { "message" => "%{COMBINEDAPACHELOG}" }    }    date {    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]    }  geoip {      source => "clientip"    }} output {  stdout {    codec => rubydebug  }   elasticsearch {    hosts => ["mac:9200"]  }}

As you can see-we use the Logstash Redis input plugin to define the Redis host and the specific Redis channel we want Logstash to extract from. data_type is set to list, which means that Logstash will use BLPOP operations to pull data from the Redis channel. In the above, we need to modify the above configuration in output according to our Elasticsearch address.

save document. We use the following command to start Logstash:

$ sudo service logstash restart[sudo] password for liuxg: [email protected]:/etc/logstash/conf.d$ service logstash status● logstash.service - logstash     Loaded: loaded (/etc/systemd/system/logstash.service; disabled; vendor prese>     Active: active (running) since Tue 2021-06-01 10:22:37 CST; 4s ago   Main PID: 40425 (java)      Tasks: 24 (limit: 18984)     Memory: 445.4M     CGroup: /system.slice/logstash.service             └─40425 /usr/share/logstash/jdk/bin/java -Xms1g -Xmx1g -XX:+UseConcM> 6月 01 10:22:37 liuxgu systemd[1]: Started logstash.6月 01 10:22:37 liuxgu logstash[40425]: Using bundled JDK: /usr/share/logstash/jdk6月 01 10:22:37 liuxgu logstash[40425]: OpenJDK 64-Bit Server VM warning: Option >lines 1-12/12 (END)

If you see the above information, it means that our Logstash has successfully run.

Start the data pipeline

In the above, we have completed all the required installation components. Now it is time for us to start the data pipeline. In the above, I have successfully configured and started the following components:

  • Elasticsearch
  • Kibana
  • Logstash
  • Redis

Before we do this, in our second terminal, let us access the Redis-CLI monitoring mode so that we can see all Redis operations that are taking place. Just enter the following command to complete this operation:

monitor
$ redis-cli 127.0.0.1:6379> pingPONG127.0.0.1:6379> monitorOK

Now, all you will see is an OK message:

We can review the data pipeline introduced at the beginning of the article. The only thing currently unconfigured is Filebeat. When we installed Filebeat earlier, we connected Filebeat directly to Elasticsearch. We now need to modify it and let it access the apache log and send it to Redis. We need to modify the filebeat.yml file as follows:

/etc/filebeat/filebeat.yml

Let's add the output to redis next:

/etc/filebeat/filebeat.yml

filebeat.inputs: # Each - is an input. Most options can be set at the input level, so# you can use different inputs for various configurations.# Below are the input specific configurations. - type: log   # Change to true to enable this input configuration.  enabled: true   # Paths that should be crawled and fetched. Glob based paths.  paths:    - /var/log/apache2/access.log

Add the collection of access.log at the beginning of the file.

Add at the end of filebeat.yml:

output.redis:  hosts: ["localhost"]  key: "apache"  db: 0  timeout: 5  data_type: "list"

In this way, the input of our Filebeat will be cascaded to redis. Save the filebeat.yml file. In the input section, we tell Filebeat which logs to collect-Apache access logs. In the output part, we tell Filebeat to forward the data to our local Redis server and the relevant channel "apache" to be subscribed to.

data_type is set to list, in this case, this means that Filebeat will use RPUSH to push logs to the Redis channel. Save the file and start Filebeat with the following command:

$ sudo service filebeat [email protected]:~$ service filebeat status● filebeat.service - Filebeat sends log files to Logstash or directly to Elastics>     Loaded: loaded (/lib/systemd/system/filebeat.service; disabled; vendor prese>     Active: active (running) since Tue 2021-06-01 10:55:56 CST; 12s ago       Docs: https://www.elastic.co/beats/filebeat   Main PID: 47511 (filebeat)      Tasks: 13 (limit: 18984)     Memory: 160.2M     CGroup: /system.slice/filebeat.service             └─47511 /usr/share/filebeat/bin/filebeat --environment systemd -c /e> 6月 01 10:55:57 liuxgu filebeat[47511]: 2021-06-01T10:55:57.534+0800        INFO >6月 01 10:55:57 liuxgu filebeat[47511]: 2021-06-01T10:55:57.535+0800        INFO >6月 01 10:55:57 liuxgu filebeat[47511]: 2021-06-01T10:55:57.535+0800        INFO >6月 01 10:55:57 liuxgu filebeat[47511]: 2021-06-01T10:55:57.538+0800        ERROR

At this time, we switch to the terminal of Redis CLI, and we can see the output as shown below:

This should be the collected access.log Apache log information.

In Logstash, I consciously added console output. We can open a terminal and type the following command:

journalctl -u logstash

It shows that our Logstash output is correct.

Let's go to Kibana to check the latest index changes:

GET _cat/indices
green  open .kibana_task_manager_7.13.0_001 d7rhl5y2TW2i-OFS4hiS6Q 1 0    10 767   1.3mb   1.3mbgreen  open .apm-custom-link                kkg4g6i5Qg6VQ4fph2LYzQ 1 0     0   0    208b    208bgreen  open .kibana-event-log-7.13.0-000001 a37rAENuReSIs_zBL0TJDg 1 0    11   0  44.2kb  44.2kbgreen  open .apm-agent-configuration        zPQxIZxWR3SX-DCD4NvpXA 1 0     0   0    208b    208bgreen  open .kibana_7.13.0_001              yr4gx8DeQMq9C2iOS6C9UA 1 0  2340  21   2.7mb   2.7mbgreen  open kibana_sample_data_logs         UpNaMIsAQg6XePEefnvB7w 1 0 14074   0   9.8mb   9.8mbyellow open logstash-2021.06.01-000001      Ajm2UXxrRS2zF82yy3pcXQ 1 1  2981   0  1012kb  1012kbgreen  open .async-search                   oKAmnYWfS4qsTLgZAUR-vQ 1 0    72   1 447.2kb 447.2kbyellow open metricbeat-7.13.0               2-usjsWIRRuqhYEr6SlyWQ 1 1    90   0   2.9mb   2.9mbgreen  open .tasks                          p6Ln5UHdTWamRZ3vJAr_4g 1 0    14   0  20.8kb  20.8kb

From the above, we can see a latest logstash-2021.06.01-000001 index. We can view the documents in it with the following command:

GET logstash-2021.06.01-000001/_search

From the above output, we can see that the log here is our content in /var/log/apache2/access.log.