June 17, 2019

EFK Stack 구성

개요

EFK Stack을 구성하여 사내 서버들의 로그를 중앙수집하여 관리하자.

서버 자원이 넉넉하지 않아 여러 클러스터로 구성하지는 못하지만 대량의 로그를 수집하는 것도 아니고 단일노드로 환경에서도 로그수집용 EFK 를 구성할 수 있으니 최대한 장점을 살려 도입하자.

로그서버 스펙

adavis에 가상머신으로 로그 수집용 서버를 만들었다.

서버 유형	Hostname	OS	CPU	메모리	서비스망 IP	내부망 IP
가상 머신	log-server	Debian stretch	4 Cores	8G	<secret>	<secret> (/24)

위 서버에 Fluent Bit+ Elasticsearch+ Kibana를 구성한다.

Architectre

구성도.:

+----------+--------+------------+
| odom     | syslog | Fluent Bit |---+
+----------+--------+------------+   |
                                     |
                                     |
+----------+--------+------------+   |
| redmine  | syslog | Fluent Bit |---+           +-------------------log-server-----------------+
+----------+--------+------------+   |           |                                              |
                                     |           |                                              |
+----------+--------+------------+   |     +-----+---+     +---------------+     +--------+     |
| lebron   | syslog | Fluent Bit |---+---->| Fluentd |---> | Elasticsearch |---> | Kibana |     |
+----------+--------+------------+   |     +---------+     +---------------+     +--------+     |
                                     |           |                                              |
               *                     |           |                                              |
               *                     |           +----------------------------------------------+
               *                     |
                                     |
+----------+--------+------------+   |
| pvdi     | syslog | Fluent Bit |---+
+----------+--------+------------+

각 서버(odom, redmine, lebron 등) 의 syslog, APP 로그를 Fluent Bit가 수집한다.
Fluent Bit가 수집한 로그를 log-server(aggregator)로 보낸다. 이때 fluentd가 중앙에서 모든 로그를 수집하게 된다.
Fluentd는 output인 ES로 로그를 저장한다.
ES에 저장된 로그 데이터를 Kibana로 시각화한다.

우리 서버 odom, lebron 등이 오래된 버전(Debian Lenny) Fluent Bit를 컴파일해야 한다.

그래서 우선 아래와 같이 rsyslog에 remote 설정하여 syslog만(APP로그 제외) Fluentd에서 수집하도록 구성한다.

(차후 각 서버에 Fluent Bit을 컴파일하여 APP 로그도 수집하도록 구성할 예정)

새롭게 바뀐 구성도.:

+----------+--------+
| odom     | syslog |---+
+----------+--------+   |
                        |
                        |
+----------+--------+   |
| redmine  | syslog |---+           +-------------------log-server-----------------+
+----------+--------+   |           |                                              |
                        |           |                                              |
+----------+--------+   |     +-----+---+     +---------------+     +--------+     |
| lebron   | syslog |---+---->| Fluentd |---> | Elasticsearch |---> | Kibana |     |
+----------+--------+   |     +---------+     +---------------+     +--------+     |
                        |           |                                              |
               *        |           |                                              |
               *        |           +----------------------------------------------+
               *        |
                        |
+----------+--------+   |
| pvdi     | syslog |---+
+----------+--------+

Installation EFK Stack

EFK 를 설치하면서 필수로 구성해야 할 설정들이 있는데, 설정에 대한 설명은 installation EFK 에서 EFK 를 설치하며 알아보았으니 여기선 생략한다.

1.) Flentd 설치

Fluentd 설치.:

root@log-server:~# curl -L https://toolbelt.treasuredata.com/sh/install-debian-stretch-td-agent3.sh | sh

상태 확인.:

root@log-server:~# systemctl status td-agent
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
   Loaded: loaded (/lib/systemd/system/td-agent.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-06-12 17:38:21 KST; 5min ago
     Docs: https://docs.treasuredata.com/articles/td-agent
  Process: 3358 ExecStart=/opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/
 Main PID: 3363 (fluentd)
    Tasks: 11 (limit: 4915)
   CGroup: /system.slice/td-agent.service
           ├─3363 /opt/td-agent/embedded/bin/ruby /opt/td-agent/embedded/bin/fluentd --log /var/log/td-agent/td-agen
           └─3368 /opt/td-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/embedded/bin/fluentd --log /

 6월 12 17:38:20 log-server systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
 6월 12 17:38:21 log-server systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.

NTP 설정.:

root@log-server:~# apt install ntp

File Descriptor 설정.:

root@log-server:~# ulimit -n
1024

root@log-server:~# tail /etc/security/limits.conf
...
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536

root@log-server:~# reboot

적용하려면 reboot이 필요하다.
할당할 수 있는 파일디스크립터의 개수가 1024인데, 이것을 65536으로 늘렸다.

Optimize Network Kernel Parameters.:

root@log-server:~# tail -n 15 /etc/sysctl.conf
...
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 5000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_max_syn_backlog = 8096
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 10240 65535

root@log-server:~# sysctl -p

서비스 자동시작.:

root@log-server:~# systemctl enable td-agent-bit

td-agent v3.4.0 설치 완료!

2.) Elasticsearch 설치

PGP Key 가져오기.:

root@log-server:~# wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
OK

source list 추가.:

root@log-server:~# echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
deb https://artifacts.elastic.co/packages/7.x/apt stable main

elasticsearch 설치.:

root@log-server:~# apt update && apt install elasticsearch

서비스 자동시작.:

root@log-server:~# systemctl enable elasticsearch

elasticsearch 유저의 최대 파일 디스크립터 설정.:

root@log-server:~# vi /usr/lib/systemd/system/elasticsearch.service
...
LimitNOFILE=65535

elasticsearch가 스왑메모리를 사용하지 않도록 설정.:

root@log-server:~# cat /etc/elasticsearch/elasticsearch.yml
...
bootstrap.memory_lock: true

root@log-server:~# ulimit -l unlimited

root@log-server:~# tail /usr/lib/systemd/system/elasticsearch.service
...
[Service]
LimitMEMLOCK=infinity
...

root@log-server:~# systemctl daemon-reload
root@log-server:~# systemctl restart elasticsearch

root@log-server:~# curl -X GET localhost:9200/_nodes?filter_path=**.mlockall
{"nodes":{"sAQ9SlscTP2vn5ALyqgIAQ":{"process":{"mlockall":true}}}}

mlockall”:true 로 나와야 한다.

Elasticsearch는 기본적으로 index를 저장하기 위해 mmapfs 디렉토리를 사용하는데, mmap 카운트의 기본값이 너무 낮아 메모리 부족 예외가 발생할 수 있다.

그래서 아래와 같이 값을 늘린다.:

root@log-server:~# cat /usr/lib/sysctl.d/elasticsearch.conf
vm.max_map_count=262144

root@log-server:~# sysctl -p

Java Heap 메모리 설정.:

root@log-server:~# grep Xm /etc/elasticsearch/jvm.options
-Xms4g
-Xmx4g

log-server의 총 메모리가 8G인데, 절반을 Java Heap으로 설정하는 것을 권장한다.
나머지 절반은 루씬 파일 캐시를 위해 남겨두어야 한다.

cluster 이름 설정.:

root@log-server:~# cat /etc/elasticsearch/elasticsearch.yml |grep cluster.name
cluster.name: orchard-cluster

root@log-server:~# curl -X GET http://localhost:9200/
{
  "name" : "log-server",
  "cluster_name" : "orchard-cluster",
  "cluster_uuid" : "_kcvsCznTkmEa-l997rQRQ",
  "version" : {
    "number" : "7.1.1",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "7a013de",
    "build_date" : "2019-05-23T14:04:00.380842Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Elasticsearch v 7.1.1 설치 완료!

Elasticsearch를 설치하면 x-pack이 함께 설치된다.

x-pack을 간단하게 소개하자면 보안, alert, monitoring, reporting, graph 관련 기능을 하나로 모아 놓은 패키지 플러그인이다.

3.) Kibana 설치

GPG Key 가져오기.:

root@log-server:~# wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
OK

source list 추가.:

root@log-server:~# echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
deb https://artifacts.elastic.co/packages/7.x/apt stable main

Kibana 설치.:

root@log-server:~# apt update && apt install kibana

root@log-server:~# systemctl enable kibana

네트워크 설정.:

root@log-server:~# grep server.host /etc/kibana/kibana.yml
server.host: "0.0.0.0"

root@log-server:~# systemctl restart kibana

Kibana 7.1.1 설치 완료!

fluentd 구성

Fluentd에서 syslog를 수신받아 ES에 저장하도록 구성하자.:

root@log-server:~# cat /etc/td-agent/td-agent.conf
<source>
  @type syslog             # parsing for syslog
  port 5140                # UDP 5140 port liten
  bind log-server          # Internal IP
  tag syslog
</source>

<match syslog.**>
  @type copy
  <store>
    @type stdout
  </store>
  <store>
    @type elasticsearch
    host 127.0.0.1
    port 9200
    logstash_format true                         # 이렇게 하면 ES에 로그를 저장할 수 있고, Kibana를 이용할 수 있다. (default : flase)
    logstash_prefix fluentd-syslog               # Set index name (default : fluentd)
    logstash_dateformat %Y%m%d                   # index name-{dateformat} ex) fluentd-syslog-20190613
    include_tag_key true                         # Include tag key (logname)
    tag_key @logname                             # Include logname

    buffer_type file                             # chunk wirte at file (default: memory)
    buffer_path /var/log/td-agent/buffer/syslog/ # chunk path
    flush_interval 1s                            # chunk가 queue로 이동하도록 flush 하는 간격 (아래 chunk limit 값이 다 되거나 이 interval 간격에 의해 데이터가 queue로 이동된다.)
    buffer_chunk_limit 256m                      # 각 청크의 최대 크기를 나타냄. limit 값이 될 때까지 이벤트가 청크에 기록되고 limit 값이 되면 queue로 데이터가 이동된다. (ddefault : mem-8M / file-256M)
    buffer_queue_limit 256                       # chunk는 output으로 이동하기 전에 queue에서 대기하는데 이 큐에 쌓일 수 있는 청크의 개수를 의미한다. (default : 256)
    retry_wait 5                                 # queue에 쌓인 chunk를 output으로 보내는데, 만약 도착지에 문제가 있어 데이터를 쓰지 못하면 retry_wait 간격만큼 대기 후 다시 시도한다. 또 실패하면 2대 만큼 기다린다. (5s, 10s, 20s, 40s ...)
    buffer_queue_full_action drop_oldest_chunk   # queue가 가득 차버렸을 때 처리에 대한 정책인데, 이 정책은 가장 오래된 청크를 삭제한다.
  </store>
</match>

위 설정을 설명하자면 우선 <source> 섹션에서는 syslog를 받아오기 위한 설정이다.

@type syslog : syslog input 플러그인인데, syslog를 파싱해서 JSON 형태로 로그를 가공해준다.
port 5140 : syslog를 받아올 port를 지정하는 것인데, udp 5140 포트를 listen하여 5140포트로 들어오는 로그를 모두 수집한다.
bind : bind address를 입력하는 것인데, 해당 서버의 internal IP로 바인딩하였다. /etc/hosts에 log-server의 정의가 있기 때문에 hostname으로 입력함.
tag : tag는 임의의 값으로 해당 input을 식별하기 위한 식별자 인데, 아래 <match syslog.**>에서 해당 input을 식별하여 output으로 보낸다.

<match> 섹션은 input을 output으로 보내기 위한 설정이다.

@type copy : <store> output을 보낼 저장소로 카피해서 보내기 위해 설정했다. stdout은 디버깅용이라 차후에는 삭제할 것이다.
<store> @type stdout : output을 지정하는 섹션인데, @type stdout은 터미널로 output을 출력한다. 디버깅용으로 사용.
<store> @type elasticsearch : 수집한 로그를 ES로 보내기 위한 설정이다.
host : ES 서버의 IP를 지정하는 설정인데, 여기선 fluentd와 ES가 하나의 서버에서 구동되어 127.0.0.1을 입력하였다.
port : ES 서버의 port를 입력하는 설정이다.
logstash_format true : 이렇게 하면 ES에 로그를 저장할 수 있고, Kibana를 이용할 수 있다. 기본값은 false이다.
logstash_prefix fluentd-syslog : ES에 생성될 index 이름을 정의하는 설정이다.
logstash_dateformat %Y%m%d : index 이름 뒤에 지정될 날짜 형식을 지정하는 설정이다.
include_tag_key : 이렇게하면 tag가 json 레코드의 value로 추가되는데 아래 tag_key의 이름이 key가 된다. ex) {“@logname”: “syslog”}
buffer_type file : output으로 데이터를 보내기 전에 chunk 단위로 데이터를 버퍼하는데, 버퍼를 메모리가 아닌 file로 하겠다는 설정이다.
buffer_path : buffer가 저장될 경로.
flush_interval : buffer된 데이터가 output으로 보내는 간격.
buffer_chunk_limit : buffer 사이즈를 정하는 설정인데, 이 사이즈가 full되면 output으로 flush 된다. chunk 사이즈가 full 되지 않아도 위 flush_interval 시간이 되면 chunk에 쌓인 로그가 output으로 flush 된다.
buffer_queue_limit : chunk는 output으로 이동하기 전에 queue에서 대기하게 되는데, 이때 queue에 쌓일 수 있는 청크의 개수를 의미한다.
retry_wait : queue에 쌓인 chunk를 output으로 보내는데, 만약 도착지에 문제가 있어 데이터를 쓰지 못하면 retry_wait 간격만큼 대기 후 다시 시도한다. 또 실패하면 2대 만큼 기다린다. (5s, 10s, 20s, 40s …)
buffer_queue_full_action : 만약 queue에 있는 데이터가 output으로 보내지지 못하고 큐가 가득 차면 어떻게 처리해야 할지 정책을 설정하는 것인데, drop_oldest_chunk은 가장 오래된 청크를 삭제한다.

이제 fluentd가 5140 포트를 열고 로그를 수신할 준비가 되었다.

우선 작동 확인을 위해 flush_interval을 1초로 두었지만, 작동 확인이 되면 1시간으로 변경할 것이다.

Syslog to Fluentd

syslog를 fluentd(aggregator)로 전송하도록 설정하자.

내가 로그를 수집할 서버는 총 21대 이며, 21대 모두 아래와 같이 동일하게 설정한다.:

root@log-client:~# cat /etc/rsyslog.d/remote.conf
*.* @log-server:5140

log-server : Internal IP

rsyslog restart.:

root@log-client:~# systemctl restart rsyslog

syslog를 기록하여 fluentd에 이벤트를 발생시키자.:

root@log-client:~# logger 하이루

log-server에서 확인한다.:

root@log-server:~# tail -f /var/log/td-agent/td-agent.log
...
2019-06-14 17:22:29.000000000 +0900 syslog.user.notice: {"host":"log-client","ident":"orchard","message":"하이루"}

Fluentd가 로그를 잘 수신했다.
Fluentd가 INPUT을 잘 수신하는 것을 검증했으니, 이제 OUTPUT으로 잘 내보내지는지 확인하자.

OUTPUT인 Elasticsearch에 로그가 잘 저장되었다면 OUTPUT으로 잘 보내진 것이다.

먼저 index를 조회하였다.:

root@log-server:~# curl -XGET http://localhost:9200/_cat/indices?v
...
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   fluentd-syslog-20190614         wY-06fX_RomprMkcecxowQ   1   1         64            0     32.4kb         32.4kb

아래 fluentd 설정에 의해 fluentd-syslog-20190614 형식의 index가 생성되었다.
- fluentd 설정(logstash_prefix fluentd-syslog ,logstash_dateformat %Y%m%d)
health가 yellow인데, 이는 단일 노드라 replica를 저장할 노드가 없어서 그렇다.
- Index의 상태에 대한 설명 참조 : https://brunch.co.kr/@alden/43
- 만약 replica를 0으로 바꾸면 상태는 green이 될것이다. 기본값은 shard5, replia 1이다.
- index 상태 및 설정 변경은 아래 진행하면서 자세히 다루기로 한다.

위 인덱스를 Kibana 에서 확인할 수 있도록 Kibana 대시보드에 접속하여 아래와 같이 인덱스 패턴을 만들자.

Management 클릭
Create index pattern 클릭
fluentd-syslog* 입력 (fluentd로 부터 수집하여 ES에 저장한 인덱스)
Next step 클릭
Time Filter field name : @timestamp 입력
Create index pattern 클릭

이제 Discover 탭으로 이동하여 수집된 로그를 확인할 수 있고 하이루로그가 잘 보인다.

ES에 쿼리를 날려 확인해보면 데이터 형식이 아래와 같다.:

root@log-server:~# curl -XGET http://localhost:9200/fluentd-syslog-20190614/_doc/G0QUVWsBAYTatQjE4KEr?pretty
{
  "_index" : "fluentd-syslog-20190614",
  "_type" : "_doc",
  "_id" : "G0QUVWsBAYTatQjE4KEr",
  "_version" : 1,
  "_seq_no" : 63,
  "_primary_term" : 2,
  "found" : true,
  "_source" : {
    "host" : "log-client",
    "ident" : "orchard",
    "message" : "하이루",
    "@timestamp" : "2019-06-14T17:22:29.000000000+09:00",
    "@logname" : "syslog.user.notice"
  }
}

Index Template

Index Template는 인덱스가 사용할 mapping, setting을 미리 저장해두고 인덱스가 자동으로 생성될 때 해당 mapping, setting을 적용하기 위해 사용된다.

에를들어, 나의 경우 아래와 같이 인덱스가 하루에 한번씩 자동으로 생성된다.

fluentd-syslog-20190614
fluentd-syslog-20190615
fluentd-syslog-20190616
fluentd-syslog-20190617

이때 위 모든 인덱스의 설정이 shard: 5, replica: 1 이라 index health가 yellow상태이다.

만약, fluentd-syslog-20190614 인덱스의 설정을 shard5, replica0 으로 설정하여 상태를 green으로 바꾸어도 다음날 자동으로 생성되는 fluentd-syslog-20190615 인덱스는 또 다시 shard5, replica1 로 설정되어 Yellow 상태로 보인다.

이러한 이유로 인해 shard1, replica0 으로 설정된 index template를 만들어 fluentd-syslog* 와 같은 패턴을 조건으로 생성되는 인덱스에 적용하도록 설정하자.

shard 1, replica 0 으로 설정한 이유는 단일 노드로 운영할 것이기 때문에 shard와 replica를 분산하여 저장할 다른 노드가 없기 때문이다.

shard와 relica의 설명은 아래 링크를 참조하자.

http://guruble.com/elasticsearch-2-shard-replica/

먼저 템플릿을 만든다.:

root@log-server:~# curl -XPUT http://localhost:9200/_template/fluentd-syslog-temp -H 'Content-Type: application/json' -d'
 {
 "index_patterns": [
     "fluentd-syslog*"
 ],
 "settings": {
     "number_of_shards": 1,
     "number_of_replicas" : 0
 }}'

템플릿 확인.:

root@log-server:~# curl -XGET http://localhost:9200/_template/fluentd-syslog-temp?pretty
{
  "fluentd-syslog-temp" : {
    "order" : 0,
    "index_patterns" : [
      "fluentd-syslog*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "number_of_replicas" : "0"
      }
    },
    "mappings" : { },
    "aliases" : { }
  }
}

이제 fluentd-syslog* 패턴으로 만들어지는 인덱스는 shard1, replica0 으로 생성될 것이다.

검증을 위해 fluentd-syslog* 의 모든 인덱스를 삭제하자.:

root@log-server:~# curl -XDELETE http://localhost:9200/fluentd-syslog-20190614?pretty
{
  "acknowledged" : true
}
root@log-server:~# curl -XDELETE http://localhost:9200/fluentd-syslog-20190615?pretty
{
  "acknowledged" : true
}
root@log-server:~# curl -XDELETE http://localhost:9200/fluentd-syslog-20190616?pretty
{
  "acknowledged" : true
}
root@log-server:~# curl -XDELETE http://localhost:9200/fluentd-syslog-20190617?pretty
{
  "acknowledged" : true
}

14일 ~ 17일 동안 자동으로 생성된 fluent-syslog를 모두 삭제하였다.

이제 이벤트를 발생시켜 새로운 인덱스를 만들고 shard, replica 설정을 확인하자.

먼저 log-client에서 로그를 발생시켰다.:

orchard@log-client:~$ logger 하이루2

이후 log-server에서 index를 확인했다.:

root@log-server:~# curl -XGET http://localhost:9200/_cat/indices?v
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   fluentd-syslog-20190617         VwAc5IJ_Sm287h265WS10w   1   0          1            0       171b           171b
...

새로운 인덱스(fluentd-syslog-20190617) 가 생성되었고, health를 보니 green임을 확인했다.

정확하게 index setting 정보를 확인해보자.:

root@log-server:~# curl -XGET http://localhost:9200/fluentd-syslog-20190617/_settings/?pretty
{
  "fluentd-syslog-20190617" : {
    "settings" : {
      "index" : {
        "creation_date" : "1560745247119",
        "number_of_shards" : "1",
        "number_of_replicas" : "0",
        "uuid" : "VwAc5IJ_Sm287h265WS10w",
        "version" : {
          "created" : "7010199"
        },
        "provided_name" : "fluentd-syslog-20190617"
      }
    }
  }
}

shard : 1
replica : 0
템플릿이 잘 적용된 것을 확인했다.

Index Lifecycle

index lifecycle management(ILM)은 시간이 지남에 따라 인덱스를 관리하는 방법을 자동화한다.

예를들어, 로그 데이터를 무한정 저장하지 않고 생명주기 정책을 설정하여 90동안만 보관하고 90일이 지난 로그 데이터는 롤 오버하거나 shard를 줄이거나 다른 서버로 인덱스를 보내거나 삭제하거나 하는 개념이다.

인덱스 생명주기 단계에는 4단계가 있다.

Hot : the index is actively being updated and queried.
Warm : the index is no longer being updated, but is still being queried.
Cold : the index is no longer being updated and is seldom queried. The information still needs to be searchable, but it’s okay if those queries are slower.
Delete : the index is no longer needed and can safely be deleted.
참조 : https://www.elastic.co/guide/en/elasticsearch/reference/7.1/index-lifecycle-management.html

단일 노드 환경에서는 delete 정책 정도 반영할 수 있을 것 같다.

나의 경우 90일이 지난 로그는 삭제하도록 설정할 것이다.

인덱스 생명주기 정책을 만들자.:

root@log-server:~# curl -XPUT http://localhost:9200/_ilm/policy/fluentd-syslog-ilm -H 'Content-Type: application/json' -d'
 {
     "policy": {
         "phases": {
             "hot": {
                 "min_age": "0ms",
                 "actions": {
                     "set_priority": {
                         "priority": 0
                     }
                 }
             },
             "delete": {
                 "min_age": "90d",
                 "actions": {
                     "delete": {}
                 }
             }
         }
     }
 }'

hot 단계는 필수로 활성화 시켜야 한다.
delete 단계를 넣어주었고, min_age : 90d 로 주었다.
그럼 90일 동안 index를 보관하고 90일이 지난 인덱스는 자동으로 삭제된다.

생명주기 정책을 인덱스에 적용하려면 인덱스 템플릿이 필요한데, 특정 인텍스 템플릿에 적용하여 인덱스 템플릿과 일치하는 모든 인덱스에 생명주기 정책이 부여된다.

그렇다면 위에서 만든 인덱스 생명주기 정책(fluentd-syslog-ilm)을 인덱스 템플릿(fluentd-syslog-temp) 에 적용하자. Kibana 대시보드에 접속하여 아래와 같이 설정하였다.

management 탭 클릭
Index Lifecycle Policies 클릭
fluentd-syslog-ilm 라인의 Actions 클릭
Add policy to index template 클릭
fluentd-syslog-temp 선택 -> Add policy

검증을 위해 다시 Index를 지우자.:

root@log-server:~# curl -XDELETE http://localhost:9200/fluentd-syslog-20190617?pretty

로그를 새롭게 생성한다.:

orchard@log-client:~$ logger 하이루23

Index의 설정을 확인한다.:

root@log-server:~# curl -XGET http://localhost:9200/fluentd-syslog-20190617/_settings/?pretty
{
  "fluentd-syslog-20190617" : {
    "settings" : {
      "index" : {
        "lifecycle" : {
          "name" : "fluentd-syslog-ilm"
        },
        "number_of_shards" : "1",
        "provided_name" : "fluentd-syslog-20190617",
        "creation_date" : "1560753820382",
        "priority" : "0",
        "number_of_replicas" : "0",
        "uuid" : "a70z4QePRG6h5k9Kyu8rvQ",
        "version" : {
          "created" : "7010199"
        }
      }
    }
  }
}

이 인덱스에 “lifecycle” : { “name” : “fluentd-syslog-ilm” } 정책이 반영된 것을 확인했다.

인증 적용하기

Kibana 대시보드에 인증을 적용할 필요를 느꼈다. 누구든 접속할 수 있으니 보안상 좋지 않다.

ES를 설치하면서 x-pack이 같이 설치되었는데, x-pack은 보안, alert, 모니터링, 리포팅 등 관련 기능을 모아놓은 패키지 플러그인인데 몇가지는 유료이고 몇가지는 무료이다.

ES 라이센스 정책에 따라 사용할 수 있는 기능이 제한되는데 opensource, basic, gold, platinum: 총 4가지로 나뉘고 각 등급에 따라 사용할 수 있는 기능이 나뉜다.

원래는 x-pack 전체가 라이센스가 필요했으나 6.3 버전부터 오픈소스로 변경되었고 basic은 라이센스 등록 없이 무료로 사용가능하다.

자세한 ES 라이센스 정책은 아래 링크를 참조하자.

https://www.elastic.co/kr/subscriptions

Stack Monitoring도 basic이라 무료로 사용할 수 있는데, Kibana 대시보드에 접속하여 Stack Monitoring 탭을 클릭하고 활성화하면 된다.

그럼 Kibana와 ES의 Node상태, CPU, MEM, index, disk, documents등 다양한 자원을 모니터링 할 수 있다.

security는 기본인증은 basic으로 무료로 사요할 수 있고 LDAP, SSO등은 gold, platinum이라 라이센스가 필요하다.

basic 라이센스인 기본인증을 적용하자.

ES 설정

먼저 ES의 설정파일을 수정하자.:

root@log-server:~# tail -n1 /etc/elasticsearch/elasticsearch.yml
xpack.security.enabled: true

root@log-server:~# systemctl restart elasticsearch

ES 7.1은 기본으로 False인데, 이것을 true로 변경하였다.

ES에는 기본으로 제공하는 6가지의 유저가 존재한다.

elastic : A built-in superuser. See Built-in roles.
kibana :The user Kibana uses to connect and communicate with Elasticsearch.
logstash_system : The user Logstash uses when storing monitoring information in Elasticsearch.
beats_system : The user the Beats use when storing monitoring information in Elasticsearch.
apm_system : The user the APM server uses when storing monitoring information in Elasticsearch.
remote_monitoring_user : The user Metricbeat uses when collecting and storing monitoring information in Elasticsearch. It has the remote_monitoring_agent and remote_monitoring_collector built-in roles.
참조 : https://www.elastic.co/guide/en/elastic-stack-overview/current/built-in-users.html#built-in-user-explanation

위 기본 유저의 암호를 설정한다.:

root@log-server:~# /usr/share/elasticsearch/bin/elasticsearch-setup-passwords interactive

Failed to determine the health of the cluster running at http://127.0.0.1:9200
Unexpected response code [503] from calling GET http://127.0.0.1:9200/_cluster/health?pretty
Cause: master_not_discovered_exception

It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.

Do you want to continue with the password setup process [y/N]y

Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y


Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:

Unexpected response code [503] from calling PUT http://127.0.0.1:9200/_security/user/apm_system/_password?pretty
Cause: Cluster state has not been recovered yet, cannot write to the security index

Possible next steps:
* Try running this tool again.
* Try running with the --verbose parameter for additional messages.
* Check the elasticsearch logs for additional error details.
* Use the change password API manually.

ERROR: Failed to set password for user [apm_system].

에러가 발생한다. 왜?..
해결책은 평가판 라이센스를 활성화 후 재시도 하는 것이었다.
기본인증은 basic 라이센스도 사용할 수 있지만, security 기능을 사용하려면 평가판 라이센스를 무조건 한번은 켜야하는건가…?

나는 basic 라이센스(기본 free license)를 가지고 있는데, 이 라이센스를 아래와 같이 Platinum으로 올렸다.

우선 elasticsearch를 stop한다.:

root@log-server:~# systemctl stop elasticsearch

x-pack security 설정을 끄고 다시 시작한다:

root@log-server:~# vi /etc/elasticsearch/elasticsearch.yml
...
#xpack.security.enabled : true

root@log-server:~# systemctl start elasticsearch

다음으로 Kibana 대시보드에 접속하여 30일 평가판 라이센스를 활성화한다.

Kibana 대시보드 접속
Management 탭 클릭
평가판 라이센스 활성화

이후 아래와 같이 다시 x-pack security를 켰다.:

root@log-server:~# vi /etc/elasticsearch/elasticsearch.yml
...
xpack.security.enabled : true

root@log-server:~# systemctl restart elasticsearch

이제 다시 ES 필수 계정들의 암호를 설정했다.:

root@log-server:~# /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
The passwords will be randomly generated and printed to the console.
Please confirm that you would like to continue [y/N]y


Changed password for user apm_system
PASSWORD apm_system = <PASSWD>

Changed password for user kibana
PASSWORD kibana = <PASSWD>

Changed password for user logstash_system
PASSWORD logstash_system = <PASSWD>

Changed password for user beats_system
PASSWORD beats_system = <PASSWD>

Changed password for user remote_monitoring_user
PASSWORD remote_monitoring_user = <PASSWD>

Changed password for user elastic
PASSWORD elastic = <PASSWD>

elasticsearch-setup-passwords 명령의 인자로 auto와 interactive를 넣을 수 있는데,
auto는 자동으로 랜덤하게 암호를 설정해주고,
interactive는 사용자가 직접 암호를 설정하는 것이다.
나의 경우 auto라서 암호는 <PASSWD>로 표시했다.

Fluentd 설정

ES에서 security 기능을 켰으니 Fluentd가 로그 데이터를 ES에 저장할 수 있도록 인증을 해야한다.

ES는 RBAC 이므로 먼저 Role을 만들고, Fluentd 유저를 만든 뒤 해당 Role을 Fluentd에게 Role을 부여할 것이다.

fluentd는 index template를 관리하고 index와 document를 만들고 삭제할 권한이 필요하다.

Role을 만들 때 아래 2가지로 권한이 나뉜다.

Cluster privileges : Manage the actions this role can perform against your cluster.
- 참조 : https://www.elastic.co/guide/en/elastic-stack-overview/7.1/security-privileges.html#privileges-list-cluster
Index privileges : ontrol access to the data in your cluster.
- 참조 : https://www.elastic.co/guide/en/elastic-stack-overview/7.1/security-privileges.html#privileges-list-indices

logstach 에서 로그를 저장하기 위해 사용하는 role을 그대로 fluentd에 적용하려 한다.

아래와 같은 권한이 필요하다.

cluster privileges : manage_index_template, monitor
index privileges : write, delete, create_index

만약 인덱스 생명주기 관리(ILM)을 사용하는 경우 추가로 아래 권한이 필요

(나의 경우 ILM을 사용하니 추가로 아래 권한도 넣어 role을 만들 것이다.

cluster privileges : manage_ilm
index privileges : manage, manage_ilm

fluentd writer라는 이름으로 role을 만들자.:

root@log-server:~# curl -XPUT -u elastic http://localhost:9200/_xpack/security/role/fluentd_writer -H 'Content-Type: application/json' -d'
{
  "cluster": ["manage_index_templates", "monitor", "manage_ilm"],
  "indices": [
    {
      "names": [ "fluentd-syslog*" ],
      "privileges": ["write","delete","create_index","manage","manage_ilm"]
    }
  ]
}'

Enter host password for user 'elastic':
{"role":{"created":true}}

fluentd라는 유저를 만들고 fluentd_writer role을 부여한다.:

root@log-server:~# curl -XPUT -u elastic http://localhost:9200/_xpack/security/user/fluentd -H 'Content-Type: application/json' -d'
{
  "password" : "<PASSWD>",
  "roles" : [ "fluentd_writer"],
  "full_name" : "Fluentd User"
 }'

Enter host password for user 'elastic':
{"created":true}

암호는 <PASSWD>로 따로 표시했다.

이제 fluentd 유저의 권한으로 ES에 로그를 저장할 수 있다.

fluentd 설정파일에 fluentd 유저의 인증 정보를 넣어준다.:

root@log-server:~# cat /etc/td-agent/td-agent.conf
<source>
  @type syslog
  port 5140
  bind log-server
  tag syslog
</source>

<match syslog.**>
  @type copy
  <store>
    @type stdout
  </store>
  <store>
    @type elasticsearch
    host 127.0.0.1
    port 9200
    user fluentd           <<<<추가
    password <PASSWD>      <<<<추가
    ...

<store> 섹션에 user와 password를 넣어준다.
패스워드 plain text로 넣기 싫은데, 방법이 없나..

최종 설정은 아래와 같다.:

root@log-server:/var/log/td-agent/buffer/syslog# cat /etc/td-agent/td-agent.conf
<source>
  @type syslog
  port 5140
  bind log-server
  tag syslog
</source>

<match syslog.**>
  @type elasticsearch
  host 127.0.0.1
  port 9200
  user fluentd
  password <PASSWD>

  logstash_format true
  logstash_prefix fluentd-syslog
  logstash_dateformat %Y%m%d
  include_tag_key true
  tag_key @logname

  buffer_type file
  buffer_path /var/log/td-agent/buffer/syslog/
  flush_interval 1h
  buffer_chunk_limit 256m
  buffer_queue_limit 256
  retry_wait 5
  buffer_queue_full_action drop_oldest_chunk
</match>

디버깅용으로 넣어두었던 stdout을 빼고, flush_interval 간격을 1h로 바꾸었다.
이제 Fluentd는 fluentd유저로 인증을 거쳐 ES에 로그를 저장하게 된다.

fluentd를 재시작한다.:

root@log-server:~# systemctl restart td-agent

ES에 로그가 잘 저장된다.

kibana 설정

kibana 계정으로 ES의 DB에 엑세스하는데 위에서 변경한 kibana유저의 암호를 kibana에게 알려주어야 한다.

kibana.yml 파일에 ES의 계정과 암호를 넣어주어도 된다.

하지만 보안상 plain text로 넣어주는 것은 안좋으니 kibana-keystore를 사용하기로 했다.

먼저 아래와 같이 key-store를 만든다.:

root@log-server:~# sudo -u kibana /usr/share/kibana/bin/kibana-keystore create
Created Kibana keystore in /var/lib/kibana/kibana.keystore

ES에 엑세스할 kibana유저와 암호를 key-store에 넣어준다.:

root@log-server:~# sudo -u kibana /usr/share/kibana/bin/kibana-keystore add elasticsearch.username
Enter value for elasticsearch.username: ******

root@log-server:~# sudo -u kibana /usr/share/kibana/bin/kibana-keystore add elasticsearch.password
Enter value for elasticsearch.password: ********************

elasticsearch.username에 kibana를 넣었고,
elasticsearch.password에 kibana계정의 패스워드를 넣었다.
이제 Kibana를 재시작하고나면 Kibana는 keystore의 정보로 ES에 엑세스 할 수 있다.

kibana를 재시작하고 웹브라우저를 열고 kibana 대시보드로 접속하자.

root@log-server:~# systemctl restart kibana

이제 인증을 위한 로그인 화면이 나오고 admin 계정인 elasic으로 로그인이 잘 된다.
난 elastic 유저를 사용하지 않으려고, orchard 계정을 만들어 superuser의 권한을 주었다.

이제 신나게 로그를 수집하고 시각화하자. EFK 구성 완료!

아 마지막으로 80 port to 5601 port 으로 REDIRECT 설정하였다.:

root@log-server:/etc/td-agent# vi /etc/sysctl.conf
net.ipv4.ip_forward=1

root@log-server:~# sysctl -p

root@log-server:~# iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 5601

root@log-server:~# cat /etc/network/interfaces
auto ens8
iface ens8 inet static
        address <secret>
        gateway <secret>
        post-up iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 5601
...

post-up 은 해당 인터페이스가 up 되고 난 후 커맨드가 수행된다.

Posted by 전광석

Filed under: OSS

Tags: EFK, elasticsearch, fluentd, kibana

iOrchard

Virtualization, Cloud Computing Expert Group