Message Title

Ali Ok edited a comment on

h2. Enabling centralized logging on OpenShift

h4. How

There's an Ansible playbook in openshift-ansible repository that takes enables logging.
https://github.com/openshift/openshift-ansible

h4. Minishift
It is not possible to enable it on a cluster setup using Minishift.

* Running Ansible locally and targeting Minishift cluster is very problematic (needs RHEL on local machine, mising tools, assumptions about the local machine don't work)
* Running Ansible within the Minishift VM is also not possible because that VM lacks lots of tools (Ansible, Pythong, Git etc.) and it is super hard to install them

h4. oc cluster up
Not possible either.

Targeting the local cluster with Ansible sounds good, but it ends up in similar problems to Minishift one.

.h4 Remote cluster that's created using {{cluster_create}} pipeline

I was able to enable logging on "apb-testing" cluster Pavel gave me. Access to this cluster requires Red Hat VPN.

Steps:

1. SSH into master node:
{code}
ssh hadmin@apb-testing.skunkhenry.com
{code}
Your key should be already in authorized keys for hadmin user.
OpenShift console for this cluster is at https://apb-testing.skunkhenry.com:8443/console

2. Clone openshift-ansible
{code}
git clone https://github.com/openshift/openshift-ansible.git
{code}

3. Checkout the relevant tag of that repo:
{code}
# find the correct one from https://github.com/openshift/openshift-ansible/releases
# that matches the OpenShift version you get using 'oc version' command
git checkout openshift-ansible-3.9.33-1
{code}

4. There was a bug in that version of the Ansible role. It is fixed in master. Make that update manually according to https://github.com/openshift/openshift-ansible/blob/1cb319e8030961f77d751f4be115fe5ddba89bda/roles/openshift_logging_elasticsearch/handlers/main.yml#L8

5. Login with system admin
{code}
oc login -u system:admin
{code}

6. Enable logging
{code}
ansible-playbook -i /home/hadmin/.config/openshift/hosts ./playbooks/openshift-logging/config.yml -e openshift_logging_install_logging=true -e openshift_logging_es_allow_external=True -e openshift_logging_es_hostname=elasticsearch.example.com
{code}

The cluster is created initially with cluster_create pipeline and that pipeline stores Ansible inventory in the master node at /home/hadmin/.config/openshift/hosts

7. Wait until Ansible playbook completes and all pods are up on OpenShift's {{logging}} project

8. Update the route in {{logging}} project for ElasticSearch. Change it using the UI from elasticsearch.example.com to something you like such as es.apb-testing.skunkhenry.com

h2. Disabling OpenShift Centralized logging
{code}
ansible-playbook -i /home/hadmin/.config/openshift/hosts ./playbooks/openshift-logging/config.yml -e openshift_logging_install_logging=false
{code}

h2. How it works?

* A Fluentd instance is created per node. Using a DaemonSet with a node selector.
* Fluentd collects logs from all pods and sends them to ElasticSearch.
* Logs are pushed to different indices:
** Operation logs: pushed using "operation.*" indices. These are Kubernetes infra logs like container creation, deployments, project creation etc.
** Project logs: pushed using "project.*" indices. These are logs from the user pods. Like, audit logs of sync service.

We are interested in project logs in our use cases.

Project logs are pushed to indices that have the project name in their name.
For example, "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20". The format is "project.<project name>.<project uid>.<yyyy.mm.dd>".
This means, all logs from all pods within a single project would go to the same index.
We do have what pod it is, or what service it is though in the document itself.

Sample document:
{code:json}

  "_index": "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20",
  "_type": "com.redhat.viaq.common",
  "_id": "MmNhZTZlZTctZWU5ZS00YzFkLWJjNDQtNjQwYmVhZjc3OTFh",
  "_score": null,
  "_source": {
    "level": "30",
    "msg": "request completed",
    "pid": 19,
    "hostname": "172.16.72.62",
    "req": {
      "id": 8453,
      "method": "GET",
      "url": "/healthz",
      "headers": {
        "host": "10.128.1.223:8000",
        "user-agent": "kube-probe/1.9",
        "accept-encoding": "gzip",
        "connection": "close"
      },
      "remoteAddress": "::ffff:10.128.0.1",
      "remotePort": 52172
    },
    "res": {
      "statusCode": 200,
      "header": "HTTP/1.1 200 OK\r\nX-Powered-By: Express\r\nAccess-Control-Allow-Origin: *\r\nContent-Type: application/json; charset=utf-8\r\nContent-Length: 53\r\nETag: W/\"35-3EKymgknC0UZgUjhN7E3BXc98h8\"\r\nDate: Thu, 20 Sep 2018 10:30:32 GMT\r\nConnection: close\r\n\r\n"
    },
    "responseTime": 6,
    "v": 1,
    "docker": {
      "container_id": "dffd0934be27113027208ddf4aed233b162fedb3bc758f6cd8956980aa90982f"
    },
    "kubernetes": {
      "container_name": "data-sync-server",
      "namespace_name": "datasync",
      "pod_name": "data-sync-server-2-2nk7p",
      "pod_id": "df4a7a42-bbfc-11e8-87ae-fa163e4c9c9e",
      "labels": {
        "app": "data-sync",
        "deployment": "data-sync-server-2",
        "deploymentconfig": "data-sync-server",
        "service": "data-sync-server"
      },
      "host": "172.16.72.62",
      "master_url": "https://kubernetes.default.svc.cluster.local",
      "namespace_id": "c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e"
    },
    "message": "{\"level\":30,\"time\":1537439432240,\"msg\":\"request completed\",\"pid\":19,\"hostname\":\"data-sync-server-2-2nk7p\",\"req\":{\"id\":8453,\"method\":\"GET\",\"url\":\"/healthz\",\"headers\":{\"host\":\"10.128.1.223:8000\",\"user-agent\":\"kube-probe/1.9\",\"accept-encoding\":\"gzip\",\"connection\":\"close\"},\"remoteAddress\":\"::ffff:10.128.0.1\",\"remotePort\":52172},\"res\":{\"statusCode\":200,\"header\":\"HTTP/1.1 200 OK\\r\\nX-Powered-By: Express\\r\\nAccess-Control-Allow-Origin: *\\r\\nContent-Type: application/json; charset=utf-8\\r\\nContent-Length: 53\\r\\nETag: W/\\\"35-3EKymgknC0UZgUjhN7E3BXc98h8\\\"\\r\\nDate: Thu, 20 Sep 2018 10:30:32 GMT\\r\\nConnection: close\\r\\n\\r\\n\"},\"responseTime\":6,\"v\":1}\n",
    "pipeline_metadata": {
      "collector": {
        "ipaddr4": "10.128.0.7",
        "ipaddr6": "fe80::e06d:e7ff:fec1:3d8c",
        "inputname": "fluent-plugin-systemd",
        "name": "fluentd",
        "received_at": "2018-09-20T10:30:32.801089+00:00",
        "version": "0.12.42 1.6.0"
      }
    },
    "@timestamp": "2018-09-20T10:30:32.240454+00:00",
    "viaq_msg_id": "MmNhZTZlZTctZWU5ZS00YzFkLWJjNDQtNjQwYmVhZjc3OTFh"
  },
  "fields": {
    "pipeline_metadata.collector.received_at": [
      1537439432801
    ],
    "@timestamp": [
      1537439432240
    ]
  },
  "sort": [
    1537439432240
  ]
}
{code}

h2. Problem: logs from pods are treated as strings, even when they're JSON

By default, OpenShift logging is setup in a way that causes ElasticSearch to treat log messages as strings (https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L46).
But, in our case in sync audit logs we log messages as JSON and we want them being treated as objects. So that we can do search/aggregation on the properties.

There are 2 possible places to change this behavior.

h4. Fluentd

Well, not really possible here:

* There's a default index template here that matches every index: https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L46
* So, there needs to be some change in ElasticSearch anyway.
* Plus, Fluentd config is set up in a complex way: https://github.com/openshift/openshift-ansible/blob/f1ae5deec6f9f5b6e6f63e88b2d5682ea40234c6/roles/openshift_logging_fluentd/templates/fluent.conf.j2#L42

h4. ElasticSearch

We can change the ElasticSearch default index template set up here:  https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json
However we can't merge that in that repo. Thus, we can simply call the ElasticSearch API to override the default index.

ElasticSearch supports multiple index templates and the index template we create will only override the "message" field related mapping (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/indices-templates.html#multiple-templates).

First, login:
{code}
oc login
# with user "admin"
token=$(oc whoami -t)
{code}

Check if our named template exists already:
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.apb-testing.skunkhenry.com/_template/aerogear_data_sync_log_template
{code}

Create or update the index template:
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" -XPUT https://es.apb-testing.skunkhenry.com/_template/aerogear_data_sync_log_template -d '
{
    "template" : "project.datasync.*",
    "order" : 100,
    "dynamic_templates": [
      {
        "message_field": {
          "mapping": {
            "type": "object"
          },
          "match": "message"
        }
      }
    ],
    "mappings": {
        "message": {
            "enabled": true,
            "properties": {
                "tag": {"type": "string", "index": "not_analyzed"},
                "requestId": {"type": "integer"},
                "operationType": {"type": "string", "index": "not_analyzed"},
                "parentTypeName": {"type": "string", "index": "not_analyzed"},
                "path": {"type": "string", "index": "analyzed"},
                "success": {"type": "boolean"},
                "dataSourceType": {"type": "string", "index": "not_analyzed"}
            }
        }
    }
}
'
{code}

Here,
* {{"template" : "project.datasync.*"}} says this template will only match the indices for that pattern
* {{"properties": {...}}} defines mapping for the relevant fields in the message data. This is required. Fields that are not defined here won't be available as separate fields on the ElasticSearch document.
* {{"order" : 100}} tells ElasticSearch to use this after the template defined by OpenShift logging, which has an order of 10 (https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L936)

Delete the index template (if necessary):
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" -XDELETE https://es.apb-testing.skunkhenry.com/_template/aerogear_data_sync_log_template
{code}

h4. How index template works

After we create the index template above, when a document comes ElasticSearch will do the mappings for all fields except "message" according the other default template.
"message" field will be mapped using the mapping we overrode.

But, there's a problem: changing ElasticSearch index templates won't affect the mapping definitions of existing indices. Only the new indices will use our template.
So, even though a new document that is written to "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20" index matches, it won't use our new mapping.
The solution is to wait until midnight. Fluentd changes the index name nightly and the new index will use the mapping we defined.

If we would like to update mappings for the existing indices ------ TB documented ? reindex overwrite/copy

Problem:
- Stackoverflow problem

Add Comment

This message was sent by Atlassian JIRA