Message Title

Ali Ok edited a comment on

h2. Enabling centralized logging on OpenShift

h4. How

There's an Ansible playbook in openshift-ansible repository that takes enables logging.
https://github.com/openshift/openshift-ansible

h4. Minishift
It is not possible to enable it on a cluster setup using Minishift.

* Running Ansible locally and targeting Minishift cluster is very problematic (needs RHEL on local machine, mising tools, assumptions about the local machine don't work)
* Running Ansible within the Minishift VM is also not possible because that VM lacks lots of tools (Ansible, Pythong, Git etc.) and it is super hard to install them

h4. oc cluster up
Not possible either.

Targeting the local cluster with Ansible sounds good, but it ends up in similar problems to Minishift one.

.h4 Remote cluster that's created using {{cluster_create}} pipeline

I was able to enable logging on "apb-testing" cluster Pavel gave me. Access to this cluster requires Red Hat VPN.

Steps:

1. SSH into master node:
{code}
ssh hadmin@apb-testing.skunkhenry.com
{code}
Your key should be already in authorized keys for hadmin user.
OpenShift console for this cluster is at https://apb-testing.skunkhenry.com:8443/console

2. Clone openshift-ansible
{code}
git clone https://github.com/openshift/openshift-ansible.git
{code}

3. Checkout the relevant tag of that repo:
{code}
# find the correct one from https://github.com/openshift/openshift-ansible/releases
# that matches the OpenShift version you get using 'oc version' command
git checkout openshift-ansible-3.9.33-1
{code}

4. There was a bug in that version of the Ansible role. It is fixed in master. Make that update manually according to https://github.com/openshift/openshift-ansible/blob/1cb319e8030961f77d751f4be115fe5ddba89bda/roles/openshift_logging_elasticsearch/handlers/main.yml#L8

5. Login with system admin
{code}
oc login -u system:admin
{code}

6. Enable logging
{code}
ansible-playbook -i /home/hadmin/.config/openshift/hosts ./playbooks/openshift-logging/config.yml -e openshift_logging_install_logging=true -e openshift_logging_es_allow_external=True -e openshift_logging_es_hostname=elasticsearch.example.com
{code}

The cluster is created initially with cluster_create pipeline and that pipeline stores Ansible inventory in the master node at /home/hadmin/.config/openshift/hosts

7. Wait until Ansible playbook completes and all pods are up on OpenShift's {{logging}} project

8. Update the route in {{logging}} project for ElasticSearch. Change it using the UI from elasticsearch.example.com to something you like such as es.apb-testing.skunkhenry.com

h2. Disabling OpenShift Centralized logging
{code}
ansible-playbook -i /home/hadmin/.config/openshift/hosts ./playbooks/openshift-logging/config.yml -e openshift_logging_install_logging=false
{code}

h2. How it works?

* A Fluentd instance is created per node. Using a DaemonSet with a node selector.
* Fluentd collects logs from all pods and sends them to ElasticSearch.
* Logs are pushed to different indices:
** Operation logs: pushed using "operation.*" indices. These are Kubernetes infra logs like container creation, deployments, project creation etc.
** Project logs: pushed using "project.*" indices. These are logs from the user pods. Like, audit logs of sync service.

We are interested in project logs in our use cases.

Project logs are pushed to indices that have the project name in their name.
For example, "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20". The format is "project.<project name>.<project uid>.<yyyy.mm.dd>".
This means, all logs from all pods within a single project would go to the same index.
We do have what pod it is, or what service it is though in the document itself.

Sample document:
{code:json}

  "_index": "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20",
  "_type": "com.redhat.viaq.common",
  "_id": "MmNhZTZlZTctZWU5ZS00YzFkLWJjNDQtNjQwYmVhZjc3OTFh",
  "_score": null,
  "_source": {
    "level": "30",
    "msg": "request completed",
    "pid": 19,
    "hostname": "172.16.72.62",
    "req": {
      "id": 8453,
      "method": "GET",
      "url": "/healthz",
      "headers": {
        "host": "10.128.1.223:8000",
        "user-agent": "kube-probe/1.9",
        "accept-encoding": "gzip",
        "connection": "close"
      },
      "remoteAddress": "::ffff:10.128.0.1",
      "remotePort": 52172
    },
    "res": {
      "statusCode": 200,
      "header": "HTTP/1.1 200 OK\r\nX-Powered-By: Express\r\nAccess-Control-Allow-Origin: *\r\nContent-Type: application/json; charset=utf-8\r\nContent-Length: 53\r\nETag: W/\"35-3EKymgknC0UZgUjhN7E3BXc98h8\"\r\nDate: Thu, 20 Sep 2018 10:30:32 GMT\r\nConnection: close\r\n\r\n"
    },
    "responseTime": 6,
    "v": 1,
    "docker": {
      "container_id": "dffd0934be27113027208ddf4aed233b162fedb3bc758f6cd8956980aa90982f"
    },
    "kubernetes": {
      "container_name": "data-sync-server",
      "namespace_name": "datasync",
      "pod_name": "data-sync-server-2-2nk7p",
      "pod_id": "df4a7a42-bbfc-11e8-87ae-fa163e4c9c9e",
      "labels": {
        "app": "data-sync",
        "deployment": "data-sync-server-2",
        "deploymentconfig": "data-sync-server",
        "service": "data-sync-server"
      },
      "host": "172.16.72.62",
      "master_url": "https://kubernetes.default.svc.cluster.local",
      "namespace_id": "c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e"
    },
    "message": "{\"level\":30,\"time\":1537439432240,\"msg\":\"request completed\",\"pid\":19,\"hostname\":\"data-sync-server-2-2nk7p\",\"req\":{\"id\":8453,\"method\":\"GET\",\"url\":\"/healthz\",\"headers\":{\"host\":\"10.128.1.223:8000\",\"user-agent\":\"kube-probe/1.9\",\"accept-encoding\":\"gzip\",\"connection\":\"close\"},\"remoteAddress\":\"::ffff:10.128.0.1\",\"remotePort\":52172},\"res\":{\"statusCode\":200,\"header\":\"HTTP/1.1 200 OK\\r\\nX-Powered-By: Express\\r\\nAccess-Control-Allow-Origin: *\\r\\nContent-Type: application/json; charset=utf-8\\r\\nContent-Length: 53\\r\\nETag: W/\\\"35-3EKymgknC0UZgUjhN7E3BXc98h8\\\"\\r\\nDate: Thu, 20 Sep 2018 10:30:32 GMT\\r\\nConnection: close\\r\\n\\r\\n\"},\"responseTime\":6,\"v\":1}\n",
    "pipeline_metadata": {
      "collector": {
        "ipaddr4": "10.128.0.7",
        "ipaddr6": "fe80::e06d:e7ff:fec1:3d8c",
        "inputname": "fluent-plugin-systemd",
        "name": "fluentd",
        "received_at": "2018-09-20T10:30:32.801089+00:00",
        "version": "0.12.42 1.6.0"
      }
    },
    "@timestamp": "2018-09-20T10:30:32.240454+00:00",
    "viaq_msg_id": "MmNhZTZlZTctZWU5ZS00YzFkLWJjNDQtNjQwYmVhZjc3OTFh"
  },
  "fields": {
    "pipeline_metadata.collector.received_at": [
      1537439432801
    ],
    "@timestamp": [
      1537439432240
    ]
  },
  "sort": [
    1537439432240
  ]
}
{code}

h2. Problem: logs from pods are treated as strings, even when they're JSON

By default, OpenShift logging is setup in a way that causes ElasticSearch to treat log messages as strings (https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L46).
But, in our case in sync audit logs we log messages as JSON and we want them being treated as objects. So that we can do search/aggregation on the properties.

There are 2 possible places to change this behavior.

h4. Fluentd

Well, not really possible here:

* There's a default index template here that matches every index: https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L46
* So, there needs to be some change in ElasticSearch anyway.
* Plus, Fluentd config is set up in a complex way: https://github.com/openshift/openshift-ansible/blob/f1ae5deec6f9f5b6e6f63e88b2d5682ea40234c6/roles/openshift_logging_fluentd/templates/fluent.conf.j2#L42

h4. ElasticSearch

We can change the ElasticSearch default index template set up here:  https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json
However we can't merge that in that repo. Thus, we can simply call the ElasticSearch API to override the default index.

ElasticSearch supports multiple index templates and the index template we create will only override the "message" field related mapping (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/indices-templates.html#multiple-templates).

First, login:
{code}
oc login
# with user "admin"
token=$(oc whoami -t)
{code}

Check if our named template exists already:
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.apb-testing.skunkhenry.com/_template/aerogear_data_sync_log_template
{code}

Create or update the index template:
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" -XPUT https://es.apb-testing.skunkhenry.com/_template/aerogear_data_sync_log_template -d '
{
    "template" : "project.datasync.*",
    "order" : 100,
    "dynamic_templates": [
      {
        "message_field": {
          "mapping": {
            "type": "object"
          },
          "match": "message"
        }
      }
    ],
    "mappings": {
        "message": {
            "enabled": true,
            "properties": {
                "tag": {"type": "string", "index": "not_analyzed"},
                "requestId": {"type": "integer"},
                "operationType": {"type": "string", "index": "not_analyzed"},
                "parentTypeName": {"type": "string", "index": "not_analyzed"},
                "path": {"type": "string", "index": "analyzed"},
                "success": {"type": "boolean"},
                "dataSourceType": {"type": "string", "index": "not_analyzed"}
            }
        }
    }
}
'
{code}

Here,
* {{"template" : "project.datasync.*"}} says this template will only match the indices for that pattern
* {{"properties": { ...}} } defines mapping for the relevant fields in the message data. This is required. Fields that are not defined here won't be available as separate fields on the ElasticSearch document.
* {{"order" : 100}} tells ElasticSearch to use this after the template defined by OpenShift logging, which has an order of 10 (https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json#L936)

Delete the index template (if necessary):
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" -XDELETE https://es.apb-testing.skunkhenry.com/_template/aerogear_data_sync_log_template
{code}

h4. How index template works

After we create the index template above, when a document comes ElasticSearch will do the mappings for all fields except "message" according the other default template.
"message" field will be mapped using the mapping we overrode.

But, there's a problem: changing ElasticSearch index templates won't affect the mapping definitions of existing indices. Only the new indices will use our template.
So, even though a new document that is written to "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20" index matches, it won't use our new mapping.
The solution is to wait until midnight. Fluentd changes the index name nightly and the new index will use the mapping we defined.

If we would like to update mappings for the existing indices , we need to reindex the existing index into a new index. This is because ElasticSearch doesn't allow mapping changes on indices.

However, Fluentd will still use the old index name and there's no way I could find to tell Fluentd to use a new index name (without a big hassle). It simply uses project.<OpenShift project name>.<OpenShift project UID>.<yyyymmdd> and since the project name and the UID is unchanged it will use the same index name.

This is an OK scenario when the project is to be created after enabling OpenShift logging and doing the ElasticSearch index template override.

For the existing projects, we should just wait until midnight. Previous indices can also be converted to using the new mapping as no new data will come to those previous indices.
The procedure here would be to reindex previous indices into new indices with a similar name and then deleting the old indices. Kibana works with index patterns (like "project.datasync.*") that's why the new index names should be similar old ones and should match the index pattern.

h4. Problem: index template too narrow/wide

In the index template above, we define a pattern to match indices: {{"template" : "project.datasync.*"}}

If this is too wide like {{"template" : "project."}}, it will match all indices from all projects/pods and will tell them to apply "message" mapping. This is not ideal.
I can't be 100% sure of the consequences of this situation but here are some guesses:
* I think ElasticSearch is smart enough to not do any mapping if the "message" is not JSON. But this might cause problems eventually. I haven't tried this situation yet with non - JSON log messages.
* If the error log is JSON anyway but it is irrelevant for us (say, another project from 3rd party) we would make ElasticSearch parse that JSON even though it might not be needed.
* Sparsely indexed fields that have values in some documents (e.g. "operationType" field) and not in other documents are not good for performance.

On the other hand, if the match pattern is too narrow like {{"template" : "project.datasync.*"}}, there's no guarantee that the index name will match that pattern.
As described above (way above), Fluentd is configured in a way that uses project name as the index name. And, the project name is completely up to the user provisioning the sync service.
If a user has to create our ElasticSearch index template manually, we could provide instructions to use the same name.
If we're going to automate this, we need to do it in a smart way so that the project name can be fetched and used in the ElasticSearch index template.

I asked this SO question to see if it is possible to make index template matching documents not indices and by a field value (e.g. we always have {{"service": "data - sync - server"}} in the documents from sync audit logs) : https://stackoverflow.com/questions/52424902/elasticsearch - template - matcing - TB documented based-on-field-value

h4. Additional operations:

Getting mappings for an index:
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.apb-testing.skunkhenry.com/project.datasync.*/_mapping
# OR, more specific
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.apb-testing.skunkhenry.com/project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20/_mapping
{code}

Reindexing:
{code}
curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" -XPOST https://es.apb-testing.skunkhenry.com/_reindex ? reindex overwrite pretty -d '
{
  "source": {
    "index": "project.datasync.c360bfe5-bbfb-11e8-87ae-fa163e4c9c9e.2018.09.20"
  },
  "dest": {
    "index": "project.datasync.ali09"
  }
}
'
{code}

Kill Fluentd instance:
{code}
oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'
{code}

Restart Fluentd instance:
{code}
oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'
{code}

h2. References

- https: / copy /docs.openshift.com/container-platform/3.9/install_config/aggregate_logging.html
- https://developers.redhat.com/blog/2018/01/22/openshift-structured-application-logs/
Problem - https : //github.com/openshift/openshift-ansible/blob/f1ae5deec6f9f5b6e6f63e88b2d5682ea40234c6/roles/openshift_logging_elasticsearch/templates/elasticsearch.yml.j2#L26
- Stackoverflow problem https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/index_templates/com.redhat.viaq-openshift-project.template.json
- https://www.elastic.co/guide/en/elasticsearch/reference/2.4/indices-templates.html#multiple-templates
- https://stackoverflow.com/questions/52424902/elasticsearch-template-matcing-based-on-field-value

Add Comment

This message was sent by Atlassian JIRA