Ingesting Logs into OCI Logging Analytics (via Agent Based Deployment)

Logs are often voluminous can be challenging to navigate through, but it can be a gold mine of valuable data to help administrators troubleshoot and identify issues or trends for operational activities.

To overcome the burden of manually eye-balling millions or (even billions) of rows in log records, bringing that data into OCI Logging Analytics (which is part of the Observability & Manageability Portfolio) will allow administrators to get quick insights, to reduce the time to isolate issues, minimising downtime and prevent impact to end users.

OCI Logging Analytics leverages Machine Learning under the cover. You don’t need to be a data scientist, as all the correlation, clustering, anomaly modelling is built-in to the platform and ready to use. It also supports both Oracle and Heterogenous (Non-Oracle, Third Party) log sources. For details on the Out of box Logs Sources we support please see:

https://docs.oracle.com/en-us/iaas/logging-analytics/doc/oracle-defined-sources.html

NOTE:
We will continue to grow the Out of Box Log Parsers and Sources as demand grows. Alternatively if you have a bespoke or log parser or source we have not yet defined, you can create your own custom one.

For details please visit:
https://docs.oracle.com/en-us/iaas/logging-analytics/doc/administration-guide.html

To bring logs into OCI Logging Analytics, the Management Agent software OS owner “mgmt_agent” will need to have access to read the logs owned by another OS user (eg. oracle)

In this example I will show you how you can achieve this to enable log ingestion for the Oracle Database Alert Log Source into OCI Logging Analytics.

1 – PREREQUISITES

  1. Install Management Agent
  2. Deploy the Plug-in for Logging Analytics to Management Agent
  3. Ensure you have the correct IAM Policies set for Logging Analytics

2 – INSTALL AND VERIFY ACL PACKAGE

  1. Verify ACL package is installed

By default the diag directory for all oracle db instances is not readable to other users or groups

$ ls -ld /u01/app/database/diag/rdbms/*
drwxr-x---. 3 oracle oinstall 34 Nov 17  2020 /u01/app/database/diag/rdbms/db19c
drwxr-x---. 3 oracle oinstall 34 Oct 11  2019 /u01/app/database/diag/rdbms/emrep

3 – SET AND VERIFY FILE ACCESS PERMISSIONS USING SETFACL

  1. When using setfacl, you need to consider the appropriate permissions for your file access.

    Please see below on how you would use setfacl to configure file permissions to enable the mgmt_agent OS user to access the files for log ingestion.
  • The -R option: set recursive on subdirectories to inherit permission
  • The -m option: is to modify the access control list
  • For traversal access on subdirectory that don’t have read permission, you need:  “r-x” (apprarently “r” is not enough)
  • For new files generated you need to specify the default permission “d:u:mgmt._agent:r-x”
$ sudo setfacl -Rm u:mgmt_agent:r-x,d:u:mgmt_agent:r-x /u01/app/database/diag/rdbms/db19c

2. Verify that the appropriate permissions are set for mgmt_agent OS user to access files

$ sudo getfacl /u01/app/database/diag/rdbms/db19c
getfacl: Removing leading '/' from absolute path names
# file: u01/app/database/diag/rdbms/db19c
# owner: oracle
# group: oinstall
user::rwx
user:mgmt_agent:r-x
group::r-x
mask::r-x
other::---
default:user::rwx
default:user:mgmt_agent:r-x
default:group::r-x
default:mask::r-x
default:other::---

$ sudo getfacl /u01/app/database/diag/rdbms/db19c/db19c
getfacl: Removing leading '/' from absolute path names
# file: u01/app/database/diag/rdbms/db19c/db19c
# owner: oracle
# group: oinstall
user::rwx
user:mgmt_agent:r-x
group::r-x
mask::r-x
other::---
default:user::rwx
default:user:mgmt_agent:r-x
default:group::r-x
default:mask::r-x
default:other::---

$ sudo getfacl /u01/app/database/diag/rdbms/db19c/db19c/trace
getfacl: Removing leading '/' from absolute path names
# file: u01/app/database/diag/rdbms/db19c/db19c/trace
# owner: oracle
# group: oinstall
user::rwx
user:mgmt_agent:r-x
group::r-x
mask::r-x
other::---
default:user::rwx
default:user:mgmt_agent:r-x
default:group::r-x
default:mask::r-x
default:other::---

$ sudo getfacl /u01/app/database/diag/rdbms/db19c/db19c/trace/alert_*.log
getfacl: Removing leading '/' from absolute path names
# file: u01/app/database/diag/rdbms/db19c/db19c/trace/alert_db19c.log
# owner: oracle
# group: oinstall
user::rw-
user:mgmt_agent:r-x
group::r--
mask::r-x
other::---


3. Confirm that you can logon as mgmt_agent OS user and can access and read files.

$ sudo -u mgmt_agent /bin/bash

bash-4.2$ id
uid=985(mgmt_agent) gid=980(mgmt_agent) groups=980(mgmt_agent),1000(opc) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
bash-4.2$ ls -ld /u01/app/database/diag/rdbms/db19c
drwxr-x---+ 3 oracle oinstall 34 Nov 17  2020 /u01/app/database/diag/rdbms/db19c
bash-4.2$ ls -ld /u01/app/database/diag/rdbms/db19c/db19c
drwxr-x---+ 16 oracle oinstall 4096 Nov 17  2020 /u01/app/database/diag/rdbms/db19c/db19c
bash-4.2$ ls -ld /u01/app/database/diag/rdbms/db19c/db19c/trace
drwxr-x---+ 2 oracle oinstall 28672 Jul 13 05:00 /u01/app/database/diag/rdbms/db19c/db19c/trace
bash-4.2$ ls -ld /u01/app/database/diag/rdbms/db19c/db19c/trace/alert_*.log
-rw-r-x---+ 1 oracle oinstall 606027 Jul 13 13:48 
bash-4.2$ tail /u01/app/database/diag/rdbms/db19c/db19c/trace/alert_db19c.log
2021-07-13T13:15:25.412008+00:00
Thread 1 cannot allocate new log, sequence 79
Checkpoint not complete
  Current log# 3 seq# 78 mem# 0: /u01/app/database/oradata/DB19C/redo03.log
2021-07-13T13:16:27.199618+00:00
Thread 1 advanced to log sequence 79 (LGWR switch),  current SCN: 34375030
  Current log# 1 seq# 79 mem# 0: /u01/app/database/oradata/DB19C/redo01.log
2021-07-13T13:48:28.031231+00:00
Thread 1 advanced to log sequence 80 (LGWR switch),  current SCN: 34375038
  Current log# 2 seq# 80 mem# 0: /u01/app/database/oradata/DB19C/redo02.log

bash-4.2$ exit

$ sudo systemctl restart mgmt_agent
● mgmt_agent.service - mgmt_agent
   Loaded: loaded (/etc/systemd/system/mgmt_agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-07-13 13:54:24 GMT; 5min ago
  Process: 26388 ExecStop=/opt/oracle/mgmt_agent/agent_inst/bin/agentcore stop sysd (code=exited, status=0/SUCCESS)
  Process: 26508 ExecStart=/opt/oracle/mgmt_agent/agent_inst/bin/agentcore start sysd (code=exited, status=0/SUCCESS)
 Main PID: 26586 (wrapper)
    Tasks: 79
   Memory: 435.9M
   CGroup: /system.slice/mgmt_agent.service
           ├─26586 /opt/oracle/mgmt_agent/agent_inst/bin/./wrapper /opt/oracle/mgmt_agent/agent_inst/bin/../config/wrapper.conf wrapper.syslog.ident=mgmt_agent wrapper.pidfile=/opt/oracle/mgmt_a...
           └─26605 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.292.b10-1.el7_9.x86_64/jre/bin/java -Dorg.tanukisoftware.wrapper.WrapperSimpleApp.maxStartMainWait=5 -Djava.security.egd=file:///dev/./ur...

Jul 13 13:54:12 xxxdemo.domain.com systemd[1]: Starting mgmt_agent...
Jul 13 13:54:12 xxxdemo.domain.com agentcore[26508]: Starting mgmt_agent...
Jul 13 13:54:18 xxxdemo.domain.com agentcore[26508]: Waiting for mgmt_agent.........
Jul 13 13:54:24 xxxdemo.domain.com agentcore[26508]: .....running: PID:26586
Jul 13 13:54:24 xxxdemo.domain.com systemd[1]: Started mgmt_agent.

4 – CREATE ENTITY IN OCI LOGGING ANALTYICS

  1. In OCI Console, navigate to:

    OBSERVABILITY & MANAGEMENT > LOGGING ANALYTICS > ADMINISTRATION > CREATE ENTITY
  2. Complete the fields for creating entity and then click on “Create”:
  • Name (Name of the Database)
  • Management Agent Compartment (Compartment where your Agent is permitted to upload to)
  • Properties for the database configuration


3. Review the Entity has been created

5 – ASSOCIATE ENTITY WITH LOG SOURCE

  1. From the menu, choose “SOURCES”

(HINT: OBSERVABILITY & MANAGEMENT > LOGGING ANALYTICS > ADMINISTRATION > SOURCES)

2. Search for “Database Alert” and Drill into the “Database Alert Logs”

3. Click on the “Unassociated Entities

4. Check the box for your database entity and click on “Check Association”

5. Then choose the Log Group Compartment where you want to place the Logs that will be ingested and Specify the Log Group you want to store logs.
Then click on “Submit”

NOTE: If you don’t have a Log Group created, click “Create Log Group” to create.

6. Return to “Associated Entities” to confirm the Database Entity is now associated.


7. Wait until association is completed and the status show as “Success”

8. Navigate to the “Agent Collection Warnings” page and validate there are no errors or issue reported.

9. Navigate from the menu to Log Explorer

10.Change the time period to “Last 7 Days” and filter on the Log Source for “Database Alert Logs” to confirm logs are now getting ingested into OCI Logging Analytics.

Filter:
'Log Source' = 'Database Alert Logs' | stats count as logrecords by 'Log Source' | sort -logrecords

6 – UNCOVER POTENTIAL ISSUES IN LOGS

1. Navigate to:
Visualisations and click on the Cluster icon

2. The following will show you a group of common patterns grouped together known as “Clusters”. Now navigate to “Potential Issues” tab

3. The following Potential Issues are listed for you to resolve.

OCI Observability & Management Platform (O&M) – Agent Based Monitoring

There are various ways you can bring telemetry and operational data into OCI Observability & Management (O&M) to proactively monitor and gain operational insights into your IT fleet.

Example of ways you can do this are:

  • Service Connector Hub – Route and move data from one OCI service to Another OCI Service (eg. OCI Logging to Logging Analytics)
  • API Call – Collect data from files stored on Object Storage or Upload Log data on demand
  • Agent Based – Deployment of Agent on Host

If you have targets you want to monitor on-premise or in the cloud (OCI, AWS, Azure etc…) and you have access to the VM or Compute instance (ie. you can SSH or Remote Desktop to the host), then an Agent based method will allow you to collect and bring that data into unified platform in O&M.

In this example we will show how you can deploy Agent based method (on Linux OS) so you can leverage the O&M services including:

  • Logging Analytics
  • DB Management
  • Operations Insights
  • Java Management Service

1 – NETWORK COMMUNICATION (For External Targets to OCI)

NOTE: The additional network communication setup is not required if the targets you are monitoring are within your OCI tenancy account.

2 – ADDITIONAL PRE-REQUISITES

For Setup Compartments, IAM Groups and Policies

Please also check the following tasks has been completed.
https://docs.oracle.com/en-us/iaas/management-agents/doc/perform-prerequisites-deploying-management-agents.html

NOTE: You may need to contact your OCI administrator to grant you the appropriate permissions.

3 – DOWNLOAD AND CREATE KEY

  1. From OCI Console navigate to:

OBSERVABILITY & MANAGEMENT > MANAGEMENT AGENTS > DOWNLOADS AND KEYS > CREATE KEY

2. Specify details and Click on CREATE

  • Key Name (eg. oci-reg-key)
  • Compartment (eg. shared_resources)

3. Review Key and Download Key to File (eg. oci-reg-key.txt)

NOTE: Your Key File will be in the format of <Key Name>.txt. Copy it to your target host.

4. Download Agent by clicking on the Agent for your OS (eg. Agent for LINUX) and copy to your target host

Alternatively you can download the agent file using wget:
wget https://objectstorage.<oci-region>.oraclecloud.com/n/idtskf8cjzhp/b/installer/o/Linux-x86_64/latest/oracle.mgmt_agent.rpm 

Example:
wget https://objectstorage.ap-sydney-1.oraclecloud.com/n/idtskf8cjzhp/b/installer/o/Linux-x86_64/latest/oracle.mgmt_agent.rpm 

4 – INSTALL AGENT

1. Login to the host and locate the downloaded agent file oracle.mgmt_agent.rpm

$ sudo rpm -ivh oracle.mgmt_agent.rpm
Preparing...                          ################################# [100%]
Checking pre-requisites
        Checking if any previous agent service exists
        Checking if OS has systemd or initd
        Checking available disk space for agent install
        Checking if /opt/oracle/mgmt_agent directory exists
        Checking if 'mgmt_agent' user exists
        Checking Java version
                JAVA_HOME is not set or not readable to root
                Trying default path /usr/bin/java
                Java version: 1.8.0_271 found at /usr/bin/java
Updating / installing...
   1:oracle.mgmt_agent-201113.1621-1  ################################# [100%]

Executing install
        Unpacking software zip
        Copying files to destination dir (/opt/oracle/mgmt_agent)
        Initializing software from template
        Creating 'mgmt_agent' daemon
        Agent Install Logs: /opt/oracle/mgmt_agent/installer-logs/installer.log.0

        Setup agent using input response file (run as any user with 'sudo' privileges)
        Usage:
                sudo /opt/oracle/mgmt_agent/agent_inst/bin/setup.sh opts=[FULL_PATH_TO_INPUT.RSP]

Agent install successful


2. Verify that the agent has been installed.

$ rpm -qa|grep mgmt_agent
oracle.mgmt_agent-201113.1621-1.x86_64

3. Copy the Downloaded key file (eg. oci-reg-key.txt)

$ cp oci-demo-key.txt /tmp/input.rsp
$ chmod 755 /tmp/input.rsp

4. Update the parameter CredentialWalletPassword with your own password in the input.rsp file and then save file.

CredentialWalletPassword = YourP8ssW0rd123!

5. Then execute the setup script to install the agent

$ sudo /opt/oracle/mgmt_agent/agent_inst/bin/setup.sh opts=/tmp/input.rsp

6. When completed, check status of agent on host

For Oracle Linux 6: sudo /sbin/initctl status mgmt_agent
For Oracle Linux 7 or later: sudo systemctl status mgmt_agent

$ sudo systemctl status mgmt_agent
● mgmt_agent.service - mgmt_agent
   Loaded: loaded (/etc/systemd/system/mgmt_agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2020-12-03 05:20:43 GMT; 6min ago
  Process: 3072 ExecStart=/opt/oracle/mgmt_agent/agent_inst/bin/agentcore start sysd (code=exited, status=0/SUCCESS)
 Main PID: 3148 (wrapper)
   Memory: 248.5M
   CGroup: /system.slice/mgmt_agent.service
           ├─3148 /opt/oracle/mgmt_agent/agent_inst/bin/./wrapper /opt/oracle/mgmt_agent/agent_inst/bin/../config/wrapper.conf wrapper.syslog.ident=mgmt_agent wrapper.pidfile=/opt/oracle/mgmt_agent/agent_inst/bin/../log/mgmt_agent.pid wrapper.daemonize=TRU...
           └─3163 /usr/java/jre1.8.0_271-amd64/bin/java -Dorg.tanukisoftware.wrapper.WrapperSimpleApp.maxStartMainWait=5 -Djava.security.egd=file:///dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -Xmx512m -Djava.library.path=../../201113.1621/lib -classpath...

Dec 03 05:20:31 oma-host systemd[1]: Starting mgmt_agent...
Dec 03 05:20:31 oma-host agentcore[3072]: Starting mgmt_agent...
Dec 03 05:20:38 oma-host agentcore[3072]: Waiting for mgmt_agent.........
Dec 03 05:20:43 oma-host systemd[1]: Started mgmt_agent.

5 – VERIFY AGENT IN CONSOLE AND DEPLOY PLUGIN

  1. In OCI Console, navigate to:
    OBSERVABILITY & MANAGEMENT > MANAGEMENT AGENTS > AGENTS

    Then click on the link to drill into the Agent (eg. Agent (snoopy))

2. Click on the Deploy Plug-Ins button

3. Choose the Plug-ins to deploy for your agent.

NOTE: If the plug-in is greyed out, then the plug-in is already enabled.

Now you should be ready to configure your service for:

For further details please visit:
https://docs.oracle.com/en-us/iaas/Content/services.htm

#DaysOfArm (14 of X)

This is my 14th #DaysOfArm article that tracks some of the experiences that I’ve had so far. And just to recap from the first post (here) on June 12 2021.

It’s been just over 2 weeks since the launch of Ampere Arm deployed in Oracle Cloud Infrastructure (OCI). Check this article out to learn more (here). And it’s been about one week since I started looking into the new architecture and deployment, since I started provisioning the VM.Standard.A1.Flex Compute Shape on OCI and since I started migrating a specific application that has many different variations to it to test it all out.

This is my next learning where I’ve deployed successfully openrouteservice – an open-source routing / direction API all deployed on an 4 OCPU with 24 GB of RAM in an Always Free Tier tenancy.

Continue reading “#DaysOfArm (14 of X)”

#DaysOfArm (13 of X)

This is my 13th #DaysOfArm article that tracks some of the experiences that I’ve had so far. And just to recap from the first post (here) on June 12 2021.

It’s been just over 2 weeks since the launch of Ampere Arm deployed in Oracle Cloud Infrastructure (OCI). Check this article out to learn more (here). And it’s been about one week since I started looking into the new architecture and deployment, since I started provisioning the VM.Standard.A1.Flex Compute Shape on OCI and since I started migrating a specific application that has many different variations to it to test it all out.

This is my next learning is another retrospective with the OCI Arcade deployment the full stack is now being deployed on 1 OCPU with 6 GB of RAM in an Always Free Tier tenancy.

Continue reading “#DaysOfArm (13 of X)”

#DaysOfArm (12 of X)

This is my 12th #DaysOfArm article that tracks some of the experiences that I’ve had so far. And just to recap from the first post (here) on June 12 2021.

It’s been just over 2 weeks since the launch of Ampere Arm deployed in Oracle Cloud Infrastructure (OCI). Check this article out to learn more (here). And it’s been about one week since I started looking into the new architecture and deployment, since I started provisioning the VM.Standard.A1.Flex Compute Shape on OCI and since I started migrating a specific application that has many different variations to it to test it all out.

This is my next learning where I’ve deployed successfully Pelias – an open-source geocode API all deployed on an 4 OCPU with 24 GB of RAM in an Always Free Tier tenancy.

Continue reading “#DaysOfArm (12 of X)”

#DaysOfArm (11 of X)

This is my 11th #DaysOfArm article that tracks some of the experiences that I’ve had so far. And just to recap from the first post (here) on June 12 2021.

It’s been just over 2 weeks since the launch of Ampere Arm deployed in Oracle Cloud Infrastructure (OCI). Check this article out to learn more (here). And it’s been about one week since I started looking into the new architecture and deployment, since I started provisioning the VM.Standard.A1.Flex Compute Shape on OCI and since I started migrating a specific application that has many different variations to it to test it all out.

This is my next learning which I focuses on Arm’s availability in our cloud.

Continue reading “#DaysOfArm (11 of X)”

Using OCI Burstable Instance

With the work that I’ve been doing with Open Street Map (here), I’ve been provisioning Pelias (here) – an open-source implementation of geocoding. This architecture is not small (consisting of 10+ docker images, and potentially 100+GB of raw geo data) especially if you are looking to geocode the whole world. The workload (or pipeline) had 4 main stages – download, prepare, import and query.

  • Download – to get the raw data sources
  • Prepare – to get the raw data into a format that can be easily imported
  • Import – to import the data into the elastic search (which is the backend)
  • Query – to accept geocode queries

Each of these stages have different performance characteristics and required different resources. The main thing that I’m looking at here is the use of compute. The need for compute during the prepare and import stages is significantly different from the download and query stages. I’m also not confidently in terms of when or how much I need.

And this is why I configured a burstable instance.

Here’s a couple of things to know …

  • There is a baseline utilisation OCPU. Consider this as a the minimum compute you want. For my scenario, it was primarily how much compute that I needed for the download and query stages.
  • There is full utilisation OCPU. Where this is can be 2x or 8x the baseline utilisation. (in the terms of the documentation – the baseline utilisation can be either 12.5% or 50% of the full utilisation OCPU). For my scenario, it was primarily the prepare and import stages that needed the additional compute.
  • The increased capacity is based upon the CPU utilisation metrics to determine whether to burst.
  • The average CPU utilisation for the month needs to up to the baseline utilisation OCPU.

Burstable Instances billing is known. It doesn’t come with Bill Shock.

You can find out more about Oracle Cloud Infrastructure burstable instances (here). If you want to try this out yourself or work on your own application, sign-up (here) for the free Oracle Cloud Trial. I’d be interested to hear your experiences and learn from others as well. Leave a comment or contact me at jason.lowe@oracle.com if you want to collaborate.

Using OCI Bastions with PuTTY

Recently, Oracle rolled out the OCI Bastions service, which is designed to simplify the process of accessing instances which do not have a public IP address. They are really easy to use, with simple commands to allow access to these internal hosts… if you are using a Unix shell. Unfortunately I suffer from being quite wedded to various tools, and as a Windows user, I tend to use PuTTY to access hosts via SSH, so this blog post will detail both the OCI Bastion service in a little more detail, as well as how I continued to resist changing my old habits, and set up connections using the OCI Bastion service using a number of components of the PuTTY suite of tools.

Continue reading “Using OCI Bastions with PuTTY”

#DaysOfArm (10 of X)

This is my tenth #DaysOfArm article that tracks some of the experiences that I’ve had so far. And just to recap from the first post (here) on June 12 2021.

It’s been just over 2 weeks since the launch of Ampere Arm deployed in Oracle Cloud Infrastructure (OCI). Check this article out to learn more (here). And it’s been about one week since I started looking into the new architecture and deployment, since I started provisioning the VM.Standard.A1.Flex Compute Shape on OCI and since I started migrating a specific application that has many different variations to it to test it all out.

This is my next learning which I focuses on something deeper into the hardware stack – vectors.

Continue reading “#DaysOfArm (10 of X)”

#DaysOfArm (9 of X)

This is my ninth #DaysOfArm article that tracks some of the experiences that I’ve had so far. And just to recap from the first post (here) on June 12 2021.

It’s been just over 2 weeks since the launch of Ampere Arm deployed in Oracle Cloud Infrastructure (OCI). Check this article out to learn more (here). And it’s been about one week since I started looking into the new architecture and deployment, since I started provisioning the VM.Standard.A1.Flex Compute Shape on OCI and since I started migrating a specific application that has many different variations to it to test it all out.

This is my next learning which I focuses on NodeJS and Python.

Continue reading “#DaysOfArm (9 of X)”