Simple, Secure Log Retention using OCI Services

Between the Oracle Cloud Infrastructure (OCI) Audit Service recording administrative operations and Oracle Identity Cloud Service (IDCS) tracking access and user management events, Oracle Cloud provides pretty comprehensive tracking of security events. Recently however, I have been in conversations with a number of organisations who have been seeking longer term retention of audit events for several different reasons, including governance, compliance and forensic requirements. The OCI Audit service supports requesting bulk export of audit events to Object Storage, which allows you manage the retention and archiving of those events yourself, so I started wondering if I could do the same for the IDCS access events. A bit of testing and some simple coding later, and I had events being pulled directly from the IDCS Audit Events API periodically and sent to Object Storage for retention.

In this blog post, I will provide this code as a sample, and discuss some of the techniques and technologies that are available in Oracle Cloud Infrastructure to enable simple, but highly secure and cost-effective, automation of cross-service tasks such as this.

Simple architecture sketch to begin with:

Now, does this need so many components? Not at all. You could simply use crontab in a compute instance, store your authentication details in a configuration file and shuttle the events between IDCS and Object Storage that way. The reason I haven’t done that is because OCI makes it very easy for services to interact with one another, and using these native cloud services rather than maintaining my own instance provides a lot of advantages, including allowing me to offload my infrastructure resiliency to the cloud backplane, and making sure I am not storing sensitive configuration information anywhere. Plus, it is cheap! All of these services are incredibly lightweight, and have pricing which reflects that.

First things first, the GitHub repository with the code for the function, and some deployment scripts is available at:

https://github.com/CallanHP/idcs-log-to-obj-store

This code is provided as an example only, with no warranty, support or guarantees. This blog will discuss the role of each of the components, and the required configuration to allow for the approach to be adapted to fit this or other orchestration requirements within OCI.

At the core of this little orchestration is Functions, which gives me a serverless runtime for my code. Instructions for setting up Functions and your development environment are available here. Functions is ideal for these short-running simple use cases, as it provides a lot of natural redundancy and pre-integration with other OCI services. Functions has a bit of delay when it has a cold start, so isn’t ideal for latency sensitive operations (use something that doesn’t scale down to zero, such as K8s for that), but most infrastructure orchestration and automation tasks don’t really care about a couple of milli-seconds.

One of the main things to be aware of when using Functions for automation tasks is that it will timeout after 30 seconds by default (you can increase this to 120), so you need to be cognizant of this when you are writing your code, and add some sanity checks before you kick off a potentially long running task. To avoid any potential issues with timeouts in my code, I had and idea around a ‘maxEvents’ parameter which can be set as a safety valve, then wrote the code so that in any execution I would only ever archive that many events. If it was invoked during a particularly busy period, and more events than ‘maxEvents’ had occurred since the last archive, that was fine, I assumed that the archiving would eventually catch up on subsequent invocations.

Functions allows for very easy dynamic configuration, and the repo contains a simple script to set the configuration values, it would be easy to modify this to work against a different bucket, idcs instance or use different secret values, etc. just by modifying the configuration of the function in the console.

The first and simplest thing that was required, was the creation of an Object Store bucket for holding the logs. Here, we are simply using Oracle-managed encryption keys, but if you wanted to use your own, the plan is to create an encryption key in a Vault to use Secrets, so there isn’t exactly a lot of overhead – just make sure you add a policy to your environment to allow the object storage service to use the key.

Since Functions are stateless, some thought was given to the naming scheme for the logs written to Object Store in order to help persist some state, while also making future retrieval of the logs easy. As Object Storage lets you filter by prefix, it made sense to prefix the stored logs with the start/end timestamps. This way, if logs for a particular time need to be retrieved, it can be easily filtered in the console, and we can retrieve the last end time to determine the new start time for the next execution.

In order to securely work with Identity Cloud (IDCS), I decided to use public/private key authentication, but I didn’t want to store my private key in a configuration file, or in clear-text in the function configuration, so I opted to use OCI Secrets. OCI Secrets lets you encrypt sensitive data using a key managed by you, and stored in a hardware security module. You can then write policies around that sensitive data, to allow other services to obtain it at runtime via an OCI API call. What this means is that you never need to persist sensitive configuration data or decryption keys in your application, you can instead load the information into memory at runtime.

To use Secrets, you can create a new Vault in the tenancy, create a master encryption key, and then create a new Secret. Secrets are stored base64 encoded in OCI, and as PEM-encoded RSA Private keys are already in base64, after generating an RSA key, you can simply strip out the PEM headers and paste the private key into the Secrets UI, before cleaning up the local copy (the headers are re-added in my code).

An IDCS application needs to be created to allow that key to be used access the IDCS APIs, so we need a Confidential Application in IDCS, which supports the Client Credentials flow type, and has Audit Administrator access, in order to retrieve the Audit Events as well as uploading the public certificate associated with the stored private key.

In order to allow the Function to use the stored key, we need to create a dynamic group for the function, and a policies on that group. The reason for this is to take advantage of OCI’s capability to perform runtime service authentication, which keeps our sensitive authentication information nice and safe by simply using the context in which the service is running to authenticate it.

To include our function in a dynamic group, we can create the following matching rule:

ALL{resource.type='fnfunc', resource.id=ocid1.fnfunc.oc1…..'}

This selects the function with the associated OCID (using the ALL when selecting by OCID isn’t actually needed, but at some point this definition might need to be expanded, such as to all functions with a particular tag, or part of a larger function ‘app’) and allows policies to be written which reference this function. This assumes you can get the OCID of your function – so while this might read like a pre-requisites section, you will actually need to deploy it.

For this scenario, the function needs to be authorised to do three things; retrieve the IDCS key from secrets; check the last time the logs were archived (so it can use that as a start time); and write the logs back to Object Storage.

OCI gives us the ability to create very fine-grained authorisation policies, so we can apply the principle of least privilege and write policies which only grant these capabilities to the dynamic group:

allow dynamic-group log-retention-fn to read secret-bundles in compartment id ocid1.compartment.oc1.. where target.secret.id='ocid1.vaultsecret.oc1….'

allow dynamic-group log-retention-fn to manage objects in compartment id ocid1.compartment.oc1.. where all{target.bucket.name='idcs_log_archive', request.permission='OBJECT_CREATE'}

allow dynamic-group log-retention-fn to inspect objects in compartment id ocid1.compartment.oc1.. where target.bucket.name='idcs_log_archive'

It is worth noting the heavy use of Conditions in these policies, to restrict the operation of the function as tightly as possible. This code only does a tiny handful of things, and even if the risk of compromise is low, whitelisting only those functions the code needs to perform is good security practice.

Once all of these configuration items; the creation of the Bucket; the IDCS client application; the private key stored in Secrets; dynamic groups and policies for the function; and the function deployment and configuration itself have been completed, the end-to-end functionality can be tested using a simple fn invoke.

I am not going to get into the details of debugging a function – there is a huge section of the documentation on approaches for that, and you should combine it with copious logging. I was also able to test locally, as I added code to fall back on local config files when not running in the OCI Functions context (see funcImpl.js, lines 72 to 92) – I am not going to get too picky about your development approach, but I strongly recommend being able to test pretty extensively locally to avoid continually pushing non-functional code to the OCI registry. Anyway, tangent over, I am going to assume your code is working at this point.

Using fn invoke allows for on-demand log retention, but in order for this to be useful, we need to automate that. For this scenario we are going to use a very simple service in a somewhat unintended way in order to obtain regular scheduled invocations of the function. That service is the Health Check service – which is part of the OCI monitoring suite, and is designed to regularly access some sort of HTTP endpoint on an application to assess that application’s health.

We can take advantage of this regular polling by simply exposing a HTTP GET endpoint which invokes our function, and the easiest way to do with a Function is through the API Gateway service. After setting up an API Gateway, and making sure it is able to access functions with yet another dynamic group and policy, we can simply create a new API Gateway deployment with a route to the log retention function.

Once this has been set up, the API Gateway service will give you a long auto-generated URL endpoint which might be a bit opaque for external developers, but is perfectly good for our Health Check. When creating a Health Check, you can just paste the hostname of that URL in the target, and select a Vantage point (I chose one in the same region, but it doesn’t really matter), as well as provide the path to your Functions API endpoint.

Then you can set the frequency of invocation – I went with every 15 minutes, since that seemed a reasonable roll-over time for the logs.

Now, every 15 minutes, the Health Check invokes the API endpoint, which invokes the Functions, which triggers a copy of the latest Audit Events to Object Store. Success.

(As a bonus, Object store allows for prefix filtering, so filtering logs by date is really easy)

Unfortunately using Health Checks like this involves exposing the endpoint publicly on the internet, though there are plenty of mitigations we can put in place to minimise risks associated with that. Firstly, you can set ingress rules on the API subnet to limit access to just your vantage points. Secondly, you could have your Health Check send some sort of magic value as a header, then drop invocations which lack that value either in an API Gateway Authentication policy or simply in your code (configure the header value in the function config). Thirdly, and present in the code, is to assess the time that has elapsed since last log backup and simply terminate execution if it is being invoked too frequently. Given that the function only performs internal operations, takes no payload or arguments and returns nothing of value, the only real concern about abuse is continual polling of the endpoint, which the above mitigate fairly well.

There are ways to use internal schedulers which avoids this issue (for instance, scheduled jobs in OKE or scheduled integrations in OIC), but the use of health checks like this has the smallest overhead both in terms of complexity and billing that I could come up with.

OCI provides a wide range of lightweight services which provide useful capabilities individually, and enable some quite complex scenarios to be completed when combined. While there may be slightly more straightforward ways to solve this problem of log retention, combining these OCI native services provides massive resiliency and security benefits which would either add significant complexity to the solution or be otherwise unobtainable. In this example, we leveraged a HSM backed encrypted key, with authentication based upon runtime context, used a highly resilient runtime and set up a redundant scheduler with nothing more than a handful of clicks in a UI and all without adding any significant cost to the solution. All of the configuration is also handled externally to the code, and changing any of the connected services doesn’t require a redeployment, and can be performed either from the console or the CLI.

This example used Functions with API Gateway, Secrets in Vault and the Health Check services to perform periodic pulls of log events from IDCS and store them in Object Storage, but this is just one potential usecase for the capabilities of OCI orchestration using Functions. The OCI Events service for instance, can be configured to invoke a function on almost any backplane event, from an object being dropped into Object Store to an ‘SSL cert expiring soon’ alert from Cloud Guard. Your Function could then perform some processing task on the object, trigger a cert renewal, publish an alert to Slack, or any other response you could think to build, leveraging the wide range of other OCI services to provide the security and functional requirements you need for the orchestration.

One thought on “Simple, Secure Log Retention using OCI Services”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s