Secure Inter-Service Communication in OCI

Oracle Cloud Infrastructure provides a ton of useful services for automating and orchestrating behaviours in your cloud environment, and while they are often pretty handy on their own, leveraging them together gives almost complete flexibility on what you can achieve. Want to trigger a backup using a command in slack, then have a message get sent back when it completes? Sure! Want to periodically poll a log API and archive the results? Easy. Oracle Cloud Infrastructure provides a number of inbuilt capabilities, as well as the ability to jump into arbitrary code to build elaborate automation flows, and this blog post will focus upon the security constructs around this, looking at how services can be authorised to invoke one another, as well as how they authenticate themselves, while avoiding storing sensitive data in insecure ways. This post is intended as an overview of the concepts, and will be referenced in more concrete ways in future.

To begin with, it probably helps to understand how authorisation works in OCI. By default, no user or resource can do anything, unless a policy is written which explicitly allows it. Policies in OCI typically take the following form:

allow group network_admins to manage network-family in compartment network_compartment

This policy allows a certain group of users (network_admins) to manage a set of resources (network-family) in a location (network_compartment). If you are administering an OCI environment, you will probably be writing a lot of these, allowing particular classes of admins to manage certain resources, project teams the ability to use managed resources, etc.

When looking at inter-service communication, as policies are written to authorise groups to access resources we need to introduce different type of group – an excitingly named ‘dynamic group’. Dynamic groups in OCI are defined by membership rules, which can be used to allow instances or resources to be considered part of a group when policies are evaluated.

A dynamic group membership rule might look like:

ALL{resource.type='fnfunc', resource.compartment.id='ocid1.compartment.oc1….. '}

This rule covers all functions (resource.type=’fnfunc’) in a compartment, and could be used with a policy such as:

allow dynamic group function-group to manage objects in compartment id 'ocid1.compartment.oc1….. where target.bucket.name='fn-log-bucket'

This combination would allow all of the functions to use the fn-log-bucket to write logs or similar. While dynamic groups are very intuitive for instances, unfortunately the resource.type directive for non-instance principals (resource and service principals) doesn’t seem to have potential values listed centrally at time of writing, each service instead tends to provide examples in the policies that they require/support as part of their initial configuration, i.e. for API Gateway

Inter-service communication is authorised through policies written against dynamic groups, and your definitions of these allow for very fine grained access for your services. I recommend reviewing the policy reference guide in the OCI documentation and using tightly scoped conditions so that you can provide your services with least-privilege access.

If you have experience administering an OCI tenancy, the above policies ought to be fairly familiar, just with slightly different principal types and hopefully far more restrictive implementation of least privilege, however, unlike users, these dynamic group members don’t log in to the console with a username/password, or have API keys specifically associated with them, so how do these principals –  compute instances and instances of running services – actually authenticate themselves?

One of the very powerful capabilities of OCI is that the runtime context of an instance or resource can be used to identify it – each instance/resource has an OCID, which is associated with a compartment, etc. When you are working with pre-built service integrations, such as an API Gateway invoking a serverless function, Object storage using encryption keys from a Vault, or the Events service pushing an event to a Notification topic, all of the authentication is taken care of by the backplane (and if you have explored setting some of these up, you might have discovered that they are also subject to the policies above), however, this is not the case when you want to invoke other services from your own code, taking advantage of the runtime context, rather than having to store API keys a config file somewhere.

In this case, it helps to be aware of how the OCI backplane injects authentication information into running instances, so that you can leverage this in your code. When you are working within a compute instance, there is a metadata service available to the instance, running on 169.254.169.254. This metadata service provides REST endpoints, which use the control plane’s identification of the instance to allow access identity information, including a private key endpoint (which is why the recommended security configuration for Compute instances is to restrict access to this endpoint unless you need it) which can be used to authenticate connections to other services. This key is short lived, and rotated multiple times per day, so it is expected that your code will query the metadata service as required.

The key is available from [http://]169.254.169.254/opc/v2/identity/key.pem, and the certificate corresponding to this from [http://]169.254.169.254/opc/v2/identity/cert.pem. In most cases, it will be easier to use an SDK or the OCI CLI to handle obtaining these values and signing the requests for you, but you can also use normal http libraries from within your code to access these endpoints and sign the requests yourself.

When your code is running as a serverless function using Functions, then the backplane injects authentication information into the runtime when it instantiates the function. The private key is available in the ‘OCI_RESOURCE_PRINCIPAL_PRIVATE_PEM’ environment variable, and the key identifier is available in the ‘OCI_RESOURCE_PRINCIPAL_RPST’ variable. Again, it will be easier to use an SDK to handle this for you, but you can also use the values from these environment variables to manually sign your request if no SDK exists for the language you are developing in. Just be aware that the key identifier obtained from the RPST environment variable needs to be pre-pended with ‘ST$’.

Using either of the above mechanisms to sign OCI API requests allows the OCI security controls to identify the instance or function and use dynamic group matching rules to authorise API requests. It is worth noting that an individual instance can only be in 5 dynamic groups simultaneously – I find it simplest both from a configuration and a governance perspective to use tightly defined dynamic groups (using tags or individual instance ids) and more policies rather than try to do anything too complex with broad group definitions.

Bonus Section: Secure access to PaaS or external services

Not every service in OCI supports dynamic groups and policy definitions for access, usually because it is designed to be interacted with by end users or in a specific (usually non-API) manner. An example of these are the database services, which authenticate SQL*Net access using user credentials such as a private key or password. In this case, you can leverage the OCI Secrets service which allows for sensitive configuration data to be stored encrypted with keys managed in a hardware security module. The OCI Secrets service does support access via policy definitions and dynamic groups, which means that if you have code which requires access to a database, you can write a narrow policy such as (where the target secret contains the database credentials):

allow dynamic-group custom-resource-principals to read secret-bundles in compartment id ocid1.compartment.oc1…. where target.secret.id='ocid1.vaultsecret.oc1.aaaaaa '

This enables your code to use one of the above mechanisms to obtain authentication information at runtime, invoke the Secrets service to obtain the database credentials, then leverage them to access the database, with no sensitive information stored in an unencrypted form anywhere – the database credentials are loaded into application memory rather than persisted in a config file. The configuration simply contains the opaque secret identifier. There is a nice write up of the steps involved in this in a blog by Todd Sharp here.

The same mechanism can be used for any service, Oracle Cloud or otherwise, that needs authentication information. Get those passwords and keys out of your config files!

One thought on “Secure Inter-Service Communication in OCI”

Leave a comment