Process Excel files with OIC + Oracle Functions

In this blog post I will explore how we can extend the native capabilities of Oracle Integration (OIC) with Oracle Functions to process Excel files.

Although OIC can handle a number of file formats natively, .xlsx or .xls files need a bit of extra love.

The inspiration for this blog comes of the back of several customer enquiries into this subject.

The simple solution for most customers is to convert the Excel file formats to CSV and subsequently process them with OIC. I will use this approach here too but with a little bit of help from other OCI services such as Oracle Functions, an API Gateway and Object Storage.

A video of this demo is available here.

The Solution

As OIC does not natively support Excel files you need to write a bit of extra code to handle them. You can either store this code within OIC as a JavaScript library or use something like Oracle Functions as a container for you code. I have chosen to use this approach here and will rely on Oracle Functions to do the file conversion for me.

With this approach in mind, i can think of two ways to implement this pattern:

The All in one Approach:

Above: a high-level design that captures a single integration flow that will convert and process excel files
  1. The flow begins with an Excel file that resides on an FTP server (this can be any FTP/SFTP server but for this demo I will use the file server embedded within OIC). I have chosen to use an FTP server as my starting point for this flow as i still see the predominant use of file servers over cloud storage alternatives in most integration flows. In this step, OIC will use the FTP adaptor and its native abilities to list and download my Excel file(s).
  2. The Excel file(s) will then be pushed to OCI Object Storage by OIC. Refer to my previous blog if you need a refresher on how to write files to Object Storage with Oracle Integration. I am using Object Storage here as a transient point for these files because I want to provide an easy and secure way for my file conversion function (which is hosted in OCI) to access these Excel files. It is also possible to achieve this without Object Storage … as the function code can be altered to simply get the file directly from OIC’s file server via FTP rather then retrieve it from Object Storage.
  3. Next, OIC will invoke the Excel-to-CSV conversion function, which is exposed via the OCI API Gateway (not depicted in the above image). In this call OIC will pass the Object Storage location of the Excel file as input parameter to the function and will receive a response with the location and name of the newly generated CSV file.
  4. The function itself will hold logic to convert both .xlsx and .xls files. It will store these files in another Object Store Bucket, but honestly this is just a personal preference. Once the function has finished processing it can optionally cleanup the temporary Excel file(s).
  5. In the final step, OIC will pick-up, read (stage file action) and processes the newly generated CSV file as required.

The Decoupled Approach:

Above: a high-level design that captures a decoupled approach where one integration is responsible for the file conversion and another for processing the converted files

The second approach, depicted above is quite obvious, all we are doing here is separating the file conversion process from the file processing operation via two integrations. The second integration can be either scheduled or driven by the OCI Event service (see here). The description of the required steps are the same so i wont re-iterate them here.

Implementation

In this section, I will show you an implementation of the All In One flow, hoping that this will give you enough information to take this approach and customise it for your specific requirements.

I will also provide you with the code required to implement an Excel-to-CSV conversion integration so you can quickly get up and running and test this in your environment.

With this in mind, to implement this flow you will need to:

  1. Create a function that can convert Excel files to CSV
  2. Expose this function via the OCI API Gateway
  3. Create two Object Storage Buckets
  4. Create an OIC connection to an FTP Server that will host your Excel files
  5. Create an OIC connection to your API Gateway
  6. Create your OIC Integration flow

Lets explore each of these steps in a little more detail:

  1. Create a function that can convert Excel files to CSV

In this step we will implement and deploy the code that will be responsible for the file conversion process. As mentioned I am using Oracle Functions as the runtime container for my code. If you are not familiar with the concept of Oracle Functions, please check out this documentation.

There are a number of ways to get started with Oracle Functions, the simplest of which is to use the OCI Cloud Shell (a web browser-based terminal accessible from the Oracle Cloud Console). You can find detailed steps on this approach here

It is obvious but yet important for me to mention that the code you define in this section is entirely up to you and the usecase you are trying to solve.

I have reused a piece of code created by my friend and colleague Steven Lawton that already does the required Excel to CSV file conversion and modified it to work with my desired OIC flow. You can find the Oracle Function code here. Please note that this code is intended for demonstration purposes only and you should make it your own before you implemented into your environment.

If you had a chance to checkout the Python function you will notice that:

  • it processes a single file at a time
  • it handles both xls and xlsx files
  • its expecting a couple of input parameters, namely:
    • File/Object Name
    • Source Object Storage Bucket
    • Target Object Storage Bucket
    • Object Storage Namespace
  • it returns the location (Object Storage Bucket) and name of the newly converted CSV file
  • it cleans up/ removes the Excel file from the Source Object Storage Bucket

Note the above function uses internal /tmp storage to do the file conversion. Temp space in Oracle Functions is limited to a 1/4 of the overall function memory size. Meaning that if you specified 1GB for the function memory size the max file size that function can process will be 256MB (which is well above the current OIC REST payload limit).

The other thing to call out here, is that in this function we are not catering for Excel files with multiple worksheets. If you have such a requirement, you will need to modify the function appropriately (e.g. pass in the desired worksheet as an input parameter & convert only that worksheet).

2. Expose this function via the OCI API Gateway

The next thing you will need to do is expose the Excel-to-CSV conversion function as an API via the OCI API Gateway. This will allow us to easily call the function via OIC. Refer to the Oracle docs here on the steps required to do this.

Note: you will also need to ensure that you have added the required policies needed to allow API Gateway to access your function.

3. Create two Object Storage Buckets

In this flow i am using two Object Storage buckets to store my transient files. One bucket is used to stage the Excel files and and other is used as the temporary location for the converted CSV files. As mentioned previously, you can use a single Object Storage bucket if you prefer.

If you need help with the steps required to create your Object Storage bucket(s), please see here. Once the bucket(s) are created, ensure that a policy exists that allows the conversion function to read and write to the appropriate Object Storage Buckets. Refer to this link for more information on the required policies.

4. Create an OIC connection to an FTP Server that will host your Excel files

With the OCI infrastructure pieces now in place, lets dive into Oracle Integration.

The first thing we want to do is create an FTP connection to the internal file server (or if you prefer: any other FTP/SFTP server). This will be the origin location for our Excel files. I’ll refer you to the OIC FTP adaptor documentation for the steps required to create this connection as there is nothing specific to call out for this usecase.

5. Create an OIC connection to your API Gateway

The other connection we want to create in OIC is the one to the OCI API Gateway.

To create a connection to the OCI API Gateway, use the REST adaptor and set the Connection URL to the hostname of your public API Gateway.

I will leave security policies aside for the minute, as this is a large topic in its own right.

6. Create your OIC Integration

To demonstrate the capability for us to process an Excel with OIC I have created the following flow:

Note: I have made the iar for this available here.

Also please note that you can click on most of the below images to enlarge them

As you can see, this is a schedule based integration that does the following:

  1. Assigns some variables related to Object Storage use:

2. Does an FTP List looking for a file name pattern of *.xls*:

3. Iterates over each file returned by the FTP List step.

4. Downloads the File from the FTP server

5. Once the file is retrieved we send it to our Source Object storage bucket:

Note that the request payload and media type used below

6. Invoke the API which exposes our Excel2CSV function. We do this via the REST connection we setup to the API Gateway. Ensure to specify the endpoints relative URI based on your implementation. See the configured request/response samples below:

Request Payload:

{
  "file_name" : "SourceAPFile-20201116.xlsx",
  "source_bucket" : "stv_bucket_OIC",
  "target_bucket" : "stv_bucket_OSS",
  "namespace" : "sdc90vkxb5rj"
}

Response Payload

{
  "fileconvertion" : "SUCCESS",
  "target_bucket" : "stv_bucket_OSS",
  "target_filename" : "SourceAPFile-20201116.csv"
}

7. The getCSVfile step depicted here is responsible for retrieving the now converted CSV file from Object storage (as a binary file). As you can see from the below mapping, it retrieves the file based on the response of the Excel2CSV conversion function.

8. The ReadCSVfile step is responsible for reading the binary file and converting it to a native payload type for OIC. See the Github repository for a sample file if you are looking to construct this schema manually or just import the IAR.

9. The PartyTime step in this flow simply logs the contents of the file out in an attempt to show that we have read the payload as expected. Replace this step with whatever you logic you require to achieve your desired usecase.

10. Finally the deleteCSVfileofObjectStorage step is a note, to remind you to cleanup the converted CSV file should you want to. You can see an example of implementing an Object Storage delete operation in my blog here.

Summary

In this blog i wanted to demonstrate an approach that allows customers to process Excel files with Oracle Integration. Although native capability to process Excel files is not available in OIC today, we can see that we can easily extend the capabilities of Oracle Integration with other OCI services to meet this requirement.

In my opinion every integration product will have certain limitation, what becomes important at that point is what can you do to extend or compliment your integration platform. In Oracle Integration we have always been believers of extending the core functionality of our product through native JavaScript functionality or complementary OCI services such as Oracle Functions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s