Policy Enforcement in Service Mesh – Istio / Envoy

This Lab, logically follows previous steps required to provision and curate a Kubernetes cluster. Please review them before proceeding. If you are in doubt, feel free to contact me directly via https://www.linkedin.com/in/citurria/

Testing BookInfo app with a single review version

First let’s test the BookInfo app with a single review version.

  • Go back to your Windows Remote Desktop browser.
  • Open a new PuTTY session pointing to your assigned Cluster.
  • Elevate your privileges:

    sudo bash

  • Open and quickly analyse the first simple routing rule:

    less /tmp/mgt/env/envSvcMesh/meshDemo/route-rule-all-v1.yaml

It basically sets a route rule for each of the 4 microservices (product-page, reviews, rating and details) to their corresponding microservice labelled as version 1. This will make sure to see the same version of the reviews web page

  • Apply this rule:

    istioctl create -f /tmp/mgt/env/envSvcMesh/meshDemo/route-rule-all-v1.yaml

  • Open a new browser tab and click on the “Simple Bookstore App” bookmark

  • As expected, notice that the Book Reviews come without “ratings” – Refresh the page multiple times to make sure this behaviour is consistent.

  • Delete the route rule and then notice a different behaviour, as the 3 different review pages versions are invoked by the load balancer, resulting in alternate results every time we refresh the landing page.

    istioctl delete -f /tmp/mgt/env/envSvcMesh/meshDemo/route-rule-all-v1.yaml

Testing BookInfo app with simple-based Routing policies

Now, let’s test a variation of the rule, that implies routing traffic to version 2, but only when logged in as a specific user.

  • First, let’s apply again rule 1, so that the Review Page version 1 is always invoked.

istioctl create -f /tmp/mgt/env/envSvcMesh/meshDemo/route-rule-all-v1.yaml

  • Now, let’s inspect route-rule-reviews-test-v2.yaml:

    It will route to reviews page version 2, only when the current login cookie has a header containing the user jason – That is, only when json user is signed in, route to version 2. With any other user or as anonymous, it will keep routing to version 1.

  • Enforce this new routing rule:

    istioctl create -f /tmp/mgt/env/envSvcMesh/meshDemo/route-rule-reviews-test-v2.yaml

  • Now, go back to your web browser and refresh a few times. Being anonymous, you should always get routed to version 1 (i.e. no ratings).

  • Now, sign in as jason (no password)

  • You will see the version 2 of the review page. You know that because you will see black stars as the ratings.

  • Refresh the page a few times and make sure it is always version 2.
  • Sign out and now validate that you get back to version 1 (no ratings)

Testing BookInfo app with network latency-based Routing policies

In this scenario, we are going to test a way in which we can prepare our microservices to be more resilient to external issues, like in the case when a 3rd party microservice is unresponsive or taking longer than expected to respond. In this case, rather than waiting forever and ultimately timeout the overall transaction, we are simply going to give up a specific microservice call and return an error on that specific call, but at least be able to respond back to the original caller.

Coming back to the BookInfo demo, the “Product page” was configured to allow a maximum tolerance of up to 3 seconds when invoking the “Review” microservice, with 1 retry. That is, 6 seconds of maximum tolerance in total, before giving up and returning a “Review Service unavailable” type of error.

Similarly, the “Review v2” microservice, has been configured with a maximum tolerance of 10 seconds when calling the “Ratings” microservice, before giving up.

What we are going to do, is to inject a hard coded 7 seconds delay when calling the “Ratings” microservice. This, as part of a new Istio RouteRule, but only when signed in as jason user.

That is:

  • Back to the PuTTY session, inspect the rule called: route-rule-ratings-test-delay.yaml

As you can see, in lines 17 to 20, we are force injecting a 7seconds delay when calling the ratings microservice and using a jason user account (lines 9 to 13).

  • Apply the rule:

    istioctl create -f /tmp/mgt/env/envSvcMesh/meshDemo/route-rule-ratings-test-delay.yaml

  • Go back to the BookInfo web page and login as jason again.

    You will notice straightaway that there is a delay in the response. A delay of 7 seconds followed by an error saying that a call to Products Review microservice was unsuccessful.

    This way, we prevented our microservice orchestration to continue lingering until ultimately timing out completely.

  • However, was it really the Reviews microservice that caused the problem? How can we know for sure what really happened in the backend?

    Thanks to the Service Mesh new paradigm of serving microservices, we can use tools such as Zipkin, which receive traces of microservices span as they occur in at runtime.

  • Open a new browser tab and click on the Zipkin bookmark.

  • In the service name, select productpage, so that we see the full runtime trace. Then click Find Traces.

You should be able to see previous calls to Product page, before the delay was introduced and they were all successful. However, the latest one we just tried is marked in red.

  • Click on the latest red trace.
  • Zipkin will show that the logical trace, as well as the injected delays. It is as follows:
    • Product page called in parallel the Details page and this call took 6 milliseconds.
    • Product page called in parallel the Reviews and it tool 3.009 seconds, which was beyond the tolerance, so it failed.
    • We can also see that Reviews called Ratings page and this is the one that took 7.652 seconds to respond (due to the rating-test delay). However, this one although it took ages to respond, it did not bring an HTTP error code, because its tolerance is 10 seconds. So, 7.652 seconds is within its expected response time.
    • A second attempt occurred from Product page to call Reviews, similarly it errored after 3.002 seconds.
    • Finally, once again, Reviews page calls Ratings and the later takes 7.662 seconds to respond.

It was simple to find the bottle neck thanks to the underlying benefits of Service Mesh.

  • Remove the rule and make sure your service is back to normal when signed in as jason user.

  • Verify Zipkin and make sure that now Ratings page is responding instantaneously (filter by newest first).

  • In the first attempt to call Service Ratings, it responds in 8.679 milliseconds.

I hope you found this blog useful. If you have any question or comment, feel free to contact me directly at https://www.linkedin.com/in/citurria/

Thanks for your time.


Author: Carlos Rodriguez Iturria

I am extremely passionate about people, technology and the most effective ways to connect the two by sharing my knowledge and experience. Working collaboratively with customers and partners inspires and excites me, especially when the outcome is noticeable valuable to a business and results in true innovation. I enjoy learning and teaching, as I recognise that this is a critical aspect of remaining at the forefront of technology in the modern era. Over the past 10+ years, I have developed and defined solutions that are reliable, secure and scalable, working closely with a diverse range of stakeholders. I enjoy leading engagements and am very active in the technical communities – both internal and external. I have stood out as a noticeable mentor running technology events across major cities in Australia and New Zealand, including various technology areas such as, Enterprise Integrations, API Management, Cloud Integration, IaaS and PaaS adoption, DevOps, Continuous Integration, Continuous Automation among others. In recent years, I have shaped my role and directed my capabilities towards educating and architecting benefits for customers using Oracle and AWS Cloud technologies. I get especially excited when I am able to position both as a way to exceed my customers’ expectations. I hold a bachelor degree in Computer Science and certifications in Oracle and AWS Solutions Architecture.

2 thoughts on “Policy Enforcement in Service Mesh – Istio / Envoy”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: