Scaling microservices with Service Mesh

Nathan Luong | May 27, 2024 10

Problems with Microservices

Fragmented communication logic

AWS-Example
  • With the rise of microservices usage, managing overheads is becoming extremely difficult, as more services are added when the backend application grows in complexity.
  • Within a typical microservice architecture, services are often time directly communicate with each other. When the application only consist of a couple services, internal communication can be easily managed and maintained. However, as the application scale, having to manually trace down, and implement how each services communicate is tedious, and inefficient.
  • To make matters worst, these microservices can be in different programming languages, frameworks, and communication protocols.

Services should only be concerned about business logic

E2
  • Let’s abstract away the technology and business logic, and consider this generic scenario of a typical micro-service architecture.
  • For each services, it typically handle more that just business logic. For example, a service can also handle:
    • Communication logic: How services are calling each other, through what protocol
    • Security Logic: Which MS are allowed to commuinicate with others, preventing an attacker who has access to one MS to affect the rest of the cluster
    • Retry Logic: Retrying, and or circuit breaking policies
    • Monitoring & Tracing: Which metrics collection library to use, which metrics service to connect to (ie, all services calling data dog)
E1

Service Mesh - The solution to manage microservices

Sidecar Proxying

  • One solution to the problem is to insert a sidecar to each service cluster, which handles the communication logic, security logic and metrics, since these logic are common across all services.
E3
  • This abstract away the communication logic away from the microservice, making it easy to implement custom transport level communication logic, such as:
    • Mutual TLS, HTTPS
    • Layer 7 Load Balancing, Traffic Splitting
      • Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic
    • Centralized Access Control List, etc…
  • To implement service mesh manually:
    • On Kubernetes, we insert a sidecar proxy to every pod.
    • On AWS, we insert a sidecar proxy to every ECS task definition.
    • With regular Docker, we can use a docker-compose file
  • Some popular service mesh framework:
    • : Most popular open source service mesh framework, built on top of Kubernetes and Envoy.
    • : Part of the containerization platform from RedHat, OpenShift Service Mesh is a managed Istio and Jeager solution, aimed to simplify operating Kubernetes and Istio at an enterprise level.
    • : A managed service mesh solution, offered by AWS, providing “Application-level networking“ for container based AWS compute services (ie: Fargate, ECS, EKS, EC2, and Kubernetes on EC2). Noted that serverless workflows (ie: Lambda) are not supported yet.
    • : Another managed service mesh solution, offered by HashiCorp, which integrate with all HashiCorp product, (including Vault, Nomad, and Terraform).

Why is service mesh useful

Out-of-the-box tracing and observability

New relics
  • With mature proxy technologies, such as , or , countless observability and tracing tools has built on top of them. Some example includes:
    • : Open Source monitoring solution, support by default with Istio
    • : Open Source distributed trancing system. ()
    • : Another popular open source distributed tracing platform. ()
    • : Popular tool with top tier developer experience
    • : A managed Cloud Observability application (formally Lightstep)

Out-of-the-box canary deployment capability

  • By default, performing a Canary Deployment with a traditional microservice architecture is an extremely tedious task.
Traffic-splitting-example
  • For example, If we want to deploy a canary version of Payment Service, and to split incoming traffic with some percentage between the canary and stable version; either we have to put a proxy in front of the Payment Service cluster, or we have to implement that logic inside of the upstream services.
  • In general, when a new version of services are pushed into production, we might not have confident in that service to handle the production workload. This scenario is common, since the new version might contain breaking changes (ie: changes in data schema, infrastructure, or third-party vendor dependency), and failure on production is costly.

Istio

What is Istio

  • Now that we have convinced ourself why service mesh is useful, let’s explore Istio, a popular open source Service Mesh implementation with (or without) Kubernetes.
Istio-Overview
  • Istio works by injecting Envoy proxies into every services, hence forming a proxy mesh on top of all services.
  • On top of that, Istio introduces the Istio control plane, which communicate with the mesh proxies, allowing remote configuration and telemetry analyzation.
AWS-Service-Mesh-Example
  • Typically, the Istio control plane will be part of the Kubernetes control plane, and it is rarely the case that company would want to use stand-alone Istio without Kubernetes.

Benefits of Istio

Heavy Integration with Kubernetes

  • Adopting Istio becomes much simpler since it is configured using Kubernetes YAML file
  • Istio uses K8S CustomResourceDefinition (CRD)
    • extending the K8S API and can be used like any other Kubernetes object

Dynamic Service Discovery

  • Istio has a central registry for all microservices, when a new service got deployed with Kubernetes, the service will get registered without configuration.
  • Using the registry, Envoy proxies can resolve these endpoints for internal communication

Security - Certificate Management

  • Istiod can act as a Certification Authority (CA) for the microservices, hence supporting secure TLS communication between services, such as HTTPS and mTLS.

Metrics & Tracing

  • Similar to other service mesh frameworks, Istio collects metrics and telemetry from the Envoy proxies, enabling observability across the MS stack.

How to use Istio

Istio-Architecture
  • We configure service traffic using Istiod, by giving it description about:
    • Virtual Service
    • Destination Rule
    • Service Registry
  • Istiod will take these high-level configuration, convert them into Envoy configuration, and push them onto the mesh.

Istio Ingress Gateway

Istio-Ingress-Gateway
  • Part of the user plane. Act as a gateway into the Kubernetes cluster, that connects to the envoy proxies.
  • It runs as a Pod in the cluster, and act as a load-balancer, which accepts incoming traffic from outside, and forward those requests to Virtual Services
# Istio Gateway
apiVersion: networking.istio.io/vlalpha3
kind: Gateway
metadata:
	name: httpbin-gateway
spec:
	selector:
	  istio: ingressgateway
	- port:
		number: 80
		name: http
		protocol: HTTP
	hosts:
	- "httpbin.example.com"
# Virtual Serivce that route traffic from Gateway
apiVersion: networking.istio.io/vlalpha3
kind: VirtualService
metadata:
	name: httpbin
spec:
	hosts:
	- "httpbin.example.com"
	http:
	- match:
	  - uri:
		  prefix: "/status"
	   - uri:
		  prefix: "/delay"
	  route:
	  - destination:
		  port:
		    number: 8080
		  host: httpbin

Virtual Service

  • How to route traffic to a given destination
apiVersion: networking.istio.io/vlalpha3
kind: VirtualService
metadata:
	name: payment-route
spec:
	hosts:
	- payment.prod.svc.cluster.local
	http:
	- name: "payment-v2-routes"
	  match:
	  - uri:
		  prefix: "/payment"
	  rewrite:
	    uri: "/newpayment"
	  route:
	  - destination:
		  host: payment.prod.svc.cluster.local
		  subset: v2
	- name: "payment-v1-route"
	  route:
	  - destination:
		  host: payment.prod.svc.cluster.local
		  subset: v1

Destination Rule

  • Configure what happens to traffic for that destination
apiVersion: networking.istio.io/vlalpha3
kind: DestinationRule
metadata:
	name: payment-destination
spec:
	host: payment.prod.svc.cluster.local
	subsets:
	- name: v1
	  labels:
	    version: v1
	- name: v2
	  labels:
	    version: v2