Service Discovery & API Gateway

Service Discovery & API Gateway

In a monolith, the order module calls the payment module with a local function call. In a microservices architecture, the order service needs to know the network address of the payment service — and that address changes as instances scale up, fail, and redeploy. Service discovery solves this: it maintains a registry of available service instances and their addresses. An API gateway solves a different problem: it provides a single entry point for external clients, handling routing, authentication, and protocol translation.

Service Discovery

The Problem

Each service runs multiple instances across different hosts. IP addresses change on every deployment. Instances come and go as auto-scaling adds and removes capacity.

Order Service needs to call Payment Service.

Payment Service instances:
  t=0:  10.0.1.5:8080, 10.0.1.6:8080
  t=1:  10.0.1.5:8080, 10.0.1.6:8080, 10.0.2.3:8080  (scaled up)
  t=2:  10.0.1.6:8080, 10.0.2.3:8080                   (10.0.1.5 crashed)
  t=3:  10.0.1.6:8080, 10.0.2.3:8080, 10.0.2.4:8080   (replaced + scaled)

Hardcoding addresses is impossible.

Client-Side Discovery

The calling service queries a service registry to get the list of available instances, then load-balances across them directly.

sequenceDiagram
    participant OS as Order Service
    participant SR as Service Registry
(Consul / Eureka) participant P1 as Payment 10.0.1.6 participant P2 as Payment 10.0.2.3 OS->>SR: GET /services/payment-service SR->>OS: [10.0.1.6:8080, 10.0.2.3:8080] OS->>OS: Load balance (round-robin) OS->>P1: POST /charge {amount: 100} P1->>OS: 200 OK

How registration works: each service instance registers itself on startup and sends heartbeats. If heartbeats stop, the registry removes the instance.

# Service registration (on startup)
consul.agent.service.register(
    name="payment-service",
    address="10.0.1.6",
    port=8080,
    check=consul.Check.http("http://10.0.1.6:8080/health", interval="10s")
)
ProsCons
No intermediary — one fewer hopDiscovery logic in every service client
Client can use smart routing (latency-based, weighted)Client must handle stale registry data
No single point of failure in the data pathTighter coupling between client and registry

Used by: Netflix Eureka (Spring Cloud), Consul with client-side load balancing, gRPC name resolution.

Server-Side Discovery

A load balancer sits between the client and the service instances. The client sends requests to a single known address; the load balancer queries the registry and routes.

sequenceDiagram
    participant OS as Order Service
    participant LB as Load Balancer
    participant SR as Service Registry
    participant P1 as Payment 10.0.1.6
    participant P2 as Payment 10.0.2.3

    Note over LB,SR: LB periodically syncs with registry

    OS->>LB: POST payment-service/charge
    LB->>LB: Select instance (round-robin)
    LB->>P2: POST /charge (10.0.2.3:8080)
    P2->>LB: 200 OK
    LB->>OS: 200 OK
ProsCons
Clients are simple — just call one addressLoad balancer is a single routing point (can become bottleneck)
Discovery logic centralized — update onceExtra hop adds latency
Client doesn’t need registry awarenessLoad balancer must be highly available

Used by: AWS ALB/NLB + ECS/EKS, Kubernetes Service (kube-proxy), NGINX with Consul template.

DNS-Based Discovery

The simplest form: use DNS to resolve a service name to instance IPs. The DNS server returns multiple A records; the client picks one.

dig payment-service.internal.example.com

payment-service.internal.example.com  A  10.0.1.6
payment-service.internal.example.com  A  10.0.2.3

Limitation: DNS TTL caching means stale records persist after an instance dies. Not suitable for rapidly changing instance pools. Works well for services that change infrequently.

Kubernetes DNS: payment-service.default.svc.cluster.local resolves to the ClusterIP — an internal virtual IP that kube-proxy routes to healthy pods. This combines DNS discovery with server-side load balancing.

API Gateway

An API gateway is a single entry point for all external client traffic. It sits between clients and the internal service mesh, handling cross-cutting concerns that don’t belong in individual services.

flowchart TB
    Web([Web App]) --> GW
    Mobile([Mobile App]) --> GW
    Partner([Partner API]) --> GW

    subgraph API Gateway
        GW[Gateway] --> Auth[Authentication]
        Auth --> RL[Rate Limiting]
        RL --> Route[Routing]
    end

    Route --> US[User Service]
    Route --> OS[Order Service]
    Route --> PS[Payment Service]
    Route --> RS[Recommendation Service]

Responsibilities

ResponsibilityWhat It Does
RoutingMaps external URL (/api/orders) to internal service (order-service:8080/orders)
AuthenticationValidates JWT/OAuth tokens before requests reach services — services trust the gateway’s identity headers
Rate limitingEnforces per-client or per-endpoint rate limits (see Rate Limiting)
SSL terminationHandles TLS at the edge — internal traffic can be plaintext (within a trusted network) or mTLS
Protocol translationClient speaks REST/JSON; internal services use gRPC/Protobuf. Gateway translates.
Request aggregationMobile app needs data from 3 services in one screen. Gateway makes 3 internal calls and returns one response.
CachingCache GET responses at the gateway to reduce backend load
ObservabilityCentralized access logs, request tracing IDs, latency metrics

Gateway Products

GatewayTypeKey Feature
KongOpen-source + enterprisePlugin ecosystem (auth, rate limiting, logging), runs on NGINX
AWS API GatewayManagedTight integration with Lambda, Cognito, IAM; pay-per-request
EnvoyProxy/gatewayL7 proxy with advanced routing, circuit breaking, observability; used in service meshes
NGINXReverse proxy + gatewayConfiguration-based routing, high performance, widely deployed
Spring Cloud GatewayJava frameworkIntegrates with Spring ecosystem, reactive, filter chains

Backend for Frontend (BFF)

A specialized API gateway per client type. Each BFF is tailored to its client’s data needs — the mobile BFF returns smaller payloads, the web BFF aggregates more data, the partner BFF uses a different auth scheme.

flowchart TB
    Web([Web App]) --> WBFF[Web BFF]
    Mobile([Mobile App]) --> MBFF[Mobile BFF]
    Partner([Partner]) --> PBFF[Partner BFF]

    WBFF --> US[User Service]
    WBFF --> OS[Order Service]
    WBFF --> RS[Recommendation Service]

    MBFF --> US
    MBFF --> OS

    PBFF --> OS
Without BFFWith BFF
Mobile app receives full user profile (50 fields) and filters locallyMobile BFF returns 8 fields the mobile app actually uses
Web and mobile share one API design — compromises for bothEach BFF is optimized for its client’s screen layouts and network constraints
Partner API changes risk breaking mobile appEach BFF evolves independently

Trade-off: N BFFs = N codebases to maintain. Use BFF when client data needs diverge significantly — don’t create a BFF for two clients that consume the same API shape.

Service Mesh vs API Gateway

Two patterns that solve different traffic routing problems:

flowchart TB
    subgraph External - North/South Traffic
        Client([External Client]) --> GW[API Gateway]
    end

    subgraph Internal - East/West Traffic
        GW --> A[Service A + Sidecar]
        A <-->|mTLS, retry, circuit breaker| B[Service B + Sidecar]
        B <-->|mTLS, retry, circuit breaker| C[Service C + Sidecar]
    end

    subgraph Service Mesh Control Plane
        CP[Istio / Linkerd] -.->|config| A
        CP -.->|config| B
        CP -.->|config| C
    end
PropertyAPI GatewayService Mesh
Traffic directionNorth-south (external → internal)East-west (service → service)
DeploymentCentral, shared instance(s)Sidecar proxy per service pod
Primary concernExternal client management (auth, rate limiting, routing)Internal service communication (mTLS, retries, circuit breakers, observability)
ExamplesKong, AWS API Gateway, NGINXIstio (Envoy sidecars), Linkerd
When to useAlways — every system needs an external entry pointWhen service-to-service communication needs consistent security, retries, and observability without changing application code

They are complementary, not alternatives. A typical production setup has an API gateway at the edge and a service mesh for internal communication.

ℹ️

Interview tip: When designing a microservices system, say: “External traffic enters through an API gateway that handles auth, rate limiting, and TLS termination. Internally, services discover each other through Kubernetes DNS — payment-service.default.svc.cluster.local resolves to a ClusterIP backed by healthy pods. For service-to-service concerns like retries, circuit breaking, and mTLS, I’d use a service mesh sidecar (Envoy) rather than implementing those in every service’s application code.” This shows you understand the separation between edge routing and internal communication.