Back to blog

Microservices Architecture: Lessons from the Trenches

June 12, 2024 (1y ago)

Microservices promise scalability, resilience, and team autonomy—but they come with significant complexity. After years of building distributed systems, here are the lessons I've learned.

When to Go Microservices

Good Candidates

  1. Multiple Teams - Different teams need independent deployment cycles
  2. Technology Diversity - Different services benefit from different tech stacks
  3. Scale Requirements - Individual components need to scale independently
  4. Complex Business Logic - Bounded contexts are clearly defined

Bad Candidates

  1. Small Teams - Overhead outweighs benefits
  2. Simple Applications - Monolith is faster to develop
  3. Shared Database - If you can't avoid shared data, reconsider
  4. Low Traffic - Complexity isn't justified
// Service boundary decision matrix
const shouldSplit = {
  teamSize: teamSize > 8,           // Multiple teams needed
  deploymentFrequency: frequency > 1, // Multiple deployments per week
  technologyStack: stack.length > 1,  // Different tech needed
  scalability: scaleFactors.length > 1, // Different scale requirements
  dataOwnership: dataBoundaries.length > 1 // Clear data boundaries
};

Service Design Patterns

API Gateway Pattern

Central entry point for all client requests:

// API Gateway with Express.js
const express = require('express');
const httpProxy = require('http-proxy-middleware');

const app = express();

// Authentication middleware
app.use(async (req, res, next) => {
  const token = req.headers.authorization;
  const user = await validateToken(token);
  req.user = user;
  next();
});

// Route to microservices
app.use('/api/users', httpProxy.createProxyMiddleware({
  target: 'http://user-service:3001',
  changeOrigin: true,
  pathRewrite: { '^/api/users': '' }
}));

app.use('/api/orders', httpProxy.createProxyMiddleware({
  target: 'http://order-service:3002',
  changeOrigin: true,
  pathRewrite: { '^/api/orders': '' }
}));

// Request aggregation
app.get('/api/dashboard', async (req, res) => {
  const [userStats, orderStats] = await Promise.all([
    fetch('http://user-service:3001/stats'),
    fetch('http://order-service:3002/stats')
  ]);
  
  res.json({
    users: await userStats.json(),
    orders: await orderStats.json()
  });
});

Service Discovery

Dynamic service registration and discovery:

// Consul service registration
const consul = require('consul');

class ServiceRegistry {
  constructor(serviceName, port) {
    this.consul = new consul();
    this.serviceName = serviceName;
    this.port = port;
    this.serviceId = `${serviceName}-${port}-${Date.now()}`;
  }
  
  async register() {
    await this.consul.agent.service.register({
      id: this.serviceId,
      name: this.serviceName,
      address: this.getIpAddress(),
      port: this.port,
      check: {
        http: `http://${this.getIpAddress()}:${this.port}/health`,
        interval: '10s',
        timeout: '5s'
      }
    });
  }
  
  async discover(serviceName) {
    const services = await this.consul.health.service({
      service: serviceName,
      passing: true
    });
    
    return services.map(service => ({
      address: service.Service.Address,
      port: service.Service.Port
    }));
  }
  
  async deregister() {
    await this.consul.agent.service.deregister(this.serviceId);
  }
}

Circuit Breaker Pattern

Prevent cascading failures:

class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 60000;
    this.monitoringPeriod = options.monitoringPeriod || 10000;
    
    this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    this.failureCount = 0;
    this.nextAttempt = Date.now();
    this.monitoringStartTime = Date.now();
  }
  
  async execute(operation) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is OPEN');
      }
      this.state = 'HALF_OPEN';
    }
    
    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
    this.monitoringStartTime = Date.now();
  }
  
  onFailure() {
    this.failureCount++;
    
    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.resetTimeout;
    }
  }
}

// Usage
const circuitBreaker = new CircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 60000
});

async function callUserService(userId) {
  return await circuitBreaker.execute(async () => {
    return await fetch(`http://user-service/users/${userId}`);
  });
}

Data Management

Database per Service

Each service owns its data:

// User service database
class UserDatabase {
  constructor() {
    this.pool = new Pool({
      connectionString: process.env.USER_DB_URL
    });
  }
  
  async createUser(userData) {
    const result = await this.pool.query(
      'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
      [userData.name, userData.email]
    );
    return result.rows[0];
  }
  
  async getUser(id) {
    const result = await this.pool.query(
      'SELECT * FROM users WHERE id = $1',
      [id]
    );
    return result.rows[0];
  }
}

// Order service database
class OrderDatabase {
  constructor() {
    this.pool = new Pool({
      connectionString: process.env.ORDER_DB_URL
    });
  }
  
  async createOrder(orderData) {
    // Store user_id as reference, not full user data
    const result = await this.pool.query(
      'INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *',
      [orderData.userId, orderData.total]
    );
    return result.rows[0];
  }
}

Saga Pattern for Distributed Transactions

Manage long-running transactions:

class OrderSaga {
  constructor() {
    this.steps = [
      { action: 'validateUser', compensate: 'rollbackUserValidation' },
      { action: 'reserveInventory', compensate: 'releaseInventory' },
      { action: 'processPayment', compensate: 'refundPayment' },
      { action: 'createOrder', compensate: 'cancelOrder' }
    ];
  }
  
  async execute(orderData) {
    const executedSteps = [];
    
    try {
      for (const step of this.steps) {
        await this.executeStep(step.action, orderData);
        executedSteps.push(step);
      }
      
      return { success: true, orderId: orderData.id };
    } catch (error) {
      // Compensate for executed steps
      for (const step of executedSteps.reverse()) {
        try {
          await this.executeStep(step.compensate, orderData);
        } catch (compensateError) {
          console.error('Compensation failed:', compensateError);
        }
      }
      
      throw error;
    }
  }
  
  async executeStep(action, data) {
    const service = this.getServiceForAction(action);
    return await service[action](data);
  }
}

Communication Patterns

Event-Driven Architecture

Loose coupling through events:

// Event bus implementation
class EventBus {
  constructor() {
    this.subscribers = new Map();
  }
  
  subscribe(eventType, handler) {
    if (!this.subscribers.has(eventType)) {
      this.subscribers.set(eventType, []);
    }
    this.subscribers.get(eventType).push(handler);
  }
  
  async publish(event) {
    const handlers = this.subscribers.get(event.type) || [];
    
    await Promise.all(
      handlers.map(handler => 
        handler(event).catch(error => 
          console.error(`Event handler failed:`, error)
        )
      )
    );
  }
}

// Usage in services
class OrderService {
  constructor(eventBus) {
    this.eventBus = eventBus;
  }
  
  async createOrder(orderData) {
    const order = await this.orderRepository.save(orderData);
    
    // Publish event
    await this.eventBus.publish({
      type: 'ORDER_CREATED',
      data: { orderId: order.id, userId: order.userId },
      timestamp: new Date().toISOString()
    });
    
    return order;
  }
}

class NotificationService {
  constructor(eventBus) {
    this.eventBus.subscribe('ORDER_CREATED', this.handleOrderCreated.bind(this));
  }
  
  async handleOrderCreated(event) {
    await this.sendOrderConfirmation(event.data);
  }
}

Message Queues

Reliable asynchronous communication:

// RabbitMQ implementation
const amqp = require('amqplib');

class MessageQueue {
  async connect() {
    this.connection = await amqp.connect(process.env.RABBITMQ_URL);
    this.channel = await this.connection.createChannel();
  }
  
  async publish(queueName, message) {
    await this.channel.assertQueue(queueName, { durable: true });
    await this.channel.sendToQueue(
      queueName,
      Buffer.from(JSON.stringify(message)),
      { persistent: true }
    );
  }
  
  async subscribe(queueName, handler) {
    await this.channel.assertQueue(queueName, { durable: true });
    await this.channel.consume(queueName, async (msg) => {
      if (msg) {
        try {
          const message = JSON.parse(msg.content.toString());
          await handler(message);
          this.channel.ack(msg);
        } catch (error) {
          console.error('Message processing failed:', error);
          this.channel.nack(msg, false, false); // Reject and don't requeue
        }
      }
    });
  }
}

Monitoring and Observability

Distributed Tracing

Track requests across services:

// OpenTelemetry setup
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-otlp-http');

const sdk = new NodeSDK({
  instrumentations: [getNodeAutoInstrumentations()],
  traceExporter: new OTLPTraceExporter({
    url: process.env.JAEGER_ENDPOINT
  })
});

sdk.start();

// Manual tracing
const { trace } = require('@opentelemetry/api');

async function processOrder(orderData) {
  const tracer = trace.getTracer('order-service');
  const span = tracer.startSpan('process-order');
  
  try {
    span.setAttributes({
      'order.id': orderData.id,
      'order.amount': orderData.amount
    });
    
    const result = await this.orderRepository.save(orderData);
    span.setStatus({ code: trace.SpanStatusCode.OK });
    
    return result;
  } catch (error) {
    span.setStatus({ 
      code: trace.SpanStatusCode.ERROR,
      message: error.message 
    });
    throw error;
  } finally {
    span.end();
  }
}

Health Checks

Monitor service health:

class HealthChecker {
  constructor(dependencies = {}) {
    this.dependencies = dependencies;
  }
  
  async checkHealth() {
    const checks = {
      status: 'healthy',
      timestamp: new Date().toISOString(),
      checks: {}
    };
    
    // Database health
    try {
      await this.dependencies.database.query('SELECT 1');
      checks.checks.database = { status: 'healthy' };
    } catch (error) {
      checks.checks.database = { 
        status: 'unhealthy',
        error: error.message 
      };
      checks.status = 'unhealthy';
    }
    
    // External service health
    for (const [name, service] of Object.entries(this.dependencies.services)) {
      try {
        await service.ping();
        checks.checks[name] = { status: 'healthy' };
      } catch (error) {
        checks.checks[name] = { 
          status: 'unhealthy',
          error: error.message 
        };
        checks.status = 'unhealthy';
      }
    }
    
    return checks;
  }
}

Deployment Strategies

Container Orchestration

Kubernetes deployment example:

# user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: user-service:latest
        ports:
        - containerPort: 3001
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
        livenessProbe:
          httpGet:
            path: /health
            port: 3001
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3001
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Blue-Green Deployment

Zero-downtime deployments:

// Deployment script
class BlueGreenDeployment {
  constructor(k8sClient) {
    this.k8sClient = k8sClient;
  }
  
  async deploy(serviceName, newImage) {
    const currentColor = await this.getCurrentColor(serviceName);
    const newColor = currentColor === 'blue' ? 'green' : 'blue';
    
    // Deploy new version
    await this.deployNewVersion(serviceName, newColor, newImage);
    
    // Wait for new version to be ready
    await this.waitForReady(serviceName, newColor);
    
    // Switch traffic
    await this.switchTraffic(serviceName, newColor);
    
    // Clean up old version
    await this.cleanupOldVersion(serviceName, currentColor);
    
    return { success: true, deployedColor: newColor };
  }
  
  async switchTraffic(serviceName, color) {
    const service = await this.k8sClient.getService(serviceName);
    service.spec.selector.version = color;
    await this.k8sClient.updateService(service);
  }
}

Common Pitfalls and Solutions

1. Distributed Monolith

Problem: Services are tightly coupled through shared databases or synchronous calls.

Solution:

  • Implement proper service boundaries
  • Use asynchronous communication
  • Ensure data ownership per service

2. Service Sprawl

Problem: Too many small services that are hard to manage.

Solution:

  • Start with larger services
  • Split when necessary, not before
  • Monitor service count and complexity

3. Testing Complexity

Problem: End-to-end testing becomes difficult.

Solution:

  • Implement contract testing
  • Use service virtualization
  • Invest in good testing infrastructure

4. Operational Overhead

Problem: Managing many services is operationally complex.

Solution:

  • Automate everything
  • Use good observability tools
  • Invest in DevOps capabilities

Best Practices

  1. Start Simple: Begin with a monolith, split when needed
  2. Clear Boundaries: Define service boundaries carefully
  3. Async Communication: Prefer event-driven communication
  4. Observability: Monitor everything from day one
  5. Automation: Automate deployment, monitoring, and scaling
  6. Failure Design: Design for failure, not success
  7. Data Ownership: Each service owns its data
  8. Version APIs: Plan for API evolution from the start

Conclusion

Microservices are a powerful architectural pattern, but they're not a silver bullet. The key is understanding when and how to apply them based on your specific needs and constraints.

Remember that the goal isn't to use microservices—it's to build systems that are scalable, maintainable, and can evolve with your business. Sometimes that means microservices, sometimes it doesn't.

Focus on the principles behind microservices—independence, resilience, and scalability—rather than the implementation details. Your future self will thank you.