Answers every service owner must know

  • How many services are written in each language?
  • Which services have security vulnerabilities or outdated dependencies?
  • What upstream and downstream collaborators use Service A?
  • Which services are production-critical? Which are spikes and experiments, or less important to critical application paths?

Minimum layers for documenting services

Type Summary
Overview An overview of the service’s purpose, intended usage and overall architecture. Service overviews should be an entry point for team members and service users.
Contract A service contract should describe the API that a service provides. Depending on transport mechanism, this can be machine-readable, for example, using Swagger (HTTP APIs) or protocol buffers (gRPC).
Runbooks Documented runbooks for production support detailing common operational and failure scenarios
Metadata Facts about a servi...

Sections in a design review document for a new microservice

Section Purpose
Problem & Context What technical and/or business problem does this feature solve? Why are we doing this?
Solution How are you intending to solve this problem?
Dependencies & Integration How does it interact with existing or pl...

Characteristics of a successful on-cal rotation

  • Inclusive — Everyone who can do it, should do it, including VPs and directors.
  • Fair — On-call work should be remunerated in addition to normal working hours.
  • Sustainable — Enough engineers should be in a rotation to avoid burnout and avoid disruption to work-life balance or day-to-work in the office.
  • Reflective — Your team should constantly review alerts and pages to ensure only alerts that matter wake someone up.

Increment's on call blogpost

Microservices Team Models

  1. Grouping by function
  • Unclear ownership
  • Lack of autonomy
  • No long-term responsibility
  • Risk of silos
  1. Grouping across functions
  • Aligning teams with business value will be reflected in the application developed; the teams will build services that explicitly implement business capabilities.
  • Individual services will have clear ownership.
  • Service architecture will reflect low coupling and high cohesiveness of teams.
  • Functional specialists in different teams can collaborate informally to develop shared pr...

Principles of Effective Teams

  1. Ownership
  • A team might own multiple services
  • 1:n ownership can lead to conflict about technical choices and make accountability unclear
  1. Autonomy
  • Important to scale
  • Self-forming teams
  1. End-to-end responsibility
  • Devops

Useful information in log entries

  1. Timestamps
  2. Identifiers
  3. Source
  4. Level or category

The four Golden signals for collecting metrics

  1. Latency
  2. Errors
  3. Traffic
  4. Saturation

Three common patterns for zero-downtime deployments

  1. Rolling deploy — You progressively take old instances (version N) out of service while you bring up new instances (version N+1), ensuring that you maintain a minimum percentage of capacity during deployment.
  2. Canaries — You add a single new instance into service to test the reliability of version N+1 before continuing with a full rollout. This pattern provides an added measure of safety beyond a normal rolling deploy.
  3. Blue-green deploys — You create a parallel group of services (the green set), running the new version of ...

6 fundamental capabilities of microservice production environment

  1. A deployment target, or runtime platform, where services are run, such as virtual machines (Ideally, engineers can use an API to configure, deploy, and update service configuration. You also could call this API the control pane, as shown in the figure.)
  2. Runtime management, such as auto-healing and autoscaling, that allows the service environment to respond dynamically to failure or changes in load without human intervention (For example, if a service instance fails, it should automatically be replaced.)
  3. Logging and monitoring to obs...

Common code to include in Microservice chassis

  • Logging
  • Configuration fetching
  • Metrics collection
  • Data store setup
  • Health checks
  • Service registry and discovery
  • The chosen transport-related boilerplate (AMQP, HTTP)

How to build reliable communication

  1. Retries
  • Exponential backoff
    1. Use jitter
  • Considerations
    1. Always limit the total number of retries.
    2. Use exponential back-off with jitter to smoothly distribute retry requests and avoid compounding load.
    3. Consider which error conditions should trigger a retry and, therefore, which retries are unlikely to, or will never, succeed.
  1. Fallbacks
  • Graceful degradation
  • Caching
  • Functional redundancy
  • Stubbed data
  1. Timeou...

What’s the purpose of a microservice chassis?

  1. Making it easier to onboard team members
  2. Getting a good understanding of the code structure and concerns regarding the tech stack that an engineering team uses
  3. Limiting the scope of experimentation for production systems as the team builds common knowledge, even if not always in the same tech stack
  4. Helping to adhere to best practices

Maximizing Service Reliability

  1. Load balancing and service health
  • Readiness
  • Liveness
  1. Rate limits
  • Back-pressure via headers
  1. Validating reliability and fault tolerance
  • Load testing ([The art of scalability](The
    Art of Scalability))
    1. Load test as part of the delivery pipeline
    2. Exploratory load testing to identify limits and test assumptions
  • Chaos testing
    1. Chaos toolkit
    2. Principles
    • Define a measurable steady state of normal system ...