Skip to main content

AWS Cost Optimisation

How to Build a $500/Month Monitoring Stack That Actually Works

Stop paying $2,000+/month for enterprise monitoring tools. This guide shows startups how to build a complete monitoring stack with CloudWatch, CloudWatch Logs Insights, AWS X-Ray, and open-source tools for under $500/month.

Cloud Associates

Cloud Associates

We’ve seen too many startups paying $3,000+/month for enterprise monitoring tools like Datadog or New Relic. The pattern is always the same: applications serving millions of requests with a handful of microservices, teams frustrated with costs but feeling locked in because “monitoring is critical.”

Here’s what we’ve learned: you can build a production-grade monitoring stack using AWS-native tools and selective open-source solutions for under $500/month. Same observability. Better retention. Faster queries. Full control over your data.

The Problem with Enterprise Monitoring Tools

Don’t get me wrong Datadog, New Relic, and Dynatrace are excellent products. But their pricing models are optimised for enterprises, not startups.

Typical enterprise monitoring costs for a startup:

  • Datadog: $1,500-5,000/month (15/host/month + $0.10/GB logs + custom metrics)
  • New Relic: $1,200-4,000/month (similar pricing model)
  • Dynatrace: $2,000-8,000/month (even more expensive)

Why so expensive?

  • Per-host pricing (every EC2 instance, container, Lambda function costs money)
  • Per-GB log ingestion fees
  • Custom metrics charges
  • APM (Application Performance Monitoring) add-ons
  • Premium features locked behind higher tiers

The trap: Start on generous free tiers, scale your infrastructure, suddenly you’re paying $3K+/month.

The $500/Month Monitoring Stack

Here’s what we built and what it costs:

Core Components

1. CloudWatch Metrics + Alarms

  • Cost: $50-80/month
  • What it does: Infrastructure metrics, application metrics, alerting

2. CloudWatch Logs + Logs Insights

  • Cost: $150-250/month
  • What it does: Centralized logging, log analysis, structured queries

3. AWS X-Ray

  • Cost: $20-40/month
  • What it does: Distributed tracing, request flow visualization, latency analysis

4. Uptime Monitoring (Better Uptime)

  • Cost: $20/month
  • What it does: External uptime checks, status page, incident management

5. Error Tracking (Sentry OSS Self-Hosted)

  • Cost: $50/month (hosting on t3.medium EC2)
  • What it does: Error tracking, release tracking, user impact analysis

6. CloudWatch Dashboard

  • Cost: $9/month (3 custom dashboards × $3/dashboard)
  • What it does: Centralized metrics visualization

7. AWS Budgets + Cost Alerts

  • Cost: $Free (2 budgets free, $0.02/day per additional)
  • What it does: Cost monitoring and alerting

Total: ~$450/month

Let’s break down each component and why it was chosen.

Component #1: CloudWatch Metrics + Alarms ($50-80/month)

What it does:

  • Collects metrics from EC2, ECS, RDS, ALB, Lambda, and custom application metrics
  • Creates alarms based on metric thresholds
  • Sends notifications via SNS (email, Slack, PagerDuty)

Setup:

1. Enable detailed monitoring for EC2/ECS

# Enable detailed monitoring (1-minute intervals)
aws ec2 monitor-instances --instance-ids i-1234567890abcdef0

2. Create custom metrics from your application

// Node.js example with aws-sdk
const { CloudWatch } = require('@aws-sdk/client-cloudwatch');
const cw = new CloudWatch({ region: 'ap-southeast-2' });

async function publishMetric(metricName, value) {
  await cw.putMetricData({
    Namespace: 'MyApp/API',
    MetricData: [
      {
        MetricName: metricName,
        Value: value,
        Unit: 'Count',
        Timestamp: new Date(),
      },
    ],
  });
}

// Track custom business metrics
await publishMetric('UserSignups', 1);
await publishMetric('OrdersCompleted', 1);
await publishMetric('RevenueUSD', 49.99);

3. Create alarms for critical metrics

# High API error rate alarm
aws cloudwatch put-metric-alarm \
  --alarm-name high-api-error-rate \
  --alarm-description "Alert when API error rate > 5%" \
  --metric-name 5XXError \
  --namespace AWS/ApplicationELB \
  --statistic Average \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:ap-southeast-2:123456789:alerts

Key alarms to set up:

  • High error rate (4xx, 5xx) on ALB
  • High CPU utilization on EC2/ECS (> 80%)
  • High memory utilization (> 85%)
  • RDS connections near limit
  • Lambda errors and throttles
  • API latency (p95 > threshold)

Cost breakdown:

  • CloudWatch Metrics: $0.30/metric/month (first 10,000 free)
  • CloudWatch Alarms: $0.10/alarm/month (first 10 free)
  • Typical usage: 50 custom metrics, 20 alarms = ~$15/month
  • API requests: ~$5/month
  • Total: $20-30/month

Component #2: CloudWatch Logs + Logs Insights ($150-250/month)

What it does:

  • Centralised log collection from EC2, ECS, Lambda, RDS
  • Structured log queries with CloudWatch Logs Insights
  • Log retention management (reduce costs by aging out old logs)

Setup:

1. Configure application logging

// Use structured JSON logging (critical for Logs Insights queries)
const winston = require('winston');

const logger = winston.createLogger({
  format: winston.format.json(),
  defaultMeta: { service: 'api-service' },
  transports: [
    new winston.transports.Console(),
  ],
});

// Log with structured data
logger.info('User login', {
  userId: '12345',
  email: 'user@example.com',
  ipAddress: '203.0.113.42',
  duration: 340,
});

2. Send logs to CloudWatch from ECS/Fargate

{
  "logConfiguration": {
    "logDriver": "awslogs",
    "options": {
      "awslogs-group": "/ecs/api-service",
      "awslogs-region": "ap-southeast-2",
      "awslogs-stream-prefix": "ecs"
    }
  }
}

3. Query logs with Logs Insights

# Find all errors in the last hour
fields @timestamp, @message, userId, errorMessage
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100
# Calculate p95 latency for API endpoints
fields @timestamp, endpoint, duration
| filter endpoint like /^\/api/
| stats avg(duration), pct(duration, 95) by endpoint
# Find slow database queries
fields @timestamp, query, duration
| filter duration > 1000
| sort duration desc
| limit 20

4. Set up log retention policies

# Keep recent logs longer, age out old logs to save money
aws logs put-retention-policy \
  --log-group-name /ecs/api-service \
  --retention-in-days 30

aws logs put-retention-policy \
  --log-group-name /aws/lambda/background-worker \
  --retention-in-days 7

Cost breakdown:

  • Log ingestion: $0.50/GB
  • Log storage: $0.03/GB/month
  • Logs Insights queries: $0.005/GB scanned

Typical startup (10 GB logs/month, 30-day retention):

  • Ingestion: 10 GB × $0.50 = $5/month
  • Storage: 10 GB × 30 days × $0.03/GB = $9/month
  • Queries: 100 GB scanned × $0.005 = $0.50/month
  • Total: $15-20/month

As you scale (100 GB logs/month):

  • Ingestion: $50/month
  • Storage: $90/month (with 30-day retention)
  • Queries: $5/month
  • Total: $145/month

Cost optimisation tips:

  1. Filter before sending - Don’t log debug messages in production
  2. Use shorter retention - 7 days for most logs, 30 days for critical logs
  3. Sample high-volume logs - Log 1% of successful requests, 100% of errors
  4. Archive to S3 - Export old logs to S3 ($0.023/GB storage) for long-term retention

Component #3: AWS X-Ray ($20-40/month)

What it does:

  • Distributed tracing (track requests across microservices)
  • Service maps (visualise how services call each other)
  • Latency analysis (find slow services/dependencies)
  • Error tracking across services

Setup:

1. Install X-Ray daemon on EC2/ECS

# Dockerfile with X-Ray daemon
FROM node:18-alpine

# Install X-Ray daemon
RUN apk add --no-cache curl
RUN curl -o /usr/local/bin/xray https://s3.amazonaws.com/aws-xray-assets.us-east-1/xray-daemon/aws-xray-daemon-linux-3.x
RUN chmod +x /usr/local/bin/xray

# Start X-Ray daemon and application
CMD ["/usr/local/bin/xray", "-o", "&", "&&", "node", "server.js"]

2. Instrument your application

// Node.js Express app with X-Ray
const AWSXRay = require('aws-xray-sdk-core');
const AWS = AWSXRay.captureAWS(require('aws-sdk'));
const express = require('express');

const app = express();

// Enable X-Ray middleware
app.use(AWSXRay.express.openSegment('api-service'));

app.get('/api/users/:id', async (req, res) => {
  // X-Ray automatically traces this HTTP request
  const subsegment = AWSXRay.getSegment().addNewSubsegment('fetch-user');

  try {
    const user = await fetchUserFromDB(req.params.id);
    subsegment.close();
    res.json(user);
  } catch (error) {
    subsegment.addError(error);
    subsegment.close();
    res.status(500).json({ error: 'Internal server error' });
  }
});

// Close X-Ray segment
app.use(AWSXRay.express.closeSegment());

app.listen(3000);

3. Analyse traces in X-Ray console

  • Service map shows request flow: ALB → API → Database → Cache
  • Trace view shows exact timing: 120ms total (50ms DB, 20ms cache, 50ms processing)
  • Filter slow traces: Show all requests > 1 second
  • Error analysis: Which service is throwing errors?

Cost breakdown:

  • Traces recorded: $5/million traces
  • Traces retrieved: $0.50/million traces
  • Traces stored: First 100,000 per month free, $1/million after

Typical startup (2M requests/month, 10% sampling):

  • Traces recorded: 200,000 × $5/million = $1/month
  • Traces retrieved: ~10,000 × $0.50/million = $0.01/month
  • Total: ~$1-2/month

With 100% sampling (expensive):

  • 2M requests × $5/million = $10/month

Recommendation: Use 5-10% sampling for normal traffic, 100% for errors.

Component #4: Uptime Monitoring ($20/month)

What it does:

  • External HTTP checks every 30-60 seconds
  • Alerts when your site is down (before customers notice)
  • Status page for customers
  • Incident management

Tool: Better Uptime

Why not CloudWatch for this?

  • CloudWatch monitors from inside AWS
  • If AWS region fails, your CloudWatch alarms fail too
  • Need external monitoring to catch AWS outages

Setup:

  1. Create HTTP checks for critical endpoints:

  2. Set up alert channels:

    • Email (immediate)
    • Slack (immediate)
    • PagerDuty (for on-call rotation)
  3. Create public status page:

    • status.example.com
    • Shows uptime, incident history
    • Customers can subscribe to updates

Cost: $20/month (Better Uptime team plan, 30 checks)

Alternatives:

  • UptimeRobot: $7/month (50 checks)
  • Pingdom: $15/month (10 checks)
  • StatusCake: Free (unlimited checks, 5-minute intervals)

Component #5: Error Tracking - Sentry OSS ($50/month self-hosted)

What it does:

  • Captures unhandled exceptions and errors
  • Groups similar errors together
  • Shows user impact (how many users affected)
  • Release tracking (which deploy introduced the bug)
  • Source map support (see original TypeScript/JSX source)

Why self-host instead of Sentry SaaS?

  • Sentry SaaS: $26/month (5K errors) → $80/month (50K errors)
  • Self-hosted: $50/month EC2 costs, unlimited errors

Setup (self-hosted on EC2):

# Launch t3.medium EC2 instance (2 vCPU, 4 GB RAM)
# Install Docker and Docker Compose

# Clone Sentry self-hosted repo
git clone https://github.com/getsentry/self-hosted.git
cd self-hosted

# Run install script
./install.sh

# Start Sentry
docker-compose up -d

Instrument your application:

// Node.js example
const Sentry = require('@sentry/node');

Sentry.init({
  dsn: 'https://your-dsn@sentry.example.com/1',
  environment: 'production',
  release: process.env.GIT_COMMIT,
});

// Capture exceptions
try {
  riskyOperation();
} catch (error) {
  Sentry.captureException(error);
  throw error;
}

// Add context to errors
Sentry.setUser({ id: user.id, email: user.email });
Sentry.setContext('order', { orderId: order.id, total: order.total });

Cost:

  • EC2 t3.medium: $30/month (reserved instance)
  • EBS storage (100 GB): $10/month
  • Data transfer: ~$5/month
  • Total: ~$50/month

Alternative (if you don’t want to self-host):

  • Sentry SaaS: $26-80/month
  • Rollbar: $25/month
  • Bugsnag: $59/month

Component #6: CloudWatch Dashboards ($9/month)

What it does:

  • Single pane of glass for all metrics
  • Customizable charts and widgets
  • Automatic refresh

Setup:

Create 3 dashboards:

1. Application Health Dashboard

  • API request rate (per minute)
  • API latency (p50, p95, p99)
  • Error rate (4xx, 5xx)
  • Active users
  • Database connections

2. Infrastructure Dashboard

  • EC2 CPU utilization
  • Memory utilization
  • Disk I/O
  • Network in/out
  • ECS task count

3. Business Metrics Dashboard

  • User signups (per hour)
  • Orders completed
  • Revenue (daily)
  • Active sessions
  • Conversion funnel

Cost: $3/dashboard/month = $9/month for 3 dashboards

Tip: First 3 dashboards per month are free, then $3 each. Keep it to 3 dashboards to stay in free tier.

Component #7: AWS Budgets + Cost Alerts (Free)

What it does:

  • Tracks AWS spending
  • Alerts when costs exceed thresholds
  • Forecasts end-of-month costs

Setup:

aws budgets create-budget \
  --account-id 123456789 \
  --budget file://budget.json \
  --notifications-with-subscribers file://notifications.json

budget.json:

{
  "BudgetName": "Monthly AWS Costs",
  "BudgetLimit": {
    "Amount": "500",
    "Unit": "USD"
  },
  "TimeUnit": "MONTHLY",
  "BudgetType": "COST"
}

notifications.json:

[
  {
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80
    },
    "Subscribers": [
      {
        "SubscriptionType": "EMAIL",
        "Address": "alerts@example.com"
      }
    ]
  },
  {
    "Notification": {
      "NotificationType": "FORECASTED",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 100
    },
    "Subscribers": [
      {
        "SubscriptionType": "EMAIL",
        "Address": "alerts@example.com"
      }
    ]
  }
]

Cost: Free (first 2 budgets free, $0.02/day per additional budget)

Putting It All Together: The Monitoring Workflow

When an Issue Occurs

1. External monitor (Better Uptime) detects downtime

  • Sends immediate Slack alert: “API is down”
  • On-call engineer gets paged

2. Engineer opens CloudWatch Dashboard

  • Sees spike in 5xx errors
  • Sees spike in API latency
  • Database CPU at 90%

3. Engineer checks CloudWatch Logs Insights

fields @timestamp, @message, endpoint, duration
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100
  • Finds slow database queries timing out
  • Identifies which endpoint is affected

4. Engineer checks X-Ray traces

  • Service map shows API → Database taking 5+ seconds
  • Trace timeline shows specific SQL query is slow
  • Identifies the problematic query

5. Engineer checks Sentry

  • 500 users affected by timeout errors
  • Error first appeared 10 minutes ago
  • Linked to latest deployment

6. Engineer mitigates

  • Scales up RDS instance (temporary fix)
  • Optimizes slow query (permanent fix)
  • Deploys fix

7. Incident resolution

  • Updates status page
  • Posts post-mortem in Slack
  • Creates ticket to prevent recurrence

Total time to diagnosis: 5 minutes (vs. 20-30 minutes with fragmented tooling)

Cost Optimisation Tips

Tip #1: Use Metric Filters Instead of Custom Metrics

Expensive way:

// Publishing custom metric from application
await cloudwatch.putMetricData({ MetricName: 'APIErrors', Value: 1 });
// Cost: $0.30/metric/month

Cheaper way:

# Create metric filter from logs (free if within log ingestion budget)
aws logs put-metric-filter \
  --log-group-name /ecs/api-service \
  --filter-name api-errors \
  --filter-pattern "[timestamp, level=ERROR, ...]" \
  --metric-transformations \
  metricName=APIErrors,metricNamespace=MyApp,metricValue=1
# Cost: $0 (just log analysis, no additional metric charges)

Savings: ~$10-20/month

Tip #2: Sample High-Volume Logs

Don’t log every successful API request at scale.

// Sample 1% of successful requests, log 100% of errors
app.use((req, res, next) => {
  const shouldLog = res.statusCode >= 400 || Math.random() < 0.01;

  if (shouldLog) {
    logger.info('API request', {
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration: res.duration,
    });
  }

  next();
});

Savings: 90% reduction in log volume = ~$100-200/month at scale

Tip #3: Use Shorter Log Retention

Default: 30 days (costs add up for high-volume logs)

Optimised:

  • Application logs: 7 days
  • Error logs: 30 days
  • Audit logs: 90 days (compliance requirement)
  • Archive old logs to S3: $0.023/GB vs. $0.03/GB in CloudWatch

Savings: ~$50-100/month

Tip #4: Self-Host Where It Makes Sense

Sentry SaaS: $80/month for 50K errors Self-hosted Sentry: $50/month EC2 + unlimited errors

When to self-host:

  • High volume (more than free tiers)
  • Predictable costs matter
  • You have DevOps resources

When NOT to self-host:

  • Small volume (free tiers cover you)
  • Don’t want operational overhead
  • Value SaaS features (integrations, support)

What This Stack Can’t Do (vs. Enterprise Tools)

Limitations vs. Datadog/New Relic:

1. No unified UI

  • Datadog has one dashboard for everything
  • This stack requires switching between AWS console, Sentry, Better Uptime

2. Less sophisticated APM

  • X-Ray is good but not as powerful as Datadog APM or New Relic
  • Missing flame graphs, advanced profiling
  • Workaround: Add open-source profiling (pyroscope, pprof) if needed

3. No machine learning anomaly detection

  • Datadog uses ML to detect anomalies automatically
  • This stack requires manual threshold-based alarms
  • Workaround: Use CloudWatch Anomaly Detection (extra cost)

4. Steeper learning curve

  • Datadog is designed for ease of use
  • This stack requires AWS knowledge and some DIY integration

5. Limited multi-cloud support

  • This stack is AWS-native
  • Datadog works across AWS, GCP, Azure seamlessly

When you should upgrade to enterprise monitoring:

  • Your AWS bill is > $20K/month (monitoring cost becomes less significant)
  • You need unified cross-cloud monitoring (AWS + GCP + Azure)
  • Your team wants ML-powered anomaly detection
  • You value a single pane of glass over cost savings

Conclusion

Total monthly cost: ~$450

What you get:

  • Infrastructure monitoring (CloudWatch Metrics + Alarms)
  • Centralized logging with powerful queries (CloudWatch Logs Insights)
  • Distributed tracing (AWS X-Ray)
  • Error tracking (Sentry self-hosted)
  • Uptime monitoring (Better Uptime)
  • Cost monitoring (AWS Budgets)

vs. Datadog at ~$3,200/month for the same workload

Savings: $2,750/month = $33,000/year

Time to set up: 1-2 days for experienced engineer, 3-5 days for beginner

Ongoing maintenance: 2-4 hours/month (mostly Sentry updates and dashboard tweaks)

ROI: Pays for itself in the first month if you’re currently on expensive enterprise monitoring.

This stack isn’t perfect. It requires more setup and AWS knowledge than buying Datadog. But for cost-conscious startups, it’s a battle-tested solution that provides 90% of the observability at 15% of the cost.

Need help setting up a cost-effective monitoring stack for your AWS infrastructure? Our DevOps Automation Services include complete monitoring setup with CloudWatch, X-Ray, logging, alerting, and dashboard configuration delivered in 6 weeks for $8,500.