About us

Fortiro is a growing fintech with a highly experienced team, backed by significant investment, and giving you the opportunity to contribute to an award-winning enterprise SaaS platform. Our platform uses emerging technologies and AI to help detect financial crime and enable automation in processes like lending and insurance claim verification. It is trusted by leading banks and financial services businesses in Australia, New Zealand, and globally.

About the role

We are looking for a Cloud Observability Engineer to take ownership of how we monitor, detect, and respond to system health and performance across our AWS platform. Right now, our senior engineers are drowning in noisy alerts. Most of them don't matter and they're reacting instead of preventing. We need someone who can flip that script: design observability so we catch problems early, kill the noise, and make alerts rare, accurate, and actionable.

Your mission is to design, build, and continuously refine our observability stack so that performance degradation, latency trends, and execution issues are identified early, long before they become customer facing incidents. When an alert does fire, it should be rare, accurate, and actionable.

This is a hands-on role with real ownership. You will be the architect of our signal, the owner of alert quality, and a critical contributor to platform stability in a high trust, high responsibility environment.

What you'll be doing

Own and improve observability across our AWS environment, focusing on performance trends, latency, and early warning signals rather than simple uptime.
Build deep visibility into serverless and event‑driven systems, including Lambda, Step Functions, and distributed workflows.
Design and continuously refine alerting rules to reduce noise and improve signal quality using anomaly detection and intelligent thresholds.
Create clear, high‑level dashboards that give engineers fast insight into overall system health.
Trace requests end to end across distributed services using modern observability standards such as OpenTelemetry and OpenSearch.
Automate responses to common warning conditions to reduce manual intervention and prevent repeat issues.
Actively monitor and triage alerts during rostered periods, including weekend and public holiday coverage.
Build and maintain runbooks and documentation to support fast, consistent incident response.
Use post‑incident learnings to improve monitoring, alerting, and automation over time.

What you’ll bring to the role

Strong experience operating AWS native environments with a deep understanding of ECS, Lambda, Step Functions, and core AWS services.
Proven experience managing and monitoring production AWS platforms.
Strong expertise in observability concepts including metrics, logs, tracing, and anomaly detection.
Hands on experience with OpenSearch, OpenTelemetry, and distributed tracing techniques.
Ability to define and manage monitoring infrastructure through Infrastructure as Code using Terraform, CloudFormation, or AWS CDK.
Experience automating operational responses using scripts or workflows.
Strong analytical mindset with the ability to spot patterns, trends, and early warning signals.
Clear written and verbal communication skills, especially when translating system behaviour into actionable insights.
Ownership mentality with a bias toward prevention rather than reaction.
Comfort operating in a role with active monitoring responsibilities and rostered coverage expectations.

Why join Fortiro?

Be yourself: Flexible working style in a supportive, inclusive environment
Make an impact: Directly reduce incidents and improve platform reliability for enterprise customers
Work on meaningful systems: Build observability for high‑scale, high‑trust financial services platforms
Grow with us: Expand your technical influence as the platform and team scale
Long‑term focus: We invest in people who care about building things properly

No recruitment agencies please. We have this covered and will reach out if we need support

Cloud Observability Engineer

Turn noise into signal. Detect issues early. Keep our platform healthy.