Site Reliability Engineer
Krila Consultancy & Recruitment

Site Reliability Engineer
Location: Onsite – Kanata, Ontario
About Our Client
Imagine a startup delivering real-time data insights that empower businesses to make smarter, faster decisions. Backed by one of the world’s top tech groups, we blend cutting-edge technology with deep expertise to help companies stay agile and ahead of the curve. With the strength of a powerhouse behind us, we drive innovation and create transformative solutions for today’s dynamic markets.
Edge Signal provides a full-fledged edge computing platform powering computer-vision applications across Retail, Hospitality and Warehousing. they run entirely on AWS, ingesting and analyzing massive fleets of on-premise devices with Datadog monitoring.
We’re looking for an experienced Site Reliability Engineer to keep their cloud and edge infrastructure running flawlessly—and to help their customers get up and running smoothly.
This position is based at their head office in Kanata, Ottawa, reporting to the Director of Technology
What You’ll Do
Operations
- Ensure highly available, fault-tolerant AWS services (auto-scaling, disaster recovery, capacity planning).
- Build and maintain Datadog dashboards, monitors and alerts for cloud resources and edge devices; author runbooks and automation scripts for incident response.
- Develop tooling to provision, update and health-check thousands of edge devices; ingest device telemetry into Datadog for unified observability.
- Automate routine ops tasks (onboarding steps, incident remediation) using shell, Python, etc.
Onboarding :
- Lead customer installations by configuring IP cameras, NVRS, and Edge Signal agents on-site.
- Guide network, security and firmware setups to ensure seamless data flow from device to cloud.
Support
- Triage and resolve Freshdesk tickets; conduct root-cause analysis and drive timely closure.
- Convert complex issues into Jira epics/stories and collaborate with product teams to ship fixes.
Compliance
- Manage AWS IAM (users, roles, policies, SSO) and enforce security best practices.
- Monitor and optimize AWS spend—set budgets, report usage and recommend cost-savings strategies.
- Integrate secrets management, vulnerability scanning and other compliance controls.
What You Will Bring
- A minimum of a Bachelor's degree in Computer Science or a related field in engineering is required;
- Min 3+ years as an SRE or DevOps engineer supporting production AWS environments.
- Proven expertise in Datadog (APM, Infrastructure, Logs, Synthetic checks)
- Strong Linux administration skills and proficient scripting ability (Bash, Python, or Go)
- Experience with AWS IAM, SSO, Control Tower, cost-management tools, and billing dashboards
- Excellent communicator with a bias toward collaboration and customer empathy
Bonus Points
- Prior work with edge computing or IoT device fleets
- Experience configuring IP cameras, RTSP streams, and NVR systems
- Freshdesk and Jira administration experience
- AWS DevOps or Solutions Architect certification
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Technicien en pharmacie agréé

Senior Member of Technical Staff - Sys - CD1

Registered Practical Nurse
