Site Reliability Engineer (Vancouver)

Gauss Labs


Date: 6 hours ago
City: Vancouver, BC
Contract type: Full time
Remote
Gauss Labs is seeking a highly skilled Site Reliability Engineer to join our team in Vancouver. As an SRE at Gauss Labs, you will play a critical role in ensuring our industrial AI platform's reliability, performance, and scalability. You will be responsible for building and maintaining a robust solution that supports our growing business at customer sites. This role requires a high level of technical expertise, a collaborative mindset, and a strong desire to continuously improve systems and processes.

Responsibilities

  • Monitoring and Alerting: Creating and maintaining robust monitoring systems to proactively identify and resolve issues before they impact customers. Implementing effective alerting mechanisms to ensure timely response to critical events
  • Incident Response: Participating in on-call rotations and leading incident response efforts to minimize downtime and restore service quickly
  • Automation: Developing and implementing automation tools and scripts to streamline operations, reduce manual effort, and improve efficiency
  • Capacity Planning: Forecasting resource needs, optimizing resource utilization, and ensuring customers' infrastructure can handle increasing workloads
  • Performance Optimization: Identifying and resolving performance bottlenecks, optimizing system performance, and improving response times
  • Collaboration: Partnering with software engineers, data scientists, and other teams to ensure alignment and efficient operations
  • Customer Focus: Working closely with the AI Program Manager and Technical Account Manager to understand customer issues, provide technical support, and improve customer satisfaction
  • Continuous Improvement: Driving a culture of continuous improvement by identifying opportunities to enhance system reliability, performance, and efficiency


Basic Qualifications

  • Bachelor's degree in computer science, engineering, or a related discipline
  • 5+ years of industry experience as a Site Reliability Engineer
  • Experience with cloud platforms (AWS, GCP, Azure), containerization technologies (Docker, Kubernetes), observability and alerting tools (Prometheus, Grafana, ElasticSearch, Jaeger)
  • Experience with scripting languages (Python, Bash)
  • Working knowledge of Github, Github actions, CI/CD concepts
  • Experience in ticket management, issue resolution, and troubleshooting
  • Strong problem-solving and troubleshooting skills
  • Excellent customer communication and interpersonal skills, fluency in verbal and written English


Preferred Qualifications

  • Knowledge of AI/ML infrastructure and workloads
  • Knowledge of big data technologies (Kafka, Flink)
  • Knowledge of database technologies (MongoDB, PostgreSQL)


[Hiring process]

Application review - Phone interview - Virtual onsite interview - VP interview/Core Value interview

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Engineer II

BrandSafway, Vancouver, BC
1 day ago
At BrandSafway, we know our employees are our greatest asset, which is why we give them the tools, training, and resources to be successful. Come join our growing team. We are looking for an Engineer II.Key ResponsibilitiesReview and approve detailed application drawings to ensure compliance with corporate standards as well as local regulatory and safety requirements.Under minimal supervision may be...

Senior Software Engineer (Java)-R-250099

Mastercard, Vancouver, BC
6 days ago
Our PurposeMastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services...

General Duty Nurse – Women’s Substance Use Program

Provincial Health Services Authority, Vancouver, BC
1 week ago
Job SummaryIn accordance with the Mission, Vision and Values, and strategic directions of Provincial Health Services Authority, patient safety is a priority and a responsibility shared by everyone at PHSA. As such, the requirement to continuously improve quality and safety is inherent in all aspects of this position.Working collaboratively as part of an interdisciplinary team and within an evidence based...