Key Responsibilities
- Design, deploy, and manage Apache Kafka clusters in development/testing/production environments.
- Proven experience deploying and managing Apache Spark and Apache Flink in production environments.
- Optimize Kafka performance, reliability, and scalability for high-throughput data pipelines.
- Ensure seamless integration of Kafka with other systems and services.
- Manage and troubleshoot Linux-based systems (Ubuntu) supporting Kafka infrastructure.
- Manage, fine-tune, deploy and operate Kafka on Kubernetes clusters, using Helm, Operators, or custom manifests Kafka
- Collaborate with cross-functional teams to identify and implement Kafka use cases.
- Contribute to automation and Infrastructure as Code (IaC) practices through CI/CD pipeline with gitlab
- Monitor system health, implement alerting, and ensure high availability.
- Participate in incident response and root cause analysis for Kafka and related systems.
- Evaluate and recommend Kafka ecosystem tools like Kafka Connect, Schema Registry, MirrorMaker, and Kafka Streams.
- Build automation and observability tools for Kafka using Prometheus, Grafana, Fluent Bit, etc.
- Deep understanding of streaming and batch processing architectures.
- Familiarity with Spark Structured Streaming and Flink DataStream API.
- Work with teams to build end-to-end Kafka-based pipelines for various applications (data integration, event-driven microservices, logging, monitoring).
- Experience running Spark and Flink on Kubernetes, YARN, or standalone clusters.
- Proficiency in configuring resource allocation, job scheduling, and cluster scaling.
- Knowledge of checkpointing, state management, and fault tolerance mechanisms.
- Ability to tune Spark and Flink jobs for low latency, high throughput, and resource efficiency.
- Experience with memory management, shuffle tuning, and parallelism settings.
- Familiarity with Spark UI, Flink Dashboard, and integration with Prometheus/Grafana.
- Ability to implement metrics collection, log aggregation, and alerting for job health and performance.
- Understanding of TLS encryption, Kerberos, and RBAC in distributed environments.
- Experience integrating with OAuth, or other identity providers.
- Familiarity with time-series databases
Required Qualifications
- 5+ years of experience administering and supporting Apache Kafka in production environments.
- Strong expertise in Linux system administration (Red Hat and Debian).
- Solid experience with Kubernetes (CNCF distributions, OpenShift, Rancher, or upstream K8s ).
- Proficiency in scripting (Bash, Python) and automation tools (Ansible, Terraform).
- Experience with Kafka security, monitoring (Prometheus, Grafana, Istio), and schema management.
- Familiarity with CI/CD pipelines and DevOps practices.
- Proficient in scripting and automation (Bash, Python, or Ansible).
- Comfortable with Helm, YAML, Kustomize, and GitOps, GitLab principles.
- 4+ years of experience in Apache Spark development, including building scalable data pipelines and optimizing distributed processing.
Top Skills
What We Do
Backed by a legacy of engineering excellence, reliability, and industry-leading customer service, Telesat is one of the largest and most successful global satellite operators. Telesat works collaboratively with its customers to deliver critical connectivity solutions that tackle the world’s most complex communications challenges, providing powerful advantages that improve their operations and drive profitable growth.
Continuously innovating to meet the connectivity demands of the future, Telesat Lightspeed, the company’s Low Earth Orbit (LEO) satellite network, will be the first and only LEO network optimized to meet the rigorous requirements of telecom, government, maritime and aeronautical customers. Telesat Lightspeed will redefine global satellite connectivity with ubiquitous, affordable, high-capacity links with fiber-like speeds.