[Dubai] Site Reliability Engineer
Syndica
This job is no longer accepting applications
See open jobs at Syndica.See open jobs similar to "[Dubai] Site Reliability Engineer" Inspired Capital.At Syndica, big things happen. Every day, we’re translating vision into reality by tackling new and exciting challenges head-on. This is a breakthrough stage in our company, and you’ll experience firsthand the infectious enthusiasm of our employees and leadership team. You’ll have the opportunity to learn new skills, grow your career, and work with the smartest, most passionate people in crypto.
This role will have primary accountability for maintaining and operating Syndica’s blockchain infrastructure platform. Golang knowledge is a necessity! The team operates with a “run what you write” philosophy and each engineer is responsible for deploying and operating the code they write.
A successful candidate must have demonstrable experience in at least one programming language (preferably Go, Rust or C++), and previous work in SaaS application development and operations. You will be working closely with the Support and Development team on the architecture and configuration of our AWS and GCP hosted infrastructure as well as management of our bare metal RPC nodes. You will be responsible to ensure the environment is configured, managed, and monitored correctly to support the business. You will drive decisions on the right-sizing of servers and storage, troubleshooting performance issues, ensuring the highest level of reliability for the platform, and tuning the environment for maximum scalability, cost efficiency, and security. The ideal candidate will also have prior experience developing applications on either of the three major cloud platforms - AWS, Azure, or GCP via Kubernetes.
Responsibilities
Design, creation, and provisioning of infrastructure
Administer overall site availability, security, latency and system health
Responsible for effective provisioning, installation/configuration, operation, and maintenance of services and system software and related infrastructure
Administer the state of all components in our cloud and bare metal environments
Deploy, manage, and operate the cloud environments
Design, build, manage and operate the infrastructure and configuration of SaaS applications with a focus on automation and infrastructure as code
Design, manage and operate the infrastructure as a service layer (hosted and cloud-based platforms) that supports the different platform services
Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, and other similar tools
Create the environments and tooling that enables the development team to release code quickly and reliably
Identify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, to operating environment, network, and application
Evaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plans
Troubleshoot and solve customer RPC issues
Ensure that SLAs are met in executing operational tasks
Work with development teams to ensure best practices for scalability, reliability, and security are designed and implemented from the start
-
Conduct periodic on-call duties
Qualifications
Great collaborator with 5+ years of experience in a DevOps or SRE role
Deep understanding of infrastructure-as-code (Terraform, etc.) and deploying large-scale systems reliably
Strong experience with Infrastructure as Code and Configuration Management tools
Experience with Prometheus/Grafana for metrics aggregation/visualization
Configuration of CI/CD pipelines
Experience using Kubernetes
Experience with automation tools/platforms
Experience with alerting and monitoring tools
Strong knowledge of monitoring and performance analytics tools (DataDog, New Relic, etc.)
Commitment to implementing reliability and security best practices
Capacity planning experience, including resource optimization and load testing
Experience working in a highly distributed company is a plus
Align a portion of your day with the business hours of Central Time Zone - UTC -6
Working knowledge of information security issues
Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code
Systematic problem-solving approach, combined with a strong sense of ownership and drive
Firm grasp of at least one modern programming language, beyond advanced scripting (Shell or Python)
Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
-
Experience writing automation tools & eagerness to "automate all the things"
What does success in this role look like?
In three months, you have become our infrastructure administrator with respect to overall site availability, security, latency, system health, customer accounts, and billing. You’ll have taken on independent code review responsibilities and are collaborating on the design of new features
In six months, you have earned the trust of the team and are delivering tasks through the entire SDLC, from design through development with minimal guidance, and are helping to effectively mentor new engineers joining the team
In twelve months, you have established a cadence of predictable, on-time delivery without cutting corners
This job is no longer accepting applications
See open jobs at Syndica.See open jobs similar to "[Dubai] Site Reliability Engineer" Inspired Capital.