Sr. Site Reliability Engineer (SRE)
Location: Manhattan NYC, NY
Remote work: Employees can work remotely (100% US remote)
Job Type: Full-time
GETTR is one of the fastest growing social media platform founded on the principles of free speech, independent thought and rejecting political censorship and “cancel culture.” With best-in-class technology, our goal is to create a marketplace of ideas to share freedom and democracy around the world. We foster a healthy marketplace of ideas, where smart, motivated, curious people bring a diversity of opinions and the courage of their convictions for idea collaboration.
We are seeking an experienced Senior SRE who is passionate about the security, performance, and reliability of applications hosted in our global multi-cloud and on-prem data centers. You will be responsible for all aspects of our builds and PROD deployment environments including scaling, provisioning, monitoring, and automation. You should have significant CI/CD experience and strong CLOUD system administration skills. Java and Spring Boot development experience will also be a plus. The successful candidate will be part of a collaborative “DevOps&SRE” business unit which provides continual proactive technical infrastructure support & improvement on all environments of GETTR services/products and platforms, ensuring their optimal system’s availability and reliability. You must contribute to the improvement of DevOps implementation while applying with the IT security’s best practices.
Participate in all stages of infrastructure provisioning, primarily providing the staging and production support.
Assist in implementation of security best practices and initiatives at all levels of the systems infrastructure.
Adhere with SRE (Site Reliability Engineering) principles/pillars on incident management and service level objectives.
Work closely with DevOps engineers to apply/improve the automation scripts and system designs shared by DevOps to improve systems efficiency in production environment.
Ensure maximum uptime and stability of cloud and on-premises environments, especially in staging and production environments.
Apply the latest OS and security patches ensuring the compatibility of underlying running application.
Lead on conducting in the disaster recovery/business continuity (DRBC) routine exercises.
Handle help desk & JIRA tickets and mitigate any production issues.
Ensure accurate knowledge base documentation in a timely manner.
Strong knowledge of secure web app deployments in AWS (4+ years).
Advanced experience as a Linux or Windows server administrator.
The ability to work with little supervision; must be self-driven and motivated.
Experience with continuous integration/continuous delivery (CI/CD) — Jenkins and Git.
Experience with containerized microservices delivered with Docker, Kubernetes (Kops, AWS EKS), or OpenShift 4.x.
Manage & optimize unified logging system and APM (Application Performance Management) monitoring tools, constantly reduce the MTTR (Mean Time to Recovery).
Strong experience with hybrid infrastructure systems monitoring and proactive incident management.
Strong scripting skills using Shell and Python or Go (a plus).
Ability to proactively triage on troubleshooting urgent production issues under high time pressure with precision.
Experience in working collaboratively with various applications development teams throughout the organization to resolve mission critical problems.
Excellent written and oral communication skills necessary to produce and process technical documents.
Excellent problem-solving and analytical skills and the ability to translate business requirements into information systems solutions.
Experience with IT security.
Someone who is a team player.
Familiarity/experience with the DevOps process.
Professional IT certifications, such as Red Hat Certified Engineer/Windows Server, and AWS certifications (a huge plus).
Relevant work experience (8+ years), either in software development or IT infrastructure.
Master’s degree in technology related, engineering or computer science (a plus).
Participate in a weekly on-call rotation (~every 3-4 weeks) as needed.
Provide mission critical production support in case of an outage during off business hours if necessary.
GETTR USA, Inc., is a privately-held, American social media company. Launched on July 4, 2021 by its Chief Executive Officer, former Senior Trump Advisor, Jason Miller, GETTR celebrates free speech, rejects cancel-culture, and provides a best-in-class technology platform for the marketplace of ideas