Site Reliability Engineer

Job description

Who is Astronomer?

Founded in 2015, Astronomer is a modern clickstream infrastructure platform to collect, route, and prep both clickstream and customer data, optimized for ease of initial implementation, flexibility, security and total cost of operation. Our team comes from a diverse mix of multidisciplinary backgrounds and is backed by notable venture investors including Frontline Ventures, 500 Startups, Angelpad and CincyTech.

Role Description

You will work on large-scale system design and troubleshooting, and be fluent in systems programming and automation. You will have a desire to take on the complex problems of scale. Familiarity with running production environments at scale is crucial in this job along with an in-depth understanding of linux systems internals, and networking. Must have experience in at least some of: Mesos, Kafka, Airflow, Marathon, DC/OS, AWS, Google Cloud, CoreOS, Python, Go.


  • Design, write and deliver software to improve the availability, scalability, latency, and efficiency of our services.
  • Solve problems relating to mission critical services and build automation to prevent problem recurrence with the goal of automating response to all non-exceptional service conditions.
  • Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
  • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
  • Build and maintain monitoring and alerting systems.