Azure Infrastructure & Site Reliability Engineer
Listed on 2026-01-24
-
IT/Tech
Cloud Computing
ABOUT GREYSTAR
Greystar is a leading, fully integrated global real estate platform offering expertise in property management, investment management, development, and construction services in institutional‑quality rental housing. Headquartered in Charleston, South Carolina, Greystar manages and operates over $300 billion of real estate in more than 250 markets globally with offices throughout North America, Europe, South America, and the Asia‑Pacific region. Greystar is the largest operator of apartments in the United States, managing over 1,000,000 units/beds globally.
Across its platforms, Greystar has nearly $79 billion of assets under management, including over $35 billion of development assets and over $30 billion of regulatory assets under management. Greystar was founded by Bob Faith in 1993 to become a provider of world‑class service in the rental residential real estate business. To learn more, visit
We seek an experienced and highly skilled Azure Infrastructure and Site Reliability Engineer to join Greystar’s Data, Digital, and AI team (D2AI). As an Azure infrastructure engineer, this role involves designing, implementing, and managing Azure‑based infrastructure and cloud solutions with additional expertise in Azure Databricks. The ideal candidate will be responsible for ensuring system scalability, security, and performance by leveraging best Dev Ops practices, Infrastructure as Code (IaC), and advanced data processing workflows.
This is a 100% hands‑on role that requires deep technical expertise and the ability to collaborate effectively across teams.
As an SRE on the D2AI team, you will be responsible for ensuring the stability, performance, and availability of our cloud‑based internally and externally customer‑facing products. Your role will be crucial in ensuring seamless operations and rapid issue resolution.
What You Will Do Azure Architecture and Resource Management- Design, implement, and manage Azure solutions to meet technical and operational needs.
- Maintain comprehensive network and server documentation, including infrastructure diagrams, server configurations, standard operating procedures, and incident reports.
- Optimize Azure resource configuration for performance, cost, and security.
- Monitor the health and reliability of Azure resources, ensuring high availability.
- Continuously monitor network/server performance, using advanced network management and server administration tools to identify issues proactively.
- Monitor and manage the health, performance, and availability of our applications running on Azure (including ADF pipelines).
- Collaborate with development teams to align Azure architecture with application requirements.
- Provide guidance on best practices for Azure resource provisioning, scaling, and configuration.
- Enable teams to leverage Azure services, including Data Bricks, for analytics and data workflows.
- Detect and analyze network and server anomalies, security threats, and performance bottlenecks. Initiate incident response procedures and coordinate with relevant teams for swift resolution.
- Troubleshoot:
Investigate and resolve infrastructure, network/server‑related issues, escalating complex problems to higher‑level teams, and maintain detailed incident documentation.
- Set up and manage Azure Databricks environments for big data processing and advanced analytics.
- Support and optimize Databricks pipelines for data engineers and scientists.
- Effectively troubleshoot and resolve Databricks‑related challenges.
- Develop and maintain Infrastructure as Code (IaC) scripts using Terraform.
- Implement Dev Ops practices, including CI/CD pipelines, automated testing, and monitoring.
- Streamline workflows by collaborating with IT and operations teams.
- Act as the primary point of contact for Azure‑related issues within the project.
- Investigate, diagnose, and resolve complex technical issues in collaboration with development and operations teams.
- Implement preventive measures to minimize downtime and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).