×
Register Here to Apply for Jobs or Post Jobs. X

Lead SRE Engineer

Job in Baltimore, Anne Arundel County, Maryland, 21217, USA
Listing for: TEKsystems
Full Time position
Listed on 2026-03-03
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Description

Think of TEKsystems Global Services (TGS) as the growth solution for enterprises today. We unleash growth through technology, strategy, design, execution and operations with a customer-first mindset for bold business leaders. We deliver cloud, data and customer experience solutions. Our partnerships with leading cloud, design and business intelligence platforms fuel our expertise. We value deep relationships, dedication to serving others and inclusion.

We drive positive outcomes for our people and our business, and we stay true to our commitments and act in harmony with our words. We exist to create significant opportunity for people to achieve fulfillment through career success. Ready to join us?

Here's what the opportunity supported through our TGS Talent Acquisition Team requires:

We are seeking a highly skilled and experienced Lead Site Reliability Engineer (SRE) to join our dynamic team. The ideal candidate will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.

The qualified candidate must demonstrate strong communication skills to collaborate with and influence many stakeholders across the organization and possess a deep technical background across technology stacks, including applications, data and messaging frameworks, and infrastructure components.

This individual should also demonstrate exceptional leadership skills in leading a technical team, recruiting team members, and growing the organization;

Responsibilities:

· Design, implement, and maintain scalable monitoring APM solutions (Dynatrace, Datadog, or New Relic) to ensure the reliability and performance of our systems.

· Express a bias for action by identifying inefficiencies and proposing solutions, working independently and collaboratively with the team.

· Develop and maintain automated alerting and incident response processes to proactively identify and address potential issues.

· Collaborate with cross-functional teams to define and implement best practices for monitoring, logging, observability, and incident management.

· Drive continuous improvement initiatives to enhance system reliability, scalability, and performance.

· Automate infrastructure provisioning, configuration, and deployment processes using Terraform and other infrastructure-as-code tools.

· Work closely with development teams to integrate monitoring and observability into the CI/CD pipeline.

· Provide guidance and mentorship to junior team members, fostering a culture of continuous learning and growth.

· Contribute to the SRE group in various technology domains and SRE practices, such as observability framework, resiliency, Dev Sec Ops , etc.

· Setup/Enhance SRE best practices in the areas of observability, automation, resiliency, etc.,

· Be accountable for the delivery and performance of SRE Teams.

· Evangelize the SRE practice across organizational boundaries.

· Recommend relevant and implementable technologies that not only represent the state-of-art SRE practices/trends but also benefit the overall application modernization journey.

· Architect new platforms/libraries/tool chains/APIs to enable broad-scope SRE adoption.

· Drive the broader developer community to adopt the SRE best practices.

· Develop playbooks on various SRE, Dev Ops, and related topics.

Required Skills &

Qualifications:

· Bachelor's degree in computer science or information technology fields or equivalent professional experiences. A master's degree is preferred.

· 5+ years of professional Site Reliability and Dev Ops career experience.

· 10+ years of total IT experience building complex systems.

· In-depth familiarity with SRE terminologies, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, incident management, postmortem analysis, Recovery Time Objectives (RTOs), and Recovery Point Objectives (RPOs).

· Ability to identify organization-wide gaps in the SRE domain and identify implementable solutions that contribute to the transformation of the organization.

· Ability to build and lead high-performance SRE teams to consistently achieve business results.

· Expertise with monitoring, APM, and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary