×
Register Here to Apply for Jobs or Post Jobs. X

SRE Metrics Analyst Intern

Job in Reston, Fairfax County, Virginia, 22090, USA
Listing for: Leidos
Apprenticeship/Internship position
Listed on 2026-03-01
Job specializations:
  • IT/Tech
    Cloud Computing, SRE/Site Reliability, IT Project Manager
Job Description & How to Apply Below

Description

We are seeking a detail-oriented and analytical SRE Metrics Analyst Intern to join our Site Reliability Engineering (SRE) team. In this role, you will be responsible for establishing and managing the collection of metrics related to system performance, reliability, and incidents. You will develop and maintain reporting frameworks to provide actionable insights to stakeholders, driving improvements in our systems and processes.

Your work will support the organization's commitment to delivering high-quality, reliable services.

This role is 50% telework and candidates must be local to the following cities:

Norfolk, VA

Jacksonville, FL

Bremerton, WA

San Diego, CA

Key Responsibilities:

Metrics Collection Framework
:

* Design and implement a comprehensive metrics collection framework that captures key performance indicators (KPIs) related to system reliability and operational efficiency.

* Identify relevant metrics and establish methods for collecting, aggregating, and storing data from various sources, including monitoring tools, logs, and databases.

Data Analysis and Visualization
:

* Analyze collected metrics to identify trends, patterns, and anomalies that impact system reliability and performance.

* Develop dashboards and visualizations to present data in a clear and actionable manner using tools such as Grafana, Kibana, or Tableau.

* Ensure that stakeholders have access to real-time insights and reports that inform decision-making.

Reporting
:

* Create regular reports on system performance, reliability, incident response times, and other critical metrics for various stakeholders, including technical teams and management.

* Provide insights and recommendations based on data analysis to drive continuous improvement initiatives.

* Prepare and present findings to stakeholders, facilitating discussions on reliability goals and performance enhancements.

Collaboration with SRE Teams
:

* Work closely with SRE teams to identify their metric needs and ensure alignment with operational goals.

* Collaborate with engineering and operations teams to ensure that metric collection is integrated into development and deployment processes.

* Support incident response efforts by providing metrics that help identify root causes and areas for improvement.

Continuous Improvement
:

* Stay current with industry trends and best practices related to metrics collection, monitoring, and reporting within SRE and Dev Ops.

* Continuously evaluate and enhance the metrics collection and reporting processes to improve data accuracy, relevance, and accessibility.

* Foster a culture of data-driven decision-making within the SRE team and broader organization.

Key

Qualifications:

* Enrolled in a degree program in a related major - GPA 3.0 or better

* US citizenship required

* Ability to obtain and maintain a DoD security clearance

Experience
:

* Experience in metrics collection, data analysis, or reporting, preferably in a Site Reliability Engineering or Dev Ops environment.

* Proven experience in working with monitoring and observability tools (e.g., Prometheus, Datadog, New Relic).

Technical Skills
:

* Strong understanding of key metrics used in site reliability engineering, including SLIs, SLOs, and SLAs.

* Proficiency in data analysis tools and languages (e.g., SQL, Python, R) for data manipulation and reporting.

* Experience with data visualization tools (e.g., Grafana, Kibana, Tableau) to create dashboards and reports.

Analytical Skills
:

* Strong analytical and problem-solving skills, with the ability to interpret complex data sets and provide actionable insights.

* Ability to evaluate the relevance and accuracy of metrics and make recommendations for improvement.

Communication and Collaboration
:

* Excellent communication skills, both written and verbal, with the ability to present data and findings to technical and non-technical audiences.

* Proven ability to work collaboratively with cross-functional teams and build strong relationships with stakeholders.

Preferred Qualifications:

* Experience with cloud platforms (AWS, GCP, Azure) and their monitoring tools.

* Familiarity with incident management processes and practices within an SRE context.

* Knowledge of software development methodologies and best practices.

Key Metrics of Success:

* Timely and accurate collection of key performance metrics with minimal data discrepancies.

* Effective visualization and reporting of metrics that inform decision-making and drive improvements in reliability.

* Positive feedback from stakeholders regarding the clarity and usefulness of reports and insights.

* Continuous improvement in the SRE metrics collection and reporting processes, leading to better operational performance.

Why Join Us?

Be part of a dynamic and innovative team focused on enhancing the reliability and performance of critical systems. Play a key role in shaping the metrics strategy that drives operational excellence and continuous improvement. Work in an environment that values collaboration, professional development, and a…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary