Systems Engineer ; AWF
Listed on 2026-03-02
-
Engineering
Systems Engineer
Serving Maryland and the Greater Washington D.C. area, Sage Cor Solutions (Sage Cor) is a growing company bringing complete engineering services and true full lifecycle System Engineering services to areas requiring (or desiring) nationally-recognized expertise in high performance computing, large data analytics and cutting edge information technologies.
Active TS/SCI w/ Polygraph required.Required Experience:
- A High School Diploma or GED plus nineteen (19) years of general system engineering experience.
- A Bachelor’s degree in a Qualified Engineering Field or a related discipline from an accredited college or university plus fifteen (15) years of systems engineering experience.
- A Master’s degree in a Qualified Engineering Field or a related discipline from an accredited college or university plus thirteen (13) years of systems engineering experience.
- A PhD in a Qualified Engineering Field or related discipline from an accredited college or university plus thirteen (13) years of systems engineering experience.
Description:
The project represents a foundational effort targeted at developing full reliability and resiliency for the newest HPC systems and customers. Reviewing the reliability class requirements, mission customer needs, IT system requirements, implementation, project planning, risk management, etc. Team members independently analyze various elements, develop recommendations, engage in planning activities and implement various elements, validate and verify solutions through testing, integrate solutions to gaps identified, and support ongoing risk management.
Example tasks include:
- Determine the most critical mission support activities and corresponding IT assets, such as servers, applications, and data, which are essential for business continuity. Prioritize these assets in the resiliency plan based on their importance to the organization.
- Formulate strategies to recover critical IT assets following a disruptive event. This may include implementing backup systems, adopting redundancy measures, or using alternative work locations.
- Independently analyze and recommend redundancy measures for critical components, such as backup systems, communication networks, and data centers, to ensure continuous operation in case of a disruption. Also, establish fault tolerance mechanisms to detect, isolate, and recover from errors to maintain mission continuity.
- Develop a detailed disaster recovery plan, focusing on the requirements of high criticality missions. Define recovery time objectives (RTOs) and recovery point objectives (RPOs) based on the mission's specific needs and create procedures to restore critical systems and infrastructure to functional states. Regularly test and update the disaster recovery plan to maintain its effectiveness.
- Continuously review and update the resiliency plan to reflect changes in the infrastructure, business requirements, and potential threats. This may include regular testing and validation of the plan to ensure its effectiveness.
- Work with the customer metrics and monitoring team to introduce new metrics capabilities to support the resiliency program.
- Regularly review the plan's effectiveness and performance through post-event analysis, audits, or reviews. Implement improvements and updates to the plan as necessary based on lessons learned and changing circumstances.
- Involve key stakeholders in the development and implementation of the resiliency plan, such as upper management. IT staff, and external partners like vendors or service providers.
- Ensure that the resiliency plan aligns with relevant standards, guidelines, and regulations that govern highly critical missions. Conduct regular audits and assessments to maintain compliance and promote continuous improvement.
- Analyze and propose ongoing risk management and monitoring processes to identify and respond to emerging threats, changes in operational environments, and technology advancements. Regularly update and adapt the resiliency plan to maintain the mission's overall resilience.
- Experience with mission assurance, reliability and resiliency planning and stakeholder engagement is beneficial.
Consistent with federal and state law where Sage Cor conducts business, Sage Cor Solutions provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status, or any other protected class.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).