×
Register Here to Apply for Jobs or Post Jobs. X

EOP - System Reliability Engineer - TS​/SCI

Remote / Online - Candidates ideally in
Washington, District of Columbia, 20022, USA
Listing for: cFocus Software Incorporated
Remote/Work from Home position
Listed on 2026-03-06
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 130000 USD Yearly USD 100000.00 130000.00 YEAR
Job Description & How to Apply Below
Position: EOP - System Reliability Engineer - TS/SCI Required

cFocus Software seeks a System Reliability Engineer to join our program supporting the Executive Office of the President. This position is remote. This position requires a TS/SCI clearance.

Qualifications:
  • 5+ years and Bachelor's Degree in Computer Programming, Science, Engineering or a related technical discipline, or the equivalent combination of education, technical training, or work/military experience, including:
  • 3+ years of related systems programming experience
  • Experience maintaining an operational environment and use of monitoring tools and dashboard interfaces (ie. Kibana, Grafana)
  • Experience working with container images and platforms (Kubernetes/Docker)
  • Strong understanding of Dev Ops and software/application development processes
  • Understanding of Git Lab, Jenkins, ArgoCD, and other Dev Ops/Continuous Integration tools for Kubernetes
  • Understanding of microservice design and architectural pattern best practices
  • Understanding of Python, Bash, and Shell scripting
  • Knowledge of network technologies, common infrastructure components, load balancers, firewalls, virtual and physical infrastructure design
  • problem solving and troubleshooting skills
  • communication and interpersonal skills
  • Must possess excellent time management skills and the drive to work unsupervised
  • Experience with deploying to on prem/data center infrastructure
  • Experience using Jira and Confluence on a daily basis
  • Experience in building processes for deploying to a Kubernetes based environment using Gitlab and Helm
  • Understanding of access management and security groups (i.e. IAM, S3 bucket, SSH, VPN, etc.)
  • Ability to write and use unit and functional testing
  • Technical

    Skills:

    Proficiency in programming languages (such as Python, Go, or Bash) is essential for scripting and automation tasks. Knowledge of Linux/Unix systems is also crucial, as SREs often work in these environments.
  • Problem-Solving: analytical and problem-solving skills are necessary to diagnose and resolve complex system issues effectively.
  • Understanding of SRE Principles:
    Familiarity with key SRE concepts such as Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets is important for measuring and maintaining system reliability.
  • Reliability and Availability: SRE practices help ensure that services are consistently available and reliable, which is critical for user satisfaction and business success.
  • Scalability: SREs implement strategies that allow systems to scale efficiently as demand increases, ensuring that performance remains optimal even under heavy load.
  • Cost Management:
    By optimizing resource usage and reducing downtime, SREs contribute to cost savings for organizations.
  • Programming and Scripting:
    Proficiency in languages like Python, Go, or Ruby is crucial for automating tasks and managing infrastructure.
  • Operating Systems: A strong understanding of Linux/Unix systems is essential for troubleshooting and managing servers.
  • Cloud Computing:
    Familiarity with cloud platforms like AWS, Azure, or Google Cloud is vital for deploying and managing applications in distributed environments.
  • Containers & Orchestration:
    Understanding containerization tools like Docker and managing containerized workloads with Kubernetes is crucial for cloud-native applications.
  • Monitoring and Logging:
    Proficiency in tools like Prometheus, Grafana, or Elasticsearch, Logstash, and Kibana (ELK) Stack is necessary for tracking metrics, setting up alerts, and analyzing logs.
  • Networking:
    Knowledge of networking protocols and configurations is essential for maintaining system health and performance.
  • Configuration Management:
    Skills in managing and maintaining system configurations are critical for ensuring system reliability.
  • Incident Response:
    Ability to respond quickly and effectively to incidents, including documenting and learning from them.
  • Security Best Practices:
    Understanding security protocols and best practices to protect systems from vulnerabilities.
  • These skills are essential for SREs to maintain high availability and performance, balancing the demands of development and operations.
  • Support required during core business hours of 8am – 5pm, Monday through Friday.
  • On-call for…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary