Senior Software Engineer - Retrieval-Augmented Generation; RAG System
Listed on 2026-01-16
-
Software Development
AI Engineer, Data Engineer
Job Title
Senior Software Engineer – Retrieval-Augmented Generation (RAG) System
About the roleWe are seeking an engineer to build and support a healthcare-centered production-scale RAG system that combines document retrieval with response generation to deliver accurate, context-aware answers. The engineer will design, implement, and operate end-to-end RAG pipelines, including LLM interaction, API creation, and secure, high-performance delivery of knowledge-grounded capabilities. Collaboration with data engineers, platform teams, and product partners to ship reliable, scalable, and observable systems is essential.
Roleand responsibilities
- Architect, implement, test, and operate end-to-end RAG workflows
- Ingest and normalize documents from diverse sources
- Generate and manage embeddings; index and query vector databases; retrieve relevant passages; apply reranking or fusion strategies; feed prompts to LLMs
- Build scalable, low-latency services and APIs (Python preferred; other languages acceptable); ensure production-grade reliability (monitoring, tracing, alerting)
- Integrate with vector databases and embedding pipelines and optimize for latency, throughput, and cost
- Design and implement ML Ops workflows: model/version management, experiments, feature stores, CI/CD for ML-enabled services, rollback plans
- Develop robust data pipelines and governance around ingestion, provenance, quality checks, and access controls
- Collaborate with data engineers to improve retrieval quality (embedding strategies, reranking, cross-encoder models, prompt engineering) and implement evaluation metrics (precision/recall, MRR, QA accuracy, user-centric metrics)
- Implement monitoring and observability for RAG components (latency, success rate, cache hit rate, retrieval quality, data drift)
- Ensure security, privacy, and compliance (authentication, authorization, data masking, PII handling, audit logging)
- 5+ years of professional software engineering experience designing and delivering production systems
- Strong programming skills (Python required; Node Js a plus)
- Deep understanding of retrieval-augmented or application-scale NLP systems with practical experience building RAG-like pipelines
- Hands‑on experience with MLOps tooling and concepts (model serving, versioning, experiments, feature stores, reproducibility)
- Proficiency with cloud infrastructure and modern software practices (AWS/GCP/Azure; Docker; Kubernetes; CI/CD)
- Strong problem-solving skills, excellent communication, and ability to work with cross-functional teams
- Familiarity with data governance, privacy, and security best practices
- Experience with agentic workflow tools (Lang Graph) and prompt engineering for LLMs
- Exposure to working with and evaluating different LLMs
- Knowledge of evaluation methodologies for retrieval and QA systems; ability to set up A/B tests and dashboards
- Experience with data processing frameworks (SQL, Pandas, Spark) and large-scale data pipelines
- Background in performance optimization for low-latency AI services (MLflow)
- Experience with monitoring and logging via New Relic, K9s, Portkey, etc.
- Experience with minimizing token usage and cost optimization
- Comfortable with design and implementation of security controls for data-intensive AI systems
$86,600 – $144,400 (geographic differentials may apply); if performed in New Jersey: $97,867 – $156,333.
Equal Opportunity StatementWe are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).