Unstructured Data Engineer Job Reston area,Virginia USA,IT/Tech

Description

The Leidos Digital Modernization Sector is seeking an Unstructured Data Engineer; this position will allow for full time telework from any U.S. based location

POSITION SUMMARY:

We are seeking a highly skilled and innovative Unstructured Data Engineer to lead the design, implementation, and operationalization of unstructured data pipelines supporting Retrieval-Augmented Generation (RAG) and enterprise AI initiatives. This role will serve as the technical expert responsible for transforming raw, unstructured content into trusted, governed, AI-ready data products.

The ideal candidate has deep experience in RAG architectures, document preprocessing, metadata enrichment, vectorization, and embedding workflows, and understands how to operationalize these capabilities at enterprise scale. Experience with Ohalo Data xRay or similar unstructured data processing platforms is strongly preferred.

PRIMARY RESPONSIBILITIES:

Design, build, and manage end-to-end RAG pipelines for enterprise AI applications.
Lead preprocessing of unstructured data, including discovery, classification, cleansing, redaction, and metadata enrichment.
Develop and optimize document chunking, embedding, and vectorization strategies for structured and unstructured datasets.
Coordinate ingestion of curated datasets into vector databases and AI platforms.
Package curated unstructured datasets as governed, reusable data products for enterprise consumption.
Define and implement metadata tagging strategies to align with Collibra governance standards.
Partner with Data Governance and Data Quality teams to ensure AI-ready data meets enterprise standards for lineage, classification, and compliance.
Evaluate and optimize embedding models, retrieval strategies, and indexing performance.
Monitor and tune RAG pipeline performance, including latency, retrieval accuracy, and cost efficiency.
Implement automation for document ingestion, transformation, and publishing workflows.
Support integration with enterprise AI platforms (e.g., ChatGPT Enterprise, Ask Sage, Moveworks).
Conduct cost analysis and capacity planning for vector storage and processing workloads.
Provide technical guidance on AI data readiness and unstructured data lifecycle management.
Design, implement, and optimize enterprise-grade RAG and prompt engineering frameworks, including context engineering strategies (chunking, metadata enrichment, semantic filtering, dynamic context management) to improve retrieval accuracy, grounding, and response quality.
Develop and maintain scalable multi-modal data pipelines that ingest, preprocess, embed, and integrate text, documents, images, audio, and structured data into governed vectorized data products consumable by enterprise AI platforms.

BASIC QUALIFICATIONS:

Bachelor’s degree in Computer Science, Data Engineering, AI/ML, or related field and 8+ years of relevant experience.
Hands‑on experience designing and implementing RAG architectures in production environments.
Experience working with unstructured data (PDFs, documents, email, transcripts, images with OCR, etc.).
Strong proficiency in Python and experience with NLP/LLM frameworks (e.g., Lang Chain, Llama Index, Hugging Face, OpenAI APIs).
Experience with vector databases (e.g., Pinecone, Weaviate, FAISS, Open Search, Azure AI Search).
Experience implementing document chunking, embedding generation, and similarity search.
Understanding of metadata modeling and governance principles.
Experience building scalable data pipelines in cloud environments (AWS, Azure, or GCP).
Hands‑on experience with prompt engineering, evaluation metrics, and context window optimization.
Strong understanding of multi‑modal data processing and pipeline engineering.
Strong knowledge of API integration and microservices architecture.
US Citizenship is required.

PREFERRED QUALIFICATIONS:

Experience with Ohalo Data xRay or similar unstructured data discovery and redaction platforms.
Experience aligning RAG pipelines with enterprise Data Governance frameworks (e.g., Collibra).
Familiarity with data classification, CUI/PII handling, and redaction controls.
Experience packaging datasets as governed data products with defined SLAs and…


Increase/decrease your Search Radius (miles)



Job Posting Language