LLM Inference Performance Engineer
Listed on 2026-03-01
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Scientist
EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge’s robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today’s best-in-class solutions. The high-performance architecture is coupled with seamless software integration and will enable the immense potential of AI to be accessible in power, energy, and space constrained applications.
EnCharge AI launched in 2022 and is led by veteran technologists with backgrounds in semiconductor design and AI systems.
About the Role
EnCharge AI is seeking an LLM Inference Deployment Engineer to optimize, deploy, and scale large language models (LLMs) for high-performance inference on its energy efficient AI accelerators. You will work at the intersection of AI frameworks, model optimization, and runtime execution to ensure efficient model execution and low-latency AI inference.
Responsibilities
- Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like Hugging Face
- Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution.
- Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications.
- Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.
- Experience in LLM inference deployment, model optimization, and runtime engineering.
- Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, Tensor
RT-LLM, Deep Speed). - In-depth knowledge of the Python programming language for model integration and performance tuning.
- Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases
- Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, Tensor Flow Serving, Torch Serve).
- Strong knowledge of LLM memory optimization strategies for long-context applications.
- Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).
Encharge
AI is an equal employment opportunity employer in the United States.
*
First Name *
Last Name *
Preferred First Name
Email *
Phone
Country *
Phone *
Resume/CV *
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
Education
School
* Select...
Degree
* Select...
Select...
End date year
Linked In Profile
Website
U.S. Standard Demographic QuestionsWe invite applicants to share their demographic background. If you choose to complete this survey, your responses may be used to identify areas of improvement in our hiring process.
How would you describe your gender identity? Select...
How would you describe your racial/ethnic background? Select...
How would you describe your sexual orientation? Select...
Do you identify as transgender? Select...
Do you have a disability or chronic condition (physical, visual, auditory, cognitive, mental, emotional, or other) that substantially limits one or more of your major life activities, including mobility, communication (seeing, hearing, speaking), and learning? Select...
Are you a veteran or active member of the United States Armed Forces? Select...
Voluntary Self-IdentificationFor government reporting purposes, we ask candidates to respond to the below self-identification survey.
Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in aconfidential file.
As set forth in EnCharge AI’s Equal Employment Opportunity policy,we do not discriminate on the basis of any protected group status under any applicable law.
If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection.
As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).