Skip to content

Krishna's Tech Blog

All Posts Learning Certifications Notices

Krishna's Tech Blog

Practical engineering knowledge from years of building real systems.

232 articles · 1,340 flashcards · 67 decks

Content

All Articles
Salesforce
AWS
Search

Learning

Flashcards
Quizzes
Certifications
Reading List

More

Notices
Glossary
Site Stats
RSS Feed

© 2026 Krishna's Tech Blog. Built with Next.js.Site Stats·RSS

←All articles

Banner for LLM Inference Optimization: How to Make LLM Calls Faster

AI & ML12 min read

LLM Inference Optimization: How to Make LLM Calls Faster

LLM-powered features are slow by default, and the slowness is user-visible. The teams that ship fast LLM features apply a small set of optimizations -- streaming, prompt caching, smaller models for routing, parallel calls, prefetching. Here's how to actually make LLM calls feel fast.

KP

Krishna Patil

December 3, 2025

Share

#llms#performance#inference#latency#optimization

SeriesPart 122 of 172

Engineering Craft

TypeScript, CI/CD, databases, observability -- the skills that make code production-ready.

Previous

Time and Calendar Programming: The Bugs You're Going to Have

Next

Migrating to a New Programming Language: When It's Worth It and How to Do It

More in AI & ML

Structured Outputs from LLMs: JSON Mode, Schemas, and Function Calling

12 min read

Model Context Protocol: The Standard for Connecting LLMs to Tools

12 min read

ML Model Serving in Production: From Notebook to API

13 min read

Study This Topic

AI Agent Patterns

20 cards · advanced

AI & ML Fundamentals

20 cards · intermediate

AWS Bedrock & AI Services

20 cards · intermediate

Older

LLM Cost Control: Stop Your AI Bill from Eating Your Margins

Newer

Load Testing: Finding Capacity Limits Before Production Does