|
|
|||
|
||||
OverviewRun large language models with predictable latency, controlled cost, and production reliability. Shipping LLMs is an operational problem. Teams struggle with time to first token, tokens per second, GPU memory pressure, and a moving target of engines and datatypes. This book turns those issues into clear practices you can apply with DeepSpeed and the serving layers you already use. You get a practical path from checkpoint to stable API, with configuration that fits real workloads, not toy demos. Every topic is grounded in measurable outcomes so your stack meets SLOs under mixed traffic and budget constraints. place DeepSpeed correctly in your stack and configure kernel injection, tensor parallel, and ZeRO for real services understand TTFT and throughput from prefill to decode and set metrics for p95 latency and queue time size and control the KV cache with paged attention, batching, and safe headroom targets apply quantization that holds up under load, including w8a8, awq, gptq, fp8, and fp4 use speculative decoding with a sound drafter choice, acceptance math, and stable fallbacks operate vllm, tensorrt llm on triton, and tgi with clean api surfaces and core flags scale with ray serve and plan capacity from workload shapes and arrival patterns tune for nvidia hopper and blackwell or amd mi300x, with attention backends and nvlink planning run on kubernetes with gpu operator, device plugin, mig, and topology aware placement wire observability with prometheus, dcgm, and opentelemetry spans, plus vllm bench, trtllm bench, and genai perf ship safely with quotas, redaction, audit logs, go live gates, and instant rollback plans This is a code heavy guide with working YAML, JSON, Shell, and Python examples that map directly to production, from gateway limits and network policies to rollout templates and exportable benchmark scripts. Grab your copy today and build an LLM service that stays fast, measurable, and dependable. Full Product DetailsAuthor: Tara MalhotraPublisher: Independently Published Imprint: Independently Published Dimensions: Width: 17.80cm , Height: 1.50cm , Length: 25.40cm Weight: 0.508kg ISBN: 9798274507356Pages: 290 Publication Date: 14 November 2025 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: Available To Order We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||