|
|
|||
|
||||
OverviewBuild dependable speech and multimodal systems from data to deployment with NeMo, Riva, Triton, and NIM. Shipping ASR, TTS, and vision language features is hard because real traffic, latency budgets, and safety rules punish vague guidance. Teams need a concrete stack, tested workflows, and playbooks that hold up under load. This book gives practitioners a practical path. Train with NeMo, serve with Triton and Riva, package stable APIs with NIM, and wire observability, safety, and rollout controls so your services stay reliable after launch. Map the NVIDIA stack in production, NeMo for training, Riva for runtime, NIM for standard APIs, Triton for serving and metrics Set up containers, GPU drivers, CUDA, and validation checks for a clean starting environment Build NeMo manifests, create tarred WebDataset shards, and manage data versions for repeatable training Apply text processing that works in products, PnC models for punctuation and case, grammar based ITN with Sparrowhawk Choose and justify architectures, CTC and RNNT tradeoffs, FastConformer for short and long speech, Parakeet for multilingual, Canary for translation and timestamps Design streaming with intent, lookahead, chunk size, and padding choices that balance latency and accuracy Run NeMo 2 configs and NeMo Run cleanly, migrate experiments, track ablations, and keep results comparable Evaluate with WER, CER, MER, and slice by accent, SNR, and channel so quality numbers reflect reality Add diarization that operators can trust, VAD with MarbleNet, embeddings with TitaNet, and MSDD integration Export for serving the right way, ONNX or TorchScript paths, TensorRT where appropriate, and Triton model repos that scale Tune Riva streaming ASR, chunk and padding settings, punctuation and ITN options, diarization flags and limits Stand up NIM ASR endpoints with an OpenAI compatible surface and autoscale them with Helm on Kubernetes Build TTS that sounds right and runs fast, FastPitch with HiFi GAN or BigVGAN, voice cloning data, lexicons, SSML controls Manage prosody and latency for streaming audio, set clause sizes and playback buffers that feel responsive Protect your product, content safeguards in TTS, consent gates for data and cloning, redaction and retention policies Measure what matters, Triton metrics in Prometheus and Grafana, practical alert rules that catch real issues Load test with perf analyzer sweeps, batch and concurrency tuning, sequence batching for conversational traffic Engineer reliability, fault injection and backpressure, graceful degradation under spikes and partial failures Wire NeMo Guardrails around ASR, TTS, and VLM flows so outputs stay on policy Watermark and detect audio with AudioSeal and formalize a detection pipeline Understand licenses and terms, NVIDIA AI Enterprise scope, Riva EULA, and NGC usage expectations Use production playbooks with SLOs, cost caps, and rollback guards that turn operations into repeatable steps This is a code heavy guide with working Python, YAML, JSON, and Shell examples that you can adapt directly into real services. Get the guide and build systems your users can rely on. Full Product DetailsAuthor: Ansel CorbynPublisher: Independently Published Imprint: Independently Published Dimensions: Width: 17.80cm , Height: 1.70cm , Length: 25.40cm Weight: 0.535kg ISBN: 9798273025103Pages: 308 Publication Date: 04 November 2025 Audience: General/trade , General Format: Paperback Publisher's Status: Active Availability: Available To Order We have confirmation that this item is in stock with the supplier. It will be ordered in for you and dispatched immediately. Table of ContentsReviewsAuthor InformationTab Content 6Author Website:Countries AvailableAll regions |
||||