Multilingual Natural Language Processing Applications: From Theory to Practice

Author:   Daniel Bikel ,  Imed Zitouni
Publisher:   Pearson Education (US)
ISBN:  

9780137151448


Pages:   640
Publication Date:   24 May 2012
Format:   Hardback
Availability:   Awaiting stock   Availability explained


Our Price $237.60 Quantity:  
Add to Cart

Share |

Multilingual Natural Language Processing Applications: From Theory to Practice


Add your own review!

Overview

Multilingual Natural Language Processing Applications is the first comprehensive single-source guide to building robust and accurate multilingual NLP systems. Edited by two leading experts, it integrates cutting-edge advances with practical solutions drawn from extensive field experience.   Part I introduces the core concepts and theoretical foundations of modern multilingual natural language processing, presenting today’s best practices for understanding word and document structure, analyzing syntax, modeling language, recognizing entailment, and detecting redundancy.   Part II thoroughly addresses the practical considerations associated with building real-world applications, including information extraction, machine translation, information retrieval/search, summarization, question answering, distillation, processing pipelines, and more.   This book contains important new contributions from leading researchers at IBM, Google, Microsoft, Thomson Reuters, BBN, CMU, University of Edinburgh, University of Washington, University of North Texas, and others.   Coverage includes Core NLP problems, and today’s best algorithms for attacking them Processing the diverse morphologies present in the world’s languages Uncovering syntactical structure, parsing semantics, using semantic role labeling, and scoring grammaticality Recognizing inferences, subjectivity, and opinion polarity Managing key algorithmic and design tradeoffs in real-world applications Extracting information via mention detection, coreference resolution, and events Building large-scale systems for machine translation, information retrieval, and summarization Answering complex questions through distillation and other advanced techniques Creating dialog systems that leverage advances in speech recognition, synthesis, and dialog management Constructing common infrastructure for multiple multilingual text processing applications   This book will be invaluable for all engineers, software developers, researchers, and graduate students who want to process large quantities of text in multiple languages, in any environment: government, corporate, or academic.

Full Product Details

Author:   Daniel Bikel ,  Imed Zitouni
Publisher:   Pearson Education (US)
Imprint:   IBM Press
Dimensions:   Width: 18.80cm , Height: 3.80cm , Length: 23.80cm
Weight:   1.120kg
ISBN:  

9780137151448


ISBN 10:   0137151446
Pages:   640
Publication Date:   24 May 2012
Audience:   College/higher education ,  Postgraduate, Research & Scholarly
Format:   Hardback
Publisher's Status:   Out of Print
Availability:   Awaiting stock   Availability explained

Table of Contents

Preface         xxi Acknowledgments         xxv About the Authors         xxvii   Part I: In Theory         1 Chapter 1: Finding the Structure of Words         3 1.1 Words and Their Components   4 1.2 Issues and Challenges   8 1.3 Morphological Models   15 1.4 Summary   22   Chapter 2: Finding the Structure of Documents         29 2.1 Introduction   29 2.2 Methods   33 2.3 Complexity of the Approaches   40 2.4 Performances of the Approaches   41 2.5 Features   41 2.6 Processing Stages   48 2.7 Discussion   48 2.8 Summary   49   Chapter 3: Syntax         57 3.1 Parsing Natural Language   57 3.2 Treebanks: A Data-Driven Approach to Syntax   59 3.3 Representation of Syntactic Structure   63 3.4 Parsing Algorithms 70 3.5 Models for Ambiguity Resolution in Parsing   80 3.6 Multilingual Issues: What Is a Token?   87 3.7 Summary   92   Chapter 4: Semantic Parsing         97 4.1 Introduction   97 4.2 Semantic Interpretation   98 4.3 System Paradigms   101 4.4 Word Sense   102 4.5 Predicate-Argument Structure 118 4.6 Meaning Representation   147 4.7 Summary   152   Chapter 5: Language Modeling          169 5.1 Introduction   169 5.2 n-Gram Models   170 5.3 Language Model Evaluation   170 5.4 Parameter Estimation   171 5.5 Language Model Adaptation   176 5.6 Types of Language Models   178 5.7 Language-Specific Modeling Problems  188 5.8 Multilingual and Crosslingual Language Modeling   195 5.9 Summary   198   Chapter 6: Recognizing Textual Entailment         209 6.1 Introduction   209 6.2 The Recognizing Textual Entailment Task   210 6.3 A Framework for Recognizing Textual Entailment   219 6.4 Case Studies   238 6.5 Taking RTE Further   248 6.6 Useful Resources   252 6.7 Summary   253   Chapter 7: Multilingual Sentiment and Subjectivity Analysis         259 7.1 Introduction   259 7.2 Definitions   260 7.3 Sentiment and Subjectivity Analysis on English   262 7.4 Word- and Phrase-Level Annotations   264 7.5 Sentence-Level Annotations   270 7.6 Document-Level Annotations   272 7.7 What Works, What Doesn’t   274 7.8 Summary   277   Part II: In Practice         283 Chapter 8: Entity Detection and Tracking         285 8.1 Introduction   285 8.2 Mention Detection   287 8.3 Coreference Resolution   296 8.4 Summary   303   Chapter 9: Relations and Events         309 9.1 Introduction   309 9.2 Relations and Events   310 9.3 Types of Relations   311 9.4 Relation Extraction as Classification   312 9.5 Other Approaches to Relation Extraction   317 9.6 Events   320 9.7 Event Extraction Approaches   320 9.8 Moving Beyond the Sentence   323 9.9 Event Matching   323 9.10 Future Directions for Event Extraction   326 9.11 Summary   326   Chapter 10: Machine Translation         331 10.1 Machine Translation Today   331 10.2 Machine Translation Evaluation   332 10.3 Word Alignment   337 10.4 Phrase-Based Models   343 10.5 Tree-Based Models   350 10.6 Linguistic Challenges   354 10.7 Tools and Data Resources   356 10.8 Future Directions   358 10.9 Summary   359   Chapter 11: Multilingual Information Retrieval         365 11.1 Introduction   366 11.2 Document Preprocessing   366 11.3 Monolingual Information Retrieval   372 11.4 CLIR   378 11.5 MLIR   382 11.6 Evaluation in Information Retrieval   386 11.7 Tools, Software, and Resources   391 11.8 Summary   393   Chapter 12: Multilingual Automatic Summarization         397 12.1 Introduction   397 12.2 Approaches to Summarization   399 12.3 Evaluation   412 12.4 How to Build a Summarizer   420 12.5 Competitions and Datasets   424 12.6 Summary   426   Chapter 13: Question Answering         433 13.1 Introduction and History   433 13.2 Architectures   435 13.3 Source Acquisition and Preprocessing   437 13.4 Question Analysis   440 13.5 Search and Candidate Extraction   443 13.6 Answer Scoring   450 13.7 Crosslingual Question Answering   454 13.8 A Case Study   455 13.9 Evaluation   460 13.10 Current and Future Challenges   464 13.11 Summary and Further Reading   465   Chapter 14: Distillation         475 14.1 Introduction   475 14.2 An Example   476 14.3 Relevance and Redundancy   477 14.4 The Rosetta Consortium Distillation System   479 14.5 Other Distillation Approaches   488 14.6 Evaluation and Metrics   491 14.7 Summary   495   Chapter 15: Spoken Dialog Systems         499 15.1 Introduction   499 15.2 Spoken Dialog Systems   499 15.3 Forms of Dialog   509 15.4 Natural Language Call Routing   510 15.5 Three Generations of Dialog Applications   510 15.6 Continuous Improvement Cycle   512 15.7 Transcription and Annotation of Utterances   513 15.8 Localization of Spoken Dialog Systems   513 15.9 Summary   520   Chapter 16: Combining Natural Language Processing Engines         523 16.1 Introduction   523 16.2 Desired Attributes of Architectures for Aggregating Speech and NLP Engines   524 16.3 Architectures for Aggregation   527 16.4 Case Studies   531 16.5 Lessons Learned   540 16.6 Summary   542 16.7 Sample UIMA Code   542   Index         551

Reviews

Author Information

Daniel M. Bikel is a senior research scientist at Google, developing new methods for NLP and speech recognition. While at IBM, he architected the distillation system for IBM’s GALE multilingual information extraction and question-answering system. While pursuing his doctorate at Penn, he built the first extensible multilingual syntactic parsing engine.   Imed Zitouni is a senior research scientist at IBM. He has led IBM’s Arabic information extraction and data resources efforts since 2004. He previously led both DIALOCA’s Speech/NLP group and Bell Labs/ Alcatel-Lucent’s language modeling and call routing activities. His work involves machine translation, NLP, and spoken dialog systems.

Tab Content 6

Author Website:  

Customer Reviews

Recent Reviews

No review item found!

Add your own review!

Countries Available

All regions
Latest Reading Guide

Aorrng

Shopping Cart
Your cart is empty
Shopping cart
Mailing List