

James Pustejovsky holds the TJX/Felberg Chair in Computer Science at Brandeis University, where he directs the Lab for Linguistics and Computation, and chairs both the Program in Language and Linguistics and the Computational Linguistics MA Program.
Meer over de auteursNatural Language Annotation for Machine Learning
A Guide to Corpus-Building for Application
Paperback Engels 2012 1e druk 9781449306663Samenvatting
Create your own natural language training corpus for machine learning. Whether you're working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycle-the process of adding metadata to your training corpus to help ML algorithms work more efficiently. You don't need any programming or linguistics experience to get started.
Using detailed examples at every step, you'll learn how the MATTER Annotation Development Process helps you Model, Annotate, Train, Test, Evaluate, and Revise your training corpus. You also get a complete walkthrough of a real-world annotation project.
- Define a clear annotation goal before collecting your dataset (corpus)
- Learn tools for analyzing the linguistic content of your corpus
- Build a model and specification for your annotation project
- Examine the different annotation formats, from basic XML to the Linguistic Annotation Framework
- Create a gold standard corpus that can be used to train and test ML algorithms
- Select the ML algorithms that will process your annotated data
- Evaluate the test results and revise your annotation task
- Learn how to use lightweight software for annotating texts and adjudicating the annotations
Specificaties
Lezersrecensies
Over Amber Stubbs
Inhoudsopgave
1. The Basics
-The Importance of Language Annotation
-A Brief History of Corpus Linguistics
-Language Data and Machine Learning
-The Annotation Development Cycle
-Summary
2. Defining Your Goal and Dataset
-Defining Your Goal
-Background Research
-Assembling Your Dataset
-The Size of Your Corpus
-Summary
3. Corpus Analytics
-Basic Probability for Corpus Analytics
-Counting Occurrences
-Language Models
-Summary
4. Building Your Model and Specification
-Some Example Models and Specs
-Adopting (or Not Adopting) Existing Models
-Different Kinds of Standards
-Summary
5. Applying and Adopting Annotation Standards
-Metadata Annotation: Document Classification
-Text Extent Annotation: Named Entities
-Linked Extent Annotation: Semantic Roles
-ISO Standards and You
-Summary
6. Annotation and Adjudication
-The Infrastructure of an Annotation Project
-Specification Versus Guidelines
-Be Prepared to Revise
-Preparing Your Data for Annotation
-Writing the Annotation Guidelines
-Annotators
-Choosing an Annotation Environment
-Evaluating the Annotations
-Creating the Gold Standard (Adjudication)
-Summary
7. Training: Machine Learning
-What Is Learning?
-Defining Our Learning Task
-Classifier Algorithms
-Sequence Induction Algorithms
-Clustering and Unsupervised Learning
-Semi-Supervised Learning
-Matching Annotation to Algorithms
-Summary
8. Testing and Evaluation
-Testing Your Algorithm
-Evaluating Your Algorithm
-Problems That Can Affect Evaluation
-Final Testing Scores
-Summary
9. Revising and Reporting
-Revising Your Project
-Reporting About Your Work
-Summary
10. Annotation: TimeML
-The Goal of TimeML
-Related Research
-Building the Corpus
-Model: Preliminary Specifications
-Annotation: First Attempts
-Model: The TimeML Specification Used in TimeBank
-Annotation: The Creation of TimeBank
-TimeML Becomes ISO-TimeML
-Modeling the Future: Directions for TimeML
-Summary
11. Automatic Annotation: Generating TimeML
-The TARSQI Components
-Improvements to the TTK
-TimeML Challenges: TempEval-2
-Future of the TTK
-Summary
12. Afterword: The Future of Annotation
-Crowdsourcing Annotation
-Handling Big Data
-NLP Online and in the Cloud
-And Finally...
Appendix A: List of Available Corpora and Specifications
Appendix B: List of Software Resources
Appendix C: MAE User Guide
Appendix D: MAI User Guide
Appendix E: Bibliography
Index
Anderen die dit boek kochten, kochten ook
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan