Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
Paperback Engels 2010 1e druk 9780470635179Samenvatting
The ultimate resource on building and deploying data integration solutions with Kettle
Kettle is a scaleable and extensible open source ETL and data integration tool that lets you extract data from databases, flat and XML files, web services, ERP systems, and OLAP cubes. It provides over 120 built-in transformation steps to validate, cleanse, and conform data, as well as numerous options to load data into data warehouses and many other targets. Kettle is a comprehensive, low-cost alternative to traditional data integration tools like Informatica PowerCenter, IBM InfoSphere DataStage, and BusinessObjects Data Integrator.
This book explains in detail how to use Kettle to create, test, and deploy your own ETL and data integration solutions. You'll learn to use Kettle's programs to create transformations and jobs, use version control, audit data, and schedule your ETL solution. Then you'll progress to more advanced concepts such as clustering and cloud computing, real-time data integration, loading a Data Vault model, and extending Kettle by building your own plugins. In addition, you'll find hands-on examples and case studies that show exactly how to put Kettle's features into practice.
- Explore the components of the Kettle ETL toolset
- Discover how to install and configure Kettle and connect it to various data sources and targets
- Design and build every aspect of an ETL solution using Kettle
- Learn how to load a data warehouse with Kettle
- Understand the steps for deploying and scheduling ETL solutions
- Gain the skills to integrate Kettle with third-party products
- Learn to extend Kettle and build your own plugins
- Use clustering and cloud computing to scale and improve the performance of your Kettle ETL solutions
- Find out how to use Kettle for real-time data integration
Specificaties
Lezersrecensies
Over Jos van Dongen
Inhoudsopgave
Part 1: Getting Started.
1. ETL Primer.
2. Kettle Concepts.
3. Installation and Configuration.
4. An Example ETL Solution—Sakila.
Part 2: ETL.
5. ETL Subsystems.
6. Data Extraction.
7. Cleansing and Conforming.
8. Handling Dimension Tables.
9. Loading Fact Tables.
10. Working with OLAP Data.
Part 3: Management and Deployment.
11. ETL Development Lifecycle.
12. Scheduling and Monitoring.
13. Versioning and Migration.
14. Lineage and Auditing.
Part 4: Performance and Scalability.
15. Performance Tuning.
16. Parallelization, Clustering, and Partitioning.
17. Dynamic Clustering in the Cloud.
18. Real-Time Data Integration.
Part 5: Advanced Topics.
19. Data Vault Management.
20. Handling Complex Data Formats.
21. Web Services.
22. Kettle Integration.
23. Extending Kettle.
Appendix A: The Kettle Ecosystem.
Appendix B: Kettle Enterprise Edition Features.
Appendix C: Built-in Variables and Properties Reference.
Index.
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan