Op werkdagen voor 23:00 besteld, morgen in huis Gratis verzending vanaf €20

Genomics in the Azure Cloud

Scaling Your Bioinformatics Workloads Using Enterprise-Grade Solutions

Paperback Engels 2022 1e druk 9781098139049
Verwachte levertijd ongeveer 16 werkdagen

Samenvatting

This practical guide bridges the gap between general cloud computing architecture in Microsoft Azure and scientific computing for bioinformatics and genomics. You'll get a solid understanding of the architecture patterns and services that are offered in Azure and how they might be used in your bioinformatics practice. You'll get code examples that you can reuse for your specific needs. And you'll get plenty of concrete examples to illustrate how a given service is used in a bioinformatics context.

You'll also get valuable advice on how to:
- Use enterprise platform services to easily scale your bioinformatics workloads
- Organize, query, and analyze genomic data at scale
- Build a genomics data lake and accompanying data warehouse
- Use Azure Machine Learning to scale your model training, track model performance, and deploy winning models
- Orchestrate and automate processing pipelines using Azure Data Factory and Databricks
- Cloudify your organization's existing bioinformatics pipelines by moving your workflows to Azure high-performance compute services
- And more

Specificaties

ISBN13:9781098139049
Taal:Engels
Bindwijze:paperback
Aantal pagina's:200
Uitgever:O'Reilly
Druk:1
Verschijningsdatum:29-11-2022
Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

Preface
Who Should Read This Book
How the Book Is Organized
Software and Hardware Requirements
Code Conventions and Downloads
Conventions Used in This Book
Using Code Examples
O'Reilly Online Learning
How to Contact Us
Acknowledgments

1. Essentials of Cloud Architecture
Cloud Horsepower
Considerations for the Cloud
Three Benefits of the Cloud
Types of Cloud Services
Infrastructure Services
Platform Services
Software Services
Azure Environment Organization
Getting an Azure Account
Welcome to the Azure Portal
Setting Up a Resource Group
Creating Resources
Free Services
Basics of the Bioinformatics Workflow
Primary Analysis
Secondary Analysis
Tertiary Analysis
Other Analyses
Other File Formats

2. Organizing Genomics Data with Data Lakes
Organizing Your Genomics Data
Going for Bronze, Silver, and Gold
Letting Your Bioinformatics Workflow Dictate Your Data Lake Organization
Planning for -omics and Non-omics Data Together
Creating a Data Lake with Azure Storage
Blob Storage Versus Data Lake Storage
Balancing Costs Versus Performance in Data Storage
The Goldilocks Method of Storage Tiers
Genomics Data Lifecycle
Managing Access Inside the Lake
Role-Based Access Control
Access-Control Lists
Azure Open Datasets for Genomics

3. Querying Variant Data in SQL
Building a Genomics Data Warehouse
Example: Lab Results
Data Warehouse Architecture for Genomics
Azure Synapse Analytics
Creating an Azure Synapse Analytics Workspace
Registering Services in Subscriptions
Getting to Work in the Synapse Workspace
Using Open Row Sets
Creating External Tables
Did Someone Say “Pool Party”?
Connecting to More Data Sources
Azure SQL DB
Creating a Database in Azure SQL DB
Relaxing at Your Genomics Data Lakehouse
Efficient File Formats

4. Orchestrating Data Movement and Transformation
Creating Your Data Factory
Getting Started with Data Movement
Getting Data into Your Data Lake Using the Copy Data Tool
Linking to NCBI's FTP Server
Transforming Data Using Data Flows
Building and Triggering Pipelines for Automation
5. Azure Databricks (and Apache Spark)
Introduction to Apache Spark and Databricks
Setting Up an Azure Databricks Workspace
Connecting Databricks to Your Data Lake
Processing Variant Data with the Glow Package
Exploring DataFrames
Automating Variant Data Processing
Orchestrating a Databricks Notebook from Data Factory
A Brief Interlude About Distributed File Formats
Using Other Tools in Databricks
Single-Node Bioinformatics Tools
Koalas
Hail

6. Azure Machine Learning
How to Scale Machine Learning Tasks
Creating an Azure Machine Learning Workspace
Training a Drug Sensitivity Model
Creating a Compute Instance in Azure Machine Learning Studio
Datastores and Datasets
Experimenting with Cluster-Based Training
Automating Model Training with AutoML
Explainable Machine Learning
Using Azure Machine Learning Not for Machine Learning
Performing Alignment in a Notebook
Custom Docker Images for Bioinformatics

7. High-Performance Computing and Other Compute Services
Bring Your Own Pipeline (BYOP)
Why Azure for HPC?
Azure Batch
Scaling Workloads with Cromwell
Azure CycleCloud
Setting Up CycleCloud Clusters
Microsoft Genomics
Alignment and Variant Calling with the msgen Package

8. Deployment, Security, Compliance, and Potpourri
Automating the Deployment of Cloud Resources
Dev, Staging, and Prod
Lifting Your Deployment with ARMs and Biceps
Security Planning
Azure Active Directory
Role-Based Access Controls and Access-Control Lists
Compliance
HIPAA, HITECH, and HITRUST
Azure Blueprints
Cost Considerations
Azure Pricing Calculator
Retail Pricing Versus Enterprise Agreements
Budgeting Examples
Quota Problems
Please, Sir, Can I Have Some More (vCPUs)?
Getting General Support
Conclusion
Looking Backward
Baby Azure
What Else?
Using Other Web-Based Bioinformatics Platforms
Looking Forward
Cheaper Sequencing = More Data

Index
About the Author

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Genomics in the Azure Cloud