Daniel Vaughan has worked with enterprise web applications for over 12 years. He is currently a software architect for a UK financial institution.

Meer over Daniel Vaughan

Daniel Vaughan

Data Science – The Hard Parts

Name: Data Science – The Hard Parts
Author: Daniel Vaughan

Techniques for Excelling at Data Science

Paperback Engels 2023 1e druk 9781098146474

€ 74,68

In winkelwagen

Levertijd ongeveer 16 werkdagen

Gratis verzonden

Samenvatting

This practical guide provides a collection of techniques and best practices that are generally overlooked in most data engineering and data science pedagogy. A common misconception is that great data scientists are experts in the "big themes" of the discipline—machine learning and programming. But most of the time, these tools can only take us so far. In practice, the smaller tools and skills really separate a great data scientist from a not-so-great one.

Taken as a whole, the lessons in this book make the difference between an average data scientist candidate and a qualified data scientist working in the field. Author Daniel Vaughan has collected, extended, and used these skills to create value and train data scientists from different companies and industries.

With this book, you will:
- Understand how data science creates value
- Deliver compelling narratives to sell your data science project
- Build a business case using unit economics principles
- Create new features for a ML model using storytelling
- Learn how to decompose KPIs
- Perform growth decompositions to find root causes for changes in a metric
- Daniel Vaughan is head of data at Clip, the leading paytech company in Mexico. He's th

Specificaties

ISBN13:9781098146474

Trefwoorden:data science

Taal:Engels

Bindwijze:paperback

Aantal pagina's:250

Uitgever:O'Reilly

Druk:1

Verschijningsdatum:17-11-2023

Hoofdrubriek:IT-management / ICT

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Schrijf een recensie

Uw waardering

?

Log in om uw waardering te geven

Klik om uw waardering te geven

Over Daniel Vaughan

Daniel Vaughan has worked with enterprise web applications for over 12 years. He is currently a software architect for a UK financial institution. An experienced Java developer, Daniel first started working with Google Web Toolkit soon after it was released in 2006 and loved the power and simplicity it bought to web application development. When Ext GWT came along he was an early adopter and has used it as part of several large projects. Daniel currently splits his time between the beautiful tranquility of the Cotswolds, England and the fast-moving city state of Singapore. He enjoys travel, scuba diving, and learning new ideas.

Andere boeken door Daniel Vaughan

Bekijk alle boeken

Inhoudsopgave

Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments

Part I. Data Analytics Techniques
1. So What? Creating Value with Data Science
What Is Value?
What: Understanding the Business
So What: The Gist of Value Creation in DS
Now What: Be a Go-Getter
Measuring Value
Key Takeaways
Further Reading

2. Metrics Design
Desirable Properties That Metrics Should Have
Measurable
Actionable
Relevance
Timeliness
Metrics Decomposition
Funnel Analytics
Stock-Flow Decompositions
P×Q-Type Decompositions
Example: Another Revenue Decomposition
Example: Marketplaces
Key Takeaways
Further Reading

3. Growth Decompositions: Understanding Tailwinds and Headwinds
Why Growth Decompositions?
Additive Decomposition
Example
Interpretation and Use Cases
Multiplicative Decomposition
Example
Interpretation
Mix-Rate Decompositions
Example
Interpretation
Mathematical Derivations
Additive Decomposition
Multiplicative Decomposition
Mix-Rate Decomposition
Key Takeaways
Further Reading

4. 2×2 Designs
The Case for Simplification
What’s a 2×2 Design?
Example: Test a Model and a New Feature
Example: Understanding User Behavior
Example: Credit Origination and Acceptance
Example: Prioritizing Your Workflow
Key Takeaways
Further Reading

5. Building Business Cases
Some Principles to Construct Business Cases
Example: Proactive Retention Strategy
Fraud Prevention
Purchasing External Datasets
Working on a Data Science Project
Key Takeaways
Further Reading

6. What’s in a Lift?
Lifts Defined
Example: Classifier Model
Self-Selection and Survivorship Biases
Other Use Cases for Lifts
Key Takeaways
Further Reading

7. Narratives
What’s in a Narrative: Telling a Story with Your Data
Clear and to the Point
Credible
Memorable
Actionable
Building a Narrative
Science as Storytelling
What, So What, and Now What?
The Last Mile
Writing TL;DRs
Tips to Write Memorable TL;DRs
Example: Writing a TL;DR for This Chapter
Delivering Powerful Elevator Pitches
Presenting Your Narrative
Key Takeaways
Further Reading

8. Datavis: Choosing the Right Plot to Deliver a Message
Some Useful and Not-So-Used Data Visualizations
Bar Versus Line Plots
Slopegraphs
Waterfall Charts
Scatterplot Smoothers
Plotting Distributions
General Recommendations
Find the Right Datavis for Your Message
Choose Your Colors Wisely
Different Dimensions in a Plot
Aim for a Large Enough Data-Ink Ratio
Customization Versus Semiautomation
Get the Font Size Right from the Beginning
Interactive or Not
Stay Simple
Start by Explaining the Plot
Key Takeaways
Further Reading

Part II. Machine Learning
9. Simulation and Bootstrapping
Basics of Simulation
Simulating a Linear Model and Linear Regression
What Are Partial Dependence Plots?
Omitted Variable Bias
Simulating Classification Problems
Latent Variable Models
Comparing Different Algorithms
Bootstrapping
Key Takeaways
Further Reading

10. Linear Regression: Going Back to Basics
What’s in a Coefficient?
The Frisch-Waugh-Lovell Theorem
Why Should You Care About FWL?
Confounders
Additional Variables
The Central Role of Variance in ML
Key Takeaways
Further Reading

11. Data Leakage
What Is Data Leakage?
Outcome Is Also a Feature
A Function of the Outcome Is Itself a Feature
Bad Controls
Mislabeling of a Timestamp
Multiple Datasets with Sloppy Time Aggregations
Leakage of Other Information
Detecting Data Leakage
Complete Separation
Windowing Methodology
Choosing the Length of the Windows
The Training Stage Mirrors the Scoring Stage
Implementing the Windowing Methodology
I Have Leakage: Now What?
Key Takeaways
Further Reading

12. Productionizing Models
What Does “Production Ready” Mean?
Batch Scores (Offline)
Real-Time Model Objects
Data and Model Drift
Essential Steps in any Production Pipeline
Get and Transform Data
Validate Data
Training and Scoring Stages
Validate Model and Scores
Deploy Model and Scores
Key Takeaways
Further Reading

13. Storytelling in Machine Learning
A Holistic View of Storytelling in ML
Ex Ante and Interim Storytelling
Creating Hypotheses
Feature Engineering
Ex Post Storytelling: Opening the Black Box
Interpretability-Performance Trade-Off
Linear Regression: Setting a Benchmark
Feature Importance
Heatmaps
Partial Dependence Plots
Accumulated Local Effects
Key Takeaways
Further Reading

14. From Prediction to Decisions
Dissecting Decision Making
Simple Decision Rules by Smart Thresholding
Precision and Recall
Example: Lead Generation
Confusion Matrix Optimization
Key Takeaways
Further Reading

15. Incrementality: The Holy Grail of Data Science?
Defining Incrementality
Causal Reasoning to Improve Prediction
Causal Reasoning as a Differentiator
Improved Decision Making
Confounders and Colliders
Selection Bias
Unconfoundedness Assumption
Breaking Selection Bias: Randomization
Matching
Machine Learning and Causal Inference
Open Source Codebases
Double Machine Learning
Key Takeaways
Further Reading

16. A/B Tests
What Is an A/B Test?
Decision Criterion
Minimum Detectable Effects
Choosing the Statistical Power, Level, and P
Estimating the Variance of the Outcome
Simulations
Example: Conversion Rates
Setting the MDE
Hypotheses Backlog
Metric
Hypothesis
Ranking
Governance of Experiments
Key Takeaways
Further Reading

17. Large Language Models and the Practice of Data Science
The Current State of AI
What Do Data Scientists Do?
Evolving the Data Scientist’s Job Description
Case Study: A/B Testing
Case Study: Data Cleansing
Case Study: Machine Learning
LLMs and This Book
Key Takeaways
Further Reading

Index
About the Author