Mining the Social Web
Data Mining Facebook, Twitter, LinkedIn, Instagram, GitHub, and More
Paperback Engels 2019 3e druk 9781491985045Samenvatting
Mine the rich data tucked away in popular social websites such as Twitter, Facebook, LinkedIn, and Instagram. With the third edition of this popular guide, data scientists, analysts, and programmers will learn how to glean insights from social media—including who’s connecting with whom, what they’re talking about, and where they’re located—using Python code examples, Jupyter notebooks, or Docker containers.
In part one, each standalone chapter focuses on one aspect of the social landscape, including each of the major social sites, as well as web pages, blogs and feeds, mailboxes, GitHub, and a newly added chapter covering Instagram. Part two provides a cookbook with two dozen bite-size recipes for solving particular issues with Twitter.
- Get a straightforward synopsis of the social web landscape
- Use Docker to easily run each chapter’s example code, packaged as a Jupyter notebook
- Adapt and contribute to the code’s open source GitHub repository
- Learn how to employ best-in-class Python 3 tools to slice and dice the data you collect
- Apply advanced mining techniques such as TFIDF, cosine similarity, collocation analysis, clique detection, and image recognition
- Build beautiful data visualizations with Python and JavaScript toolkits
Specificaties
Lezersrecensies
Inhoudsopgave
A Note from Matthew Russell
README.1st
Managing Your Expectations
Python-Centric Technology
Improvements to the Third Edition
The Ethical Use of Data Mining
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Acknowledgments for the Third Edition
Acknowledgments for the Second Edition
Acknowledgments from the First Edition
I. A Guided Tour of the Social Web
Prelude
1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More
Overview
Why Is Twitter All the Rage?
Exploring Twitter’s API
Fundamental Twitter Terminology
Creating a Twitter API Connection
Exploring Trending Topics
Searching for Tweets
Analyzing the 140 (or More) Characters
Extracting Tweet Entities
Analyzing Tweets and Tweet Entities with Frequency Analysis
Computing the Lexical Diversity of Tweets
Examining Patterns in Retweets
Visualizing Frequency Data with Histograms
Closing Remarks
Recommended Exercises
Online Resources
2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More
Overview
Exploring Facebook’s Graph API
Understanding the Graph API
Understanding the Open Graph Protocol
Analyzing Social Graph Connections
Analyzing Facebook Pages
Manipulating Data Using pandas
Closing Remarks
Recommended Exercises
Online Resources
3. Mining Instagram: Computer Vision, Neural Networks, Object Recognition, and Face Detection
Overview
Exploring the Instagram API
Making Instagram API Requests
Retrieving Your Own Instagram Feed
Retrieving Media by Hashtag
Anatomy of an Instagram Post
Crash Course on Artificial Neural Networks
Training a Neural Network to “Look” at Pictures
Recognizing Handwritten Digits
Object Recognition Within Photos Using Pretrained Neural Networks
Applying Neural Networks to Instagram Posts
Tagging the Contents of an Image
Detecting Faces in Images
Closing Remarks
Recommended Exercises
Online Resources
4. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More
Overview
Exploring the LinkedIn API
Making LinkedIn API Requests
Downloading LinkedIn Connections as a CSV File
Crash Course on Clustering Data
Normalizing Data to Enable Analysis
Measuring Similarity
Clustering Algorithms
Closing Remarks
Recommended Exercises
Online Resources
5. Mining Text Files: Computing Document Similarity, Extracting Collocations, and More
Overview
Text Files
A Whiz-Bang Introduction to TF-IDF
Term Frequency
Inverse Document Frequency
TF-IDF
Querying Human Language Data with TF-IDF
Introducing the Natural Language Toolkit
Applying TF-IDF to Human Language
Finding Similar Documents
Analyzing Bigrams in Human Language
Reflections on Analyzing Human Language Data
Closing Remarks
Recommended Exercises
Online Resources
6. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More
Overview
Scraping, Parsing, and Crawling the Web
Breadth-First Search in Web Crawling
Discovering Semantics by Decoding Syntax
Natural Language Processing Illustrated Step-by-Step
Sentence Detection in Human Language Data
Document Summarization
Entity-Centric Analysis: A Paradigm Shift
Gisting Human Language Data
Quality of Analytics for Processing Human Language Data
Closing Remarks
Recommended Exercises
Online Resources
7. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More
Overview
Obtaining and Processing a Mail Corpus
A Primer on Unix Mailboxes
Getting the Enron Data
Converting a Mail Corpus to a Unix Mailbox
Converting Unix Mailboxes to pandas DataFrames
Analyzing the Enron Corpus
Querying by Date/Time Range
Analyzing Patterns in Sender/Recipient Communications
Searching Emails by Keywords
Analyzing Your Own Mail Data
Accessing Your Gmail with OAuth
Fetching and Parsing Email Messages
Visualizing Patterns in Email with Immersion
Closing Remarks
Recommended Exercises
Online Resources
8. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
Overview
Exploring GitHub’s API
Creating a GitHub API Connection
Making GitHub API Requests
Modeling Data with Property Graphs
Analyzing GitHub Interest Graphs
Seeding an Interest Graph
Computing Graph Centrality Measures
Extending the Interest Graph with “Follows” Edges for Users
Using Nodes as Pivots for More Efficient Queries
Visualizing Interest Graphs
Closing Remarks
Recommended Exercises
Online Resources
II. Twitter Cookbook
9. Twitter Cookbook
Accessing Twitter’s API for Development Purposes
Problem
Solution
Discussion
Doing the OAuth Dance to Access Twitter’s API for Production Purposes
Problem
Solution
Discussion
Discovering the Trending Topics
Problem
Solution
Discussion
Searching for Tweets
Problem
Solution
Discussion
Constructing Convenient Function Calls
Problem
Solution
Discussion
Saving and Restoring JSON Data with Text Files
Problem
Solution
Discussion
Saving and Accessing JSON Data with MongoDB
Problem
Solution
Discussion
Sampling the Twitter Firehose with the Streaming API
Problem
Solution
Discussion
Collecting Time-Series Data
Problem
Solution
Discussion
Extracting Tweet Entities
Problem
Solution
Discussion
Finding the Most Popular Tweets in a Collection of Tweets
Problem
Solution
Discussion
Finding the Most Popular Tweet Entities in a Collection of Tweets
Problem
Solution
Discussion
Tabulating Frequency Analysis
Problem
Solution
Discussion
Finding Users Who Have Retweeted a Status
Problem
Solution
Discussion
Extracting a Retweet’s Attribution
Problem
Solution
Discussion
Making Robust Twitter Requests
Problem
Solution
Discussion
Resolving User Profile Information
Problem
Solution
Discussion
Extracting Tweet Entities from Arbitrary Text
Problem
Solution
Discussion
Getting All Friends or Followers for a User
Problem
Solution
Discussion
Analyzing a User’s Friends and Followers
Problem
Solution
Discussion
Harvesting a User’s Tweets
Problem
Solution
Discussion
Crawling a Friendship Graph
Problem
Solution
Discussion
Analyzing Tweet Content
Problem
Solution
Discussion
Summarizing Link Targets
Problem
Solution
Discussion
Analyzing a User’s Favorite Tweets
Problem
Solution
Discussion
Closing Remarks
Recommended Exercises
Online Resources
III. Appendixes
A. Information About This Book’s Virtual Machine Experience
B. OAuth Primer
Overview
OAuth 1.0a
OAuth 2.0
C. Python and Jupyter Notebook Tips and Tricks
Index
Rubrieken
- advisering
- algemeen management
- coaching en trainen
- communicatie en media
- economie
- financieel management
- inkoop en logistiek
- internet en social media
- it-management / ict
- juridisch
- leiderschap
- marketing
- mens en maatschappij
- non-profit
- ondernemen
- organisatiekunde
- personal finance
- personeelsmanagement
- persoonlijke effectiviteit
- projectmanagement
- psychologie
- reclame en verkoop
- strategisch management
- verandermanagement
- werk en loopbaan