SlideShare a Scribd company logo
How Netflix directs 1/3rd of
Haley Tucker
Mohit Vora
QCon
San Francisco
Nov 16, 2015
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/netflix-streaming-arch
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
How Netflix Directs 1/3rd of Internet Traffic

Recommended for you

Apache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink MeetupApache Flink @ Alibaba - Seattle Apache Flink Meetup
Apache Flink @ Alibaba - Seattle Apache Flink Meetup

This document summarizes Haitao Wang's experience working on streaming platforms at Alibaba and Microsoft. It describes Alibaba's data infrastructure challenges in handling large volumes of streaming data. It introduces Alibaba Blink, a distribution of Apache Flink that was developed to meet Alibaba's scale needs. Blink has achieved unprecedented throughput of 472 million events per second with latency of 10s of milliseconds. The document outlines improvements made in Blink's runtime, declarative SQL support, and use cases at Alibaba including real-time A/B testing, search index building, and online machine learning.

seattle apache flink meetup
Stream processing for the masses with beam, python and flink
Stream processing for the masses with beam, python and flink Stream processing for the masses with beam, python and flink
Stream processing for the masses with beam, python and flink

ApacheCon Las Vegas 2019 September 9-12 Beam Summit 20th anniversary of the Apache Software Foundation

apacheflinkbeam
Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)Continuous SQL with Apache Streaming (FLaNK and FLiP)
Continuous SQL with Apache Streaming (FLaNK and FLiP)

18 aug2021 Continuous SQL with Apache Streaming (FLaNK and FLiP) https://emamo.com/event/worldfestival-2021/s/pro-talk-continuous-sql-with-flink-WR115a In this talk, I will walk through how someone can set up and run continuous SQL queries against Pulsar topics utilizing Apache Flink. We will walk through creating Pulsar topics, schemas and publishing data. We will then cover consuming Pulsar data, joining Pulsar topics and inserting new events into Pulsar topics as they arrive. This basic overview will show hands-on techniques, tips and examples of how to do this using Pulsar tools. https://github.com/tspannhw/FLiP-IoT https://github.com/tspannhw/SpeakerProfile/tree/main/2021/talks

apache flinkapache nifiapache pulsar
How Netflix Directs 1/3rd of Internet Traffic
Playback
Overview
DATA PLANE
(CDN)
CONTROL PLANE
STREAM
NETFLIX
DEVICE
How Netflix Directs 1/3rd of Internet Traffic

Recommended for you

Maintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix APIMaintaining the Front Door to Netflix : The Netflix API
Maintaining the Front Door to Netflix : The Netflix API

This presentation was given to the engineering organization at Zendesk. In this presentation, I talk about the challenges that the Netflix API faces in supporting the 1000+ different device types, millions of users, and billions of transactions. The topics range from resiliency, scale, API design, failure injection, continuous delivery, and more.

scaleresiliencyhystrix
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6

New Features in Confluent Platform 6.0 / Apache Kafka 2.6, including REST Proxy and API, Tiered Storage for AWS S3 and GCP GCS, Cluster Linking (On-Premise, Edge, Hybrid, Multi-Cloud), Self-Balancing Clusters), ksqlDB.

kafkakafka connectkafka streams
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs


This document discusses developing an advanced visualization tool for Flink and Spark jobs that provides insight into data characteristics and the physical execution plan. It aims to help developers detect issues, understand distributed systems, and guide testing of adaptive partitioning techniques. The tool enhances existing metrics and APIs to visualize input/output patterns and physical tasks/subtasks. Future plans include public beta release and integrating dynamic repartitioning to mitigate data skew.

open source#ff16big data
Project 366 #59; 280212 Days Gone By..., CC BY-SA, Pete 2012, Flickr
AUDIOVIDEO TEXT
STREAMS
How do we build a streaming “tape”?
Determine the preferred experience
DEVICETITLE
CONNECTIONS
COUNTRY
NETWORK
Broadband - wired or wifi
Cellular - Edge, 3G, LTE, ...
CUSTOMER

Recommended for you

Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift

http://flink-forward.org/kb_sessions/faster-and-furiouser-flink-drift/ Not long ago, we had the opportunity to test Apache Flink to see just how fast it would go on a moderately realistic task with fast hardware and with a good streaming transport layer underneath. Our goal was not so much careful comparison with other software, but flat-out speed, Flink against Flink. In the process, we learned a lot about what it takes to go fast. Some of the lessons were ones that we had “learned” a number of times before: – the bottleneck isn’t where you thought it was – copying data is expensive – context switches are expensive – measure twice, cut once But there were some real surprises along the way. The really important knobs weren’t quite what people say you should turn. One of the biggest surprises was the degree to which high performance libraries have threading built into them which makes the actual concurrrency much higher than the apparent concurrency. The result was that at least one cluster parameter needed to be adjusted by 30x to get real

#ff16big datastream processing
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud

In the era of cloud generation, the constant activity around workloads and containers create more vulnerabilities than an organization can keep up with. Using legacy security vendors doesn't set you up for success in the cloud. You’re likely spending undue hours chasing, triaging and patching a countless stream of cloud vulnerabilities with little prioritization. Join us for this live webinar as we detail how to streamline host and container vulnerability workflows for your software teams wanting to build fast in the cloud. We'll be covering how to: Get visibility into active packages and associated vulnerabilities Reduce false positives by 98% Reduce investigation time by 30% Spot a legacy vendor looking to do some cloud washing

laceworkcloud security
Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Refactoring Organizations - A Netflix Study (QCon NYC 2017)Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Refactoring Organizations - A Netflix Study (QCon NYC 2017)

Is your service architecture and engineering velocity constrained by organizational concerns? Does it seem impossible to give priority to key initiatives regardless of intent? Are engineers switching tasks so often that they are just treading water? Are critical projects endlessly backlogged? Has staffing up pushed the limits of your team structure? Navigating through challenges like these can be daunting and solutions fraught with uncertainty. How do you know what, where, when to change. And whatever the answer is today it will most certainly vary over time. Effective organizations evolve, at key inflection points, to support critical business and technical goals. There is not only a strong relationship between organizations and the software they produce (Conway’s Law) but many organizational solutions can be derived from analogs in the technical realm. In other words, we can treat organizational improvement as a refactoring exercise. Over the last 20 years Netflix engineering has proven time and again an ability to adapt and grow, resulting in undisputed dominance over the global internet tv market. In this talk we’ll use Netflix as a case study to illustrate how specific strategies, framed as technical analogs, have been employed to maximize engineering agility, velocity, and impact. These powerful, yet simple strategies and solutions provide a useful blueprint for organizational success.

refactoringqconarchitecture
That’s exactly what I want
...now where can I get it?
Point the device to appropriate locations
Steering
GENERATE
PLAYBACK
MANIFEST
PLAYBACK
MANIFEST
PLAYBACK MANIFEST
Uh-oh, the
content is
encrypted!
Keymaster, CC BY-SA, Sean McGrath 2007, Flickr
LICENSE
LICENSE
And...Action!
SESSION
(START, STOP, PAUSE,
RESUME, KEEPALIVE)
SESSION EVENTS
LICENSE
PLAYBACK
MANIFEST
GENERATE
PLAYBACK
MANIFEST
SESSION
(START, STOP, PAUSE,
RESUME, KEEPALIVE)
PLAYBACK LIFECYCLE
How Netflix Directs 1/3rd of Internet Traffic
Data Plane
(CDN)
What is a Content Delivery Network?
Open
Connect
A NETFLIX ORIGINAL
CONTENT RANK
BYTES
STREAMED
PREDICTABLE VIEWING PATTERNS
FILLING WHEN YOU SLEEP
Dreaming…,CCBY-SA,EleniBoulsaiki2009,Flickr
FILLING WHEN YOU SLEEP
Open
Connect
A NETFLIX ORIGINAL
READ XOR WRITE
ONEWAY,CCBY-SA,KennyLouie2010,Flickr
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
Content Delivery Mechanisms
DATA PLANE
(CDN)
CONTROL PLANE
STREAM
NETFLIX
DEVICE
STREAM
ISP DATA
CENTER
ISP
ROUTER
NETFLIX
DEVICE
STREAM
ISP DATA
CENTER
ISP
ROUTER
NETFLIX
DEVICE
ISP CO-LOCATION
STREAM
ISP DATA
CENTER
ISP
ROUTER
NETFLIX
DEVICE
STREAM
ISP DATA
CENTER
NETFLIX
DEVICE
IXP DATA
CENTER
NFLX
ROUTER
ISP
ROUTER
ISP
ROUTER
NETFLIX
STREAM
ISP DATA
CENTER
NETFLIX
DEVICE
IXP DATA
CENTER
NFLX
ROUTER
ISP
ROUTER
ISP
ROUTER
NETFLIX
STREAM
ISP DATA
CENTER
NETFLIX
DEVICE
IXP DATA
CENTER
NFLX
ROUTER
ISP
ROUTER
ISP
ROUTER
IXP INTERCONNECTION
NETFLIX
Control
Plane
OPEN CONNECTSTREAM
NETFLIX
DEVICE
CDN
CONTROL
PLANE
DEVICE
CONTROL
PLANE
DON’T KEEP SECRETS
Network Proximity
Content Positioning
Load Distribution
Network Proximity
Social Network in a Course, CC BY-SA, Hans Põldoja 2010, Flickr
By Specification?
By Specification?
Doesn’t scale
Border Gateway Protocol
TAKEAWAY
BGP ROUTE
175.231.128.0/24
(+ proximity attributes)
Use BGP
ISP2 DATA
CENTER
ISP2 BGP
ROUTES
CONTROL
PLANE
IXP DATA
CENTER
ISP1 BGP
ROUTES
ISP1 DATA
CENTER ISP1
NFLX
BGP ROUTE
175.231.128.0/24
(+ proximity attributes)
Content Positioning
LOCALIZE TRAFFIC
ISP
DATA CENTER
SERVE CACHE
MISS
HOW DO WE DETERMINE WHAT CONTENT
WILL BE POPULAR TOMORROW?
CHANGING CATALOG
EVOLVING MEMBER TASTES
MINIMIZE FILL CHURN
ISP
DATA CENTER
OFF PEAK
FILL
USE HISTORICAL DATA
CONTENT RANKBYTES
STREAMED
bytesStreamed/bytesStored
IS ONE DAY OF HISTORY ENOUGH?
EXPONENTIALLY WEIGHTED
MOVING AVERAGE
WEIGHT
DAYS AGO
0 10 20 30 40
…
= 0.9
TAKEAWAY Weigh Recent Data Higher
HOW SHOULD CONTENT BE ALLOCATED?
MILLIONS
OF FILES
THOUSANDS
OF SERVERS
HOW SHOULD CONTENT BE ALLOCATED?
SVR4
SVR2
SVR1
SVR3
FILE1
FILE3
FILE1
TAKEAWAY
ALLOCATE MULTIPLE REPLICAS
RESILIENT TO CLUSTER CHANGES
REPEATABLE
Consistent Hashing
ISP2 DATA
CENTER
WHAT TO
FILL?
CONTROL
PLANE
IXP DATA
CENTER
WHERE TO
FILL FROM?
ISP1 DATA
CENTER
S3
FILL OVER
HTTP
Load Distribution
CONTENT RANKBYTES
STREAMED
LOTS OF
THROUGHPUT
LOTS OF
STORAGE
CONTENT WITH CONFLICTING CONSTRAINTS
SSD BASED
SPINNING DISK
BASED
WITHIN CLUSTERS ON EACH SERVER
MEMORY
CONTENT RANK
BYTES
STREAMED
SSD SPINNING DISK
TAKEAWAY Tier Infrastructure
ACROSS SERVERS
WITHIN CLUSTERS
BALANCE
BALANCE
ACROSS EQUIDISTANT
CLUSTERS
HOW DO WE BALANCE LOAD?
OPEN CONNECTNETFLIX
DEVICE
CDN
CONTROL
PLANE
DEVICE
CONTROL
PLANE
LOAD
BALANCER
STREAM
USING CONTENT DISTRIBUTION
HOW DO WE BALANCE LOAD?
FLIP A COIN
AND WHEN WE HAVE EQUALLY ATTRACTIVE
LOCATIONS TO SERVE FROM –
INCIDENT LOAD
SYSTEM
METRICS
MAX
INSANESANE
HOW DO WE LOAD SERVERS OPTIMALLY?
… AMIDST EVER CHANGING INTERNET WEATHER
TRAFFIC
t
… AND DAILY TRAFFIC EBBS AND FLOWS
+ SERVE
STREAMS
FEEDBACK
-
TRAFFIC EFFECT ON
SYSTEM METRICS
CONTROL
WE INTRODUCE A FEEDBACK LOOP
TAKEAWAY PID CONTROLLER
TAKEAWAY PID CONTROLLER
Process
Variable
Set Point
Control
Variable
Current RPM
Desired RPM
Input Voltage
System Metrics
System Metrics
Max
Controlled
Traffic
DC MOTOR
TAKEAWAY PID CONTROLLER
Process
Variable
Set Point
Control
Variable
System Metrics
System Metrics
Max
Controlled
Traffic
Current RPM
Desired RPM
Input Voltage
LOADING SERVERS
ISP2 DATA
CENTER
CONTROL
TO 80%
CONTROL
PLANE
IXP DATA
CENTER
NO
CONTROL
ISP1 DATA
CENTER
0.0 < CONTROL VAR < 1.0
TRAFFIC
t
NEXT HOP
TRAFFIC SHIFTS TO NEXT HOP LOCATION
Steering
STREAM
NETFLIX
DEVICE
CDN
CONTROL
PLANE
PLAYBACK
SERVICES
STEERING
Got URLs for
f1, f2, …, fn?
Yes, here’s
the URLs
PROXIMITY
HEALTH
CONTENT
CASS
KAFKA
OPEN CONNECT
Architecture
Evolution
5 CHALLENGES
API
STEERING
SESSION
MANIFEST
DRM
LICENSE
How did we evolve from here...
API
STEERING
SESSION
MANIFEST
DRM
LICENSE
CLIENT SCRIPTS
SERVICE LAYER
RULES
INSIGHTS
...to here.
5 SOLUTIONS
CACHE
DEVICE
CUSTOMER
TITLE
NETWORK
Broadband - wired or wifi
Cellular - Edge, 3G, LTE, ...
CONNECTIONS
COUNTRY
High dimensionalityCHALLENGE
How Netflix Directs 1/3rd of Internet Traffic
How can we quickly alter the playback
experience in a targeted manner?
ALL
STREAMS
FOR
CONTENT
ENGINE
RULES
BEST
STREAMS
FOR
SESSION
Stream FilteringUSE CASE
EXAMPLE RULES
ENGINE
CONFIGURATION
MANAGEMENT UI
UPDATING RULES
TOPIC
PUBLISH
RULES
SUBSCRIBE
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
Dynamic Business Rules
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
TAKEAWAY
Pinpoint what is brokenCHALLENGE
Haystacks,CCBY-SA,JohnPavelka2008,Flickr
3:00 AM : Pager goes off
METRICS AND ALERTING
OK...error code 105 is elevated. But
why?
Indexed Logging
Detailed Domain Insights
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
INSIGHTS
TAKEAWAY
Large amount of stateCHALLENGE
How can we enable faster UIs and
low-end devices?
We introduced a server-side caching tier
MANIFESTSCUSTOMERA
CUSTOMERA
CUSTOMERB
Watch out for resiliency issues!!
Ping Pong project, CC BY-SA, Michael Knowles 2008, Flickr
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
INSIGHTS
Reduce client stateTAKEAWAY
CACHE
Managing device protocolsCHALLENGE
Square peg, round hole, CC BY-SA, Simon Law 2006, Flickr
Can we allow devices to define their
own protocols?
DYNAMIC SCRIPTING PLATFORM
SESSION
LICENSE
MANIFEST
XBOX
iPHONE
HTML5
PLAYER
iphone.groovy
JAVASERVICE
LAYER
xbox.groovy
html5.groovy
API
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
INSIGHTS
Client-driven protocols
API
CLIENT
SCRIPTS
SERVICE
LAYER
TAKEAWAY
CACHE
Enabling high-velocity innovationCHALLENGE
CC BY-SA, Nathan E Photography 2008, Flickr
How can we expose new data with the
least amount of churn?
API MANIFEST
Stream
● Bitrate
● Framerate
● Dynamic Data
Stream’
● Bitrate
● Dynamic Data
This works from API:
● stream.getBitrate()
● stream.getDynamicData().get(“FRAME_RATE”)
Works
both
ways!
This works from CLIENT SCRIPT!
● stream.getDynamicData().get(“BIT_RATE”)
● stream.getDynamicData().get(“FRAME_RATE”)
CLIENT SCRIPT
Stream’’
● Dynamic Data
Works
both
ways!
API MANIFEST
Stream
● Bitrate
● Framerate
● Dynamic Data
Stream’
● Bitrate
● Dynamic Data
Works
both
ways!
API
CLIENT
SCRIPTS
SERVICE
LAYER
STEERING
SESSION
MANIFEST
DRM
LICENSE
RULES
INSIGHTS
Data pass-thruTAKEAWAY
CACHE
TAKEAWAYS
● BGP based proximity
● Tiered Infrastructure
● PID Controller
● EWMA for historical data
● Consistent Hashing
● Dynamic business rules
● Detailed domain insights
● Reduce client state
● Client-driven protocols
● Data pass-thru
TAKEAWAYS
● BGP based proximity
● Tiered Infrastructure
● PID Controller
● EWMA for historical data
● Consistent Hashing
● Dynamic business rules
● Detailed domain insights
● Reduce client state
● Client-driven protocols
● Data pass-thru
Questions?
Haley Tucker
@hwilson1204
Mohit Vora
@mohitvora
STREAM
NETFLIX
DEVICE
NETFLIX
DEVICE
STREAM
SPINNING
DISK SERVERS
SSD SERVERS
WHAT TO
FILL?
WHERE TO
FILL FROM?
API
CLIENT
SCRIPTS
SERVICE
LAYER
CACHE
CONTROL
DON’T KEEP
SECRETS
STEERING
SESSION
MANIFEST
DRMLICENSE
RULES
CACHE
INSIGHTS
IXP DATA
CENTER
ISP1
ISP2
ISP2 BGP
ROUTES
ISP1 BGP
ROUTES
CONTROL
TO 80%
● Background image from https://www.flickr.com/photos/centralasian/4099515384, Image was
cropped and red lines and dots were drawn on top, https://creativecommons.org/licenses/by/2.0/.
● Image from https://www.flickr.com/photos/28705377@N04/4142872268, No modifications made,
https://creativecommons.org/licenses/by/2.0/.
● Image of cassette is from https://www.flickr.com/photos/comedynose/6939206771, Image was
cropped, https://creativecommons.org/licenses/by/2.0/.
● Image of speaker is from https://www.flickr.com/photos/av_hire_london/5578975575, No
changes made, https://creativecommons.org/licenses/by/2.0/.
● Image of television is from https://www.flickr.com/photos/jvcamerica/3660897684/, No changes
made, https://creativecommons.org/licenses/by/2.0/.
● Image of text is from https://www.flickr.com/photos/dno1967b/5754743006, No changes made,
https://creativecommons.org/licenses/by/2.0/.
● Background image from https://www.flickr.com/photos/mcgraths/866572532, Image was cropped,
https://creativecommons.org/licenses/by/2.0/.
● Image from https://www.flickr.com/photos/thatguyfromcchs08/2300190277, Image is dimmed,
https://creativecommons.org/licenses/by/2.0/.
● Image from https://www.flickr.com/photos/mknowles/3134373590, Image was cropped, https:
//creativecommons.org/licenses/by-sa/2.0/.
Image Attributions
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/netflix-
streaming-arch

More Related Content

How Netflix Directs 1/3rd of Internet Traffic