Standardized Origin Pull Strategy for VOD CDN

1. Background & Problem:


  • Background:

  • Historically, the company has connected to many vendors for VOD CDN.

  • There are minor differences in origin-pull methods between vendors.

  • Dedicated line capacity varies between different vendors.

  • Vendors have different positioning: some are full mirror storage vendors that permanently store copies of origin resources, and there are also other mirror storage vendors.

  • Problem:

  • When scheduling and switching new traffic or recovering from abnormal dedicated line conditions, the pressure on the dedicated line and UPOS origin server is high.

  • Vendor origin-pull strategies are black boxes, making it difficult to locate problems when they occur, resulting in low processing efficiency.


2. Strategy Changes and Evolution of VOD Origin-Pull Architecture


1. Specific Cases and Reflections on VOD Origin-Pull Failures


During a past outage involving multi-vendor dedicated lines, sudden bursts of dedicated line bandwidth occurred both during the outage and when the line was being restored.


After the incident, we analyzed that the bandwidth surge was caused by the impact of different types of origin-pull under the vendors' dedicated lines. At the time of the failure, due to the lack of fine-grained dedicated line monitoring, we could not perform CDN traffic switching because we were worried about the risk of traffic avalanche, so the efficiency of problem localization and loss stopping was low.


 2. Origin-Pull Architecture Analysis


To solve the above problems, improve the origin-pull stability of CDN vendors, ensure origin server security, enable rapid problem localization and timely loss stopping when issues occur, and reduce CDN quality degradation caused by origin-pull problems. During initial research, we investigated the origin-pull strategies of all connected commercial CDNs and found that there are two main common origin-pull scenarios:

  • The first scenario is that after a user requests a video segment, the CDN waits for the entire file to finish pulling from origin before returning the requested segment, which tends to increase the user's playback delay.

  • The second scenario is that when pulling a segment from origin, the entire file that the segment belongs to is pulled synchronously, which greatly amplifies origin-pull bandwidth and puts pressure on the origin server.


3. Origin-Pull Optimization Plan


After analyzing the origin-pull architecture of vendors, we proposed a standardized origin-pull solution based on the characteristics of the company's VOD business. The main logic is as follows:

  • First, prioritize responses to synchronous segment origin-pull to reduce user playback delay.

  • Second, judge whether a video file is hot or cold based on statistics of the frequency of segment origin-pull requests.

  • When a file is determined to be a hot file, the entire file is pulled synchronously to reduce origin-pull bandwidth when segments of this hot resource are requested again.

  • When a file is determined to be a cold file, it is added to the entire-file origin-pull queue and downloaded to storage during off-peak traffic hours, avoiding putting pressure on the origin server during peak traffic periods.

  • Then, flow control and rate limiting are implemented for the entire-file download logic to avoid excessive amplification of origin-pull bandwidth.

  • Finally, in extreme cases where the dedicated line is interrupted, asynchronous entire-file origin-pull and active distribution pulling are stopped to ensure that users' synchronous segment origin-pull can go through the public network normally, reducing the impact on the business.


4.  Monitoring Dependency Optimization


Design of real-time vendor VOD origin-pull strategy and classified origin-pull bandwidth monitoring data


The benefits of this approach are:

  • First, it enables rapid localization of abnormal origin-pull types when origin-pull bandwidth becomes abnormal.

  • Second, vendor origin-pull strategies are no longer black boxes; we can adjust and control vendors' origin-pull bandwidth based on their origin-pull strategies, avoiding pressure on the origin server.

  • Finally, it enables rapid confirmation of CDN quality issues caused by origin-pull problems.


5.  Commercial CDN Storage Bucketing Scheme


During the implementation of the standardized VOD origin-pull scheme, with the architectural evolution of decoupling VOD CDN and storage, there are two benefits from the perspective of CDN resource operation:

  • First, it can maximize the ability to switch between vendors during CDN outages, based on the upper capacity limit of dedicated lines and gateways.

  • Second, on the premise of optimized business costs, it can improve the efficiency of traffic switching and shorten the time to obtain benefits.

At the same time, it also brings a more complex environment on the origin-pull side.

Therefore, we made improvements on the origin-pull side:

  • First, we adapted the bucketing scheme for full mirror vendors, split different origin-pull types, implemented different priority and strategy guarantees for each type, and made it compatible with the implementation of standardized CDN origin-pull.

  • Second, after decoupling CDN and storage, we calculated resource requirements, including the dedicated line capacity, temporary storage size, and gateway requirements on both sides corresponding to per-unit CDN access bandwidth.

  • Finally, we worked with the procurement and network teams to promote corresponding resource expansion for CDN vendors and storage business parties.

The optimized architectural scheme is as follows:


6.  Origin-Pull Security Optimization


To ensure origin server security, we introduced OSIG authentication into the standardized origin-pull scheme.

Because both CDN and origin server public network authentication use the UPSIG authentication scheme, if the origin server's public network address is leaked, users can access resources by changing the origin domain name, which brings uncontrollable risks to origin server security.

Therefore, the OSIG authentication scheme was introduced: an additional OSIG signature is added when a vendor CDN pulls from origin, isolating the authentication logic of CDN and origin server, so as to achieve the purpose of protecting origin server security.


3. Summary and Outlook:


The process of standardizing origin-pull lasted more than half a year. Although we encountered various problems and the scheme has gradually evolved, the scheme has been successfully implemented initially and is moving in the expected direction. Overall, most of the functions have been completed. Next, we will continue to optimize various resources from the perspective of resource operation, making the VOD CDN business more stable and reliable.


-End-

Author | Zhentao



This is a discussion topic separated from the original topic at https://www.bilibili.com/read/cv35992195/