If you're wondering what 10-bit video is, then you probably don't need the algorithm. It's interesting though to some.

Wikipedia tells us that BT.601 defines Y'C_{B}C_{R} as the following :

K_{B} = 0.114

K_{R} = 0.299

it also tells us that :

Y' = 16 + (65.481 * R' + 128.553 * G' + 24.996 * B')

C_{B} = 128 + (-37.797 * R' - 74.203 * G' + 112 * B')

C_{R} = 128 + (112.0 * R' - 93.786 * G' - 128.214 * B')

The conversion here is from RGB to 8-bit Y'C_{B}C_{R} which is in this case based on floating point and obviously insufficient for our needs. See, in a 4:2:2 video system, you're given 30-bits per video sample. This is based on a 10-bit Y, a 10-bit C_{B} and a 10-bit C_{R} value. Since two pixels side by side share a common C_{B}C_{R} value, we should instead say that when processed and filtered, a 4:2:2 video system uses 40-bits for two pixels. That would mean 20-bits precision (maximum) per single pixel sample. By reducing the sample size to 8-bits per color channel sample, that reduces the precision to 16-bits which is cheeky.

We want to keep as much color data as possible. The formula provides by Wikipedia uses a floating point system for making the calculations. In the case of this item, the input values are from 0 to 1. Now, don't get me wrong. Floating point is great, but it's slow as heck, hard to vectorize and can't be handled using table lookups. We'll take the formula and do two different things to it :

- Convert it to output 10-bit values
- Convert it to be integer based instead of floating point.

- Start with

y = 16 + (65.481 * r + 128.553 * g + 24.996 * b)

Cb = 128 + (-37.797 * r -74.203 * g + 112.0 * b)

Cr = 128 + (112.0 * r - 93.786 * g - 18.214 * b)

- Multiply r,g,b by 256 and scale to 0-1023 instead of 0-255

y = 64 + ((261.924 * r + 514.212 * g + 99.984 * b) / 256.0)

Cb = 512 + ((-151.188 * r - 296.812 * g + 448.0 * b) / 256.0)

Cr = 512 + ((448.0 * r - 375.144 * g - 72.856 * b) / 256.0)

- Multiply coefficients by 1024 to work in integer instead of floating point

y = 64 + ((268210 * r + 526553 * g + 102383 * b) » 18)

Cb = (uint32_t)((512 « 18) + ((-154817 * (int32_t)r) - (303935 * (int32_t)g) + (458752 * (int32_t)b)) » 18);

Cr = (uint32_t)((512 « 18) + ((458752 * (int32_t)r) - (384147 * (int32_t)g) - (74605 * (int32_t)b)) » 18);

## C code

The end result is found below. It's a chunk of code that performs the full 8-bit per channel RGB input to 10-bit YC_{B}C_{R} conversion. So far, I'm pretty pleased with how it works. I'll correct it if I come across any errors when testing further.

uint32_t y = 64 + (((268210 * red()) + (526553 * green()) + (102383 * blue())) >> 18); uint32_t Cb = (uint32_t)((512 << 18) + ((-154817 * (int32_t)red()) - (303935 * (int32_t)green()) + (458752 * (int32_t)blue())) >> 18); uint32_t Cr = (uint32_t)((512 << 18) + ((458752 * (int32_t)red()) - (384147 * (int32_t)green()) - (74605 * (int32_t)blue())) >> 18);

Please notice that the value 512 is upsamples by 2^{18} and added before shifting. This is to compensate for the fact that the values it is added to are very often going to be negative values. So we perform the addition before the shift to avoid dealing with negative numbers.