如何使用VideoToolbox解压H.264videostream

我在计算如何使用Apple的硬件加速video框架来解压缩H.264videostream时遇到了很多麻烦。几个星期后，我想出来，想分享一个广泛的例子，因为我找不到一个。

我的目标是在WWDC14会议513中介绍一个video工具箱的全面，有益的例子。我的代码不能编译或运行，因为它需要与基本的H.264stream（如从文件读取的video或从网上stream传输的video）集成，需要根据具体情况进行调整。

我应该提到，除了我在学习Google的过程中学到的知识之外，我对video编解码的经验还很less。我不知道所有的video格式，参数结构等细节，所以我只包括我认为你需要知道的东西。

我正在使用XCode 6.2并已部署到运行iOS 8.1和8.2的iOS设备。

概念：

NALU：NALU只是一个长度不同的数据块，它有一个NALU开始代docker0x00 00 00 01 YY其中0x00 00 00 01 YY的前5位告诉你这是什么types的NALU，因此头后面是什么types的数据。（因为你只需要前5位，所以我使用YY & 0x1F来获取相关位）。我列出了NSString * const naluTypesStrings[]方法中的所有types，但是你不需要知道它们是什么全部是。

参数：您的解码器需要参数，以便知道如何存储H.264video数据。你需要设置的是序列参数集（SPS）和图片参数集（PPS） ，它们都有自己的NALUtypes编号。你不需要知道参数是什么意思，解码器知道如何处理它们。

H.264码stream格式：在大多数H.264码stream中，您将收到一个初始的PPS和SPS参数集，然后是一个i帧（也称为IDR帧或同步帧）NALU。那么你会收到几个P帧NALU（也许几十个左右），然后另一组参数（可能与初始参数相同）和一个I帧，更多的P帧等等。 P帧。从概念上讲，您可以将i帧视为video的完整图像，而P帧只是对i帧进行的更改，直到您收到下一个i帧为止。

程序：

从您的H.264stream中生成单独的NALU。 我无法显示此步骤的代码，因为它取决于您正在使用的video源很多。我做了这个graphics来显示我在做什么（图中的“data”是我的代码中的“frame”），但是你的情况可能并且可能会有所不同。我的方法receivedRawVideoFrame:每当我收到一个帧（ uint8_t *frame ），这是两种types之一被调用。在图中，这2种框架types是2个大的紫色框。
使用CMVideoFormatDescriptionCreateFromH264ParameterSets（）从SPS和PPS NALUs创buildCMVideoFormatDescriptionRef 。如果不这样做，您将无法显示任何框架。 SPS和PPS可能看起来像是一堆数字，但VTD知道如何处理它们。所有你需要知道的是， CMVideoFormatDescriptionRef是一个video数据的描述，如宽度/高度，格式types（ kCMPixelFormat_32BGRA ， kCMVideoCodecType_H264等），宽高比，色彩空间等。解码器将保持参数，直到一个新的集合到达（有时甚至在没有改变的情况下参数也会重新发送）。
根据“AVCC”格式重新打包您的IDR和非IDR帧NALU。 这意味着删除NALU起始代码并用一个4字节的头部代替NALU的头部长度。您不需要为SPS和PPS NALU执行此操作。（请注意，4字节的NALU长度标题是big-endian的，所以如果你有一个UInt32值，它必须在使用CFSwapInt32复制到CMBlockBuffer之前进行字节交换，在我的代码中用htonl函数调用。
将IDR和非IDR NALU帧打包到CMBlockBuffer中。 不要使用SPS PPS参数NALU执行此操作。所有你需要知道的关于CMBlockBuffers是它们是一种在核心媒体中包装任意数据块的方法。（videostream水线中的任何压缩video数据都包含在此中。）
将CMBlockBuffer打包到CMSampleBuffer中。 所有你需要知道的关于CMSampleBuffers是，他们用其他信息（这里是CMVideoFormatDescription和CMTime ，如果使用CMTime话）包装我们的CMBlockBuffers 。
创build一个VTDecompressionSessionRef并将示例缓冲区提供给VTDecompressionSessionDecodeFrame（）。 或者，您可以使用AVSampleBufferDisplayLayer及其enqueueSampleBuffer:方法，而不需要使用VTDecompSession。设置起来比较简单，但如果出现像VTD那样的错误，就不会抛出错误。
在VTDecompSessioncallback中，使用生成的CVImageBufferRef来显示video帧。 如果您需要将您的CVImageBuffer转换为UIImage ，请参阅我的StackOverflow答案。

其他说明：

H.264stream可以有很大的不同。据我所知， NALU开始代docker有时是3个字节 （ 0x00 00 01 ） ，有时是4个 （ 0x00 00 00 01 ）。我的代码工作4个字节; 如果你使用3，你将需要改变一些东西。
如果您想了解更多关于NALU的信息 ，我发现这个答案非常有帮助。就我而言，我发现我不需要忽略所描述的“模拟预防”字节，所以我个人跳过了这一步，但是您可能需要知道这一点。
如果您的VTDecompressionSession输出一个错误号码（如-12909），请在您的XCode项目中查找错误代码。在项目导航器中findVideoToolbox框架，打开它并find头文件VTErrors.h。如果找不到，我也在下面的所有错误代码中包含了另一个答案。

代码示例：

所以让我们从声明一些全局variables开始，包括VT框架（VT = Video Toolbox）。

 #import <VideoToolbox/VideoToolbox.h> @property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc; @property (nonatomic, assign) VTDecompressionSessionRef decompressionSession; @property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer; @property (nonatomic, assign) int spsSize; @property (nonatomic, assign) int ppsSize;

以下数组仅用于打印出您正在接收的NALU帧的types。如果你知道所有这些types的意思，对你有好处，你知道更多关于H.264比我:)我的代码只处理types1,5,7和8。

 NSString * const naluTypesStrings[] = { @"0: Unspecified (non-VCL)", @"1: Coded slice of a non-IDR picture (VCL)", // P frame @"2: Coded slice data partition A (VCL)", @"3: Coded slice data partition B (VCL)", @"4: Coded slice data partition C (VCL)", @"5: Coded slice of an IDR picture (VCL)", // I frame @"6: Supplemental enhancement information (SEI) (non-VCL)", @"7: Sequence parameter set (non-VCL)", // SPS parameter @"8: Picture parameter set (non-VCL)", // PPS parameter @"9: Access unit delimiter (non-VCL)", @"10: End of sequence (non-VCL)", @"11: End of stream (non-VCL)", @"12: Filler data (non-VCL)", @"13: Sequence parameter set extension (non-VCL)", @"14: Prefix NAL unit (non-VCL)", @"15: Subset sequence parameter set (non-VCL)", @"16: Reserved (non-VCL)", @"17: Reserved (non-VCL)", @"18: Reserved (non-VCL)", @"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)", @"20: Coded slice extension (non-VCL)", @"21: Coded slice extension for depth view components (non-VCL)", @"22: Reserved (non-VCL)", @"23: Reserved (non-VCL)", @"24: STAP-A Single-time aggregation packet (non-VCL)", @"25: STAP-B Single-time aggregation packet (non-VCL)", @"26: MTAP16 Multi-time aggregation packet (non-VCL)", @"27: MTAP24 Multi-time aggregation packet (non-VCL)", @"28: FU-A Fragmentation unit (non-VCL)", @"29: FU-B Fragmentation unit (non-VCL)", @"30: Unspecified (non-VCL)", @"31: Unspecified (non-VCL)", };

现在，这是所有魔法发生的地方。

 -(void) receivedRawVideoFrame:(uint8_t *)frame withSize:(uint32_t)frameSize isIFrame:(int)isIFrame { OSStatus status; uint8_t *data = NULL; uint8_t *pps = NULL; uint8_t *sps = NULL; // I know what my H.264 data source's NALUs look like so I know start code index is always 0. // if you don't know where it starts, you can use a for loop similar to how i find the 2nd and 3rd start codes int startCodeIndex = 0; int secondStartCodeIndex = 0; int thirdStartCodeIndex = 0; long blockLength = 0; CMSampleBufferRef sampleBuffer = NULL; CMBlockBufferRef blockBuffer = NULL; int nalu_type = (frame[startCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); // if we havent already set up our format description with our SPS PPS parameters, we // can't process any frames except type 7 that has our parameters if (nalu_type != 7 && _formatDesc == NULL) { NSLog(@"Video error: Frame is not an I Frame and format description is null"); return; } // NALU type 7 is the SPS parameter NALU if (nalu_type == 7) { // find where the second PPS start code begins, (the 0x00 00 00 01 code) // from which we also get the length of the first SPS code for (int i = startCodeIndex + 4; i < startCodeIndex + 40; i++) { if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01) { secondStartCodeIndex = i; _spsSize = secondStartCodeIndex; // includes the header in the size break; } } // find what the second NALU type is nalu_type = (frame[secondStartCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); } // type 8 is the PPS parameter NALU if(nalu_type == 8) { // find where the NALU after this one starts so we know how long the PPS parameter is for (int i = _spsSize + 4; i < _spsSize + 30; i++) { if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01) { thirdStartCodeIndex = i; _ppsSize = thirdStartCodeIndex - _spsSize; break; } } // allocate enough data to fit the SPS and PPS parameters into our data objects. // VTD doesn't want you to include the start code header (4 bytes long) so we add the - 4 here sps = malloc(_spsSize - 4); pps = malloc(_ppsSize - 4); // copy in the actual sps and pps values, again ignoring the 4 byte header memcpy (sps, &frame[4], _spsSize-4); memcpy (pps, &frame[_spsSize+4], _ppsSize-4); // now we set our H264 parameters uint8_t* parameterSetPointers[2] = {sps, pps}; size_t parameterSetSizes[2] = {_spsSize-4, _ppsSize-4}; status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault, 2, (const uint8_t *const*)parameterSetPointers, parameterSetSizes, 4, &_formatDesc); NSLog(@"\t\t Creation of CMVideoFormatDescription: %@", (status == noErr) ? @"successful!" : @"failed..."); if(status != noErr) NSLog(@"\t\t Format Description ERROR type: %d", (int)status); // See if decomp session can convert from previous format description // to the new one, if not we need to remake the decomp session. // This snippet was not necessary for my applications but it could be for yours /*BOOL needNewDecompSession = (VTDecompressionSessionCanAcceptFormatDescription(_decompressionSession, _formatDesc) == NO); if(needNewDecompSession) { [self createDecompSession]; }*/ // now lets handle the IDR frame that (should) come after the parameter sets // I say "should" because that's how I expect my H264 stream to work, YMMV nalu_type = (frame[thirdStartCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); } // create our VTDecompressionSession. This isnt neccessary if you choose to use AVSampleBufferDisplayLayer if((status == noErr) && (_decompressionSession == NULL)) { [self createDecompSession]; } // type 5 is an IDR frame NALU. The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know if(nalu_type == 5) { // find the offset, or where the SPS and PPS NALUs end and the IDR frame NALU begins int offset = _spsSize + _ppsSize; blockLength = frameSize - offset; data = malloc(blockLength); data = memcpy(data, &frame[offset], blockLength); // replace the start code header on this NALU with its size. // AVCC format requires that you do this. // htonl converts the unsigned int from host to network byte order uint32_t dataLength32 = htonl (blockLength - 4); memcpy (data, &dataLength32, sizeof (uint32_t)); // create a block buffer from the IDR NALU status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold buffered data blockLength, // block length of the mem block in bytes. kCFAllocatorNull, NULL, 0, // offsetToData blockLength, // dataLength of relevant bytes, starting at offsetToData 0, &blockBuffer); NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed..."); } // NALU type 1 is non-IDR (or PFrame) picture if (nalu_type == 1) { // non-IDR frames do not have an offset due to SPS and PSS, so the approach // is similar to the IDR frames just without the offset blockLength = frameSize; data = malloc(blockLength); data = memcpy(data, &frame[0], blockLength); // again, replace the start header with the size of the NALU uint32_t dataLength32 = htonl (blockLength - 4); memcpy (data, &dataLength32, sizeof (uint32_t)); status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold data. If NULL, block will be alloc when needed blockLength, // overall length of the mem block in bytes kCFAllocatorNull, NULL, 0, // offsetToData blockLength, // dataLength of relevant data bytes, starting at offsetToData 0, &blockBuffer); NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed..."); } // now create our sample buffer from the block buffer, if(status == noErr) { // here I'm not bothering with any timing specifics since in my case we displayed all frames immediately const size_t sampleSize = blockLength; status = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, NULL, _formatDesc, 1, 0, NULL, 1, &sampleSize, &sampleBuffer); NSLog(@"\t\t SampleBufferCreate: \t %@", (status == noErr) ? @"successful!" : @"failed..."); } if(status == noErr) { // set some values of the sample buffer's attachments CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES); CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0); CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue); // either send the samplebuffer to a VTDecompressionSession or to an AVSampleBufferDisplayLayer [self render:sampleBuffer]; } // free memory to avoid a memory leak, do the same for sps, pps and blockbuffer if (NULL != data) { free (data); data = NULL; } }

以下方法创build您的VTD会话。每当您收到新的参数时重新创build它。（每当你收到参数，你都不必重新创build它，很确定。）

如果要为目标CVPixelBuffer设置属性，请阅读CoreVideo PixelBufferAttributes值并将其放入NSDictionary *destinationImageBufferAttributes 。

 -(void) createDecompSession { // make sure to destroy the old VTD session _decompressionSession = NULL; VTDecompressionOutputCallbackRecord callBackRecord; callBackRecord.decompressionOutputCallback = decompressionSessionDecodeFrameCallback; // this is necessary if you need to make calls to Objective C "self" from within in the callback method. callBackRecord.decompressionOutputRefCon = (__bridge void *)self; // you can set some desired attributes for the destination pixel buffer. I didn't use this but you may // if you need to set some attributes, be sure to uncomment the dictionary in VTDecompressionSessionCreate NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithBool:YES], (id)kCVPixelBufferOpenGLESCompatibilityKey, nil]; OSStatus status = VTDecompressionSessionCreate(NULL, _formatDesc, NULL, NULL, // (__bridge CFDictionaryRef)(destinationImageBufferAttributes) &callBackRecord, &_decompressionSession); NSLog(@"Video Decompression Session Create: \t %@", (status == noErr) ? @"successful!" : @"failed..."); if(status != noErr) NSLog(@"\t\t VTD ERROR type: %d", (int)status); }

现在，每当VTD完成解压缩任何发送给它的帧时，都会调用此方法。即使出现错误或帧被丢弃，该方法也会被调用。

 void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime presentationDuration) { THISCLASSNAME *streamManager = (__bridge THISCLASSNAME *)decompressionOutputRefCon; if (status != noErr) { NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Decompressed error: %@", error); } else { NSLog(@"Decompressed sucessfully"); // do something with your resulting CVImageBufferRef that is your decompressed frame [streamManager displayDecodedFrame:imageBuffer]; } }

这是我们实际发送sampleBuffer到VTD进行解码的地方。

 - (void) render:(CMSampleBufferRef)sampleBuffer { VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression; VTDecodeInfoFlags flagOut; NSDate* currentTime = [NSDate date]; VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags, (void*)CFBridgingRetain(currentTime), &flagOut); CFRelease(sampleBuffer); // if you're using AVSampleBufferDisplayLayer, you only need to use this line of code // [videoLayer enqueueSampleBuffer:sampleBuffer]; }

如果使用的是AVSampleBufferDisplayLayer ，请确保像这样初始化图层，在viewDidLoad或其他一些init方法中。

 -(void) viewDidLoad { // create our AVSampleBufferDisplayLayer and add it to the view videoLayer = [[AVSampleBufferDisplayLayer alloc] init]; videoLayer.frame = self.view.frame; videoLayer.bounds = self.view.bounds; videoLayer.videoGravity = AVLayerVideoGravityResizeAspect; // set Timebase, you may need this if you need to display frames at specific times // I didn't need it so I haven't verified that the timebase is working CMTimebaseRef controlTimebase; CMTimebaseCreateWithMasterClock(CFAllocatorGetDefault(), CMClockGetHostTimeClock(), &controlTimebase); //videoLayer.controlTimebase = controlTimebase; CMTimebaseSetTime(self.videoLayer.controlTimebase, kCMTimeZero); CMTimebaseSetRate(self.videoLayer.controlTimebase, 1.0); [[self.view layer] addSublayer:videoLayer]; }

如果在框架中找不到VTD错误代码，我决定把它们包含在这里。（同样，所有这些错误和更多可以在项目导航器中的VideoToolbox.framework本身的VTErrors.h文件中find。

您将在VTD解码帧callback中或在创buildVTD会话时收到其中一个错误代码，如果您执行了错误的操作。

 kVTPropertyNotSupportedErr = -12900, kVTPropertyReadOnlyErr = -12901, kVTParameterErr = -12902, kVTInvalidSessionErr = -12903, kVTAllocationFailedErr = -12904, kVTPixelTransferNotSupportedErr = -12905, // cf -8961 kVTCouldNotFindVideoDecoderErr = -12906, kVTCouldNotCreateInstanceErr = -12907, kVTCouldNotFindVideoEncoderErr = -12908, kVTVideoDecoderBadDataErr = -12909, // cf -8969 kVTVideoDecoderUnsupportedDataFormatErr = -12910, // cf -8970 kVTVideoDecoderMalfunctionErr = -12911, // cf -8960 kVTVideoEncoderMalfunctionErr = -12912, kVTVideoDecoderNotAvailableNowErr = -12913, kVTImageRotationNotSupportedErr = -12914, kVTVideoEncoderNotAvailableNowErr = -12915, kVTFormatDescriptionChangeNotSupportedErr = -12916, kVTInsufficientSourceColorDataErr = -12917, kVTCouldNotCreateColorCorrectionDataErr = -12918, kVTColorSyncTransformConvertFailedErr = -12919, kVTVideoDecoderAuthorizationErr = -12210, kVTVideoEncoderAuthorizationErr = -12211, kVTColorCorrectionPixelTransferFailedErr = -12212, kVTMultiPassStorageIdentifierMismatchErr = -12213, kVTMultiPassStorageInvalidErr = -12214, kVTFrameSiloInvalidTimeStampErr = -12215, kVTFrameSiloInvalidTimeRangeErr = -12216, kVTCouldNotFindTemporalFilterErr = -12217, kVTPixelTransferNotPermittedErr = -12218,

Josh Baker的Avios库提供了一个很好的Swift示例： https ： //github.com/tidwall/Avios

请注意，Avios当前期望用户在NAL起始代码处理分块数据，但是处理解码来自该点的数据。

另外值得一看的是基于Swift的RTMP库HaishinKit（以前的“LF”），它有自己的解码实现，包括更强大的NALUparsing： https ： //github.com/shogo4405/lf.swift

除了上面的VTErrors之外，我认为值得添加CMFormatDescription，CMBlockBuffer，CMSampleBuffer错误，你可能会在尝试Livy的例子的时候遇到这个错误。

 kCMFormatDescriptionError_InvalidParameter = -12710, kCMFormatDescriptionError_AllocationFailed = -12711, kCMFormatDescriptionError_ValueNotAvailable = -12718, kCMBlockBufferNoErr = 0, kCMBlockBufferStructureAllocationFailedErr = -12700, kCMBlockBufferBlockAllocationFailedErr = -12701, kCMBlockBufferBadCustomBlockSourceErr = -12702, kCMBlockBufferBadOffsetParameterErr = -12703, kCMBlockBufferBadLengthParameterErr = -12704, kCMBlockBufferBadPointerParameterErr = -12705, kCMBlockBufferEmptyBBufErr = -12706, kCMBlockBufferUnallocatedBlockErr = -12707, kCMBlockBufferInsufficientSpaceErr = -12708, kCMSampleBufferError_AllocationFailed = -12730, kCMSampleBufferError_RequiredParameterMissing = -12731, kCMSampleBufferError_AlreadyHasDataBuffer = -12732, kCMSampleBufferError_BufferNotReady = -12733, kCMSampleBufferError_SampleIndexOutOfRange = -12734, kCMSampleBufferError_BufferHasNoSampleSizes = -12735, kCMSampleBufferError_BufferHasNoSampleTimingInfo = -12736, kCMSampleBufferError_ArrayTooSmall = -12737, kCMSampleBufferError_InvalidEntryCount = -12738, kCMSampleBufferError_CannotSubdivide = -12739, kCMSampleBufferError_SampleTimingInfoInvalid = -12740, kCMSampleBufferError_InvalidMediaTypeForOperation = -12741, kCMSampleBufferError_InvalidSampleData = -12742, kCMSampleBufferError_InvalidMediaFormat = -12743, kCMSampleBufferError_Invalidated = -12744, kCMSampleBufferError_DataFailed = -16750, kCMSampleBufferError_DataCanceled = -16751,

@Livy在CMVideoFormatDescriptionCreateFromH264ParameterSets之前删除内存泄漏您应该添加以下行：

if (_formatDesc) { CFRelease(_formatDesc); _formatDesc = NULL; }

如何使用VideoToolbox解压H.264videostream

概念：

程序：

其他说明：

代码示例：

MediaCodec和Camera：颜色空间不匹配

FFMPEG（libx264）“高度不能被2整除”

使用ffmpeg连接两个mp4文件

使用Android MediaCodec从相机编码H.264

如何处理原始的UDP数据包，以便它们可以通过directshow源filter中的解码器filter进行解码

H.264stream的序列/图像参数集的可能位置

获取H264videostream的尺寸

Android上使用FFMPEG

如何使用x264 C API将一系列图像编码到H264中？