如何使用VideoToolbox解压H.264videostream

我在计算如何使用Apple的硬件加速video框架来解压缩H.264videostream时遇到了很多麻烦。 几个星期后,我想出来,想分享一个广泛的例子,因为我找不到一个。

我的目标是在WWDC14会议513中介绍一个video工具箱的全面,有益的例子。 我的代码不能编译或运行,因为它需要与基本的H.264stream(如从文件读取的video或从网上stream传输的video)集成,需要根据具体情况进行调整。

我应该提到,除了我在学习Google的过程中学到的知识之外,我对video编解码的经验还很less。 我不知道所有的video格式,参数结构等细节,所以我只包括我认为你需要知道的东西。

我正在使用XCode 6.2并已部署到运行iOS 8.1和8.2的iOS设备。

概念:

NALU:NALU只是一个长度不同的数据块,它有一个NALU开始代docker0x00 00 00 01 YY其中0x00 00 00 01 YY的前5位告诉你这是什么types的NALU,因此头后面是什么types的数据。 (因为你只需要前5位,所以我使用YY & 0x1F来获取相关位)。我列出了NSString * const naluTypesStrings[]方法中的所有types,但是你不需要知道它们是什么全部是。

参数:您的解码器需要参数,以便知道如何存储H.264video数据。 你需要设置的是序列参数集(SPS)图片参数集(PPS) ,它们都有自己的NALUtypes编号。 你不需要知道参数是什么意思,解码器知道如何处理它们。

H.264码stream格式:在大多数H.264码stream中,您将收到一个初始的PPS和SPS参数集,然后是一个i帧(也称为IDR帧或同步帧)NALU。 那么你会收到几个P帧NALU(也许几十个左右),然后另一组参数(可能与初始参数相同)和一个I帧,更多的P帧等等。 P帧。 从概念上讲,您可以将i帧视为video的完整图像,而P帧只是对i帧进行的更改,直到您收到下一个i帧为止。

程序:

  1. 从您的H.264stream中生成单独的NALU。 我无法显示此步骤的代码,因为它取决于您正在使用的video源很多。 我做了这个graphics来显示我在做什么(图中的“data”是我的代码中的“frame”),但是你的情况可能并且可能会有所不同。 我正在与什么工作 我的方法receivedRawVideoFrame:每当我收到一个帧( uint8_t *frame ),这是两种types之一被调用。 在图中,这2种框架types是2个大的紫色框。

  2. 使用CMVideoFormatDescriptionCreateFromH264ParameterSets()从SPS和PPS NALUs创buildCMVideoFormatDescriptionRef 。 如果不这样做,您将无法显示任何框架。 SPS和PPS可能看起来像是一堆数字,但VTD知道如何处理它们。 所有你需要知道的是, CMVideoFormatDescriptionRef是一个video数据的描述,如宽度/高度,格式types( kCMPixelFormat_32BGRAkCMVideoCodecType_H264等),宽高比,色彩空间等。解码器将保持参数,直到一个新的集合到达(有时甚至在没有改变的情况下参数也会重新发送)。

  3. 根据“AVCC”格式重新打包您的IDR和非IDR帧NALU。 这意味着删除NALU起始代码并用一个4字节的头部代替NALU的头部长度。 您不需要为SPS和PPS NALU执行此操作。 (请注意,4字节的NALU长度标题是big-endian的,所以如果你有一个UInt32值,它必须在使用CFSwapInt32复制到CMBlockBuffer之前进行字节交换,在我的代码中用htonl函数调用。

  4. 将IDR和非IDR NALU帧打包到CMBlockBuffer中。 不要使用SPS PPS参数NALU执行此操作。 所有你需要知道的关于CMBlockBuffers是它们是一种在核心媒体中包装任意数据块的方法。 (videostream水线中的任何压缩video数据都包含在此中。)

  5. 将CMBlockBuffer打包到CMSampleBuffer中。 所有你需要知道的关于CMSampleBuffers是,他们用其他信息(这里是CMVideoFormatDescriptionCMTime ,如果使用CMTime话)包装我们的CMBlockBuffers

  6. 创build一个VTDecompressionSessionRef并将示例缓冲区提供给VTDecompressionSessionDecodeFrame()。 或者,您可以使用AVSampleBufferDisplayLayer及其enqueueSampleBuffer:方法,而不需要使用VTDecompSession。 设置起来比较简单,但如果出现像VTD那样的错误,就不会抛出错误。

  7. 在VTDecompSessioncallback中,使用生成的CVImageBufferRef来显示video帧。 如果您需要将您的CVImageBuffer转换为UIImage ,请参阅我的StackOverflow答案。

其他说明:

  • H.264stream可以有很大的不同。 据我所知NALU开始代docker有时是3个字节0x00 00 01,有时是4个0x00 00 00 01 )。 我的代码工作4个字节; 如果你使用3,你将需要改变一些东西。

  • 如果您想了解更多关于NALU的信息 ,我发现这个答案非常有帮助。 就我而言,我发现我不需要忽略所描述的“模拟预防”字节,所以我个人跳过了这一步,但是您可能需要知道这一点。

  • 如果您的VTDecompressionSession输出一个错误号码(如-12909),请在您的XCode项目中查找错误代码。 在项目导航器中findVideoToolbox框架,打开它并find头文件VTErrors.h。 如果找不到,我也在下面的所有错误代码中包含了另一个答案。

代码示例:

所以让我们从声明一些全局variables开始,包括VT框架(VT = Video Toolbox)。

 #import <VideoToolbox/VideoToolbox.h> @property (nonatomic, assign) CMVideoFormatDescriptionRef formatDesc; @property (nonatomic, assign) VTDecompressionSessionRef decompressionSession; @property (nonatomic, retain) AVSampleBufferDisplayLayer *videoLayer; @property (nonatomic, assign) int spsSize; @property (nonatomic, assign) int ppsSize; 

以下数组仅用于打印出您正在接收的NALU帧的types。 如果你知道所有这些types的意思,对你有好处,你知道更多关于H.264比我:)我的代码只处理types1,5,7和8。

 NSString * const naluTypesStrings[] = { @"0: Unspecified (non-VCL)", @"1: Coded slice of a non-IDR picture (VCL)", // P frame @"2: Coded slice data partition A (VCL)", @"3: Coded slice data partition B (VCL)", @"4: Coded slice data partition C (VCL)", @"5: Coded slice of an IDR picture (VCL)", // I frame @"6: Supplemental enhancement information (SEI) (non-VCL)", @"7: Sequence parameter set (non-VCL)", // SPS parameter @"8: Picture parameter set (non-VCL)", // PPS parameter @"9: Access unit delimiter (non-VCL)", @"10: End of sequence (non-VCL)", @"11: End of stream (non-VCL)", @"12: Filler data (non-VCL)", @"13: Sequence parameter set extension (non-VCL)", @"14: Prefix NAL unit (non-VCL)", @"15: Subset sequence parameter set (non-VCL)", @"16: Reserved (non-VCL)", @"17: Reserved (non-VCL)", @"18: Reserved (non-VCL)", @"19: Coded slice of an auxiliary coded picture without partitioning (non-VCL)", @"20: Coded slice extension (non-VCL)", @"21: Coded slice extension for depth view components (non-VCL)", @"22: Reserved (non-VCL)", @"23: Reserved (non-VCL)", @"24: STAP-A Single-time aggregation packet (non-VCL)", @"25: STAP-B Single-time aggregation packet (non-VCL)", @"26: MTAP16 Multi-time aggregation packet (non-VCL)", @"27: MTAP24 Multi-time aggregation packet (non-VCL)", @"28: FU-A Fragmentation unit (non-VCL)", @"29: FU-B Fragmentation unit (non-VCL)", @"30: Unspecified (non-VCL)", @"31: Unspecified (non-VCL)", }; 

现在,这是所有魔法发生的地方。

 -(void) receivedRawVideoFrame:(uint8_t *)frame withSize:(uint32_t)frameSize isIFrame:(int)isIFrame { OSStatus status; uint8_t *data = NULL; uint8_t *pps = NULL; uint8_t *sps = NULL; // I know what my H.264 data source's NALUs look like so I know start code index is always 0. // if you don't know where it starts, you can use a for loop similar to how i find the 2nd and 3rd start codes int startCodeIndex = 0; int secondStartCodeIndex = 0; int thirdStartCodeIndex = 0; long blockLength = 0; CMSampleBufferRef sampleBuffer = NULL; CMBlockBufferRef blockBuffer = NULL; int nalu_type = (frame[startCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); // if we havent already set up our format description with our SPS PPS parameters, we // can't process any frames except type 7 that has our parameters if (nalu_type != 7 && _formatDesc == NULL) { NSLog(@"Video error: Frame is not an I Frame and format description is null"); return; } // NALU type 7 is the SPS parameter NALU if (nalu_type == 7) { // find where the second PPS start code begins, (the 0x00 00 00 01 code) // from which we also get the length of the first SPS code for (int i = startCodeIndex + 4; i < startCodeIndex + 40; i++) { if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01) { secondStartCodeIndex = i; _spsSize = secondStartCodeIndex; // includes the header in the size break; } } // find what the second NALU type is nalu_type = (frame[secondStartCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); } // type 8 is the PPS parameter NALU if(nalu_type == 8) { // find where the NALU after this one starts so we know how long the PPS parameter is for (int i = _spsSize + 4; i < _spsSize + 30; i++) { if (frame[i] == 0x00 && frame[i+1] == 0x00 && frame[i+2] == 0x00 && frame[i+3] == 0x01) { thirdStartCodeIndex = i; _ppsSize = thirdStartCodeIndex - _spsSize; break; } } // allocate enough data to fit the SPS and PPS parameters into our data objects. // VTD doesn't want you to include the start code header (4 bytes long) so we add the - 4 here sps = malloc(_spsSize - 4); pps = malloc(_ppsSize - 4); // copy in the actual sps and pps values, again ignoring the 4 byte header memcpy (sps, &frame[4], _spsSize-4); memcpy (pps, &frame[_spsSize+4], _ppsSize-4); // now we set our H264 parameters uint8_t* parameterSetPointers[2] = {sps, pps}; size_t parameterSetSizes[2] = {_spsSize-4, _ppsSize-4}; status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault, 2, (const uint8_t *const*)parameterSetPointers, parameterSetSizes, 4, &_formatDesc); NSLog(@"\t\t Creation of CMVideoFormatDescription: %@", (status == noErr) ? @"successful!" : @"failed..."); if(status != noErr) NSLog(@"\t\t Format Description ERROR type: %d", (int)status); // See if decomp session can convert from previous format description // to the new one, if not we need to remake the decomp session. // This snippet was not necessary for my applications but it could be for yours /*BOOL needNewDecompSession = (VTDecompressionSessionCanAcceptFormatDescription(_decompressionSession, _formatDesc) == NO); if(needNewDecompSession) { [self createDecompSession]; }*/ // now lets handle the IDR frame that (should) come after the parameter sets // I say "should" because that's how I expect my H264 stream to work, YMMV nalu_type = (frame[thirdStartCodeIndex + 4] & 0x1F); NSLog(@"~~~~~~~ Received NALU Type \"%@\" ~~~~~~~~", naluTypesStrings[nalu_type]); } // create our VTDecompressionSession. This isnt neccessary if you choose to use AVSampleBufferDisplayLayer if((status == noErr) && (_decompressionSession == NULL)) { [self createDecompSession]; } // type 5 is an IDR frame NALU. The SPS and PPS NALUs should always be followed by an IDR (or IFrame) NALU, as far as I know if(nalu_type == 5) { // find the offset, or where the SPS and PPS NALUs end and the IDR frame NALU begins int offset = _spsSize + _ppsSize; blockLength = frameSize - offset; data = malloc(blockLength); data = memcpy(data, &frame[offset], blockLength); // replace the start code header on this NALU with its size. // AVCC format requires that you do this. // htonl converts the unsigned int from host to network byte order uint32_t dataLength32 = htonl (blockLength - 4); memcpy (data, &dataLength32, sizeof (uint32_t)); // create a block buffer from the IDR NALU status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold buffered data blockLength, // block length of the mem block in bytes. kCFAllocatorNull, NULL, 0, // offsetToData blockLength, // dataLength of relevant bytes, starting at offsetToData 0, &blockBuffer); NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed..."); } // NALU type 1 is non-IDR (or PFrame) picture if (nalu_type == 1) { // non-IDR frames do not have an offset due to SPS and PSS, so the approach // is similar to the IDR frames just without the offset blockLength = frameSize; data = malloc(blockLength); data = memcpy(data, &frame[0], blockLength); // again, replace the start header with the size of the NALU uint32_t dataLength32 = htonl (blockLength - 4); memcpy (data, &dataLength32, sizeof (uint32_t)); status = CMBlockBufferCreateWithMemoryBlock(NULL, data, // memoryBlock to hold data. If NULL, block will be alloc when needed blockLength, // overall length of the mem block in bytes kCFAllocatorNull, NULL, 0, // offsetToData blockLength, // dataLength of relevant data bytes, starting at offsetToData 0, &blockBuffer); NSLog(@"\t\t BlockBufferCreation: \t %@", (status == kCMBlockBufferNoErr) ? @"successful!" : @"failed..."); } // now create our sample buffer from the block buffer, if(status == noErr) { // here I'm not bothering with any timing specifics since in my case we displayed all frames immediately const size_t sampleSize = blockLength; status = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, NULL, _formatDesc, 1, 0, NULL, 1, &sampleSize, &sampleBuffer); NSLog(@"\t\t SampleBufferCreate: \t %@", (status == noErr) ? @"successful!" : @"failed..."); } if(status == noErr) { // set some values of the sample buffer's attachments CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES); CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0); CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue); // either send the samplebuffer to a VTDecompressionSession or to an AVSampleBufferDisplayLayer [self render:sampleBuffer]; } // free memory to avoid a memory leak, do the same for sps, pps and blockbuffer if (NULL != data) { free (data); data = NULL; } } 

以下方法创build您的VTD会话。 每当您收到新的参数时重新创build它。 (每当你收到参数,你都不必重新创build它,很确定。)

如果要为目标CVPixelBuffer设置属性,请阅读CoreVideo PixelBufferAttributes值并将其放入NSDictionary *destinationImageBufferAttributes

 -(void) createDecompSession { // make sure to destroy the old VTD session _decompressionSession = NULL; VTDecompressionOutputCallbackRecord callBackRecord; callBackRecord.decompressionOutputCallback = decompressionSessionDecodeFrameCallback; // this is necessary if you need to make calls to Objective C "self" from within in the callback method. callBackRecord.decompressionOutputRefCon = (__bridge void *)self; // you can set some desired attributes for the destination pixel buffer. I didn't use this but you may // if you need to set some attributes, be sure to uncomment the dictionary in VTDecompressionSessionCreate NSDictionary *destinationImageBufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithBool:YES], (id)kCVPixelBufferOpenGLESCompatibilityKey, nil]; OSStatus status = VTDecompressionSessionCreate(NULL, _formatDesc, NULL, NULL, // (__bridge CFDictionaryRef)(destinationImageBufferAttributes) &callBackRecord, &_decompressionSession); NSLog(@"Video Decompression Session Create: \t %@", (status == noErr) ? @"successful!" : @"failed..."); if(status != noErr) NSLog(@"\t\t VTD ERROR type: %d", (int)status); } 

现在,每当VTD完成解压缩任何发送给它的帧时,都会调用此方法。 即使出现错误或帧被丢弃,该方法也会被调用。

 void decompressionSessionDecodeFrameCallback(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef imageBuffer, CMTime presentationTimeStamp, CMTime presentationDuration) { THISCLASSNAME *streamManager = (__bridge THISCLASSNAME *)decompressionOutputRefCon; if (status != noErr) { NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain code:status userInfo:nil]; NSLog(@"Decompressed error: %@", error); } else { NSLog(@"Decompressed sucessfully"); // do something with your resulting CVImageBufferRef that is your decompressed frame [streamManager displayDecodedFrame:imageBuffer]; } } 

这是我们实际发送sampleBuffer到VTD进行解码的地方。

 - (void) render:(CMSampleBufferRef)sampleBuffer { VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression; VTDecodeInfoFlags flagOut; NSDate* currentTime = [NSDate date]; VTDecompressionSessionDecodeFrame(_decompressionSession, sampleBuffer, flags, (void*)CFBridgingRetain(currentTime), &flagOut); CFRelease(sampleBuffer); // if you're using AVSampleBufferDisplayLayer, you only need to use this line of code // [videoLayer enqueueSampleBuffer:sampleBuffer]; } 

如果使用的是AVSampleBufferDisplayLayer ,请确保像这样初始化图层,在viewDidLoad或其他一些init方法中。

 -(void) viewDidLoad { // create our AVSampleBufferDisplayLayer and add it to the view videoLayer = [[AVSampleBufferDisplayLayer alloc] init]; videoLayer.frame = self.view.frame; videoLayer.bounds = self.view.bounds; videoLayer.videoGravity = AVLayerVideoGravityResizeAspect; // set Timebase, you may need this if you need to display frames at specific times // I didn't need it so I haven't verified that the timebase is working CMTimebaseRef controlTimebase; CMTimebaseCreateWithMasterClock(CFAllocatorGetDefault(), CMClockGetHostTimeClock(), &controlTimebase); //videoLayer.controlTimebase = controlTimebase; CMTimebaseSetTime(self.videoLayer.controlTimebase, kCMTimeZero); CMTimebaseSetRate(self.videoLayer.controlTimebase, 1.0); [[self.view layer] addSublayer:videoLayer]; } 

如果在框架中找不到VTD错误代码,我决定把它们包含在这里。 (同样,所有这些错误和更多可以在项目导航器中的VideoToolbox.framework本身的VTErrors.h文件中find。

您将在VTD解码帧callback中或在创buildVTD会话时收到其中一个错误代码,如果您执行了错误的操作。

 kVTPropertyNotSupportedErr = -12900, kVTPropertyReadOnlyErr = -12901, kVTParameterErr = -12902, kVTInvalidSessionErr = -12903, kVTAllocationFailedErr = -12904, kVTPixelTransferNotSupportedErr = -12905, // cf -8961 kVTCouldNotFindVideoDecoderErr = -12906, kVTCouldNotCreateInstanceErr = -12907, kVTCouldNotFindVideoEncoderErr = -12908, kVTVideoDecoderBadDataErr = -12909, // cf -8969 kVTVideoDecoderUnsupportedDataFormatErr = -12910, // cf -8970 kVTVideoDecoderMalfunctionErr = -12911, // cf -8960 kVTVideoEncoderMalfunctionErr = -12912, kVTVideoDecoderNotAvailableNowErr = -12913, kVTImageRotationNotSupportedErr = -12914, kVTVideoEncoderNotAvailableNowErr = -12915, kVTFormatDescriptionChangeNotSupportedErr = -12916, kVTInsufficientSourceColorDataErr = -12917, kVTCouldNotCreateColorCorrectionDataErr = -12918, kVTColorSyncTransformConvertFailedErr = -12919, kVTVideoDecoderAuthorizationErr = -12210, kVTVideoEncoderAuthorizationErr = -12211, kVTColorCorrectionPixelTransferFailedErr = -12212, kVTMultiPassStorageIdentifierMismatchErr = -12213, kVTMultiPassStorageInvalidErr = -12214, kVTFrameSiloInvalidTimeStampErr = -12215, kVTFrameSiloInvalidTimeRangeErr = -12216, kVTCouldNotFindTemporalFilterErr = -12217, kVTPixelTransferNotPermittedErr = -12218, 

Josh Baker的Avios库提供了一个很好的Swift示例: https : //github.com/tidwall/Avios

请注意,Avios当前期望用户在NAL起始代码处理分块数据,但是处理解码来自该点的数据。

另外值得一看的是基于Swift的RTMP库HaishinKit(以前的“LF”),它有自己的解码实现,包括更强大的NALUparsing: https : //github.com/shogo4405/lf.swift

除了上面的VTErrors之外,我认为值得添加CMFormatDescription,CMBlockBuffer,CMSampleBuffer错误,你可能会在尝试Livy的例子的时候遇到这个错误。

 kCMFormatDescriptionError_InvalidParameter = -12710, kCMFormatDescriptionError_AllocationFailed = -12711, kCMFormatDescriptionError_ValueNotAvailable = -12718, kCMBlockBufferNoErr = 0, kCMBlockBufferStructureAllocationFailedErr = -12700, kCMBlockBufferBlockAllocationFailedErr = -12701, kCMBlockBufferBadCustomBlockSourceErr = -12702, kCMBlockBufferBadOffsetParameterErr = -12703, kCMBlockBufferBadLengthParameterErr = -12704, kCMBlockBufferBadPointerParameterErr = -12705, kCMBlockBufferEmptyBBufErr = -12706, kCMBlockBufferUnallocatedBlockErr = -12707, kCMBlockBufferInsufficientSpaceErr = -12708, kCMSampleBufferError_AllocationFailed = -12730, kCMSampleBufferError_RequiredParameterMissing = -12731, kCMSampleBufferError_AlreadyHasDataBuffer = -12732, kCMSampleBufferError_BufferNotReady = -12733, kCMSampleBufferError_SampleIndexOutOfRange = -12734, kCMSampleBufferError_BufferHasNoSampleSizes = -12735, kCMSampleBufferError_BufferHasNoSampleTimingInfo = -12736, kCMSampleBufferError_ArrayTooSmall = -12737, kCMSampleBufferError_InvalidEntryCount = -12738, kCMSampleBufferError_CannotSubdivide = -12739, kCMSampleBufferError_SampleTimingInfoInvalid = -12740, kCMSampleBufferError_InvalidMediaTypeForOperation = -12741, kCMSampleBufferError_InvalidSampleData = -12742, kCMSampleBufferError_InvalidMediaFormat = -12743, kCMSampleBufferError_Invalidated = -12744, kCMSampleBufferError_DataFailed = -16750, kCMSampleBufferError_DataCanceled = -16751, 

@Livy在CMVideoFormatDescriptionCreateFromH264ParameterSets之前删除内存泄漏您应该添加以下行:

if (_formatDesc) { CFRelease(_formatDesc); _formatDesc = NULL; }