Tag Archives: Core Video

Creating QuickTime Movies in iOS Using AVFoundation

I finally set aside some time to update my iOS app, Wave, and in doing so decided to start converting some of the underlying codebase of the app from Objective-C to Swift. The code that handles converting a recorded session to a QuickTime movie was a good candidate to start with since it is very isolated from the rest of the app. On a more educational note, it provides good insight into how to use CGContext effectively to achieve good architecture, reduce complexity, and eliminate duplicated code.

When viewing a session in Wave, it is rendered into the CGContext of the view. To render it into a QuickTime movie, it is rendered into a bitmap graphics context (which is just a special type of CGContext) that represents 1 frame in the movie. That way, all the rendering can live in one place, and I just pass it a different CGContext depending on what the render target is. With that in mind, let’s see how to create the QuickTime movie.

import Foundation
import AVFoundation

fileprivate let kVideoPixelFormatType = kCVPixelFormatType_32ARGB
fileprivate let kVideoFPS = 60

struct VideoExporter {
    typealias VideoProgressBlock = (Int, Int) -> Void
    typealias VideoCompletionBlock = (URL?, Error?) -> Void
    
    let session: Session
    let renderer: WaveRenderer
    
    init(session: Session) {
        self.session = session
        self.renderer = WaveRenderer(session: session)
    }
    
    func export(size: CGSize, withProgressBlock progessBlock: VideoProgressBlock?, completion: VideoCompletionBlock?) {
        // Export to QuickTime...
    }
}

The public API of the VideoExporter struct is shown above; it can be called to export a session to a QuickTime movie like this:

let session = ...
let exporter = VideoExporter(session: session)
        
DispatchQueue.global(qos: .default).async {
    let videoSize = CGSize(width: 640, height: 480)
    exporter.export(size: videoSize) { (framesCompleted, totalFrames) in
        // Report progress
        debugPrint("\(framesCompleted)/\(totalFrames)")
    } completion: { (fileUrl, error) in
        if let movieUrl = fileUrl {
            debugPrint("Export done: \(movieUrl.absoluteString)")
        } else {
            debugPrint("Export error: ", error?.localizedDescription ?? "unknown error")
        }
    }
}

The export method starts out by creating a temporary file to write the QuickTime file to:

var fileUrl = URL(fileURLWithPath: NSTemporaryDirectory())
fileUrl.appendPathComponent(UUID().uuidString)
fileUrl.appendPathExtension("mov")

To write to a QuickTime format, we need an AVAssetWriter instance. It simply takes the URL of the file we wish to write to and the format of that file. However, it also needs an input source, which is what we use to write the content into the file. In this case, we need an instance of AVAssetWriterInput. As mentioned at the start, we want to be able to render to a bitmap context and use that to append the pixel buffer as frames of the QuickTime movie, so an instance of AVAssetWriterInputPixelBufferAdaptor is needed.

let options: [String: Any] = [kCVPixelBufferCGImageCompatibilityKey as String: kCFBooleanTrue as Any]
var pixelBuffer: CVPixelBuffer?
let status = CVPixelBufferCreate(kCFAllocatorDefault, Int(size.width), Int(size.height), kVideoPixelFormatType, options as CFDictionary, &pixelBuffer)
if status != kCVReturnSuccess {
    // Handle error
    return
}

let videoSettings: [String: Any] = [AVVideoCodecKey: AVVideoCodecType.hevc,
                                    AVVideoWidthKey: size.width,
                                    AVVideoHeightKey: size.height
]

let pixelBufferAttr: [String: Any] = [kCVPixelBufferPixelFormatTypeKey as String: kVideoPixelFormatType]
let writerInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoSettings, sourceFormatHint: nil)
let adaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: pixelBufferAttr)

let assetWriter = try! AVAssetWriter(url: fileUrl, fileType: .mov)
assetWriter.add(writerInput)
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: .zero)

We’re now ready to render the frames and append them to the asset writer. The bitmap context is created by initializing it with the memory of the CVPixelBuffer so that when we render to the bitmap context, it will write the pixel information to this buffer, which is then appended to the asset writer.

var status = CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
if status != kCVReturnSuccess {
    // Handle error
}

let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let bitmapInfo = CGImageAlphaInfo.premultipliedFirst.rawValue
let bitmapContext = CGContext(data: pixelData, width: Int(size.width), height: Int(size.height), bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo)!

let width = CGFloat(bitmapContext.width)
let height = CGFloat(bitmapContext.height)

// Flip the coordinate system for iOS
bitmapContext.translateBy(x: 0.0, y: height)
bitmapContext.scaleBy(x: 1.0, y: -1.0)

// Each 'pos' in the renderer corresponds to 1 frame
while (renderer.pos < renderer.length) {
    bitmapContext.addRect(CGRect(x: 0.0, y: 0.0, width: width, height: height))
    bitmapContext.setFillColor(gray: 0.3, alpha: 1.0)
    bitmapContext.drawPath(using: .fill)
    
    renderer.drawInContext(bitmapContext)
    
    var appendOk = false
    var idx = 0
    while !appendOk && (idx < 1000) {
        if adaptor.assetWriterInput.isReadyForMoreMediaData {
            let frameTime = CMTime(value: CMTimeValue(renderer.pos), timescale: CMTimeScale(kVideoFPS))
            appendOk = adaptor.append(pixelBuffer, withPresentationTime: frameTime)
        } else {
            Thread.sleep(forTimeInterval: 0.01)
        }
        
        idx += 1
    }
    
    if !appendOk {
        debugPrint("Missed rendering frame.")
        // Handle error
    }
    
    renderer.pos += 1
    DispatchQueue.main.async {
        progressBlock?(self.renderer.pos, self.renderer.length)
    }
}

adaptor.assetWriterInput.markAsFinished()
status = CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
if status != kCVReturnSuccess {
    // Handle error
}

When appending the pixel buffer to the asset writer input adaptor, the presentation time of that frame needs to be provided as well. In this case, that will be the frame number as the value and the FPS (60 frames per second) for the timescale. In other words, this lets us specify the presentation time of a particular frame within the timeline of the video (frame number 30, for example, will be 0.5 seconds into the video). Furthermore, the asset writer may not be ready for more media data as soon as we ask for it, so we check the isReadyForMoreMediaData property within a loop (up to some limit of time) before appending the pixel buffer with the adaptor.

The exporting is now finished, and the QuickTime video is ready! The final step is to let the asset writer know that we’re done, and call the completion block:

assetWriter.finishWriting {
    switch assetWriter.status {
    case .completed:
        DispatchQueue.main.async {
            completion?(assetWriter.outputURL, nil)
        }
    case .failed:
        // Handle error
    default:
        // Handle other cases
    }
}