'Using Vision and RealityKit Rotates Counterclockwise and Distorts(Stretches?) Video
I am attempting to learn object detection in iOS, and then mark the place of the detected object. I have the model trained and installed in the project. My next step is to show an AR View on screen. That is working. When I turn my vision processing code on via a button, I end up with the image on screen rotated and distorted (most likely just stretching due to inverted axis).
I found a partial tutorial that I was using to help guide me, and they seem to have run into this issue, solved it, but did not show the solution. I have no way of contacting the author. The author's comment was: one slightly tricky aspect to this was that the coordinate system returned from Vision was different than SwiftUI’s coordinate system (normalized and the y-axis was flipped), but some simple transformations did the trick.
I have no idea which simple transformations they were, but I suspect they were simd related. If anyone has insight into this, I would appreciate solving the rotation and distortion issue.
I do have error codes that appear in the console as soon as Vision starts:
Messages similar to this:
2022-05-12 21:14:39.142550-0400 Find My Apple Remote[66143:9990936] [Assets] Resolving material name 'engine:BuiltinRenderGraphResources/AR/arInPlacePostProcessCombinedPermute7.rematerial' as an asset path -- this usage is deprecated; instead provide a valid bundle
2022-05-12 21:14:39.270684-0400 Find My Apple Remote[66143:9991089] [Session] ARSession <0x111743970>: ARSessionDelegate is retaining 11 ARFrames. This can lead to future camera frames being dropped.
2022-05-12 21:14:40.121810-0400 Find My Apple Remote[66143:9991117] [CAMetalLayer nextDrawable] returning nil because allocation failed.
The one that concerns me the most is the last one.
My code, so far, is:
struct ContentView : View {
@State private var isDetecting = false
@State private var success = false
var body: some View {
VStack {
RealityKitView(isDetecting: $isDetecting, success: $success)
.overlay(alignment: .top) {
Image(systemName: (success ? "checkmark.circle" : "slash.circle"))
.foregroundColor(success ? .green : .red)
}
Button {
isDetecting.toggle()
} label: {
Text(isDetecting ? "Stop Detecting" : "Start Detecting")
.frame(width: 150, height: 50)
.background(
Capsule()
.fill(isDetecting ? Color.red.opacity(0.5) : Color.green.opacity(0.5))
)
}
}
}
}
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView()
let scale = SIMD3<Float>(repeating: 0.1)
let model: VNCoreMLModel? = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool
@Binding var success: Bool
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Start vision processing
if parent.isDetecting {
guard let model = parent.model else {
return
}
// I suspect the problem is here where the image is captured in a buffer, and then
// turned in to an input for the CoreML model.
let pixelBuffer = frame.capturedImage
let input = AppleRemoteDetectorInput(image: pixelBuffer)
do {
let request = VNCoreMLRequest(model: model) { (request, error) in
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservation = results as? [VNRecognizedObjectObservation],
let first = recognizedObjectObservation.first
else {
self.parent.boundingBox = nil
self.parent.success = false
return
}
self.parent.success = true
print("\(first.boundingBox)")
self.parent.boundingBox = first.boundingBox
}
model.featureProvider = input
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.right, options: [:])
try handler.perform([request])
} catch {
print(error)
}
}
}
}
}
Solution 1:[1]
After days of trying to figure this out, with research and more research, I came across this question and answer that provides the solution. Please note that both answers are valid, it just depends upon the structure of your app.
The crux of the issue is that causing a state change in RealityKitView
causes the ARView
to be re-instantiated. However, this time, it is instantiated with a size of 0, and that is what causes the error[CAMetalLayer nextDrawable] returning nil because allocation failed
as this causes it to return nil. However, initializing it with some size like this:
let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
resolves that issue.
For the sake of those who are attempting this in the future, here is the current working UIViewRepresentable
:
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView(frame: .init(x: 1, y: 1, width: 1, height: 1), cameraMode: .ar, automaticallyConfigureSession: false)
// Making this implicity unwrapped. If this fails, the app should crash anyway...
let model: VNCoreMLModel! = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool // This turns Vision on and off
@Binding var success: Bool // This is the state of Vision's finding the object
@Binding var message: String // This allows different messages to be communicated to the user
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Add coaching overlay
addCoachingOverlay(session: session)
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func addCoachingOverlay(session: ARSession) {
let coachingOverlay = ARCoachingOverlayView()
coachingOverlay.autoresizingMask = [.flexibleWidth, .flexibleHeight]
coachingOverlay.session = session
coachingOverlay.goal = .horizontalPlane
arView.addSubview(coachingOverlay)
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let config = MLModelConfiguration()
config.computeUnits = .all
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {
}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
if parent.isDetecting {
// Do not enqueue other buffers for processing while another Vision task is still running.
// The camera stream has only a finite amount of buffers available; holding too many buffers for analysis would starve the camera.
guard currentBuffer == nil, case .normal = frame.camera.trackingState else {
return
}
// Retain the image buffer for Vision processing.
self.currentBuffer = frame.capturedImage
classifyCurrentImage()
}
}
// MARK: - Vision classification
// Vision classification request and model
/// - Tag: ClassificationRequest
private lazy var classificationRequest: VNCoreMLRequest = {
// Instantiate the model from its generated Swift class.
let request = VNCoreMLRequest(model: parent.model, completionHandler: { [weak self] request, error in
self?.processClassifications(for: request, error: error)
})
// Crop input images to square area at center, matching the way the ML model was trained.
request.imageCropAndScaleOption = .scaleFill
// Use CPU for Vision processing to ensure that there are adequate GPU resources for rendering.
request.usesCPUOnly = true
return request
}()
// The pixel buffer being held for analysis; used to serialize Vision requests.
private var currentBuffer: CVPixelBuffer?
// Queue for dispatching vision classification requests
private let visionQueue = DispatchQueue(label: "com.alelin.Find-My-Apple-Remote.ARKitVision.serialVisionQueue")
// Run the Vision+ML classifier on the current image buffer.
/// - Tag: ClassifyCurrentImage
private func classifyCurrentImage() {
guard let currentBuffer = currentBuffer else {
return
}
// Most computer vision tasks are not rotation agnostic so it is important to pass in the orientation of the image with respect to device.
// This is an extension on CGImagePropertyOrientation
let orientation = CGImagePropertyOrientation(UIDevice.current.orientation)
let input = AppleRemoteDetectorInput(image: currentBuffer)
parent.model.featureProvider = input
let requestHandler = VNImageRequestHandler(cvPixelBuffer: currentBuffer, orientation: orientation, options: [:])
visionQueue.async {
do {
// Release the pixel buffer when done, allowing the next buffer to be processed.
defer { self.currentBuffer = nil }
try requestHandler.perform([self.classificationRequest])
} catch {
print("Error: Vision request failed with error \"\(error)\"")
}
}
}
// Handle completion of the Vision request and choose results to display.
/// - Tag: ProcessClassifications
func processClassifications(for request: VNRequest, error: Error?) {
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservations = results as? [VNRecognizedObjectObservation],
let recognizedObjectObservation = recognizedObjectObservations.first,
let bestResult = recognizedObjectObservation.labels.first(where: { result in result.confidence > 0.5 }),
let label = bestResult.identifier.split(separator: ",").first
else {
self.parent.boundingBox = nil
self.parent.success = false
if let error = error {
print("Unable to classify image.\n\(error.localizedDescription)")
}
return
}
self.parent.success = true
print("\(recognizedObjectObservation.boundingBox)")
self.parent.boundingBox = recognizedObjectObservation.boundingBox
// Show a label for the highest-confidence result (but only above a minimum confidence threshold).
let confidence = String(format: "%.0f", bestResult.confidence * 100)
let labelString = String(label)
parent.message = "\(labelString) at \(confidence)"
}
func session(_ session: ARSession, didFailWithError error: Error) {
guard error is ARError else { return }
let errorWithInfo = error as NSError
let messages = [
errorWithInfo.localizedDescription,
errorWithInfo.localizedFailureReason,
errorWithInfo.localizedRecoverySuggestion
]
// Filter out optional error messages.
let errorMessage = messages.compactMap({ $0 }).joined(separator: "\n")
DispatchQueue.main.async {
self.parent.message = "The AR session failed with error: \(errorMessage)"
}
}
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Yrb |