Deep 6-DoF Object Detection and Tracking (2024)

  • Direkt zur Navigation springen
  • Direkt zur Suche springen
  • Direkt zum Inhalt springen
  • Start page >
  • Departments >
  • Vision and Imaging Technologies >
  • Research Groups >
  • >
  • Research Topics

Vision and Imaging TechnologiesNewsTechnologies and SolutionsResearch Groups

    • Research Topics
      • CVG Research Overview
      • Neural Speech-Driven Face Animation
      • Video-driven Facial Animation
    • Projects
    • Publications
    • Team
    • Student Opportunities
    • Research Topics
      • IMC Research Overview
    • Projects
    • Publications
    • Team
    • Research Topics
    • Projects
    • Publications
    • Team
    • Research Topics
      • Pose and gesture analysis
      • Behaviour analysis for human-computer interaction
      • Contact-free Human-Computer Interaction
      • Image Quality Estimation
      • Subjective Tests
    • Projects
    • Publications
    • Team
      • Birgit Nierula

ProjectsPublicationsPeople and ContactStudent Opportunities

The knowledge about the pose with six degrees of freedom (6-DoF), meaning rotation and translation in 3D space, of objects in front of a camera is essential for many tasks in the field of augmented reality (AR) and robotic object manipulation. As an example, displaying local assistance information during assembly tasks in AR glasses helps positioning work pieces correctly by annotating the field-of-view with information about necessary movements in real-time.

Multi-Object Tracking

In our research, we focus on the creation of neural networks for multi-object pose estimation from rgb images. A single neural network differentiates between a fixed number of objects and estimates their poses accordingly. In contrast to state-of-the-art procedures, which train an independent network for every object, we only need one network inference, independent of the number of different objects in the image, reducing the computation time. Furthermore, introducing only few specific extra weights per object decreases the amount of memory needed to store the detection network. During training, all objects to be found are shown to the network, simultaneously, making it learn to distinguish similar objects by their geometric differences.

Deep 6-DoF Object Detection and Tracking (2)

We utilize ideas from neural style transfer and conditional image synthesis and apply them to the task of pose estimation. The pixel-wise object segmentation is decoupled from the actual feature-point regression for pose estimation with 2D-3D correspondences. Rather, the semantic segmentation is guiding the estimation in a dedicated feature-point decoder. Object-specific (de)-normalization parameters are transforming feature-maps based on the spatial arrangement of the segmentation map.

Synthetic Training Data

Training a Convolutional Neural Network (CNN) for object pose estimation requires a large number of images annotated with the associated instance segmentations and object poses. Manual recording and annotation of this training data is very time-consuming and inflexible and not feasible in most real-world scenarios. Instead, the 3D models of the objects are used to generate the training data synthetically by means of computer graphics. This allows the effortless generation of large datasets and the usage of this perfectly annotated data in training. We reduce the degradation of performance due to the difference between real and synthetic images, known as the domain gap, by combining domain randomized – a diverse, randomised determination of object texture, scene lighting and background – and near-photorealistic synthetic images.

Besides using the synthetic datasets for pose estimation of specific objects, we evaluate if the data is suitable to solve the more general tasks of learning generic 2D-3D similarity metrics.

Local Refinement

Especially in the context of AR applications, stable poses and pixel-accurate overlays are of high importance to maintain the illusion of interacting with virtual objects. Using a CNN for pose estimation is very suitable to get initial estimates, but temporal stability of the tracking in video data is hard to handle by the neural network directly.

A markerless tracking, based on the principle of analysis by synthesis, estimates the movement of the component after pose initialization in real-time. Here, the distance of the computer-generated projection of the edges of the model to the image edges is minimised by adjusting the object pose. An Iteratively Reweighted Least Squares (IRLS) optimisation stabilises the procedure in case of occlusions, e.g. by hands. For uniformly coloured (untextured) objects, this matching must be purely geometry-based. A suitable feature are the silhouette and geometry edges of the component, which are easily recognisable in the camera image regardless of the lighting situation.

Publications

N. Gard, A. Hilsmann, P. Eisert,
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation, Proc. 33rd British Machine Vision Conference (BMVC), Nov. 2022 arXiv [Code] [BMVC]

Niklas Gard, Anna Hilsmann, Peter Eisert
Combining Local and Global Pose Estimation for Precise Tracking of Similar Objects,
International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, virtual, February 2022 arXiv

Projects

This topic is funded in the projects BIMKIT and digitalTwin.

Deep 6-DoF Object Detection and Tracking (2024)

References

Top Articles
Sous Vide Limoncello Recipe
Sugar-Free Keto Chocolate Frosting Recipe | Wholesome Yum
Where To Go After Howling Pit Code Vein
Skyward Sinton
The Ivy Los Angeles Dress Code
50 Meowbahh Fun Facts: Net Worth, Age, Birthday, Face Reveal, YouTube Earnings, Girlfriend, Doxxed, Discord, Fanart, TikTok, Instagram, Etc
Craigslist Pet Phoenix
Optimal Perks Rs3
Khatrimaza Movies
Tribune Seymour
Katie Boyle Dancer Biography
Richmond Va Craigslist Com
How to watch free movies online
Culvers Tartar Sauce
Aspen.sprout Forum
House Party 2023 Showtimes Near Marcus North Shore Cinema
Dc Gas Login
272482061
Unlv Mid Semester Classes
24 Hour Drive Thru Car Wash Near Me
2020 Military Pay Charts – Officer & Enlisted Pay Scales (3.1% Raise)
Lehmann's Power Equipment
Royal Cuts Kentlands
Teacup Yorkie For Sale Up To $400 In South Carolina
Lisas Stamp Studio
Understanding Gestalt Principles: Definition and Examples
Drift Hunters - Play Unblocked Game Online
Barista Breast Expansion
Villano Antillano Desnuda
Ts Modesto
Does Royal Honey Work For Erectile Dysfunction - SCOBES-AR
Ff14 Sage Stat Priority
Shauna's Art Studio Laurel Mississippi
Magicseaweed Capitola
10 games with New Game Plus modes so good you simply have to play them twice
Labyrinth enchantment | PoE Wiki
„Wir sind gut positioniert“
Google Flights Orlando
Exploring the Digital Marketplace: A Guide to Craigslist Miami
Craigslist Minneapolis Com
Sound Of Freedom Showtimes Near Amc Mountainside 10
Rocket Lab hiring Integration & Test Engineer I/II in Long Beach, CA | LinkedIn
Sherwin Source Intranet
Iron Drop Cafe
300+ Unique Hair Salon Names 2024
Rocket Bot Royale Unblocked Games 66
Minecraft Enchantment Calculator - calculattor.com
The Missile Is Eepy Origin
Ff14 Palebloom Kudzu Cloth
211475039
The Love Life Of Kelsey Asbille: A Comprehensive Guide To Her Relationships
Latest Posts
Article information

Author: Errol Quitzon

Last Updated:

Views: 6186

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.