{"id":175,"date":"2023-05-08T20:59:32","date_gmt":"2023-05-08T20:59:32","guid":{"rendered":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/?page_id=175"},"modified":"2023-12-19T00:03:05","modified_gmt":"2023-12-19T00:03:05","slug":"fall","status":"publish","type":"page","link":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/","title":{"rendered":"Fall 2023"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">Overview<\/h1>\n\n\n\n<p>Our <strong>previous approach<\/strong> to depth estimation contains three main submodules: unary feature extraction, 3D spherical sweeping, and cost volume computation. We have done experiments to optimize the 3D volume computation network and choose different <strong>CNN<\/strong> models to extract the unary features, such as deformable, or spherical convolutions, but the <strong>improvements<\/strong> were very<strong> minimal<\/strong>.&nbsp;Thus we transitioned to use the recent <strong>foundation model<\/strong> with <strong>transformer<\/strong>-based architecture, <strong>DINOv2<\/strong> [1], which enables us to leverage pre-trained models to achieve greater efficiency and potentially improved performance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">About DINO<\/h2>\n\n\n\n<p>DINOv2 employs self-supervised learning to train a <strong>Vision Transformer<\/strong> on a curated dataset of images. This process produces versatile visual features that can be applied across different image tasks without fine-tuning.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw\" style=\"width: 600px\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DINO &#8211; Depth Estimation decoder<\/h2>\n\n\n\n<p>For <strong>depth estimation<\/strong>, DINOv2 employs a <strong>DPT decoder head<\/strong>, which allows for detailed and dense depth predictions. Specifically, the DPT [5] first reassembles tokens from multiple transformer stages into image-like representations, at different scales, and applies fusion modules to merge these representations progressively to predict depth. DINOv2&nbsp; also provides simpler linear decoders, but the DPT head is our primary focus due to its better performance.<br><\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" width=\"743px;\" height=\"295px;\" src=\"https:\/\/lh7-us.googleusercontent.com\/yLXYntRhsiBPe4o6IfdHPR2W-Q52-BcQukW5AeZ0rcxIc17Op-TMdcQvHEeBuhrfM012SH6NEMRJy6P7IVgIG6uUyB0biqIsvyl5_PbPhLY5MJpW4GCTfwf0_xlcU_jvdylGMH2xuncHLup4L3srJ40Smg=nw\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DINO &#8211; Depth Estimation Results<\/h2>\n\n\n\n<p>Here&#8217;s a result table for depth estimation. These methods are measured using the <strong>RMSE metric<\/strong> across three datasets, with lower indicating better performance. DINOv2 with the DPT head performs better against the listed state-of-the-art self-supervised learning models.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"356\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-8-1024x356.png\" alt=\"\" class=\"wp-image-232\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-8-1024x356.png 1024w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-8-300x104.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-8-768x267.png 768w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-8.png 1218w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">DINO with fisheye<\/h2>\n\n\n\n<p>But the question remains \u2013 how does <strong>DINOv2<\/strong> perform <strong>with<\/strong> <strong>distorted images<\/strong>? As shown in the example below, the DINOv2 could give a <strong>coarse depth prediction<\/strong>, but it missed depth information across finer structures and curved surfaces such as the edge of the bench. Thus we also need to investigate ways to make DINOv2 more distortion invariant.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/HNXU5-tCmzGf5YphCusVeQQyUpgqRc1dyckSDGY98ml82083sOSDM_nIB40PAkGOVWot4xtlMHF7GaybHSDXhvAaExBJ8I-Hi6LK6W64JhfRGIyPbi56nsBSskRPmeKNTlCPEGO3wC5yfme_VLINCs-j6w=nw\" style=\"width: 260px\"><img decoding=\"async\" style=\"width: 260px\" src=\"https:\/\/lh7-us.googleusercontent.com\/4Anc8h7NbsPdujz12yNOCVwbdeCMJvvlSLPc-KX05K6LZ__NT-_CcjB5CuK3Cl64ODFpq0g79CbyBRhf60zr-ZtwtEbVez9YZy_psNZQlDUp4Mg9J1Z91DSGaYKNthvZ5WZjgDHb6-iFVThugCCYPrGc8Q=nw\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DarSwin &#8211; Polar Patch Partition<\/h2>\n\n\n\n<p>Recent research like DarSwin [2] offers insights into <strong>adapting transformers for distorted images<\/strong>. Unlike standard patch embedding, DarSwin employs a <strong>distortion-aware Polar Patch Partition<\/strong> that respects the fisheye image\u2019s unique features. It splits the image in the azimuth direction in an equiangular fashion, while the radial split follows the lens distortion curve, accommodating the inherent distortion of fisheye lenses.<br><\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" style=\"width: 260px\" src=\"https:\/\/lh7-us.googleusercontent.com\/_LyGWNN02C-2_DtHfkNWhSYGhXA87U7FNuX6_BYV6HMhrOXPakrixFMt4iuuvz0b8d0wKENq15GnpaMF321lQp-0I21i4F3fGJbluu-qJyE90Y27SfMsXQ9xmvtI3bc9JksKghCJ_EeB9FLwwuBtBKoJUA=nw\"><img decoding=\"async\" style=\"width: 260px\" src=\"https:\/\/lh7-us.googleusercontent.com\/N3Y3jmUVhxf9y6uTZUPA8C6iXRMeGbisgceUF-qTSnn9KHGCNypuWRWgpkJstP5WYVraVMiKQiS4n7fxrE0nTevrtG47YWHY_lBpPIWUUyK9ywdV1Ur7x3v9uuUzQj47FMHymQYBTtikurT7f2FsjVXrqQ=nw\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DarSwin &#8211; Distortion-aware Sampling<\/h2>\n\n\n\n<p>Furthermore, DarSwin collects the same number of points within each patch and then feeds into the patch embedding layers. The result is the Darswin model can not only understand but also respect the fisheye distortion, leading to more accurate depth estimation. We leverage the same <strong>annular patching method<\/strong> to process the raw fisheye image directly instead of relying on square patches as in DINOv2.<\/p>\n\n\n\n<p><img decoding=\"async\" width=\"699px;\" height=\"227px;\" src=\"https:\/\/lh7-us.googleusercontent.com\/aoIDxgv997q6-7GiuSlI_wjXqx0hT-hzwFCv6uqZBaRK4udz_WYwCoOBFhHoExSrlpAOxwYaM-hcH_lWDSgFkwFJeOMu6O__0Kalp5iduaHwSsidyt4Mc-zFm2SPMsOoxJbPaQyma5Sz0g8YS-kmlX5ktg=nw\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Distortion-aware Patch Embeddings<\/h2>\n\n\n\n<p>Once we apply the annular sector-based patching method to the input fisheye, we propose to use <strong>different linear projections<\/strong> for each set of distortion levels. The original ViT (and DINO) uses a single linear projection of all the square patches to get the individual visual tokens. But, given that the level of distortion changes radially in a fisheye image (increase from 1 to 4 in the below example), we apply a different learned linear projection for <strong>each of the distortion levels<\/strong>. The idea is that these learned projections also learn how to deal with the image distortion in those levels. Once we get these <strong>distortion-aware patches,<\/strong> we then add polar positional embeddings which are then fed to the attention mechanism like before.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"961\" height=\"378\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-1.png\" alt=\"\" class=\"wp-image-198\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-1.png 961w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-1-300x118.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-1-768x302.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p>Since the pixel distribution changes radially in fisheye images, we leverage <strong>polar embeddings<\/strong> from the SPE module [3] that rely on the sampled polar point (r, theta) instead of the traditional cosine or RelPos embeddings.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" width=\"716px;\" height=\"262px;\" src=\"https:\/\/lh7-us.googleusercontent.com\/hHmhIP3xeOFFuNJcdIooCWN3glQM9yyeHXWPzq7d1OfD8K1GbaJao8yzobX0E-2ti3b6EcRx3B0LLp4rIilyGEJjk60XPNceZiTZEV-vNz2e0ZYqy02YN3gOo1O6QGJPiJlfhyr66lIkT3yDTcZ5Ar_lvw=s2048\"><\/p>\n\n\n\n<p>The polar positional embeddings can be expressed as below where (<strong>r, theta<\/strong>) represents the position of each sampling point. The sine and cosine functions in the i-th dimension of the<br>position encoding is modulated by a <strong>frequency<\/strong> parameter, and the <strong>power <\/strong>parameter is used to adjust the <strong>expressivity <\/strong>of the position encoding.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" width=\"468px;\" height=\"58px;\" src=\"https:\/\/lh7-us.googleusercontent.com\/1R_H5Yr8aLiXqoVlOncWvOps-oCirfftnHecvi0NsJ_0CEmfmDdXxfiz__EiUob5h_9PuoGghbEWI-qNrHvtMxGTF-1_EyeIKbF8tYqRqlqkG14j2x-F0X9BxCiJByn7y4FnI5jLgfUYNZyLDqI39OJ6JA=s2048\"><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Overall architecture<\/h1>\n\n\n\n<p>The overall architecture from the annular patching approach, to distortion-aware patch embedding, to the polar positional encodings to get the visual tokens in the encoder are shown below. This is then fed to the DPT decoder to get the <strong>corresponding depth map <\/strong>in the fisheye space.<\/p>\n\n\n\n<p class=\"has-text-align-center\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/PRg65bUSqtcXE1TIJPHS_2LikAS8vrvvRxRKWbWbecXr2M5cJZN1w_9byPzuZd7dCDeB-q1ZjLWhoGfypBpYBZmkqIehOvKxFq6WeRmk-eEvuB4tYGVZCByeechBijHvmmZj28ZR_NWlegNKIKKrCIc1Tg=s2048\" style=\"width: 700px\"><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Results<\/h1>\n\n\n\n<p>We first compare the performance of the DINO approach against our baseline <strong>NDDepth <\/strong>[4] on <strong>pinhole <\/strong>images and then evaluate its performance on fisheye images directly. Below, we present some representative results in an indoor environment (house) with the 6 images collected from AirSim (<strong>Back, Bottom, Front, Left, Right, Top<\/strong>) and the respective depth estimation results from DINO and NDDepth. It can be observed that DINO results better capture <strong>fine-grained features<\/strong> from the input pinhole image than the baseline.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"882\" height=\"454\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-3.png\" alt=\"\" class=\"wp-image-208\" style=\"width:675px;height:auto\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-3.png 882w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-3-300x154.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-3-768x395.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p>Next, we evaluate the performance of the DINO approach on a set of environments collected in AirSim and present their relative ground truth depths. From the metrics,  it can be observed DINO does suffer from a higher error in fisheye space compared to the relatively easier pinhole space depth estimation.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"414\" height=\"412\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-4.png\" alt=\"\" class=\"wp-image-211\" style=\"width:539px;height:auto\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-4.png 414w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-4-300x300.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-4-150x150.png 150w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-4-100x100.png 100w\" sizes=\"auto, (max-width: 414px) 100vw, 414px\" \/><\/figure>\n<\/div>\n\n\n<p>The <strong>house <\/strong>environment is an indoor environment, and <strong>Gascola <\/strong>models an outdoor in-the-wild space while <strong>Victorian <\/strong>street models a more urban outdoor landscape. These three were selected and reported to test the performance on a <strong>range of environment settings<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"620\" height=\"310\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-7.png\" alt=\"\" class=\"wp-image-214\" style=\"width:673px;height:auto\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-7.png 620w, https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/12\/image-7-300x150.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/figure>\n\n\n\n<p>Note: The reported depth values were cut by range <em>(0.001, 1.0)<\/em><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">References<\/h1>\n\n\n\n<p>[1] Oquab, M., et al. (2023). DINOv2: Learning Robust Visual Features without Supervision.<\/p>\n\n\n\n<p>[2] Athwale, A., et al. (2023). DarSwin\u202f: Distortion Aware Radial Swin Transformer. IEEE\/CVF International Conference on Computer Vision (ICCV).<\/p>\n\n\n\n<p>[3] Yang, D., et al. (2023). Sector Patch Embedding: An Embedding Module Conforming to The Distortion Pattern of Fisheye Image.<\/p>\n\n\n\n<p>[4] Shao, S., et al. (2023). NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion. IEEE\/CVF International Conference on Computer Vision (ICCV).<br><br>[5] Ranftl, R., et al. (2021). Vision Transformers for Dense Prediction. IEEE\/CVF International Conference on Computer Vision (ICCV).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Our previous approach to depth estimation contains three main submodules: unary feature extraction, 3D spherical sweeping, and cost volume computation. We have done experiments to optimize the 3D volume computation network and choose different CNN models to extract the unary features, such as deformable, or spherical convolutions, but the improvements were very minimal.&nbsp;Thus we &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Fall 2023&#8221;<\/span><\/a><\/p>\n","protected":false},"author":183,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-175","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Fall 2023 - Multi-view robot perception on Edge Devices<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Fall 2023 - Multi-view robot perception on Edge Devices\" \/>\n<meta property=\"og:description\" content=\"Overview Our previous approach to depth estimation contains three main submodules: unary feature extraction, 3D spherical sweeping, and cost volume computation. We have done experiments to optimize the 3D volume computation network and choose different CNN models to extract the unary features, such as deformable, or spherical convolutions, but the improvements were very minimal.&nbsp;Thus we &hellip; Continue reading &quot;Fall 2023&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/\" \/>\n<meta property=\"og:site_name\" content=\"Multi-view robot perception on Edge Devices\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-19T00:03:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lh7-us.googleusercontent.com\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/\",\"name\":\"Fall 2023 - Multi-view robot perception on Edge Devices\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/lh7-us.googleusercontent.com\\\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw\",\"datePublished\":\"2023-05-08T20:59:32+00:00\",\"dateModified\":\"2023-12-19T00:03:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/#primaryimage\",\"url\":\"https:\\\/\\\/lh7-us.googleusercontent.com\\\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw\",\"contentUrl\":\"https:\\\/\\\/lh7-us.googleusercontent.com\\\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/fall\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Fall 2023\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/#website\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/\",\"name\":\"Multi-view robot perception on Edge Devices\",\"description\":\"CMU MSCV &#039;23 capstone project\",\"publisher\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/#organization\",\"name\":\"Multi-view robot perception on Edge Devices\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/wp-content\\\/uploads\\\/sites\\\/93\\\/2023\\\/05\\\/cropped-cropped-icon.png\",\"contentUrl\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/wp-content\\\/uploads\\\/sites\\\/93\\\/2023\\\/05\\\/cropped-cropped-icon.png\",\"width\":250,\"height\":250,\"caption\":\"Multi-view robot perception on Edge Devices\"},\"image\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/f23team16\\\/#\\\/schema\\\/logo\\\/image\\\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Fall 2023 - Multi-view robot perception on Edge Devices","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/","og_locale":"en_US","og_type":"article","og_title":"Fall 2023 - Multi-view robot perception on Edge Devices","og_description":"Overview Our previous approach to depth estimation contains three main submodules: unary feature extraction, 3D spherical sweeping, and cost volume computation. We have done experiments to optimize the 3D volume computation network and choose different CNN models to extract the unary features, such as deformable, or spherical convolutions, but the improvements were very minimal.&nbsp;Thus we &hellip; Continue reading \"Fall 2023\"","og_url":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/","og_site_name":"Multi-view robot perception on Edge Devices","article_modified_time":"2023-12-19T00:03:05+00:00","og_image":[{"url":"https:\/\/lh7-us.googleusercontent.com\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw","type":"","width":"","height":""}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/","url":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/","name":"Fall 2023 - Multi-view robot perception on Edge Devices","isPartOf":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/#primaryimage"},"image":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/#primaryimage"},"thumbnailUrl":"https:\/\/lh7-us.googleusercontent.com\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw","datePublished":"2023-05-08T20:59:32+00:00","dateModified":"2023-12-19T00:03:05+00:00","breadcrumb":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/#primaryimage","url":"https:\/\/lh7-us.googleusercontent.com\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw","contentUrl":"https:\/\/lh7-us.googleusercontent.com\/SKc90ylvo2pz2aVQ4ozIhMYjJxRIbp4BSMsIqs_kqGYcxT4Yx_ugy-Sks_-VX9DB_8PfwiDrdcWFQn9v3hxWlCw_htft-Ug3SzV-NMtuhyKFOjLjA5d9ZXVroUEgJIlJQEOKnS-fll0IYtFuz_qxoW15bw=nw"},{"@type":"BreadcrumbList","@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/fall\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/"},{"@type":"ListItem","position":2,"name":"Fall 2023"}]},{"@type":"WebSite","@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/#website","url":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/","name":"Multi-view robot perception on Edge Devices","description":"CMU MSCV &#039;23 capstone project","publisher":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/#organization","name":"Multi-view robot perception on Edge Devices","url":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/#\/schema\/logo\/image\/","url":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/05\/cropped-cropped-icon.png","contentUrl":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-content\/uploads\/sites\/93\/2023\/05\/cropped-cropped-icon.png","width":250,"height":250,"caption":"Multi-view robot perception on Edge Devices"},"image":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/pages\/175","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/users\/183"}],"replies":[{"embeddable":true,"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/comments?post=175"}],"version-history":[{"count":18,"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/pages\/175\/revisions"}],"predecessor-version":[{"id":233,"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/pages\/175\/revisions\/233"}],"wp:attachment":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/f23team16\/wp-json\/wp\/v2\/media?parent=175"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}