{"id":54,"date":"2022-12-19T07:08:35","date_gmt":"2022-12-19T07:08:35","guid":{"rendered":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/?page_id=54"},"modified":"2023-04-30T05:08:46","modified_gmt":"2023-04-30T05:08:46","slug":"calibration","status":"publish","type":"page","link":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/","title":{"rendered":"Calibration"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p><strong>Motivation<\/strong>: In order to investigate how model architecture might affect predicted confidence distribution, we propose to measure how different models are calibrated on detection task.<\/p>\n\n\n\n<p><strong>Dataset<\/strong>: COCO 2017, 2D detection.<\/p>\n\n\n\n<p><strong>Metric<\/strong>: mAP, <a href=\"https:\/\/jamesmccaffrey.wordpress.com\/2021\/01\/22\/how-to-calculate-expected-calibration-error-for-multi-class-classification\/\">ECE<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2004.13546\">D-ECE<\/a>, Brier score<\/p>\n\n\n\n<p><strong>Candidate<\/strong>: GLIP<strong>(Vision-language model, zero shot)<\/strong>, CenterNet-V1, CenterNet-V2, FCOS-R-50, FCOS-X-101, Faster-Rcnn, RetinaNet<\/p>\n\n\n\n<p><strong>Calibration method<\/strong>: beta calibration and histogram binning<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><strong><strong>Before<\/strong> calibration<\/strong>: <\/p>\n\n\n\n<p>We first compare the performance of different model . <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1UnrAHv8U9zVBkntdswDuM8Bae5ULVfBFRCa01tNiFVY\/edit?usp=sharing\">Result<\/a> shows the following observation:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>GLIP are extremely overfidence for P&lt; 0.7, and slightly underconfidence for P&gt;0.7<\/li>\n\n\n\n<li>CenterNet is perfectly calibrated at low P, be more and more underconfidence as P increase<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Faster RCNN tend to predict extreme P(0 or 1), might to do with two stage structure(disregard RPN objectness confidence score). And D-ECE is sensitive to box position in image.<\/li>\n\n\n\n<li>All method are overconfidence at low P, under-confidence at high P <strong>(except for two stage Faster RCNN!)<\/strong><\/li>\n\n\n\n<li>CenterNet2 (probabilistic aware) and RetinaNet (<a href=\"https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf\">train with focal loss<\/a>) is most well-calibrated in general<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"409\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-1024x409.png\" alt=\"\" class=\"wp-image-114\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-1024x409.png 1024w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-300x120.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-768x307.png 768w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-1536x614.png 1536w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x.png 1586w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><figcaption class=\"wp-element-caption\">Confidence histogram and reliability diagram of faster-Rcnn and RetinaNet before calibration.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"343\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-133711@2x-1024x343.png\" alt=\"\" class=\"wp-image-106\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-133711@2x-1024x343.png 1024w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-133711@2x-300x101.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-133711@2x-768x257.png 768w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-133711@2x-1536x515.png 1536w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-133711@2x.png 1760w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><figcaption class=\"wp-element-caption\">Difference between confidence and performance in each region of the image for Faster-Rcnn before calibration.<\/figcaption><\/figure>\n\n\n\n<p><strong><strong>After<\/strong> calibration<\/strong>: <\/p>\n\n\n\n<p>Surprisingly, <strong>after<\/strong> calibration, <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/18nw9VsB1xo3kQrePKH9uwn7PKyeh292bZSg4WG6HhQo\/edit?usp=sharing\">result<\/a> shows GLIP receive <strong>the best performance <\/strong>in both mAP (detection accuracy) and ECE(uncertainty measurement). This shows that,<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>1. Simple classification calibration methods could already receive relatively good results on detection.&nbsp;<\/li>\n\n\n\n<li>2. Large-scale pre-train is potentially useful for certainty aware since it knows more about \u2018what is certain and what is uncertain&#8217; in a more realistic setting.&nbsp;<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"440\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134126@2x-1024x440.png\" alt=\"\" class=\"wp-image-108\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134126@2x-1024x440.png 1024w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134126@2x-300x129.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134126@2x-768x330.png 768w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134126@2x-1536x660.png 1536w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134126@2x.png 1824w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><figcaption class=\"wp-element-caption\">Visualization of GLIP before and after simple calibration.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-style-default\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"345\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134410@2x-1024x345.png\" alt=\"\" class=\"wp-image-111\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134410@2x-1024x345.png 1024w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134410@2x-300x101.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134410@2x-768x259.png 768w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134410@2x-1536x517.png 1536w, https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-134410@2x.png 1704w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><figcaption class=\"wp-element-caption\">Difference between confidence and performance in each region of the image for GLIP after calibration.<\/figcaption><\/figure>\n\n\n\n<p> <\/p>\n\n\n\n<p> <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Motivation: In order to investigate how model architecture might affect predicted confidence distribution, we propose to measure how different models are calibrated on detection task. Dataset: COCO 2017, 2D detection. Metric: mAP, ECE, D-ECE, Brier score Candidate: GLIP(Vision-language model, zero shot), CenterNet-V1, CenterNet-V2, FCOS-R-50, FCOS-X-101, Faster-Rcnn, RetinaNet Calibration method: beta calibration and histogram binning Before &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Calibration&#8221;<\/span><\/a><\/p>\n","protected":false},"author":154,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-54","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Calibration - Uncertainty in Image classification<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Calibration - Uncertainty in Image classification\" \/>\n<meta property=\"og:description\" content=\"Motivation: In order to investigate how model architecture might affect predicted confidence distribution, we propose to measure how different models are calibrated on detection task. Dataset: COCO 2017, 2D detection. Metric: mAP, ECE, D-ECE, Brier score Candidate: GLIP(Vision-language model, zero shot), CenterNet-V1, CenterNet-V2, FCOS-R-50, FCOS-X-101, Faster-Rcnn, RetinaNet Calibration method: beta calibration and histogram binning Before &hellip; Continue reading &quot;Calibration&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/\" \/>\n<meta property=\"og:site_name\" content=\"Uncertainty in Image classification\" \/>\n<meta property=\"article:modified_time\" content=\"2023-04-30T05:08:46+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-1024x409.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/\",\"name\":\"Calibration - Uncertainty in Image classification\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/wp-content\\\/uploads\\\/sites\\\/77\\\/2022\\\/12\\\/WX20221219-154825@2x-1024x409.png\",\"datePublished\":\"2022-12-19T07:08:35+00:00\",\"dateModified\":\"2023-04-30T05:08:46+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/wp-content\\\/uploads\\\/sites\\\/77\\\/2022\\\/12\\\/WX20221219-154825@2x.png\",\"contentUrl\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/wp-content\\\/uploads\\\/sites\\\/77\\\/2022\\\/12\\\/WX20221219-154825@2x.png\",\"width\":1586,\"height\":634},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/calibration\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Calibration\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/#website\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/\",\"name\":\"Uncertainty in Image classification\",\"description\":\"Students: Jia Shi | Advisors: Deva Ramanan (CMU), Shu Kong (CMU, TAMU), Francesco Ferroni (Argo AI), Arun Balajee Vasudevan (CMU) | Sponsor: Argo AI\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2023team6\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Calibration - Uncertainty in Image classification","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/","og_locale":"en_US","og_type":"article","og_title":"Calibration - Uncertainty in Image classification","og_description":"Motivation: In order to investigate how model architecture might affect predicted confidence distribution, we propose to measure how different models are calibrated on detection task. Dataset: COCO 2017, 2D detection. Metric: mAP, ECE, D-ECE, Brier score Candidate: GLIP(Vision-language model, zero shot), CenterNet-V1, CenterNet-V2, FCOS-R-50, FCOS-X-101, Faster-Rcnn, RetinaNet Calibration method: beta calibration and histogram binning Before &hellip; Continue reading \"Calibration\"","og_url":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/","og_site_name":"Uncertainty in Image classification","article_modified_time":"2023-04-30T05:08:46+00:00","og_image":[{"url":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-1024x409.png","type":"","width":"","height":""}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/","url":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/","name":"Calibration - Uncertainty in Image classification","isPartOf":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/#primaryimage"},"image":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/#primaryimage"},"thumbnailUrl":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x-1024x409.png","datePublished":"2022-12-19T07:08:35+00:00","dateModified":"2023-04-30T05:08:46+00:00","breadcrumb":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/#primaryimage","url":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x.png","contentUrl":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-content\/uploads\/sites\/77\/2022\/12\/WX20221219-154825@2x.png","width":1586,"height":634},{"@type":"BreadcrumbList","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/calibration\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/"},{"@type":"ListItem","position":2,"name":"Calibration"}]},{"@type":"WebSite","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/#website","url":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/","name":"Uncertainty in Image classification","description":"Students: Jia Shi | Advisors: Deva Ramanan (CMU), Shu Kong (CMU, TAMU), Francesco Ferroni (Argo AI), Arun Balajee Vasudevan (CMU) | Sponsor: Argo AI","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/pages\/54","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/users\/154"}],"replies":[{"embeddable":true,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/comments?post=54"}],"version-history":[{"count":7,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/pages\/54\/revisions"}],"predecessor-version":[{"id":125,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/pages\/54\/revisions\/125"}],"wp:attachment":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2023team6\/wp-json\/wp\/v2\/media?parent=54"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}