{"id":9,"date":"2025-05-08T16:52:30","date_gmt":"2025-05-08T16:52:30","guid":{"rendered":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/?page_id=9"},"modified":"2025-12-12T21:05:04","modified_gmt":"2025-12-12T21:05:04","slug":"method","status":"publish","type":"page","link":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/","title":{"rendered":"Overview"},"content":{"rendered":"\n<p>Recent advances in video generation have demonstrated impressive visual quality, but current models remain difficult to scale, control, and adapt to real-world use cases. Large video diffusion models are computationally expensive to train and slow to run, while still offering limited controllability over the generated content. Most systems rely on coarse global conditioning and support only text or image inputs, making it challenging to perform precise, temporally consistent video editing or to incorporate richer, time-varying guidance. These gaps motivate the need for a scalable video generation framework that supports fine-grained control and efficient large-scale deployment.<\/p>\n\n\n\n<p><strong>Key challenges:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Restricted input modalities:<\/strong> Most models support only text and image conditioning, limiting their ability to leverage richer, per-frame or structured control signals.<\/li>\n\n\n\n<li><strong>High computational cost:<\/strong> State-of-the-art video generation models are large and slow, requiring significant GPU resources for both training and inference.<\/li>\n\n\n\n<li><strong>Limited controllability:<\/strong> Existing models lack fine-grained, frame-level control, making it difficult to enforce temporal constraints or perform precise video edits.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2102\" height=\"982\" src=\"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png\" alt=\"\" class=\"wp-image-224\" srcset=\"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png 2102w, https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM-300x140.png 300w, https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM-1024x478.png 1024w, https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM-768x359.png 768w, https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM-1536x718.png 1536w, https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM-2048x957.png 2048w\" sizes=\"auto, (max-width: 706px) 89vw, (max-width: 767px) 82vw, 740px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Goal<\/strong><\/h2>\n\n\n\n<p>Our goal is to build a scalable and controllable video generation system based on <strong>VACE<\/strong>, a state-of-the-art video diffusion framework that enables <strong>per-frame control signals<\/strong> to be provided as context during generation. By supporting fine-grained, time-varying conditioning, such as masks, reference frames, and depth, our system aims to unlock more precise and expressive video editing and generation capabilities.<\/p>\n\n\n\n<p>A central focus of this project is <strong>efficient large-scale training and inference<\/strong>. We implement VACE using the <strong>NVIDIA Megatron Core<\/strong> library to leverage advanced parallelism strategies, including tensor, pipeline, context, and data parallelism. This allows the model to scale seamlessly to <strong>thousands of GPUs<\/strong>, making it feasible to train and deploy large video generation models with long temporal context while maintaining high throughput and memory efficiency. Together, these contributions bridge cutting-edge video modeling with production-ready distributed systems design.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recent advances in video generation have demonstrated impressive visual quality, but current models remain difficult to scale, control, and adapt to real-world use cases. Large video diffusion models are computationally expensive to train and slow to run, while still offering limited controllability over the generated content. Most systems rely on coarse global conditioning and support &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Overview&#8221;<\/span><\/a><\/p>\n","protected":false},"author":254,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-9","page","type-page","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Overview - Scalable Video Creating and Editing<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Overview - Scalable Video Creating and Editing\" \/>\n<meta property=\"og:description\" content=\"Recent advances in video generation have demonstrated impressive visual quality, but current models remain difficult to scale, control, and adapt to real-world use cases. Large video diffusion models are computationally expensive to train and slow to run, while still offering limited controllability over the generated content. Most systems rely on coarse global conditioning and support &hellip; Continue reading &quot;Overview&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/\" \/>\n<meta property=\"og:site_name\" content=\"Scalable Video Creating and Editing\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-12T21:05:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2102\" \/>\n\t<meta property=\"og:image:height\" content=\"982\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/\",\"name\":\"Overview - Scalable Video Creating and Editing\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/wp-content\\\/uploads\\\/sites\\\/135\\\/2025\\\/12\\\/Screenshot-2025-12-12-at-1.01.43-PM.png\",\"datePublished\":\"2025-05-08T16:52:30+00:00\",\"dateModified\":\"2025-12-12T21:05:04+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#primaryimage\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/wp-content\\\/uploads\\\/sites\\\/135\\\/2025\\\/12\\\/Screenshot-2025-12-12-at-1.01.43-PM.png\",\"contentUrl\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/wp-content\\\/uploads\\\/sites\\\/135\\\/2025\\\/12\\\/Screenshot-2025-12-12-at-1.01.43-PM.png\",\"width\":2102,\"height\":982},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Overview\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/#website\",\"url\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/\",\"name\":\"Scalable Video Creating and Editing\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/mscvprojects.ri.cmu.edu\\\/2025team2-1\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Overview - Scalable Video Creating and Editing","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/","og_locale":"en_US","og_type":"article","og_title":"Overview - Scalable Video Creating and Editing","og_description":"Recent advances in video generation have demonstrated impressive visual quality, but current models remain difficult to scale, control, and adapt to real-world use cases. Large video diffusion models are computationally expensive to train and slow to run, while still offering limited controllability over the generated content. Most systems rely on coarse global conditioning and support &hellip; Continue reading \"Overview\"","og_url":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/","og_site_name":"Scalable Video Creating and Editing","article_modified_time":"2025-12-12T21:05:04+00:00","og_image":[{"width":2102,"height":982,"url":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png","type":"image\/png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/","url":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/","name":"Overview - Scalable Video Creating and Editing","isPartOf":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#website"},"primaryImageOfPage":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#primaryimage"},"image":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#primaryimage"},"thumbnailUrl":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png","datePublished":"2025-05-08T16:52:30+00:00","dateModified":"2025-12-12T21:05:04+00:00","breadcrumb":{"@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#primaryimage","url":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png","contentUrl":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-content\/uploads\/sites\/135\/2025\/12\/Screenshot-2025-12-12-at-1.01.43-PM.png","width":2102,"height":982},{"@type":"BreadcrumbList","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/"},{"@type":"ListItem","position":2,"name":"Overview"}]},{"@type":"WebSite","@id":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/#website","url":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/","name":"Scalable Video Creating and Editing","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/pages\/9","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/users\/254"}],"replies":[{"embeddable":true,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/comments?post=9"}],"version-history":[{"count":16,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/pages\/9\/revisions"}],"predecessor-version":[{"id":255,"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/pages\/9\/revisions\/255"}],"wp:attachment":[{"href":"https:\/\/mscvprojects.ri.cmu.edu\/2025team2-1\/wp-json\/wp\/v2\/media?parent=9"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}