Creating a Video Editor on the Web with Svelte

For my dance game that I’m working on, I needed a way for users to edit the videos that they wanted to dance to, before publishing them to the main page for others to play.

The idea is simple:

Allow users to create “projects” on their computers, which consist of all the information associated with a dance chart, including title, artist, dance video ID, and so on.
The editor will allow users to fine tune the video to their liking and to analyze it using TensorFlow.js.
After analysis, users can playtest their charts in the editor to ensure that everything is set up properly.
Finally, users can publish their dance chart to the main page so anyone can play it.

Designing the Interface

The first step in the process was to design the interface for the editor - what should it look like?

I’ve learned that designing good-looking graphical user interfaces is really difficult. That’s why there’s an entire field of study dedicated to it in universities and why most companies have roles just for designing the looks of the products. It takes an incredible amount of time, effort, and fine-tuning to get something that looks decent.

I don’t have any background in design, so I tried my best to make something that was functional, looked okay, and didn’t take too long to design. I want to get my game to a playable state quickly, so I can’t spend months figuring out how the editor should look.

After a week or two of designing the editor, I came up with the following designs.

The Landing Page

landing page

create project

loading screen

The landing page is just a place to manage your projects. You can either edit and existing project or create a new one with just a title and YouTube link. After you click on a project, it loads and displays the next screen (pictured below).

The General Layout

edit metadata

The top left section in the editor has a few different tabs - Edit, Review, and Publish. I had to come up with the purpose of each of these tabs even before I started creating the design, which was a little tricky. There’s a save button right underneath these tabs.

On the right, there’s a video preview that displays the video as you scrub through the timeline, and it has a few controls: play/pause, next/previous frame, and navigate to the beginning/end. On the bottom area you can see the timeline with video thumbnails based on the time, an area for keyframes, and an audio waveform.

The Edit Tab

edit blocked area

The Edit tab holds an assortment of what I am calling “components,” or little addons that modify different properties of your dance chart. The Project Metadata component is a default component that has a bunch of details about your project’s information, such as name, chart title, song artist, and difficulty. A Blocked Area component allows you to section off specific areas in the video that won’t be used for scoring. This is useful is there are multiple people and you only want to analyze the movements of one person.

Some of the properties of components will be able to be keyframed during the video to allow for animation. For example, this is useful for Blocked Area components so that you can move the blocked area to cover the dancer if they move throughout the video.

The Review Tab

review tab

The Review tab is the place where you review all of the components that you added and analyze the video using TensorFlow.js. The automatic analysis will scrub through the video using the MoveNet model and gather keypoint data for every frame (not quite actually, but more on the implementation later).

After the automatic analysis is done, the Analysis Summary will show any potential problems that were detected with the analysis. These problems might interfere with the chart when played, so it’s best to try to solve them before publishing.

And here you’re also able to playtest your chart! Click the little play button to test out everything you’ve put together so far.

The Publish Tab

publish tab

The Publish tab is the most basic tab out of the three and is mostly there for validation and publishing to the main page so anyone can play it. You can review all of your project and chart information and then publish it once all checks are met.

Editor Implementation

Now that the design and idea is done (for now) for the editor, it’s time to build it!

Grid Layout

The layout of the editor looks like display: grid would be perfect for designing it. Before this project, I didn’t know how to use the grid layout in CSS.

Since starting this project, one of my goals has been to learn a lot more about web dev and gain a deeper understanding of the driving principles behind the properties. One way that I’ve done this is through Josh’s CSS for JS course, which I would highly recommend if you want a deeper dive into why things in CSS appear the way that they do.

Before taking his course, I was often confused about how to do things in CSS and why they wouldn’t work, which led to me trying random things until it looked okay. This approach often led to problems on different resolutions, though.

After going through most of the course, I am much more confident in my CSS abilities and my abilities to craft a user interface well. The CSS for the main grid container looks a little something like this:

main.grid-container {
    display: grid;
    width: 100%;
    height: 100%;
    grid-template-columns: repeat(3, 1fr);
    grid-template-rows: repeat(14, 1fr);
    background-color: var(--color-gray-500);
    gap: 1px;
    --tabs-row-amount: 11; /* Amount of rows that the tabs section should have */
}

It’s fairly straightforward, but I’ll explain it anyways, because this was something completely new to me when I first started.

It’s a grid that fills up the entire container with 100% width and height. Then we specify how many columns and rows the grid should have - in my case, 3 columns (1 for the tabs section and 2 for the video preview section) and 14 rows (14 was arbitrary based on how thick I wanted the save button to be, since I planned for it to take up one row’s height of space, right underneath the tabs section.

Next I give it a background color and gap, and then define a CSS variable to determine how tall the tabs section should be. The --tabs-row-amount variable tells some of the other CSS rules how many rows the tabs section should take up, or in other words, what its approximate height should be.

Cool, right? And then we just have to tell each section in the grid which rows and columns it should take up. Here are a couple snippets of some of the sections:

section.tabs {
    grid-column: 1 / 2;
    grid-row: 1 / var(--tabs-row-amount);
    overflow-y: auto;
}

section.timeline {
    grid-column: 1 / -1;
    grid-row: calc(var(--tabs-row-amount) + 1) / -1;
}

section.preview {
    grid-column: 2 / -1;
    grid-row: 1 / calc(var(--tabs-row-amount) + 1);
}

grid-column tells the grid which columns the section should take up, spanning from the first number to the second number. If the second number is -1, it spans to the end. grid-row follows the same concept, except for rows to span.

The trickiest part to follow here is the calc part with the rows - this uses the CSS variable I defined above. This variable determines where certain sections should span to and how many rows they should take up.

Video Preview

It’s easy to display a video element, but how about limiting users from right-clicking and downloading it? That’s a bit more tricky.

While I did technically build a YouTube video downloader for my game, I don’t want people to be able to save these downloaded videos onto their computer. I only want them to be available for use with my game. This isn’t easy to do, and for more tech-savvy users, there is always a way around it.

My solution here is to use a canvas element to display the video and then sync up the audio with it separately. This way, you can’t just right click the canvas to save the video; right clicking it only allows you to save the current frame.

There are a few other reasons to use a canvas in the instance:

I already have separate video and audio files, so I would have had to sync the audio/video anyway.
A canvas allows me to draw complex things over the video easily, and this will be needed for certain components, such as the Blocked Areas component where users can draw shapes over the canvas.
TensorFlow.js can also use a canvas instead of a video for analysis, which makes things much easier in the end. I can simply draw the blocked areas on the canvas and TensorFlow won’t be able to see behind them.

Displaying a video onto a canvas is somewhat trivial, with just a few key steps:

A continuous loop to keep grabbing each frame of the video, using window.requestAnimationFrame. I would not recommend using setInterval/setTimeout as that doesn’t play nicely with the event loop (click for an excellent video on the event loop).
Use drawImage on the 2D context of the canvas to display the current frame of the video on the canvas. There’s a great solution for some of the flaws of just using drawImage, which you can check out here.

One drawback of this canvas-based approach is that the video playback isn’t quite as smooth as a normal video playback would be. I expect that this is a limitation of requestAnimationFrame, but I have not yet found what causes this. There might be a way around this, or perhaps my browser is limiting the amount of animation frames per second.

Syncing the audio/video was a trivial task since the main hurdle is just to play/pause them at the same time and ensure that their currentTimes are the same.

The design for the video preview also had the current frame number along with minutes and seconds on either side of the progress bar. To find the current frame, you’ll need the frames per second of the video, and then you can use something like this:

Math.floor(((time % 60) % 1) * fps)

Looks a bit odd, right? Let’s break it down.

time is the current time in the video, such as 12.432 seconds. We first mod it by 60 to just get the number of seconds for the current minute. Then, we mod it by 1 to just get the decimal amount for the second, such as 0.432. Multiplying that by the frames per second gives us the exact frame the the video is on, and all that’s left to do after that is to round it down to get an even number.

While I was working on the video preview, I found this incredibly useful MDN page about audio and video manipulation on the web.

Navigation using the timeline at the bottom will be the most used way to scrub through different parts of the video. Click on any part and it’ll jump right to there and place the yellow line on the current position.

It’s not too difficult to do - just use an on:click event and use the event’s layerX property and total timeline width to determine the percent of the timeline that was clicked. My code looks something like this:

const percentClick = e.layerX / (width - timeline_padding * 2 - 2); // Account for padding + border width
$createVideo.currentTime = percentClick * $createVideoDuration;

Using the percentage from the end of the timeline, I multiply that by the total duration of the video to find the time that the user clicked on, and then set the video’s current time to it.

Timeline Audio Waveform

I wanted to display the audio in a waveform in the timeline so that it’s easy to see where the highs and lows of the song are, which should make for faster navigation.

I figured someone had already made a package for generating waveforms, and I found one that was pretty easy to use here! You pretty much just create the WaveSurfer and off you go:

WaveSurfer.create({
    container: '#waveform',
    waveColor: 'rgb(38, 126, 97)',
    progressColor: 'rgb(77, 189, 152)',
    interact: false,
    height: 50,
    responsive: true,
    hideScrollbar: true,
});

One thing that I want to emphasize is the responsive option - setting this to true will ensure that the waveform resizes if the browser window is resized! Otherwise it won’t change at all.

Timeline Thumbnails

As seen in the design from earlier, I wanted to have little thumbnails on the timeline to show what the video approximately looks like at different timestamps.

As per usual, the first step was to look around to see if anyone had done something similar. Someone had, in the form of video-metadata-thumbnails. It returns all the thumbnails as a bunch of blobs, which we can use as imges. I tried it out and it was incredibly slow, even with low quality thumbnails.

My solution for this was to strip out the only file I needed and write a method to get thumbnails from the video in parallel. This way, each worker only has to get a portion of the thumbnails in the video so each one of them should complete faster.

The end result worked well, but when there were too many instances running in parallel, it would cause latency and stuttering issues. I resolved to only run three workers in parallel, which still resulted in a massive speedup over the original implementation.

After retrieving all the thumbnails, I needed to display them in the timeline, which turned out to be much more difficult than I anticipated.

To display the thumbnails, I first needed to find out how many thumbnails to display, given the width of the timeline. But in order to do that, I needed to figure out the width of one thumbnail, which also turned out to be a little bit tricky, given that the width is automatic based on the height of the space allocated for the thumbnails.

Eventually after a bunch of trial and error, I was able to figure it out. It’s a little bit complicated, and I’ll save the implementation details. But one cool detail that I do want to mention is that it automatically gets more or less thumbnails depending on the timeline width as you resize the window! I thought that was a neat part of it, so no matter your screen resolution, you’ll have proper video thumbnails.

Automatic Analysis with Tensorflow

Tensorflow is able to analyze a frame of a video or a static image and return data about the person detected in it, if there were any. Since the MoveNet model is able to run in realtime, I can simply play the video back and run Tensorflow on it to analyze it.

There is a catch to this though: not all of the frames will be analyzed and have data for them. The model is bound to be slightly too slow on some frames or skip others, so we won’t have data for every frame, and that’s okay! In most cases, a person’s movements don’t differ by a huge amount between consecutive frames.

I decided to go a little bit further with this idea and add a “Video Playback Speed” slider to the analysis, which allows you to set how fast the video plays back as it is analyzed. Depending on your computer specs, you might be able to speed it up to 2x or 3x speed and still get good results.

I hit a roadblock while saving this keypoint data with the project though; it exceeded the maximum quota for the LocalStorage object. LocalStorage can only hold up to 5MB of data per website. I used this method to analyze how much data was actually being stored in the keypoints, and it turns out that it was just under 5MB, which is way too much for LocalStorage, especially if you want to have multiple projects. The solution for this was to use IndexedDB again, which is the same place that the videos are downloaded to.

The Result

After reading all this, I bet you want to see how it turned out, right? Does it look anything like the design I created at the beginning?

As it turns out, it does! Here’s a quick video of it:

I’m really pleased with how it turned out! It works great and I think that it looks even better than my mockup design. One interesting note is that the loading time is determinate and based on the amount of time it takes the thumbnails to generate, so it is an accurate loader.

And as I write this, I realize that the video preview somehow isn’t centered - this has been fixed now! 😅

Next Steps

With the editor in a good shape, it’s time to finally work on the gameplay! We need to playtest the charts that we’re making, and in order to do that, we need the gameplay screen to be fleshed out. Soon we’ll be able to dance to any YouTube video and get realtime scores to tell us how well we’re doing. Stay tuned!

Ben's Brain