It's (not yet) Dance Time

I’ve had this idea for about a year or so now: a game similar to Just Dance but with a more open song ecosystem, different scoring mechanisms, and live multiplayer (with funky avatars).

In case you’re not familiar, Just Dance is a video game series in which you’re able to choose a song and dance to it, following the movements of the on-screen characters. You receive scores on how well you perform the dance movements.

Look like fun? I’ve actually never played, but it does look like fun! It’s really easy to get into nowadays as you can connect your computer to your phone app to easily play using your phone’s camera instead of a webcam.

It gets a little less fun when you only get about 40 songs total to play when you purchase the game. You can get a few hundred more songs by subscribing to the Just Dance Unlimited service for around $4 a month. That doesn’t sound like a terrible deal, but what if they don’t have your favorite song?

You’re out of luck there, and that’s where my idea begins.

A Year of Research

It’s a bit of an overstatement - I didn’t spend an entire year painstakingly researching every possible way that I could create this game. I was finishing my last year at university in the 2020-2021 school year, so it was one of my many side projects.

The Xbox Kinect

The original idea was to use an Xbox Kinect because it has an RGBD camera, which means that in addition to detecting color, it can also detect depth. This means that it knows how far away each pixel is from the camera.

Using this data, I would be able to create a 3D skeleton of lines and dots to represent a person and track their body movement. I’d be able to understand their dancing in real time.

There are some good packages that you can use to harness the power of a Kinect in Unity, my game engine of choice. I experimented a lot with the Kinect and hooked it up to 3D avatars in Unity to create a “virtual dance scene” for my dance dance revolution livestream (another story for another time) and it turned out…just okay.

Needless to say, as you can see above, it wasn’t great. I scrapped that idea for a while and tabled the dance game until recently.

TensorFlow.js

Just this year, TensorFlow.js released a new pose detection model called MoveNet, which is able to detect a person in real-time in a browser.

MoveNet can track keypoints through fast motions and atypical poses.

(Image credit: TensorFlow.js)

After seeing it, I realized that it was exactly what I was looking for. An ultra fast method to detect a pose (or multiple poses of multiple people) that even runs in a browser! I could use this on someone’s webcam to detect their pose as they dance and compare it to the pose that they’re trying to perform from the song.

Perfect.

The Framework

Now that I knew how I would detect someone’s poses, the question was now: what framework do I use to build the website?

I had experience with React from a few other projects in the past, so I considered using that. However, I wanted to learn something new, and there were some aspects of React that I wasn’t particularly thrilled about.

After some browsing around and seeing how other people had used MoveNet from TensorFlow.js, I found some projects written in Svelte. And instead of explaining what it is to you, I’ll let you watch this quick, 100 second video that covers most of the key aspects.

Sounds like fun, right? It’s pretty different from React and I’m interested in learning something new, so this seemed right up my alley. Based on my progress in the tutorial, it also seems like it fixes most of my gripes that I have with React, which is great!

You’re also able to make really smooth and cool-looking UI without too much extra effort, which I am a fan of as well.

Now It’s Dance Time

With all of the pieces in place, it’s time to get coding! I’ll be using Svelte as the framework to build the website. The site will use the MoveNet model to detect poses and compare them to what is expected in the dance in real time.

Song Editor

In order to play songs, we need a song editor so they can be created!

While it would be nice to host all of the gigabytes of music videos on my own server, that’s a little too expensive and not quite worth it. Instead, I think that simply embedding any YouTube video will work quite well.

The editor flow might look something like this:

  1. Click on the Editor tab
  2. Create a new project and enter some basic information, such as title, artist, etc.
  3. Enter the YouTube URL of the song that you want to create the dance chart from. Dance choreography videos are great for this.
  4. Hit “Analyze” to analyze the video and create pose data for it.
  5. Scrub through the song to make sure the pose data looks good. If it doesn’t, then you can use some of the additional tools I’m thinking about below to help.
  6. Trim the song and make any other edits that you need.
  7. Save and publish the song to the song browser/store so you and others can play it.

The key thing here is that I won’t be storing any copyrighted content on my server. The only thing that I’ll be storing on my server is all of the song metadata and a small database of the pose data and any keyframed extra content.

We’ll get into keyframed extra content in a later post, but the gist of it is that it’ll allow you to animate certain aspects of the song over time, such as fade in/out, target zone (only analyzing a certain area of the video for poses), titles, and even pose keypoint weights.

I’m currently thinking about having the song information split up into two pieces: a JSON file with all the metadata and then a separate store with tables for the pose data and keyframe data. I’d like a data structure that allows for fast lookup based on timestamps, and I think simply using IndexedDB will work. I’ve never used IndexedDB before, so it will be a good learnined experience, too.

Scoring

Because I come from a background of DDR, I’m used to a scoring system that expects frame-perfect accuracy and precision. A scoring system like that wouldn’t ever work for a free-form dance game, but we can do something similar.

Every frame, we’ll compare the current pose data and keypoints with that of the expected pose data and keypoints at the corresponding timestamp in the song. There will have to be some scaling of the expected keypoints to match the person who’s playing, because someone might be taller or shorter and that might affect their score otherwise.

Combined with a similarity threshold, this system should be able to detect how well someone is performing the dance at any given moment. If they’re doing well, indicate it on the screen, keep giving them points, and add to a combo. If they are having trouble, indicate it on screen and reset the combo.

Many existing games either have one difficulty per song or multiple difficulties for a single song. In this case, multiple difficulties per song would require the dance to be marginally different to warrant a difficulty change. The similarity detection threshold could be modified, but you’d still have to perform complex moves.

My current plan is to have a single difficulty per song. Once you’ve finished creating your song in the editor, you can set its difficulty to one of the predefined levels. With this system, it might be a good idea to have song “packs” where there are multiple versions of the same song, but with different difficulties and different dances. A search with a filter for song name and difficulty could also work.

Multiplayer

This isn’t a feature for the minimum viable product, but it would be awesome to have. In Just Dance, you’re able to play multiplayer with friends next to you or online (I think). However, all you see are their names and the score they received on the last move. I’d like a little bit more information, such as total running score and current combo.

It would also be really cool to see little dancing avatars that represent the other players dancing when playing online multiplayer! Then you wouldn’t have to share your full webcam and you’d be able to get a sense of what the other person is doing while still retaining a little bit of anonymity. There’s a lot of exciting things that are possible with little funky avatars, but this is a feature that will be developed much later.

Now Hold On…

There’s still a little bit more investigation that needs to be done to determine if this is all feasible!

This is a list of all the little things I need to test before really diving into things (for brevity, pose and keypoint data is abbreviated as PKD):

  • How to generate PKD from a YouTube video
  • How to store PKD
  • How to quickly read PKD from a database based on its timestamp during a YouTube video
  • How to compare PKD from the database to the live PKD from a webcam (including potential PKD scaling)

Once we’ve figured out how to do all of these preliminary steps, then we can start!

Stay tuned! 🕺