MIT SloanFronts

AI VS Humans -
WTF or FTW?

A 72-hour anime creation & binge study & redesign of 10+ AI tools based on Human-Centered AI and XAI theory

TL;DR

📲 Product

Industry: Academic

Phase: 0-to-1 Product

Type: Web & Video

Status: Aired on Dec.6 2023

📽️ Project

Duration: 72 hours

Type: Academic | Teaching Assistanship

Team:

Producer (Professor)
Director (😃 Me):
Designer (😃 Me):

🏋🏻‍♀️ My Contribution

UX Research & Audit

Product Redesign
Production
Illustration

🎯 Goals

Project Goal:

Uplevel the yearly-celebrated event MIT SloanFronts creativity show

Design Goal:

Create an anime to serve all presentations of the show like a movie storytelling in 72 hours from scratch

🎉 Impacts

Brought the event along with the class Creative Industries to the next level

UX Design topics on XAI for further discussion

Jump to Results

A project that made me see through the darkness of UX lay-offs and found the silverlining.

Background

As a Teaching Assistant for the class Creative Industries 23-24, I came up with a new topic that have not been touched previously- the impact of AI-Genrerated Content (AIGC) on human creativity.

To figure out at least not a short answer to the question that whether Gen-AI will be so disruptive to overshadow human creativity and production, I pushed myself into an ultimate challenge situation - How can I lever AI tools to create an anime for the 2023 SloanFronts design show in just 72 hours? To what extent AI tools can truly take over human powers?

Problems & Goals

Goals

PROJECT goal - bring the yearly-celebrated event SloanFronts creativity show to the next level

DESIGN goal - design an anime for the show like movie storytelling in 72 hours from scratch, ideally fully powered by AI

Problem 1- Timing

72 hours for an Anime literally from scratch sounds like a mission impossible, given the overall procedure (storyboarding, character design, sound design, audio making, graphic design, dramatic motion work, video editing, etc.).

problem 2-Delivery

My personal storytelling style is a combination of hilarity with dramatic emotion arousal. I have limited sense of how native-speakers (the majority of the audience) could vibe with the verbiage/tone suggested by AI.

problem 3-
Content (diversity, culture, professionalism)

Giving an exact prompt with the most concise and accurate description to AI tools is hard coz of translation nuances and the depth of description details;
For cultural differences, taking the pun suggested by AI tools might be risky, since some punches only work for specific people;
Before this event, I knew little about audio-making and voiceover generation.

My Solutions

Project management

My strategy was to Be REALISTIC - Have a fast UX audit on the main AI tools for each phase of creation and stick to the relatively most intuitive option with shorter learning curve.

I mapped out each phase of design for this project in term of the estimated time for manual creation and bottom line of the design deliverable.

After that, I binged studied popular tools for each phase regarding their features, pricing, learning curve and more than anything, UX auditing criteria

Accessibility (e.g., Tool tips for laypersons to understand each piece of UI);
Explainability (i.e., results & algorithms explainable to users)

Navigation of Info Architecture ("IA") (for beginners, a comprehensive but intuitive IA helps save lots of time)
Page speed (it determines possibilities of multitasking)

User Research & Persona

According desktop research, I learned Five different XAI needs and the importance of being mindful of human cognitive biases. As a beginner myself, I decided to delve into the persona of a beginner to Gen-AI tools who is more desperate than the others for precise and accurate AIGC outcomes with a pursuit for a less steep learning curve.

^^^ A diagram of different audiences for a given prediction and explanation by Meg Kurdziolek (2022)

User Flow

Despite that most current AI tools are not positioned as consumer products thus making it too early to discuss UX design for such products, I still found two points where UX & UI design can very essentially enhance the user journey:

Before users input prompts to AI, i.e., the stage before the blackbox system. This phase entails clean, intuitive and instructional UIs to assume that even if a beginner without watching tutorials can operate the tools smoothly.

After the AIGC results are provided. In this phase, users will make decisions quickly whether to accept the results, modify prompts for regeneration or abandon, which requires the system to provide feedback on users’ regarding their wrong inputs, better alternatives, etc..

Redesign

Selection of tools for production

Based on the UX Auditing guidance, I set the rules of picking the RIGHT tools instead of the most comprehensive ones and picked these

being user-friendly to desperate and anxious beginner users🤷🏻‍♂️ in its explainability and accessibility;
being organized and well-structured in its Info Architure so that users can stay layer focused on the same screen;
offering great trial plans or low pricing plans;

I binge studied 10+ AI tools covering the phase of pre-production, production and post-production by:

desktop research on AI product reviews;
picking the tools most suitable for my case and walking through all essential tutorials;
and mapping out their key pros and cons (see the summary below)

1) Midjourney

1.1 Before

Midjourney is powerful but more enclosed than other image Gen-AI tools. Regional modification is allowed for a generated image but not based on new prompts.

1.2 After

In my redesign, users are able to customize regional regeneration upon their new prompts

2) Runway

2.1 Before

Runway's multimedia AI-generated content offers impressive features, yet some, like the image-to-image function, are hindered by a counterintuitive user interface. For instance, a long list of text-based descriptions for visual styles can be overwhelming and unclear, leaving users confused about the visual content.

2.1 After

I've redesigned this section to include visual thumbnails, making artistic terms more accessible and reducing cognitive load for users.

2.2 Before

The current design allows for some customization but lacks clear guidance. For example, first-time users unfamiliar with the term 'Seed' find no helpful tooltips in the current design. Additionally, there's no way for users to comment on or provide feedback about the generated results.

When inputting specific keywords like 'Junji Ito,' which typically represents a black and white comic line art style, the system fails to explain why the results may deviate from expectations.

2.2 After

I understand there might be technical or other reasons why current AI tools don't explain their results for each piece. Nevertheless, I've redesigned the interface to include a speech bubble pop-up, helping users understand which part of their prompts might have led to unexpected content generation.

I've also provided clearer examples of negative prompts to inspire and guide users.

Tooltips have been added to key sections like 'Seed' to assist newcomers in understanding the finer details.

3) Eleven Labs

3.1 Before

Eleven Labs has made significant strides since its initial release, yet there's always room for improvement.

For example, the system introduced sliders for users to customize voice generation and set safety ranges to prevent inappropriate outputs.

3.1 After

While the sliders are a good feature, I've enhanced them with preset scenario tooltips (e.g., optimal settings for audiobooks/marketing) and an official voice preview function. This way, users can more easily configure their ideal voices without constant adjustments.

3.2 Before

In the voice generation section, the customization options were limited. Moreover, users sometimes struggled to gauge how sample voices would sound with exaggerated expressions if the sample text didn't include specific voice types.

3.2 After

In my redesign, I've enabled users to experiment with various text-to-speech options. Additionally, I've integrated the Speech Synthesis Markup Language (SSML) as an option, allowing users to precisely achieve the vocal effects they desire.

4) D-ID

4.1 Before

A significant challenge in creating smooth animation is lip syncing. Current AI tools, like D-ID, only allow for speaking human figures with frontal facing, which is highly limiting for animation creation. This means that if the character is non-human or posed in a side profile, these AI tools won't produce satisfactory results.

Additionally, the guides for D-ID's lip syncing feature can be confusing, especially when they list several criteria in text form, such as measurements or numerical calculations (e.g., understanding the size of 200x200 pixels by sight).

4.1 After

In my redesign, users can use a dummy as a template to proportionally upload, crop, and resize their character images.

For images that aren't exactly frontal facing, users can pinpoint the characters' facial features to assist the AI in mapping the face asymmetrically, resulting in more accurate figures.

4.2 Before

When selecting voices in D-ID, users were presented with a scrolling list of all available voices without detailed attributes, forcing them to essentially guess which voice would best suit their needs.

4.2 After

In my redesign, I've added attribute tags to each voice, allowing users to more quickly and accurately find their ideal voice.

5) LALAMU

5.1 Before

While LaLaMu performs greatly in lip-syncing, its submission process is not unfriendly - users won’t know if their upload violates any rule until it was uploaded, and it usualy generally 4 times of back-and-forth to have the satisfying content.

5.1 After

For images that aren't exactly frontal facing, users can pinpoint the characters' facial features to assist the AI in mapping the face asymmetrically, resulting in more accurate figures.

Final Outcomes

I summarized my time commitment to all tools, as well as the pros and cons of each AI tool from a desperate beginner's perspective as below:

Looking into the real time commitment and my anticipation, I found the several points when I could stronlgy feel what the current AI tools were not able to cover:

Making an impressive and creative script (not a mediocre or cliche one);
Create cohesive and uniformed characters from beginning to the end;

Doing dramatic or exaggerating animations without sacrificing the high-pixel visual quality;
Smooth lip-syncing for multi-dimensional figures.

The final 4-min anime half made by AI after 72 hours' binge study and production looked great and made the event a blast!

Discussion

Creativity - Human wins.

While AI-generated content (AIGC) appears impressive, its creativity often falls short, tending toward predictability and cliché. However, these outputs can still spark inspiration and fuel human creativity.

Paradigms of future UX for Gen-AI

As we explore various tools across different production phases, it's becoming clear that the future of UX design will heavily focus on explaining AI to users, emphasizing explainability.

With the growing ability for users to customize and train their own generative AI models, it's crucial to ensure they understand the inner workings of the "black box" and how they can enhance results based on this knowledge.

From my perspective, key elements of effective UX/UI design for a generative AI tool should include:

Presets & Templates

The most generally used preset parameters and templates, scenarios for prompts, etc.

Clear previews

Images - visual styles
Audio - voice tones, emphases, expressions, etc.
Video - movement directions, etc.

Regional Mod

Regional, more refined sub-scale alterations and modifications based on prompts

Intuitive guides

Tips for crafting effective prompts to improve generation efficiency, essential guides for beginners to understand the finer details

feedback system

Help improve generation quality by enabling instant feedback on each generation

Alerts on Privacy

Prevent misuse of private information in customized model training, such as commercial exploitation or accidental disclosure of confidential information (as seen in cases where employees were terminated due to sharing sensitive data in ChatGPT for model training)

Expanding the Scope

FIVE SENSES

I believe we are nearing the Singularity, a point where established practices, paradigms, and systems undergo unprecedented disruption. In terms of interaction design modalities, we have primarily explored the visual and auditory dimensions,

while the sensory realms of smell, taste, and touch remain largely unexplored. This untapped space could present significant opportunities for future UX and product designers.

COUNTER-VERIFICATION

In an era where seeing is no longer believing, I anticipate a growing need for "counter-verification" to discern whether content is authentic or fabricated.

This will become increasingly important as we navigate the complexities of digital information.

Areas for Improvement

UX Perspective - Depth of research

Ideally, I would conduct detailed user interviews to gather insights from users with varying levels of proficiency in AI products. This would provide a more nuanced understanding of their needs and experiences.

Product Perspective - Depth of use cases study

The depth of this case study was constrained by time and budget limitations. However, I am actively participating in AI-related hackathons focused on content creation to deepen my understanding of UX in these contexts.

role wise - Just DO!

I usually tend to overthink when faced with a new challenge and over-prepare for it. Ideally, I would apply the design sprint thinking more often - just kick off making trials, failing and growing fast to gain insights.

Positivity/Future Outlook

Given the rapid pace of AI technology development, the findings of my case study may quickly become outdated. Nevertheless, as an AI-powered product designer, I maintain a positive outlook, recognizing that it's never too late to acquire new skills and never too early to embrace revolutionary design approaches.

—- The End —-

UX & Product

Visual

About