
MIT SloanFronts
AI VS Humans -
WTF or FTW?
A 72-hour anime creation & binge study & redesign of 10+ AI tools based on Human-Centered AI and XAI theory
TL;DR
📲 Product
Industry: Academic
Phase: 0-to-1 Product
Type: Web & Video
Status: Aired on Dec.6 2023
📽️ Project
Duration: 72 hours
Type: Academic | Teaching Assistanship
Team:
Producer (Professor)
Director (😃 Me):
Designer (😃 Me):
🏋🏻♀️ My Contribution
UX Research & Audit
Product Redesign
Production
Illustration
🎯 Goals
Project Goal:
Uplevel the yearly-celebrated event MIT SloanFronts creativity show
Design Goal:
Create an anime to serve all presentations of the show like a movie storytelling in 72 hours from scratch
🎉 Impacts
Brought the event along with the class Creative Industries to the next level
UX Design topics on XAI for further discussion
Jump to Results
A project that made me see through the darkness of UX lay-offs and found the silverlining.
Background
As a Teaching Assistant for the class Creative Industries 23-24, I came up with a new topic that have not been touched previously- the impact of AI-Genrerated Content (AIGC) on human creativity.
To figure out at least not a short answer to the question that whether Gen-AI will be so disruptive to overshadow human creativity and production, I pushed myself into an ultimate challenge situation - How can I lever AI tools to create an anime for the 2023 SloanFronts design show in just 72 hours? To what extent AI tools can truly take over human powers?

Problems & Goals
Goals
PROJECT goal - bring the yearly-celebrated event SloanFronts creativity show to the next level
DESIGN goal - design an anime for the show like movie storytelling in 72 hours from scratch, ideally fully powered by AI
Problem 1- Timing
72 hours for an Anime literally from scratch sounds like a mission impossible, given the overall procedure (storyboarding, character design, sound design, audio making, graphic design, dramatic motion work, video editing, etc.).
problem 2-Delivery
My personal storytelling style is a combination of hilarity with dramatic emotion arousal. I have limited sense of how native-speakers (the majority of the audience) could vibe with the verbiage/tone suggested by AI.
problem 3-
Content (diversity, culture, professionalism)
Giving an exact prompt with the most concise and accurate description to AI tools is hard coz of translation nuances and the depth of description details;
For cultural differences, taking the pun suggested by AI tools might be risky, since some punches only work for specific people;
Before this event, I knew little about audio-making and voiceover generation.
My Solutions
Project management
My strategy was to Be REALISTIC - Have a fast UX audit on the main AI tools for each phase of creation and stick to the relatively most intuitive option with shorter learning curve.
I mapped out each phase of design for this project in term of the estimated time for manual creation and bottom line of the design deliverable.
After that, I binged studied popular tools for each phase regarding their features, pricing, learning curve and more than anything, UX auditing criteria
Accessibility (e.g., Tool tips for laypersons to understand each piece of UI);
Explainability (i.e., results & algorithms explainable to users)
Navigation of Info Architecture ("IA") (for beginners, a comprehensive but intuitive IA helps save lots of time)
Page speed (it determines possibilities of multitasking)

User Research & Persona
According desktop research, I learned Five different XAI needs and the importance of being mindful of human cognitive biases. As a beginner myself, I decided to delve into the persona of a beginner to Gen-AI tools who is more desperate than the others for precise and accurate AIGC outcomes with a pursuit for a less steep learning curve.

^^^ A diagram of different audiences for a given prediction and explanation by Meg Kurdziolek (2022)

User Flow
Despite that most current AI tools are not positioned as consumer products thus making it too early to discuss UX design for such products, I still found two points where UX & UI design can very essentially enhance the user journey:
Before users input prompts to AI, i.e., the stage before the blackbox system. This phase entails clean, intuitive and instructional UIs to assume that even if a beginner without watching tutorials can operate the tools smoothly.
After the AIGC results are provided. In this phase, users will make decisions quickly whether to accept the results, modify prompts for regeneration or abandon, which requires the system to provide feedback on users’ regarding their wrong inputs, better alternatives, etc..

Redesign
Selection of tools for production
Based on the UX Auditing guidance, I set the rules of picking the RIGHT tools instead of the most comprehensive ones and picked these
being user-friendly to desperate and anxious beginner users🤷🏻♂️ in its explainability and accessibility;
being organized and well-structured in its Info Architure so that users can stay layer focused on the same screen;
offering great trial plans or low pricing plans;
I binge studied 10+ AI tools covering the phase of pre-production, production and post-production by:
desktop research on AI product reviews;
picking the tools most suitable for my case and walking through all essential tutorials;
and mapping out their key pros and cons (see the summary below)

1) Midjourney
1.1 Before
Midjourney is powerful but more enclosed than other image Gen-AI tools. Regional modification is allowed for a generated image but not based on new prompts.

1.2 After
In my redesign, users are able to customize regional regeneration upon their new prompts

2) Runway
2.1 Before
Runway's multimedia AI-generated content offers impressive features, yet some, like the image-to-image function, are hindered by a counterintuitive user interface. For instance, a long list of text-based descriptions for visual styles can be overwhelming and unclear, leaving users confused about the visual content.

2.1 After
I've redesigned this section to include visual thumbnails, making artistic terms more accessible and reducing cognitive load for users.

2.2 Before
The current design allows for some customization but lacks clear guidance. For example, first-time users unfamiliar with the term 'Seed' find no helpful tooltips in the current design. Additionally, there's no way for users to comment on or provide feedback about the generated results.
When inputting specific keywords like 'Junji Ito,' which typically represents a black and white comic line art style, the system fails to explain why the results may deviate from expectations.

2.2 After
I understand there might be technical or other reasons why current AI tools don't explain their results for each piece. Nevertheless, I've redesigned the interface to include a speech bubble pop-up, helping users understand which part of their prompts might have led to unexpected content generation.
I've also provided clearer examples of negative prompts to inspire and guide users.
Tooltips have been added to key sections like 'Seed' to assist newcomers in understanding the finer details.

3) Eleven Labs
3.1 Before
Eleven Labs has made significant strides since its initial release, yet there's always room for improvement.
For example, the system introduced sliders for users to customize voice generation and set safety ranges to prevent inappropriate outputs.

3.1 After
While the sliders are a good feature, I've enhanced them with preset scenario tooltips (e.g., optimal settings for audiobooks/marketing) and an official voice preview function. This way, users can more easily configure their ideal voices without constant adjustments.

3.2 Before
In the voice generation section, the customization options were limited. Moreover, users sometimes struggled to gauge how sample voices would sound with exaggerated expressions if the sample text didn't include specific voice types.

3.2 After
In my redesign, I've enabled users to experiment with various text-to-speech options. Additionally, I've integrated the Speech Synthesis Markup Language (SSML) as an option, allowing users to precisely achieve the vocal effects they desire.

4) D-ID
4.1 Before
A significant challenge in creating smooth animation is lip syncing. Current AI tools, like D-ID, only allow for speaking human figures with frontal facing, which is highly limiting for animation creation. This means that if the character is non-human or posed in a side profile, these AI tools won't produce satisfactory results.
Additionally, the guides for D-ID's lip syncing feature can be confusing, especially when they list several criteria in text form, such as measurements or numerical calculations (e.g., understanding the size of 200x200 pixels by sight).

4.1 After
In my redesign, users can use a dummy as a template to proportionally upload, crop, and resize their character images.
For images that aren't exactly frontal facing, users can pinpoint the characters' facial features to assist the AI in mapping the face asymmetrically, resulting in more accurate figures.


4.2 Before
When selecting voices in D-ID, users were presented with a scrolling list of all available voices without detailed attributes, forcing them to essentially guess which voice would best suit their needs.

4.2 After
In my redesign, I've added attribute tags to each voice, allowing users to more quickly and accurately find their ideal voice.

5) LALAMU
5.1 Before
While LaLaMu performs greatly in lip-syncing, its submission process is not unfriendly - users won’t know if their upload violates any rule until it was uploaded, and it usualy generally 4 times of back-and-forth to have the satisfying content.


5.1 After
For images that aren't exactly frontal facing, users can pinpoint the characters' facial features to assist the AI in mapping the face asymmetrically, resulting in more accurate figures.

Final Outcomes
I summarized my time commitment to all tools, as well as the pros and cons of each AI tool from a desperate beginner's perspective as below:

Looking into the real time commitment and my anticipation, I found the several points when I could stronlgy feel what the current AI tools were not able to cover:
Making an impressive and creative script (not a mediocre or cliche one);
Create cohesive and uniformed characters from beginning to the end;
Doing dramatic or exaggerating animations without sacrificing the high-pixel visual quality;
Smooth lip-syncing for multi-dimensional figures.

The final 4-min anime half made by AI after 72 hours' binge study and production looked great and made the event a blast!
Discussion
Creativity - Human wins.
While AI-generated content (AIGC) appears impressive, its creativity often falls short, tending toward predictability and cliché. However, these outputs can still spark inspiration and fuel human creativity.
Paradigms of future UX for Gen-AI
As we explore various tools across different production phases, it's becoming clear that the future of UX design will heavily focus on explaining AI to users, emphasizing explainability.
With the growing ability for users to customize and train their own generative AI models, it's crucial to ensure they understand the inner workings of the "black box" and how they can enhance results based on this knowledge.
From my perspective, key elements of effective UX/UI design for a generative AI tool should include:
Presets & Templates
The most generally used preset parameters and templates, scenarios for prompts, etc.
Clear previews
Images - visual styles
Audio - voice tones, emphases, expressions, etc.
Video - movement directions, etc.
Regional Mod
Regional, more refined sub-scale alterations and modifications based on prompts
Intuitive guides
Tips for crafting effective prompts to improve generation efficiency, essential guides for beginners to understand the finer details
feedback system
Alerts on Privacy
Prevent misuse of private information in customized model training, such as commercial exploitation or accidental disclosure of confidential information (as seen in cases where employees were terminated due to sharing sensitive data in ChatGPT for model training)
Expanding the Scope
FIVE SENSES
I believe we are nearing the Singularity, a point where established practices, paradigms, and systems undergo unprecedented disruption. In terms of interaction design modalities, we have primarily explored the visual and auditory dimensions,
while the sensory realms of smell, taste, and touch remain largely unexplored. This untapped space could present significant opportunities for future UX and product designers.

COUNTER-VERIFICATION
In an era where seeing is no longer believing, I anticipate a growing need for "counter-verification" to discern whether content is authentic or fabricated.
This will become increasingly important as we navigate the complexities of digital information.
Areas for Improvement

UX Perspective - Depth of research
Ideally, I would conduct detailed user interviews to gather insights from users with varying levels of proficiency in AI products. This would provide a more nuanced understanding of their needs and experiences.
Product Perspective - Depth of use cases study
The depth of this case study was constrained by time and budget limitations. However, I am actively participating in AI-related hackathons focused on content creation to deepen my understanding of UX in these contexts.
role wise - Just DO!
I usually tend to overthink when faced with a new challenge and over-prepare for it. Ideally, I would apply the design sprint thinking more often - just kick off making trials, failing and growing fast to gain insights.
Positivity/Future Outlook
Given the rapid pace of AI technology development, the findings of my case study may quickly become outdated. Nevertheless, as an AI-powered product designer, I maintain a positive outlook, recognizing that it's never too late to acquire new skills and never too early to embrace revolutionary design approaches.