On a warm, gentle morning in Bandung, sunlight filtered through the glass windows of a small café inside Institut Teknologi Bandung (ITB). There, in that cozy cafeteria, I sat across from Prof. Lee Jung Hee and researcher Kim Yeon Hee. They are the key figures behind a groundbreaking Korean–Foreign Language Parallel Corpus Project that could transform the way millions around the world experience K-dramas, K-pop, and Korean culture.
They had arrived from Jakarta after meetings at the University of Indonesia. And before heading into an official discussion with ITB researchers, she graciously took the time to speak with me. What she shared reveals not just a research project—but a vision that directly shapes the future of global Hallyu.
A National Mission: Making Korea a Global AI Leader with Corpus Language Project
Prof. Lee introduced the initiative as part of a long-term national project by the South Korean government. The goal is ambitious: to make Korea one of the world’s top three AI powerhouses.
A core pillar of this mission?
Building high-quality multilingual translation data, especially for languages that global AI systems historically lack, such as Indonesian, Vietnamese, Thai, Tagalog, Hindi, Khmer, Uzbek, and Russian.
This effort is a crucial component of the Korean–Foreign Language Parallel Corpus Project, which, in turn, is part of the broader Modu Corpus (Corpus for All) initiative led by the National Institute of Korean Language (NIKL).
Meet Prof. Lee Jung Hee: The Teacher Behind a Global Language Movement

Prof. Lee is a respected educator at Kyung Hee University, having taught Korean language education since 1996. She smiled as she recalled:
“I’ve met so many foreign learners over the past decades. Many of them are now Korean language experts in their own countries.”
Prof. Lee Jung Hee.
When the parallel corpus project began, she realized something important:
Korean language education had matured, and a new global generation of Korean-speaking talent could help build language data that reflects real, diverse, authentic Korean.
This multilingual team is now the backbone of the corpus creation process—reviewing, refining, and aligning Korean sentences with their equivalents in eight languages.
And their skills have grown, too.
“Our translation reviewers, after five years in this project, have become more fluent not only in Korean but also in their own native languages,” she said proudly.
How This Corpus Language Project Quietly Improved AI — Including Your Favorite Subtitles
Back in 2021, when the project first started, Korean-to-other-language AI translation quality was still poor. Many global AI models struggled with languages outside English and Chinese.
But the tide turned.
“Companies like Google and Naver could train their models better using the data we built. That’s why AI translation for Indonesian, Thai, Vietnamese, and others has improved dramatically.”
Prof. Lee Jung Hee.
The parallel corpus has now exceeded tens of millions of words, consisting of:
- 40% written text
- 60% spoken language, including YouTube conversations, everyday dialogue, and colloquial expressions.
This heavy inclusion of daily Korean speech is essential for entertainment content.
How the Corpus Language Project Transforms K-Content for Global Fans

For K-entertainment lovers, this is where the Korean–Foreign Language Parallel Corpus Project truly shines.
Prof. Lee emphasized that the corpus is not just linguistic data.
“It is key infrastructure that supports the global localization of Korean content.”
Prof. Lee Jung Hee.
Because the corpus includes authentic spoken expressions—from slang to idioms—it allows AI systems to:
- Translate K-dramas with more natural dialogue
- Produce subtitles that capture nuance (sarcasm, humor, tones)
- Improve K-pop lyrics translation accuracy
- Reduce awkward or overly literal subtitles on streaming platforms
The languages included also reflect major Hallyu regions—Southeast Asia, Central Asia, and Russia.
“K-pop and K-drama rely heavily on spoken expressions and cultural nuance.
Prof. Lee Jung Hee.
Our large-scale parallel corpus helps ensure these nuances are preserved in translation.”
Imagine future AI subtitle systems that understand when a character whispers “헐…”, or when someone says “대박!”, or how to localize informal speech patterns used by idols in variety shows.
This project makes that future possible.
Does the Corpus Include Slang and Everyday Korean?
“Yes. Quite a lot,” she confirmed.
Because 60% of the dataset comes from spoken interactions and YouTube content, it includes:
- Youth slang
- Regional expressions
- Internet language
- Conversational shortcuts
- Multi-generational speech patterns
This ensures AI can capture the real voices of Korean society—not just textbook Korean.
Collaboration with OTT Platforms, Agencies, and Media?
“Highly possible,” says Prof. Lee.
The corpus is public-sector data, built under NIKL, so formal collaboration frameworks are needed. However:
- OTT platforms
- K-pop agencies
- Drama production companies
- Media localization studios
could all benefit enormously from this resource.
Better data → better subtitles → better global fan experience.
Preserving Korean Language Identity in a Globalized Digital World
The professor stressed a crucial cultural mission:
“This corpus records authentic Korean expressions as they are.
It allows AI to understand not just meaning, but nuance and cultural context.”
In other words:
As Korean content spreads worldwide, AI-powered translation shouldn’t flatten or erase unique Korean expressions—but help preserve them.
The corpus becomes a digital archive of the living Korean language.
Indonesia’s Role: A Promising Partnership
During her visit to Indonesia, Prof. Lee Jung Hee met with Badan Bahasa, University of Indonesia (UI), and Institut Teknologi Bandung (ITB).
She found that Indonesia has rich linguistic resources that can complement Korea’s.
“If both countries translate and share data, the scale will grow much faster and support better AI systems.”
Prof. Lee Jung Hee.
This partnership could empower:
- Local Indonesian tech companies
- Startups wanting to build AI for Indonesian–Korean commerce
- Better tools for Korean language learners
- More accurate translation for Indonesian Hallyu audiences
AI for SMEs: Unlocking New Business Possibilities
Small businesses in Korea and Indonesia increasingly want their own AI systems. Prof. Lee believes the corpus will help them:
“They can build more precise AI models for their specific context using our data.”
Prof. Lee Jung Hee.
This could support e-commerce, tourism, customer support, cross-border K-content licensing, and multilingual marketing.
The Future: Chain-of-Thought (CoT) Parallel Corpus
One of the most exciting future directions Prof. Lee shared:
“We plan to expand beyond sentence-level translation into Chain-of-Thought—data that includes reasoning and explanation behind translations.”
Prof. Lee Jung Hee.
This will allow next-generation AI to understand not just how to translate, but why.
It could revolutionize:
- AI subtitle generation
- Localization tone and style
- Cultural nuance modeling
- Korean language learning apps
And create new research collaborations across Southeast Asia and global institutions.
Final Thoughts: A New Era of Enjoying K-Culture
Sitting in that quiet Bandung café, I realized the magnitude of what Prof. Lee is building.
For fans, this isn’t just a technical project.
It’s a gateway—one that will let the world watch K-dramas with subtitles that feel alive, understand K-pop lyrics more deeply, and learn Korean culture in ways that are more natural and meaningful.
Prof. Lee’s calm passion reflects a simple truth:
Better language understanding leads to better cultural connections.
And this project brings us one step closer to a world where Korean entertainment truly feels like it was made for everyone.
Join us on Kpoppost’s Instagram, Threads, Facebook, X, Telegram channel, WhatsApp Channel and Discord server for discussions. And follow Kpoppost’s Google News for more Korean entertainment news and updates.







