Cang Jie System

AI Generated Arts
Data Visualization
Virtual Reality
Time
2021 - 2022
Tools
Python (PyTorch, Matplotlib), Web Development, Unity, 3D Modeling
Role
This is my undergraduate thesis project that I worked on individually.
Intro
The Cang Jie System is an experimental project that utilizes artificial intelligence (AI) and deep learning models to gain a better understanding of the extensive Chinese character system. To achieve this, I employed an innovative approach that utilized image processing techniques to study the logographic nature of Chinese characters. Through this project, I sought to gain a deeper insight into the complex and intricate language system of Chinese characters.
Background
Chinese is a logographic language. The Chinese characters are glyphs whose visual components depict concrete objects or abstract concepts. The earliest known record of Chinese writing, the Oracle Bone Scripts, mimics the shape of actual objects to tell stories.

Over the centuries, Chinese characters’ forms continued to evolve and develop, while the most noticeable trend is standardized and simplified. The information they carry and the interrelations among the characters were preserved. I borrow the figure of Cang Jie, the inventor of Chinese characters, to tell the mission of my practices.
Experiment I: A Preliminary Understanding
Prior to constructing a high-dimensional latent space that could thoroughly describe the characters, I first experimented with projecting the character image data to 2-dimensional spaces, hoping to derive preliminary understandings that could guide later investigations. The visualization of the experiments results showed that Characters with the same radicals are clustered together in the 2D space. Additionally, the 2D projection also exhibits that the distribution of the characters are strongly related to the complexity (indicated by the number of strokes) of the characters.
2D Visualization of Characters Mapped to 2D Space (only part of the characters are showing to ensure readability)
Visualization of Locations of Characters With the Same RadicalsCharacters with the same radicals are clustered together.

The clustering does overlap because the space is extremely condensed. The radicals in the graph (from left to right, up to down) are 讠(talk), 女 (female), 口 (mouth), 刀 (knife), 人 (human), 一 (one).
Experiment II: Generate New Characters with GAN
The preliminary insights gained through Manifold Learning confirmed the possibility of describing the entire character system using algorithmic models. Building on this, I developed generative deep learning models that could generate new characters and provide deeper insights into the character system.

To achieve this, I chose the GAN model structure, known for producing high-quality and seamless generations across various applications. In the following section, I detail my experiments with multiple GAN variants, including DCGAN, InfoGAN, WGAN, and VAE-GAN.
DCGAN & InfoGAN
Both experiments with DCGAn and InfoGAN encountered the issue of mode collapse, where the generative model starts to produce limited varieties of generations.
WGAN
Experiments also showed that the new GAN variant, WGAN had solved the issue of mode collapse. Yet, the generations produced by the model were still partially glitched.
VAE-GAN
To compensate for the absent metrics for validating generated characters, I experimented with another GAN variant, the VAE-GAN. VAE-GAN combines the attributes of both GAN and VAE, enabling the model to produce new generations and reconstruct original images.
Comparison between the original characters (left) and the reconstructed characters (right) created by VAE-GAN model.
A collection of generated characters from the trained VAE-GAN model
Experiment III: Visualize the Character Space
To fully interpret the learned latent space through visualization in the three-dimensional spaces, one crucial operation is to reduce the number of dimensions of the latent space. Principle Component Analysis was used to deduct the high-dimensional data into 3-dimensional vectors. The following point cloud visualization shows the character distribution in the reduced 3D space.
3D Visualization of Correlation Between Character's Complexity and Locations in Condensed Latent Space. The visualization demonstrates that the third dimension in the condensed latent space roughly describes the characters' complexity.
3D Visualization of Correlation Between Character's Radicals and Locations in Condensed Latent Space. Characters with same radicals are clustered together in the condensed latent space.
3D Visualization of All Characters with Radical 艹 (艸)
Live-Generation Web Application
In order to better communicate the experiment results to the mass audience, I built a live-generation web application to allow my audience to interact with the model.
Visit the live-application at cangjie-system.herokuapp.com

The audience can experiment with the trained model by themselves and generate their own Chinese characters by inputting different character calculations. The web application also serves as a gallery that exhibits a variety of generated characters and the documentation of the project’s concepts and executions.

The application was built with Flask and hosted online using Heroku server.
VR Character Space
In the context of this project, I also experimented with 3D data visualization in VR environment. Each character exists in the 3D character space as a character bulb. The audience may travel across the character space and interact with the character bulb. The light beams that connect the character bulbs allow the audience to learn about how the new characters are created.
Prototype of original characters (character in gold) and generated characters (character in silver). Connections between the new and the original characters are highlighted.
Workflow: placing a few sample characters in the space
Workflow: placing the 100 most used characters and 30 generated new characters with C# scripts, making the visualization data-driven.
Workflow: experiment with different visual, iteration 1
Workflow: experiment with different visual, iteration 2
Next Project ____