Hancheol Park

I am currently a Senior AI Research Engineer at Nota Inc., where I research and develop model compression and optimization techniques to build efficient AI models. I received my Ph.D. in Computer Science from the School of Computing at KAIST, where my research focused on Natural Language Processing under the supervision of Prof. Jong C. Park.

Email: hancheolp (at) gmail.com

Web: https://hancheolp.github.io

News

MTN News interview on Nota’s AI model optimization technology

I was interviewed by MTN News about Nota’s AI model optimization technology, and the segment was broadcast on MTN.

MTN News

2026

Two MoE quantization papers accepted to AdaptFM @ ICML 2026

Two papers on our MoE-specific quantization methods for MoE LLMs were accepted to AdaptFM, an ICML 2026 workshop.

News Article Nota AI Blog ICML 2026 (1) ICML 2026 (2)

2026

NVIDIA Nemotron Hackathon Seoul

We won 1st place in Track C and were selected as the overall winner at the NVIDIA Nemotron Hackathon Seoul with our data-driven method for MoE model optimization.

YouTube News Article NVIDIA Blog

2026

Official quantized Solar-Open-100B models released

We released official quantized variants of Solar-Open-100B from the Sovereign AI Foundation Model project using our MoE-specific quantization method.

Hugging Face (1) Hugging Face (2) Technical Report Nota AI Blog

2026

AWS technical blog on LLM optimization for Inferentia

AWS published a technical blog post introducing our LLM optimization and quantization techniques for AWS Inferentia.

AWS Blog

2026

Research Interest

Foundation Models

LLM/VLM pretraining, instruction fine-tuning, retrieval-augmented generation, RAG-aware fine-tuning, vLLM-based serving, custom model architecture integration in vLLM, and model evaluation.

Hugging Face Nota AI Blog

Efficient AI

Efficient LLMs/VLMs, lightweight neural architecture design, quantization, pruning, knowledge distillation, model porting, graph optimization, ONNX/TFLite conversion, and NPU/GPU-aware deployment.

ICML 2026 (1) ICML 2026 (2) arXiv preprint AWS Inferentia Blog Hugging Face

Reliable NLP

Hallucination mitigation, ambiguity detection, uncertainty estimation, selective answering, uncertainty-aware generation, and confidence calibration.

COLING 2025 (1) arXiv preprint (1) COLING 2025 (2) arXiv preprint (2) ACL 2023

Human-Centric Computer Vision

Image and video understanding, surveillance video analytics, event detection, human activity recognition, and vision systems for user convenience and safety.

CVPR 2024 (1) CVPR 2024 (2) ICCV 2023 CVPR 2023 (1)

Publication

DREAM-MoE: Downstream Routing Error-Aware Margin-Preserving Quantization for Mixture-of-Experts Large Language Models

Hancheol Park, Geonho Lee, Tae-Ho Kim. AdaptFM @ ICML, 2026.

Hancheol Park

News

MTN News interview on Nota’s AI model optimization technology

Two MoE quantization papers accepted to AdaptFM @ ICML 2026

NVIDIA Nemotron Hackathon Seoul

Official quantized Solar-Open-100B models released

AWS technical blog on LLM optimization for Inferentia

Research Interest

Publication

DREAM-MoE: Downstream Routing Error-Aware Margin-Preserving Quantization for Mixture-of-Experts Large Language Models

SRA-MoE: Output-Aware Selective Router Alignment for MoE Quantization

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

Nota AI at GenAI Detection Task 1: Unseen Language-Aware Detection System for Multilingual Machine-Generated Text

Where do LLMs Encode the Knowledge to Assess the Ambiguity?

Assessing the Answerability of Queries in Retrieval-Augmented Code Generation

Self-Knowledge Distillation for Learning Ambiguity

Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking

Road Object Detection Robust to Distorted Objects at the Edge Regions of Images

Computationally Efficient Decoders for Semantic Segmentation Models

Efficient Semantic Segmentation Models with Weighted Sum-based Feature Fusion Decoders

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

Deep Model Compression Also Helps Models Capture Ambiguity

Question-Answering in a Low-resourced Language: Benchmark Dataset and Models for Tigrinya

Addressing the Occlusion Problem in Multi-Camera People Tracking with Human Pose Estimation

A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

Cut Inner Layers: A Structured Pruning Strategy for Efficient U-Net GANs

Constructing a Paraphrase Database for Agglutinative Languages

Detection of Non-Standard Meaning Usage with Word Embedding

Predicting Symptoms of Depression for Social Media Users via Linguistic Patterns

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language

Enhanced Sign Language Transcription System via Hand Tracking and Pose Estimation

Neural Theorem Prover with Word Embedding for Efficient Automatic Annotation

Addressing Low-Resource Problems in Statistical Machine Translation of Sign Language

Affix Modification-Based Bilingual Pivoting Method for Paraphrase Extraction in Agglutinative Languages

The Correlation between Search Quality and Query Popularity

Initiating Moderation in Problematic Smartphone Usage Patterns

Measuring Popularity of Machine-Generated Sentences Using Term Count, Document Frequency, and Dependency Language Model

Sentential Paraphrase Generation for Agglutinative Languages Using SVM with a String Kernel

An Automatic Evaluation Metric for Korean Paraphrase via Semantic Frame

Granted Patent

Method and System for Computationally Efficient High-Performance Object Detection

Method and System for Compressing Natural Language Understanding Models via Layer Pruning

Technique for Reducing Hallucinated Answers from AI-Based Language Models

Method and Apparatus for Early Fire Detection

Method and Apparatus for Determining Ambiguity in an Input Prompt

Method for Estimating Crowd Count, Method for Training a Model for Crowd Count Estimation, and Electronic Device for Performing the Same

Method of Lightweighting a Neural Network for Object Recognition, Method of Recognizing an Object Using the Lightweighted Neural Network, and Electronic Device for Performing the Same

Method of Lightweighting a Neural Network for Object Recognition, Method of Recognizing an Object Using the Lightweighted Neural Network, and Electronic Device for Performing the Same

Knowledge Distillation Method and System Specialized for Pruning-Based Deep Neural Network Compression

Statement Reliability Evaluation System and Method Using Commonsense Knowledge and Linguistic Patterns

System and Method for Constructing Emotion Lexicon by Paraphrasing and Recognizing Emotion Frames

Method and System for Personality Recognition from Dialogues

Apparatus for Detecting Non-standard Meaning Usage of Words, Method for Detecting Non-standard Meaning Usage of Words, and Recording Medium

System and Method for Communication Training Program over Virtual Reality and Continued Feedback via Mobile Device

Awards & Honors

NVIDIA Nemotron Hackathon Seoul

GenAI Content Detection Task 1

NVIDIA AI City Challenge 2024 Track 4

NVIDIA AI City Challenge 2024 Track 1

NVIDIA AI City Challenge 2023 Track 1

Visual Object Tracking 2023 Challenge

Outstanding Paper Award

Best SK AI Partner

Best Presentation Award

Best Paper Award

Best Paper Award

AFNLP Best Asian Paper Award

Public Highlights

MTN News Interview

Two MoE Quantization Papers Accepted to ICML 2026 Workshop

NVIDIA Nemotron Dev Days Seoul

NVIDIA Korea Blog Recap

LLM Model Quantization Techniques for AWS Inferentia by Nota AI

Two MoE Quantization Papers Accepted to an ICML 2026 Workshop

NotaMoEQuantization: An MoE-Specific Quantization Method for Solar-Open-100B

Solar-Open-100B-NotaMoEQuant-Int4

Solar-Open-100B-NotaMoEQuant-NVFP4

Unseen Language-Aware Detection System for Multilingual Machine-Generated Text

Where Do LLMs Encode the Knowledge to Assess the Ambiguity?