StepFun Challenges OpenAI with StepAudio 2.5 Voice AI

Shanghai-based StepFun has launched StepAudio 2.5, an end-to-end voice model that claims to outperform industry leaders in emotional intelligence.

StepFun Challenges OpenAI with StepAudio 2.5 Voice AI
Shanghai-based StepFun has unveiled StepAudio 2.5 Realtime, an end-to-end voice model designed to process audio input directly without text-based conversion.

Redefining Paralinguistic Comprehension

The model distinguishes itself by reading non-verbal cues such as vocal speed, emotional tone, and age. In objective benchmarks, StepAudio 2.5 achieved a score of 82.18 in paralinguistic perception, surpassing major competitors like GPT Realtime 1.5.

«We are moving beyond simple query-response systems toward soul-level companions that maintain character stability through long-tail conversations,» the lab stated.

  • Trained on over 10,000 human-authored persona seeds.
  • Achieved an 86.36 score in general dialogue quality.
  • Backed by $1.7 billion in total funding.

FAQ

How does StepFun prevent AI from going out of character?

The lab utilizes roleplay-specific RLHF, which focuses on persona stability rather than just general quality, ensuring the model remains consistent during extended interactions.

Who leads StepFun?

The company was founded in 2023 by Jiang Daxin, a veteran developer who spent 16 years at Microsoft managing Bing and Cortana.

Leave a Reply

Your email address will not be published. Required fields are marked *