Stable Audio Open Small: Revolutionizing Mobile AI Audio Generation

AI快讯15小时前发布 niko
4 0
AiPPT - 一键生成ppt

Stable AUdio Open Small is a groundbreaking text – to – audio generationmodel officially open – sourced by Stability AI. Tailored for mobile devices,it has 341 million parameters, representing a Leap in AI audio generationtowards edge computing and mobile use.

The model draws on the earlier Stable Audio Open and is deeply optimized.Thanks to the KleidiAI library from Arm, it can generate up to 11 secondsof 44.1kHz stereo audio in under 8 seconds on smartphones, enabling offlineaudio generation. This is a significant advantage as it doesn’t rely on cloudprocessing, fitting well for offline Scenarios.

Using a latent Diffusion model (LDM) , combined with T5 text embeddingsand a transformer – based diffusion ARChitecture, it can generate varioussounds from simple English text prompts. AIbase tests show its suitability forsound design and music production, capable of creating detailed short audioclips.

Released under the Stability AI CommUnity License , it’s free forresearchers, individual users, and small companies. The model weights and codeare accessible on Hugging Face and GitHub. Enterprises can purchase anenterprise license for commercial use. All its training data comes fromroyalty – free audio sources, ensuring copyright compliance.

The adversarial relative contrast (ARC) post – training method is a keyinnovation. It enhances generation speed and prompt adherence withouttraditional distillation or classifier – free guidance. The model achiEVEs animpressive CLAP conditional diversity score of 0.41, leading among similarmodels. Its Ping – Pong sampling technique further optimizes few – stepinference, balancing speed and quality.

Subjective testing reveals high scores in diversity, quality, and promptadherence. The release of this model is set to transform the AI audiogeneration landscape, making audio creation more accessible to ordinary users.However, it has limitations, such as only supporting English prompts and weakperformance on non – Western music styles. Stability AI plans futureimprovements for multilingual support and musical style diversity.

© 版权声明
Trea - 国内首个原生AI IDE