Vocal Synthesis: Technology That Redefines Playing an Instrument

Imagine a keyboard like no other that has come before—one that can “sing” pre-programmed lyrics according to the musical phrases you play. Unlike conventional keyboards and synthesizers, and in contrast to specialized studio production equipment, the Casiotone CT-S1000V represents an all-new performance instrument concept built around a groundbreaking new technology: Vocal Synthesis.

Since the 1980s, PCM-based technology has given keyboard players access to a myriad of authentic instrument sounds as standard. But convincing replications of the human singing voice have always remained out of reach. Not only do vocals exhibit tremendous variations in timbre even at a constant pitch, but lyrics add a whole other layer of complexity. Even for a given word, differences in melody and phrasing, as well as the words that precede and follow it, will call for changes to the waveform.

And while there have been products that aimed to replicate the sound of the human voice in a keyboard format, they required specialized operating and performance techniques or minute calibration and performance of pre-programmed lyrics—factors that have limited uptake among musicians.

Casio’s Vocal Synthesis technology overcomes these issues by putting authentic reproductions of the human singing voice at the fingertips of keyboardists. And the launch of the CT-S1000V means that anybody can create sung vocals whenever they feel like it, without the need for special techniques.

An Instrument That “Sings” in Real Time

Vocal Synthesis, a groundbreaking new technology that puts sung vocal lines within easy reach of instrumentalists, was developed by Casio in conjunction with researchers at the Nagoya Institute of Technology. Articulation models based on earlier machine learning research are used to modulate vocal tones from a range of virtual vocal patches known as Vocalists, combined with built-in preset lyrics and user-programmed lyrics entered into the Lyric Creator app to produce fluently sung lines via the keyboard. The vocals, which are created by combining vocal tones produced by a PCM sound source designed to mimic human vocal cords, together with filters that generate phrasing in accordance with the lyrics input, can also be tweaked in real time via physical knobs to further adjust characteristics such as age and gender.

This technology is a radical departure from prior approaches that triggered pre-recorded vocal sounds via a keyboard, or vocoders, which combine sung vocals in real time with synthesizer tones. And as well as eliminating the need for special techniques, Vocal Synthesis allows you to pair your pre-programmed lyrics with any melody or harmonies you wish, opening up new possibilities at the nexus of instrumental and lyrical expression.

Note Mode and Phrase Mode Give Control Over Lyrical Progressions

How the lyrics advance is determined by one of two modes: Note Mode, in which the notes you play form the melody and the lyrics advance with each note played ; and Phrase Mode, which steps through the lyrics automatically at a fixed meter as you play.

Note Mode produces fluent, natural-sounding vocal lines by triggering each syllable of your lyrics according to the notes you play and applying the terminal consonants as you release the keys. There are also several features to guard against the risk of losing your place having played a wrong note. Casio’s SP-3 and SP-20 pedals (sold separately) or other commercially available footswitch can be used to step forward and backward through the lyrics , or reset the lyric position. You can also select which syllable to pronounce next using the bass register keys, and control how syllables should advance when chords are played.

While Note Mode—and existing vocal synthesis software—largely demand that the vocal melody is fixed in advance, in Phrase Mode, however, the CT-S1000V gives you options for improvisation by moving the lyrics forward automatically as you play. Simply set up the lyrics and their rhythmic subdivisions and BPM in advance, and then play freely and hear the lyrics being sung along in time.

This mode allows you to move beyond simple conceptions of melody, and create complex vocal lines based on advanced keyboard techniques. At the same time, if your keyboard skills are more basic, you can still be confident that the phrasing will not ‘break’ if you accidently hit some wrong keys. Another benefit of Phrase Mode is natural phonetic junctures that create flowing vocal passages. This also provides a great compositional platform, allowing you to work out new vocal lines and develop arrangements via the intuitive interface of the keyboard.

Indeed, it is Phrase Mode that takes the CT-S1000V beyond simply a new keyboard with new sound generators and tones, transforming it into an instrument that opens up new possibilities for musical expression.

Real-Time Control Over Vocal Timbre

The CT-S1000V has 22 vocal presets, or “Vocalists,” each with their own character, from “Choir Group” and “Bossa Nova” to “Child” and even “Vocoder.” Each comprising multiple wave forms and elements such as white noise, these Vocalists are precisely tuned to produce clarity on both vowels and consonants, whether performing monophonic melody lines or more complex harmonies.

Meanwhile, the physical control knobs on the CT-S1000V’s console panel offer real-time control over parameters such as vibrato and portamento, as well as tonal characteristics that determine your chosen Vocalist’s age and gender profile. And by adjusting attack and release, and the speed of enunciation, you can sculpt even more natural transitions between words and syllables, and subtly control the timing with which the resulting vocals hit the ear. What’s more, you can upload a 16 bit/44.1 kHz WAV file and create your own User Vocalist, while combining different parameters can help to unearth completely original sounds. It’s this scope for experimentation that makes the CT-S1000V so unique and enjoyable, all underpinned by Casio’s Vocal Synthesis technology.

Vocalist Examples


Mid-size female choir sound. The consonants are clearly articulated and can be heard even when playing chords.


Male choir trio. The consonants are clearly articulated and can be heard even when playing chords.


Vocoder-like sound popularized by 80s disco hits, with a pitch one octave lower layered underneath.


Breathy bossa nova-style female vocal sound.


Operatic sound with a distinctive vibrato.


A fierce vocal sound known as 'Growl', used in the Death Metal genre. No fundamental tone is contained in the sound.


Bend-ups are added at the initial phase to create an amusing, ghostly sound.


A unique sound in which lyrics can be spoken while mimicking chickens, cows, lions, cats and goats.

Combine Functions to Create Complex Vocal Textures

The creative possibilities of Vocal Synthesis and the CT-S1000V are expanded still further by combining its many built-in features. For example, the arpeggiator generates arpeggiated chords and other phrases from held-down keys, but can also be used in conjunction with the Syllable Randomizer to vocalize syllables from lyrics in random order to create otherworldly phonetic clusters.

Built-in DSP effects combined the instrumental tones unlock even more dynamic combinations. Holding down a key with the Retrigger function active recreates the effect of striking the key in rapid, perfectly timed succession . You can use Retrigger in combination with the Hold function to have keys retriggering even after they have been released, facilitating textures and arrangements that would be impossible to physically play.

Upload Original
Verses Using
the Lyric Creator App

Check the App Store/Google Play for compatibility information regarding your smart device model and OS.

Enter Your Lyrics

Favorite song lyrics and original creations alike can be entered in English and Japanese using your iOS or Android device via Casio’s own Lyric Creator app. This text is automatically divided into syllable units (though you can also assign divisions manually and group multiple syllables together), and after exporting the resulting data to your CT-S1000V, you’re ready to play.

Set the Meter

In Phrase Mode, the playback meter of the lyrics is determined by assigning note values (8th notes, quarter notes, etc.) to the individual syllable units and inserting rests. Individual lyric tones include tempo data that can be adjusted via the CT-S1000V itself. Tempo can also be synced to the MIDI clock from your DAW or other external MIDI device to ensure that your vocal phrasing always remains in time regardless of how adventurous you get.

Get Granular with Phrasing and Diction

Users with the appetite for a truly granular approach can go even deeper and edit the phonemes that comprise individual syllables. And besides crafting clearer vocal diction, this process can be used to approximate regional accents or mimic the pronunciation of words in languages other than English and Japanese. (Note that the available phoneme library consists only of sounds that occur in standard English and Japanese.)

Table of Phonetic Symbols Used in Phoneme Editing and IPA Equivalents+-

Chain Lyrics Together for Longer Sequences

While Lyric Creator places a limit on the length of lyric that can be entered (up to 100 eighth-note syllables), once uploaded to your CT-S1000V, individual lyrics can be chained together into much longer sequences. This function allows you to fine tune individual sections at the input stage before combining them within your CT-S1000V to create a complete song.

Create Your Own Vocalists

The Lyric Creator app can also be used to transform a WAV audio sample (16bit/44.1kHz, mono/stereo, max. 10 seconds in length) stored within your smart device into an original Vocalist patch that can then be loaded into the CT-S1000V. The editing interface allows you to set characteristics such as age, gender, vocal range, and vibrato.

The CT-S1000V’s 22 Vocalist presets have each been designed for maximum clarity of enunciation by blending different waveforms with elements such as white noise, and as such User Vocalist waveforms may not achieve the same level of articulation. But with some experimentation you can create new sounds, including abstract ones akin to the CT-SV1000V’s Animal preset.

Start experimenting by downloading a free User Vocalist-compatible waveform sample (SawC4+WhiteNoise.wav) from the link below:

Connecting the CT-S1000V to Your Smart Device

Once the Lyric Creator app is installed on your smart device, you can start transferring lyrics, sequences, vocal samples, etc., by connecting your device to your CT-S1000V via a USB cable. While connected, you can also use the app to view how much space is available on the CT-S1000V’s internal drive, delete files, and edit file names. Program files are exported using a proprietary format that enables sharing between CT-S1000V users. You can also import Music XML lyric data and note values from your DAW.

Vocal Synthesis: The First Giant Leap in Sound Generation Since PCM

Casio’s History of Sound Source Development

Born from audio-industry research and development that began in the 1970s, Pulse Code Modulation (PCM) technology dramatically impacted our everyday musical landscape through the digital reproduction of soundwaves from analog sources like physical instruments and the human voice.


Casiotone 201

Sound Source


Sound Source


It was in 1980, driven by the goal of leveraging digital technology to bring the joy of playing music to everyone, that Casio Computer Co., Ltd. first entered the musical instrument market with the Casiotone 201 and its groundbreaking Vowel-Consonant synthesis approach. Then, in the mid 1980s, with recorded music shifting from analog records to digital CDs, and rival instrument makers competing to develop a digital synthesis platform for music creation, Casio launched the CZ-101 synthesizer, based on our unique Phase Distortion (PD) sound source. And Casio’s contribution to digital synthesis took another leap forward in 1988 with the launch of the VZ-1 and its Integrated Phase Distortion (iPD) technology.

The 1980s was a period when electronic instruments had a tremendous impact on popular music. And new keyboards and synthesizers featuring PCM technology took center stage by enabling artists to create records using completely new sounds and to explore new performance styles. Meanwhile, improvements in memory continued to drive down the price of digital instruments, increasingly placing them within reach of ordinary consumers.

In 1985, Casio released the MT-500 electronic keyboard, which used PCM-based tone generation to digitize the sounds of drums and cymbals and paved the way for the runaway success of the legendary SK-1 sampling keyboard, which sold a million units following its launch in 1986. This was followed in 1988 by the CT-640, an electronic keyboard based entirely on PCM-based tone generation.

Around the same period, PCM technology was also giving rise to electronic pianos with increasingly authentic sounds. The PCM-based CDP-3000—Casio’s first electronic piano with a hammer-action keyboard—was another new arrival in 1988, while 1991 brought a full expansion into the electronic piano market with the introductions of the still-popular CELVIANO series as Casio launched the AP-7 and its Advanced Piano (AP) sound source. In the years that followed, improvements in memory continued to drive heightened performance and sound quality at ever-more affordable prices.

But PCM sound sources were not without their limitations.

While they were ideal for faithful reproduction of stored sounds, PCM sound sources struggled to reproduce subtle variations in tone and articulation due to playing dynamics. Casio launched a development drive aimed at addressing this issue and advancing the PCM architecture, and in 1993 the CTK-1000 was launched: an electronic piano whose Integrated Cross-Sound Architecture (iXA) sound source combined PCM-based tone generation with touch response and DSP functionality.

Many of our current products still use PCM-based sound sources combined with unique Casio technologies to reproduce complex shifts in tone due to factors such as note decay and playing dynamics. Our electronic piano lineup includes both the Privia and CELVIANO series, which feature Acoustic and Intelligent Resonator (AiR) sound sources, and the CELVIANO Grand Hybrid with its AiR Grand sound source. Meanwhile, the Casiotone series, too, features the PCM-based Acoustic Intelligent multi-Expression (AiX) sound source.

Sound Source






Privia PX-S1100

CELVIANO Grand Hybrid GP-510BP

Casiotone CT-S1

But while such advances have enabled PCM-based tone generation to provide faithful recreations of a huge range of instrumental sounds, it continues to struggle with the most fundamental and historically important instrument of all: the human voice. Reproductions of sung vocal lines have to contend with a range of challenges. Not only do different vocal techniques produce wildly complex variations in tone, but lyrical content also presents an overwhelming number of variables to process. From the diversity of vocabulary to transitions between syllables and the way a different melody or phrasing can completely transform the necessary waveform for a given word. And while PCM-based approaches have made progress in recent years, along with vocoders and other technologies, significant shortcomings have continued to hamper widespread adoption.

Now, at long last, 2022 brings Casio’s longstanding development efforts to fruition with an entirely new approach to sound generation technology––Vocal Synthesis––and an altogether new kind of instrument: The Casiotone CT-S1000V. Combining Vocal Synthesis with a revolutionary Phrase Mode, this new device performs the staggeringly complex task of putting sung vocal lines at your fingertips, but with an interface that is simple and intuitive enough for anyone to use. And just as the last 30 years have brought a wealth of changes to our lifestyles, we hope that this latest innovation can have a similar impact in the fields of musical performance and composition.

Vocal Synthesis

Casiotone CT-S1000V