Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Vocaloid 6 (vocaloid.com)
205 points by brudgers on Nov 10, 2022 | hide | past | favorite | 224 comments


I went from Vocaloid hater to fan in the span of this year. There are Japanese Vocaloid producers who are pushing the boundaries of pop music in a way that wouldn't be possible with a real singer. I've never come across anything like this music in the West. Definitely an acquired taste.

My Vocaloid song recommendation: Ungray Days by the producer Tsumiki. Tsumiki creates a sharp, aggressive sound that is disagreeable at first but really addictive. https://www.youtube.com/watch?v=UvF3Mwj5d4E


I'm sorry, but come on. Being a musician myself, I always try to keep an open mind and fully realize that not all music is supposed to give you the warm and fuzzies, but IMO nothing about that is interesting. It sounds like mathcore on fast forward with a chipmunk incomprehensibly chattering over it. No interesting harmonic structure, no interesting instrumentation or arrangement, rhythms straight stolen from other genres.

I'm being genuine when I say I'm interested to hear what about this moves you. I almost always get it even if I don't like it. This...I don't get it.


Certainly Ungray Days incorporates many rhythms, chord progressions, and other structures that are derivative of other works. I think this is true of all pop music though. These strike me as songwriting elements that have withstood the test of time because they work, so I don't necessarily view them negatively - to the contrary I feel that I need a certain number of these familiar elements for music to emotionally resonate with me. What I love about Tsumiki's music is these common pop structures combined with the fast tempo, weird voice, and busy instrumentation. There is chaos unfolding in the music but at the same time it is held together by classic structures that I recognize from older songs. It produces this chaotic yet focused energy that really gets me going. There's definitely nothing complex or groundbreaking in this song in terms of musical structure, but I haven't really come across other music with a similar "vibe", if you will (System of a Down may come closest for me.)


What I like about the song OP linked lies mostly in the legacy of how Vocaloid has been used in the past, e.g. "Disappearance of Hatsune Miku" [0] - there are certain textural elements that come with using Vocaloid that show up repeatedly, and a big one among them is "rapid chatter" effects. There are other ways of getting a similar result that don't sound the same, just like a session performance is probably going to sound different from a sampled instrument.

If you examine this stuff on the basis of harmonic structure, rhythm or arrangement you're basically going down the path of discounting most of electronic music, which is discretized into microgenres just on the basis of using a faster tempo, a different snare hit sound or an unusual mixing strategy. You have to really lean into timbre and texture to find what to appreciate.

[0] https://youtu.be/5qkTpJAhywg


There are all sorts of styles that use vocaloid, it's not all the fast chipmunk stuff.

I'm also a musician, there's a lot I can enjoy with vocaloids and utaus.

Also fwiw, 'rhythms stolen from other genres' is a really weird comment for a musician to make.


None of the things you mentioned are necessary conditions for good music.


It's definitely an acquired taste. I'm about 50/50 on vocaloid songs personally.

IMHO, the best gateway drug is this album/series this is from, though that may just be my personal nostalgia:https://www.youtube.com/watch?v=s_lGrcOtzck

Another interesting vocaloid artist is SOOOO. However, anyone who has struggled with depression/self harm or has suffered abuse should not look them up if they think there is any possibility of being triggered by a mention of it (to the degree where I opened youtube in a private tab to find these links so that I wouldn't risk them being recommended while I'm in a bad headspace).

That being said, https://www.youtube.com/watch?v=RUIelJYMO4U, https://www.youtube.com/watch?v=0OOWWNFTguY, and https://www.youtube.com/watch?v=uGZ0I71Yawo encompass many of those feelings better than any other songs I've heard.

I personally like the chaotic quality that many vocaloid songs carry, and much like other somewhat experimental genre's you start to be able to follow more after listening to it over time.


There are certain limitations (which will get better in time) and it's also because it is a stylistic choice. Vocaloid certainly created it's stylistic niche due to early limitations and it is quite hard to find songs in other styles using voicebanks that are suitable for other styles.

EDIT: That being said Miku can also be amazing: https://www.nicovideo.jp/watch/sm23655091

The original in a more Miku style: https://www.youtube.com/watch?v=PqJNc9KVIZE

Or a more jazzy version but not as good as the first: https://www.youtube.com/watch?v=CwG9viczjhs

Compare to a human cover: https://www.youtube.com/watch?v=fEsyBaG-uNw

Here is a different example:

https://www.youtube.com/watch?v=gIBdpzporFs

Of course the original version which is true to Vocaloid style is like this:

https://www.youtube.com/watch?v=Mqps4anhz0Q

Even if Vocaloid is capable of much more than just Miku, Miku is immensely influential in the subculture.

Does it sound better live ( https://www.youtube.com/watch?v=K_xTet06SUo )? Or for a closer comparison with the first link https://www.youtube.com/watch?v=nepNc0Gk1E8 ? Sure, even though the Vocaloid style has its own charm. But you also do not need to be able to be able to sing in order to create a song with Vocaloid, so overall it's a great tool. If the song is good, someone will eventually cover it live.

Vocaloid allows many who want to compose but do not sing to participate in a remixing ecosystem. The collaborative nature of the community is an incredible strength.


> sounds like mathcore on fast forward with a chipmunk incomprehensibly chattering over it

Such an excellent and on-point description for 99% of Vocaloid content.


Try listening to what passes as mainstream J-pop/J-rock. By Japanese standards, the vocaloid scene is actually pretty innovative.


I'm not up to date on J-pop/J-rock as of 2022 but things of past years like King Gnu's Hakujitsu or LiSa's First Take performance seem pretty decent as far as fairly popular music goes, and I'm sure I am missing a lot of stuff due to my musical interests having shifted a fair bit over the years.

[1] King Gnu - Hakujitsu: https://www.youtube.com/watch?v=ony539T074w

[2] LiSa - Gurenge: https://youtu.be/MpYy6wwqxoo?t=45


Hah. I was just about to post how that song earlier reminded me a bit of LiSA https://m.youtube.com/watch?v=CwkzK-F0Y00 or the Dorohedoro opening https://youtu.be/_nsNWzypHHg


Innovative by mathcore(TIL this is a genre tbh) standards. Frankly I'd think GP is spot on, even though I love Vocaloid songs.


You're right. And you are too soft in your wording. This sounds disgusting.


Please listen to Kiichi's "Francium" and tell me how much you hate it. My enjoyment of the song is greatly increased whenever I'm reminded that people dislike it.


> My enjoyment of the song is greatly increased whenever I'm reminded that people dislike it.

Why?


There is a clear explanation for this. If you like something that others find disgusting, it allows you to consider yourself special. Entire subcultures stand on such a psychological focus. Everything has a place under heaven, and this is not good or bad.


Good or bad, it does mean a general increase is "unique" behaviors which are taken for reasons besides "its good".

Diversity is important, but it has the drawback that compatability suffers. Monoculture is no better, but the tendency to dismiss what others find unique is a recognition of the general (biological) strategy, to conserve some status quo to build community.

General advice, don't have strong feelings about what others like - if they enjoy similar things fine if not their opinions are not worth much in the first place. At the same time, don't be afraid to enjoy what you like or find your tastes change over time, it's natural and not necessarily wrong.

Worst thing you can do to a young person, be old, and tell them you love everything they like - either they will think you a foolish old person or be devastated they aren't as hip as they thought they were...


Yeah, that's pretty weird. I thought this stuff would be cool slowed down, maybe like some DJ Screw type stuff, but no.


I don't feel much from that one. But I will drop some of my favorites here in case anyone wants to discover them.

CHO-DARI- - Hatsune Miku https://www.youtube.com/watch?v=DU1HjAPvHG8

Mum / 雄之助 feat. flower https://www.youtube.com/watch?v=IjAcngUNiZ8

Hana to Nare / Yunosuke feat. KAFU https://www.youtube.com/watch?v=XqKbuEDvaf8

IA - Conqueror https://www.youtube.com/watch?v=C3E5fb39xcs


Wow, it's time for me to shine (see my username)

Twitter Land - STEAKA : https://www.youtube.com/watch?v=e_qQEU_uGjw

Chimera - DECO*27 : https://www.youtube.com/watch?v=c6HKcNVbByc

Start Up! - Nariyama Ryo : https://www.youtube.com/watch?v=LFOV9NbkiJM

My name is - yanagamiyuki : https://www.youtube.com/watch?v=1hj3BDehQGc

Dance with me - Osanzi : https://www.youtube.com/watch?v=n37kZTKbpSM

Highlight - KIRA : https://www.youtube.com/watch?v=AYUNaQaDfa8

Ghost city tokyo - ayase : https://www.youtube.com/watch?v=lWl5viCqGSc

Aqua illumination - PedestrianP : https://www.youtube.com/watch?v=F02fIei8gZU

These songs might be pedestrian (heh) to you, but there is so much niche and experimental all using the same voice--I find this highly fascinating.

The above songs I think give a pretty wide longitudinal view of Vocaloid music and the variety you can find in the fandom, from just Hatsune Miku as the vocal.


> My name is - yanagamiyuki : https://www.youtube.com/watch?v=1hj3BDehQGc

Never thought I'd find something this interesting from a HN thread, thanks!


> My name is - yanagamiyuki

This one is incredible! It's like rap+vocoder. I listened to all of these and loved it, thanks!


Also adding: a couple of imageboard originals that introduced me to vocaloid, not really a fan of most vocaloid music though. I still don't know what sub-genre these are -- the producers being anonymous doesn't help!

Luka(?) sounds pretty natural here, too, especially for 2009. I still remember thinking at first that it was sung by a human.

/jp/ themesong - anonymous ft. Luka, Len, Rin, Miku https://www.youtube.com/watch?v=SCSM4W8vk3Q

/jp/ themesong 2 - anonymous ft. Luka, Miku https://commons.wikimedia.org/wiki/File:Jp_themesong_2.webm

Edit: better audio on the first one: https://www.youtube.com/watch?v=UC2QrK4c3Qw


Adding to this, I highly recommend works from nulut[0] and niki[1].

[0] - https://www.youtube.com/watch?v=3sEptl-psU0

[1] - https://www.youtube.com/watch?v=wgHbvHpT5ww


All of the examples linked so far sounds highly artificial, so much that I'm guessing part of the charm for the listeners is the artificial and robotic sound/voicing, so the producers make it extra so in order to giver higher satisfaction.

Is there any examples of songs out there going the opposite way, trying to use something like Vocaloid to make the voice and singing as realistic and human-like as possible?


Ai Dee by Mitchie M is famously tagged [Miku & Luka sing like humans] on the original upload, so that was an explicit goal. Of course, it may not sound as impressive today as it did in 2012, but it's a reasonable attempt.


You could make a case for SOOOO in https://www.youtube.com/watch?v=0OOWWNFTguY. I don't know if it's intended to be as realistic/human sounding, but portions get close.

Most vocaloid definitely leans into the robotic quality.


Which portions specifically? Gave it a listen and didn't find any parts that came close to not sounding robotic. Maybe in comparison to the other examples linked here, that one was better (but I'm unsure if that's actually true), but if that's state of the art, I'm not sure it really comes close to hitting the mark.


Definitely not state of the art as far as being humanoid, that would be the top level link.

It's also not particularly humanoid, just closer than most vocaloids I've heard. Going off of my memory of the song, it's more the intonation then the timbre that stuck out as being more realistic.

I'm afraid I'm not willing to listen through the song currently, I usually dredge up some stuff I'm not wanting to deal with right now when I listen to this artists music.


Vocaloid sounding robotic is kind of the appeal to me.


That IA one was nuts!


I've gone from being Vocaloid indifferent to a Vocaloid hater. Being the father of a 12 year girl who is obsessed with imaginary Japanese Vocaloid artists, I'm totally over the sound of it, although I agree the histrionics produced by these things can be fairly amazing to hear from time to time.

What I did find more interesting was the AI "sung" version of Joelene that was doing the rounds a few days ago, based on the voice of Holly Herndon:

  https://youtu.be/kPAEMUzDxuo
Interested to see where that goes, although I've got to admit, I'm a purist, and any type of digital vocalist is going to make me go "meh" sooner or later when compared to even a half decent human singer.


There is definitely a element to high energy music which appeals to younger crowds. Same reason clubs tend to turn music up to levels capable of causing hearing damage.

I even exploited this fact as a way of staying awake a couple times while taking long road trips, as a stand in for caffeine.

Having high energy music is OK, allowing it to disturb the peace is not, time to teach the lessons about manners and being considerate, I suppose (buy a pair of headphones for her, limit her volume so she doesn't suffer early onset hearing loss).

No guarantee she won't turn out to be obnoxious as an adult, but that's the genetic lottery, I'm afraid.


Hmm interesting. I'm not familiar with this scene. I'm not sure how I feel about it. I feel like it feels kind of hollow. Like there's a lot of energy in it, but the vocaloid part just feels so emotionless. Maybe that's a cultural barrier though. Jpop and Kpop make me feel similarly and they're actually singing.

On the western side in a similar vein you've got hyper pop coming up from 100 gecs and laura les and what not. This kind of sound, hypertuned and almost as incomrehensible, sounds better to me. You do still get a vein of emotion. I love this sound.

https://www.youtube.com/watch?v=879ysA4h9r4


On the cultural barrier part, vocaloids are nowhere touching the leaderboards even in Japan, so it is far from being a widely accepted thing yet.

To me the most interesting part to vocaloid is the ability for a sole producer to make a complete song without any external help. The vocal parts have always been a barrier, and while emotionless and still lacking in some areas, vocaloids are “good enough” to support a well produced song.

We’ve seen creators rise through the ranks through vocaloid, get experience and exposure, to then move to full professional production with a staff and an actual singer (who’s voice will also be heavily processed, but they have a ton of tuning experience at that point)

I also agree with the parent comment that some creators do benefit from the “mechanical” part. Throwing more links, Giga works with both singers and vocaloids and is pretty good at extracting the best of boths: https://youtube.com/c/GigaVideos


> To me the most interesting part to vocaloid is the ability for a sole producer to make a complete song without any external help. The vocal parts have always been a barrier, and while emotionless and still lacking in some areas, vocaloids are “good enough” to support a well produced song.

An early example of this was the debut album of Boston which was mostly recorded in Scholz's basement with him on every instrument except drums, then the tapes were mailed to LA for Delp to record vocals. I think it's rather funny in particular that Rock and Roll Band was written and mostly recorded before the band even existed.


> I'm not sure how I feel about it. I feel like it feels kind of hollow. Like there's a lot of energy in it, but the vocaloid part just feels so emotionless.

In Japan, the term is "denpa" (電波ソング). Denpa music is intentionally strange as it is catchy, and hypnotic as it is awkward. There are many producers creating high-BPM electronic vocaloid music that is chaotic for effect. It is a bit more twee than the western sounds, as you mentioned, but it can be quite enjoyable if you're in the right mood.

More on denpa music: https://en.wikipedia.org/wiki/Denpa_song

Nanahira playlist, an example of a vocaloid character: https://www.youtube.com/watch?v=NHIyvhJadXM

Explaining Vocaloid in 3 minutes: https://www.youtube.com/watch?v=GODXMGAMpVc

Also, I think you'd enjoy the Song Exploder podcast. If you haven't heard it already, check out the episode where 100 gecs break down how Money Machine was created:

https://songexploder.net/100-gecs


Nanahira is a real person, not a vocaloid


There's no overlap between denpa and vocaloid; I think you'd have a very difficult time having the vocaloid sing out of tune in a charming way.

ななひら (nanahira) is probably the most well known denpa artist, but she mostly sings normal songs now I think (and has a lovely voice doing so).

ココ is my favourite denpa artist https://youtu.be/2wl8Ofce8TE


> In Japan, the term is "denpa" (電波ソング)

In Japanese, there is no distinction between syllable-final [n] and syllable-final [m]. But in English there is. Traditional romanizations of Japanese will transcribe this as "dempa", for the obvious reasons that (a) that is what the Japanese spelling says; and (b) that is also how the word is pronounced.

I often see English speakers get very confused over exotic modern transcriptions such as "denba" or "senpai", believing there must be a reason they are written that way. But I'm not sure what that reason is supposed to be.


Following the "spelling" surely suggests consistently spelling 電(でん) as "den", not alternating n/m depending on the environment? The Japanese don't write different んs for 電波(でんぱ)・電流(でんりゅう)・電話(でんわ).

Attempting to approximate pronunciation is a valid theory of transcription, but one which also ought to prescribe that 電気(でんき) be transcribed as dengki; English is not much less discerning of syllable-final [n] vs [ŋ] as it is vs [m]. This is not a position I've ever seen anyone defend in earnest, though.

(Romanization for anglophone is a bit of a lost cause anyway, since we're going to fuck up the vowels no matter what you do.)


> Attempting to approximate pronunciation is a valid theory of transcription, but one which also ought to prescribe that 電気(でんき) be transcribed as dengki; English is not much less discerning of syllable-final [n] vs [ŋ] as it is vs [m].

That is blatantly incorrect. English converts syllable-final [n] to [ŋ] when followed by a velar exactly the same way Japanese does, and English spelling reflects that. Consider the English words "think", "clunky", or "handkerchief".


Sure, now show us lack of assimilation to a subsequent bilabial (in a context where /nk/ does assimilate), which is what Japanese does and that you're implying English does differently (it doesn't). English has it baked in so deeply that most would-be /np/s are already spelled <mp>, which muddies the waters a bit, but these past few days have given us plenty of clips of people pronouncing "government", haven't they?


What are you trying to show? You seem to agree that the English spelling of /nt/ is "nt", the English spelling of /ŋk/ is "nk", and the English spelling of /mp/ is "mp". There is no possibility of "np", "nb", or "nm".

How would that suggest that it's reasonable to spell the Japanese word "dempa" as "denpa"?

For demonstrating lack of assimilation of /n/ to following bilabial, there are a couple distinct questions you might ask. It's very frequent for people to preserve the tongue gesture associated with /n/, because a bilabial stop doesn't use the tongue and so [n] is easily coarticulated. But that turns into /mp/ or /mb/ over time because the difference is not easy to hear. In contrast, for a word such as "impossible" where this process completed many hundreds of years ago, the tongue is not used at all in the pronunciation of /mp/. This is a kind of lack of assimilation.

You can also see lack of assimilation in the very people who go to special efforts to pronounce [n] in Japanese words where that is inappropriate.

Note that the English and Japanese phenomena you're talking about are very distinct. This is a fact about the historical development of sounds in English (and Latin...) that doesn't apply to current English, where a sequence like /ng/ will often be preserved across word boundaries. ("One ghost"; this is the only context in which such a sequence can occur at all.[1]) English maintains a robust distinction between /n/ and /m/ and a weaker one between /ŋ/ and the other two.[2]

In contrast, Japanese ん assimilates to whatever follows it, and in the case that nothing follows it it may (rarely) be realized as nothing more than nasalization of the preceding vowel. Word boundaries are not relevant. Japanese does not have a phonemic syllable-final /n/ or /m/ (or /ŋ/). It has a single sound (usually indicated /N/ by specialists, apparently, due to even more weirdnesses that it involves) that gets realized differently in different contexts.

So again - what would justify representing the Japanese sound as "n" regardless of context in languages where, unlike in Japanese, the distinction between "n" and "m" is meaningful?

[1] You say that most would-be /np/s are already spelled "mp", but this is false - the words that are spelled "mp" changed long ago, and do not represent attempts by modern speakers to pronounce an /np/ sequence. They represent attempts to pronounce an /mp/ sequence.

[2] Why weaker? /ŋ/ doesn't have the status the other two do; it cannot begin a syllable. And it makes for a less than perfect contrast with /n/ and /m/ because it has a fairly pronounced effect on the vowel that precedes it, which makes drawing a clean contrast difficult.


> What are you trying to show? You seem to agree that the English spelling of /nt/ is "nt", the English spelling of /ŋk/ is "nk", and the English spelling of /mp/ is "mp". There is no possibility of "np", "nb", or "nm".

Consider "inpainting", "unbiased", and (as suggested earlier) "government", each of which is a synchronically transparent /n/ across a morpheme boundary, yet a cursory survey of recorded English speech suggests that it's pretty common for these tongue gesture associated with /n/ to be absent—infamously, the second syllable of the last routinely loses its coda altogether. This occurs across a transparent morpheme boundary, even with affixes productive in the modern language, even in learned usage.

English does have a lot more wrenches to throw in this, like producing nuclear nasals in a range of situations and not always assimilating across prosodic word boundaries—heck, it probably goes both ways in an utterance like "in my main menu". Words spelled "mp" are reliably [mp] in the modern language, but it's not a simple case as "mp" spelling /mp/ read [mp] and "np" spelling /np/ read [np]; English phonotactics also coerces the nasal in /np/ to a bilabial realization.

> because a bilabial stop doesn't use the tongue and so [n] is easily coarticulated

That doesn't sound quite right—this assimilation surely wouldn't be nearly as globally prevalent as it actually is if that were true.

Try it. While you'd think from the descriptions that a bilabial stop shouldn't care where the tongue goes, I think you'll find it quite challenging to coarticulate [n] with [b]—tongue positioning at lower teeth is pretty obligatory—and much easier to sequence them or produce [mb].

Clearly you can see the unnaturalness of lack of assimilation to call the attempt to do so "special effort"! So of course, the typical anglophone is not going to try to realize [n.p], they'll just see the <np> and read [mp] because that's what they would with any other internal /np/.

> So again - what would justify representing the Japanese sound as "n" regardless of context in languages where, unlike in Japanese, the distinction between "n" and "m" is meaningful?

Now, this gets to an entirely different issue: the purpose of the transcription. You seem convinced that the main goal of romanization is to provide a pronunciation guide for anglophones. But in the context of discussing a niche musical genre on the internet, that's not necessarily a high priority in the first place; you might care more about, say, searchability: we're looking for https://en.wikipedia.org/wiki/Denpa, not https://www.worldbank.org/en/programs/debt-toolkit/dempa.

And in a wider context, the principal users of romaji Japanese aren't anglophones; they're Japanese-speakers who for some or other reason need need to coerce Japanese text into an ~ASCII-subset representation, targeting primarily computer systems with that sort of limitation (most common case being keyboards via IME, hold that thought) and secondarily other people who can read Japanese; and naturally they make the distinctions Japanese makes and largely don't make the distinctions Japanese doesn't make. So unless backed by a marketing department, they tend to produce n (or nn as needed) for ん, because they have a tenuous grasp on how anglos spell [mp] in the first place and でmぱ is garbage that their IME won't convert into the right word, so why type that?

(This is also why pinyin can be the way it is, yet their IMEs have routinely have modes to ignore s-sh/n-ng/n-l distinctions.)


>> because a bilabial stop doesn't use the tongue and so [n] is easily coarticulated

> That doesn't sound quite right—this assimilation surely wouldn't be nearly as globally prevalent as it actually is if that were true.

> Try it.

You know, I mentioned a specific theory here that you've completely ignored. The coarticulation is easy. But it is difficult for a listener to tell the difference between coarticulated [nb] and [mb]. If you're willing to let multiple generations pass, this means that /nb/ will become /mb/ regardless of how easy it is to pronounce.

You will also note that this theory of what's happening mostly cannot be disproved by recordings, which you appear to want to do. You'd want an X-ray or MRI study, something which shows you what the tongue is doing.

> I think you'll find it quite challenging to coarticulate [n] with [b]—tongue positioning at lower teeth is pretty obligatory

This is just obviously false. You have no problems producing [b] with your tongue positioned however you like. You can position it for [t], you can position it for [tʃ], you can position it for [k]. And of the three coarticulations I just mentioned, all of them are well attested, though only the middle one is attested in English ("pshaw", a scoffing sound).

> and much easier to sequence them

This is worthy of comment; there is a linguistic concept called "coarticulation", but all cases of coarticulated consonants seem to have a conventional sequence associated with them. I have no real knowledge or opinion on how real the conventional sequencing is, or how much sequencing is allowed before you stop calling the sounds coarticulated. I suspect that indeed it is easier to sequence two events than to coordinate them to occur at exactly the same time; this is true for all types of events, not just language-related ones. I don't think that the linguistic concept requires absolute synchronization of particular points in time; my understanding is that producing any given phoneme requires some motion and therefore takes place over a nonzero span of time, and "coarticulated" consonants are those for which the durations overlap, not necessarily those for which the durations perfectly coincide.

But I will note that while sequencing of /nb/ is obviously necessary in a way that is not true for /pt/, since /n/ must have nasal airflow and /b/ must not, there is no reason for "coarticulation" of /nb/ to be more difficult than it is in the attested coarticulation /tm/ (exactly the as /np/ for our purposes; /tm/ also features a voicing difference between /t/ and /m/).

> Consider "inpainting", "unbiased", and (as suggested earlier) "government", each of which is a synchronically transparent /n/ across a morpheme boundary

I don't think "government" is a valid example, and you should stop trying to lean on it. In my view, the pronunciation of "government" has as much to do with the morphemes suggested by its spelling as the pronunciation of "comfortable" does with the morphemes suggested by its spelling.

I have no problem with "unbiased"; that's a great example of what we're talking about.

> Clearly you can see the unnaturalness of lack of assimilation to call the attempt to do so "special effort"!

I don't agree with this. I claim that it is common for Anglophones pronouncing "unbiased" to make contact between the tip of their tongue and their alveolar ridge while they pass over the /n/ in the word. (And here, we're on firm ground saying that the internal phoneme is /n/ and not /m/, since it's part of a productive prefix un-.) I further believe that they make no special effort to do so. They may or may not allow a longer duration of nasal murmur than they do in other contexts, to make the /n/ clear; doing this would constitute a special effort. I believe that some speakers will do this and some won't bother. Of those who do, only a small amount of effort will be given to the task.

But the case of English speakers attempting to pronounce Japanese is different. They will go to great lengths to demonstrate that they want to comply with the bizarre textual representation they see. They are happy to produce highly unnatural speech in order to do so. (Which isn't really a problem; they don't really have an alternative to producing unnatural-sounding speech in early attempts to pronounce a foreign language. But this is something they shouldn't encounter problems with.)

> And in a wider context, the principal users of romaji Japanese aren't anglophones; they're Japanese-speakers who for some or other reason need need to coerce Japanese text into an ~ASCII-subset representation, targeting primarily computer systems with that sort of limitation (most common case being keyboards via IME, hold that thought)

> (This is also why pinyin can be the way it is, yet their IMEs have routinely have modes to ignore s-sh/n-ng/n-l distinctions.)

This isn't a flattering comparison for the all-n Japanese transcription system. The pinyin for 吕 is lü. Chinese people don't use German keyboards, which makes the pinyin impossible to type. So where ü contrasts with u, pinyin input methods require you to input V. And Chinese people have responded to this by adopting v-based spellings; it is common to see pseudo-pinyin like "lv" where that pinyin has been generated by an ordinary Chinese person for their own purposes, such as a sign over their business or an online username.

But the letter V is formally not a part of pinyin at all, which means that text generated by the government never uses it and neither do instructional texts.

It is true that this situation is the reverse of the one we're discussing - the Chinese are making a distinction that is required by their language but forbidden by their keyboard, and the fact that they are aware of the distinction makes it easy for them to know what to do. The Japanese are failing to make a distinction that doesn't exist in their language but does exist on their keyboard; this is precisely parallel to the pinyin IME settings you note that will allow the user to ignore phonemic distinctions that they don't make. Again we see that the system maintains the distinction and it's the job of the input method to interpret what the user wants to say.

Chinese IMEs also offer a "double pinyin" input method, in which you type one letter to indicate the onset of a syllable and a second letter to indicate the rime. All syllables are two input-letters long; this model matches the traditional Chinese view of their own phonology. You could just as easily base your system of English transcription on this: instead of "Xi Jinping", 习近平's name would be "Xi Jnp;". Instead of "Sun Yat-sen", we'd talk about "Sp Yixm".

That's what it looks like when you base spelling on what it's convenient for foreigners to type as an intermediate input to their own, different spelling. (As is the case with Japanese input methods.) There are zero people who believe it's a good idea. It's not a better idea in the Japanese case.


+1 for 100 gecs; their sound is a very distinct aural palate cleanser that I consistently enjoy. See also Charli XCX's How I'm Feeling Now album and midwxst's SUMMER03 EP


It's a random-walk of blues riffs over a stock diatonic chord progression with a slow (and predictable) harmonic rhythm.

The only conceivable surprise is a crude chromatic key change to the minor version of the raised mediant.

You'd think the precision of those dynamic envelopes and timbral games would push the artist to venture out and explore that mediant relationship to create quicker and more jarring harmonic progressions and modulations. But no-- it turns out to be less inventive than the mediant chains emanating from, say, Joni Mitchell and her acoustic guitar over fifty years ago:

https://www.youtube.com/watch?v=3q2jiRUVLgI

(I find some of the lyrics apt, too.)

Compared to the cookie-cutter harmony and melody of the music you linked, even Mitchell's augmented triad in the melody at the end of the chorus sounds like the musical equivalent of solving fast homomorphic encryption.

It's the the audio tech that is on display in the music you linked, so every other musical consideration shifts to the background to illuminate that tech. I get that. But holy shit why does that baseline have to be stuck in the fucking 1650s? While I love the "electrified Vivaldi" hack that is heavy metal from the late 70s/early 80s (Master of Puppets et al), I question whether we really need more than one musical genre based on that parlor trick.

It would be like every stand up comedian ending their set with increasingly theatrical pyrotechnic pull-my-finger jokes. I could laugh my ass off at the absurdity for a year, maybe two. But forever?


It’s possible you’re stuck on the first slope of the dunning kruger graph.

My (possibly wrong) impression of your comment is that you seem to have made the mistake of associating complexity with quality in music which is extremely common in those who’ve just started looking into music theory.

Most music needs only the smallest dash of novelty to achieve the perfect mix of the new and familiar to its target audience. If you start attempting to evaluate popular music on what about it is inventive or new, you’re likely to find yourself unable to appreciate most of what people are enjoying and cut yourself off from loving a broad spectrum of musical expression.

You might also find yourself unable to express why you enjoy the music you do like in a way that doesn’t come across as if you’re arguing an objective scientific point——an approach which might undercut your argument by making you unintentionally come across as someone who has just learned a lot of fancy theory jargon and is eager for an excuse to wield it.


I'm not sure I understand what Vocaloid does? Does it generate vocal parts "from scratch" / just from lyrics? Or is it more like a vocoder?

The track you reference sounds like chipmunks sped up 2x; it's not unpleasant to listen to, and fun, but I feel it could be made just like that (record at 80bpm, high pass filter, maybe transpose 1 octave, and speed up to 180), no "AI" involved.


It's an instrument: you have a piano roll interface, draw in your melody like editing MIDI in a DAW, and add lyrics to each note (usually with some manual phoneme fine-tuning), and it outputs a stream of vocal audio.

Human Japanese singers, especially women, tend to operate in a higher octave range than what is common in the west. It's slightly culturally insensitive to take shots at vocal pitch when talking about J-Pop. Pitch is largely a social/cultural construct, and Japan generally leans into the idea of higher pitch -> polite or cute and lower pitch -> aggressive or rude. (e.g. you raise your pitch when talking to your boss, and drop it to express your disgust with someone.) Just putting that out there, not trying to be accusatory or anything. It's just always good to keep in mind that western cultural norms are hardly universal.


Ah, thanks for the heads up!

For the record, I was responding to the gp saying

> ... pushing the boundaries of pop music in a way that wouldn't be possible with a real singer

=> I felt it was possible to do what the example does by singing slowly and speeding it up afterwards.


The chipmunk effect isn't even necessarily part of it. Most vocaloid music is in a more "normal" range.

It's a synthesizer. It's an alternative to human singers. I can imagine someone seeing a digital piano for the first time. "I'm not sure what it even does. I could just use an acoustic piano. It sounds the same."


Yeah fair enough. I have no problem with Vocaloid -- but I do have a problem with the over the top marketing copy (sorry if that was unclear).


As for the chipmunk sound, it's not unusual for female j-pop vocalists to operate one or two octaves higher than the unfamiliar western ear would generally consider pleasant.

There's also plenty of music directly derivative of the vocaloid scene that maintains a similar aesthetic with 'organic' vocalists and dispenses with some of the awkwardness of vocaloid-oriented compositions. Example: https://www.youtube.com/watch?v=hjJMIWyl_l4


A slightly more natural sounding track for reference: https://youtu.be/9vyIPWBeRes

This one if an official track for a popular vocaloid rythm game.

Also, at this point the “chipmunk” sound is part of the brand and will be kept to some extent for tracks labelled as vocaloids (it’s kind of a market on its own)


It generates audio from phonetic lyrics.


you write down phonemes on a DAW, and it synthesizes the voice for you. you also put down vibrato or other modifiers like you would for most other instruments.

the audio is generated from a voicebank that is a database of prepared phonemes recorded from a voice actor. some packages come with multiple variants of voicebanks, like you could have a "soft" voice and a "vivid" voice.


Look at the upload date on the video. It's 2 years old. No AI was involved, as that's being touted as a new feature of V6.


This is a geeky song I did in English using Vocaloid singers:

https://www.youtube.com/watch?v=Y2k8EOBL75o

Not great, not terrible.


Ungray Days slightly reminds me of oneohtrix point never.



This is such an excellent album.


Thanks for turning me on to that.

It's definitely not my thing, but Tsumiki's use of vocals is interesting and well executed.


IME addictive music is the worst.


It gives me Splatoon vibes.


The rapper/producer Deko has been doing some very interesting stuff with adding vocaloid synth characters to rap/hyperpop music. He has two vocaloid "characters", Lil Yammeii and Lil Hard Drive. Most vocaloid rap I listen to is terrible but this stuff is super well produced. He'll even do things like add breathing noises to the vocaloid tracks, which improves the sound a lot. https://youtu.be/usRDtHjYKzU

He also has some funny parodical bits he does, like rapping about having a lot of money/jewels/etc and then the vocaloid characters rap about having a lot of RAM.


> disagreeable at first but really addictive.

Nope, certainly stays disagreeable to me. I wonder what makes people enjoy weird stuff in so many different ways. I might not like this, but I enjoy white noise artist Merzbow [0] or breakcore from Drumcorps [1]

[0]: https://www.youtube.com/watch?v=przphi3RjeE

[1]: https://www.youtube.com/watch?v=WXEzn43VITM


I think people who have never heard Merzbow are likely going to misunderstand your post as being dismissive rather than understanding of either opinion.

As a small form of resistance to the surveillance state I partake in, I have taught kids to ask any nearby personal assistants to play woodpecker #2 and they find it hilarious.


Hacking on GPU Wavelets at the moment for time series. Multiresolution autocorrelation seems like it should be near instantaneous. And then, that opens up not just pitch correction, but "shifting peaks" ;)

Efficient Pitch Detection Techniques for Interactive Music

https://ccrma.stanford.edu/~pdelac/PitchDetection/icmc01-pit...

New Phase-Vocoder Techniques For Pitch-Shifting, Harmonizing and other Exotic Effects

https://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf


Can someone point to a good open source alternative for vocaloid?

I know of Sinsy [0] but I couldn't get it working. eCantorix [1] is very old and rudimentary (it uses espeak underneath [2]).

Searching just now I see OpenUtau [3] but I have no experience with it.

Seems crazy there isn't a good FOSS solution for this.

[0] http://www.sinsy.jp/

[1] https://github.com/divVerent/ecantorix

[2] https://espeak.sourceforge.net/

[3] https://github.com/stakira/OpenUtau


I haven't used it much but OpenUtau (under active development) has been nice. Cross-platform, automatically handles internationalization issues, and the phenomizer system is very nice. Works well with the free multilingual Kumi Hitsuboku voicebank[0].

Also has integration with NNSVS (neural net based vocal synthesizer) and the entire UTAU ecosystem.

I would say this is the good FOSS solution!

[0]: https://cubialpha.wixsite.com/koomstar


My daughter has both OpenUtau and Vocaloid 5. She tends to use OpenUtau more, although I don't know why.


the vocaloid editor is notorious for being really hard to use.



OpenUtau is the main one if I recall correctly, used to be really into this stuff as a teenager. Not sure how things have changed in the 15ish years since


I agree with a lot of people in this thread, vocaloid music can be really great and a lot of the great music takes advantage over the slightly robotic or unusual sounds it can produce. (https://www.youtube.com/watch?v=M92c6pl10u0). But it almost never actually sounds human, my favourite example is very good but still not quite there (https://www.youtube.com/watch?v=a_QoqcsrRgc)

I have recently found Synth V, which uses AI trained on a singer and it can produce shockingly good results. It can also still used to produce similar sounds to vocaloids as well.

Here are some of my favourite Synth V covers and songs: https://www.youtube.com/watch?v=jU_CG_FF6WI https://www.youtube.com/watch?v=EKOSQGKn5Cw https://www.youtube.com/watch?v=ShG8Ij6_Hbo https://www.youtube.com/watch?v=cXv_vKX6Y-0


The near-universal understanding in the fandom is that Vocaloid is a not a replacement for human voice, nor is it really aiming to be some sort of faithful emulation of it; rather it is "its own thing", like how a Piano isn't an emulation of human voice, nor is a DAW a replacement for a guitar or whatever.


But for those that wanted an artificial singer, as far as I know there weren't really any more realistic sounding alternatives. It's nice now that instead of spending so much effort tuning vocaloids to sound realistic there is an alternative.

And knowing the limitations of vocaloid never stopped anyone from trying. Mitchie M is a great example.


The new V6 vocaloids even with the AI expressiveness still sound very much like vocaloids. There's a specific timbre to the way they pronounce certain vowels or do odd formant shifts that always comes through that I'm unsure of if it's intentional or not - on one hand it's very signature, on the other hand it doesn't sound quite as 'realistic'.

As far as the iconic characters go, they're moving to an entirely different engine (Piapro NT) anyways, so I wonder how future works created using them will sound.


Agreed. I wonder how much of it is intentional; one of the common complaints I see from Synth V users is that none of the voice banks, except maybe Eleanor Forte, have that synthetic, "vocaloid-y" sound. Maybe they know they can't compete on realism, so they're leaning into the signature sound?


Solaria sounds like a well tuned Vocaloid.


Solaria can be very good. There's a song, Dawn by Circus-P, that I've used on a few occasions to startle people with its apparent authenticity. It exhibits an unnatural range, but there are only a couple of moments (unfortunately, near the start) where the pronunciation sounds clearly artificial if you know what you're listening for.


Gumi is definitely an iconic character, and the new Gumi release is probably as big as the Vocaloid 6 news itself. (Not only are there a lot of "vocaloid classics" that were made with Gumi, but some of the biggest hits in the past few years, such as KING by Kanaria or Getcha by Giga and Kira, have used Gumi.)

I kind of suspect Piapro NT is going to end up being a bust, with a pivot back to Yamaha's platform. We're a few years in and they've still only released Miku, none of the other Crypton Future Media characters, and a lot of people are sticking to the Vocaloid 4 release because they're not fans of how NT sounds. Now V6 is out and the technological gulf is widening.


The Vocaloid sound is basically part of the brand now. I can't imagine them ever changing it, and if they did, the existing producers would most likely shun it in favour of the sound that they've gotten used to.


Agreed. I think a proper solution for them to this would just be to create a separate spin-off product that is focused on realistic voice synthesis, while continuing the development of the typical-vocaloid-sounding product line as they currently are.

The audience wanting more vocaloid-like sound and the one wanting more realistic sound aren't really the same, and the overlap between them, I suspect, is not large. So it makes way more sense to capture the latter group by creating that more-realistic-voice spin-off product line, as opposed to being forced to choose between the realistic and vocaloid-like target demographics.

We already know the size of vocaloid-sound target audience, but I bet the audience for realistic-sound synthesis is going to be magnitudes larger (mostly because of versatility of where that tech could be useful, while with vocaloid it is mostly constrained to music production and vocaloid-related visual arts accompanied by a typical vocaloid voice).


There's nothing stopping them from making new "characters" that sound more realistic.


Check out Synthesizer V with Eleanor Forte.

Synth V is extremely fast and the output is shockingly good with some tweaking - good enough to be indistinguishable for many people.


Huh, even really good tuning has that quality. [1]

[1] https://www.youtube.com/watch?v=GcxIuAWX7Ws


The unintentional marketing effect of the Vocaloid characters (Hatsune Miku, Kagamine Rin/Len) is phenomenal. I want to buy and use Vocaloid itself just because those characters are cute


Crypton actually licenses their characters under the noncommercial Creative Commons license, since they recognize that their whole business (and the desirability for companies to want to license them) stems from fans creating a whole world of music and visual art surrounding them.

It's a great example of a company seizing an opportunity and not stepping on the fan communities that built them said opportunity.


It's very intentional.


It's very clever and very intentional.

Vocaloid was developed as a DSP project at a Spanish university, supported by Yamaha, and first marketed in the UK - with almost zero success.

It wasn't until it was personalised/mythologised by combining it with anime in Japan that it really exploded. Crypton have very deliberately milked it for everything it's worth. Making the output branded but royalty-free was marketing genius.

One bizarre thing about it - it's converging with autotuned vocal stylings applied to organic human vocals, especially in genres like hyperpop. It's becoming increasingly hard to tell them apart.

Another bizarre thing - there's a kind of corrosive psychedelic "What does human mean?" aesthetic that applies to AI art in general. Vocaloid music is a subset of other emulated artforms: simultaneously cute, spectacular, and naive, but also disturbing, overwrought, and uncanny.

This looks very much like a new era, in the manner of Baroque, Romantic, and Modern. It's going to be explosively transformative for all of the arts, and it's not obvious yet that there's going to be much left that's still recognisable after the dust has settled.


It was not obvious that they become so popular like now. Vocaloid1 Meiko wasn't popular. There were very few official artwork for Miku at the beginning. Rise of Nicovideo UGC helped a lot.


50/50, I'm sure they would've liked the characters to catch on with people, but they grew way more than I'm sure Yamaha ever thought they would internationally even. Weebs everywhere love Miku and don't even know she's from audio software by Yamaha


Her sleeves are literally a Yamaha DX7, which makes it obvious to synth nerds everywhere, but... not sure how much of the weeb population are also synth nerds.


Minor correction: Miku, Luka, Rin/Len, Kaito and Meiko are not Yamaha products. Crypton Future Media is the company behind them, as they're third party voice banks for Yamaha's Vocaloid platform. (+/- the newer NT version of Miku, which uses its own thing.) Yamaha only recently started shipping their own voices at all, and none of them have the same fan cachet.


like their targeted advertising collaboration with dominos, known to be the favored pizza of audio professionals https://www.youtube.com/watch?v=yPuI4l0jK7s


Related (and of interest to HN) - the effort to save the Hatsune Miku Dominos app[0], the quest to track down Scott Oelkers, President of Dominos Pizza Japan (featured in the ad)[1] and interview [2].

[0] https://www.youtube.com/watch?v=341IsnWdaT4

[1] https://www.youtube.com/watch?v=MCaZt6Dy2_A

[2] https://www.youtube.com/watch?v=vLUCZux2Sbs


Being true to who you are makes all your dreams possible.

https://www.youtube.com/watch?v=r_XMIGjJoS8


Probably also natural. Japan love to make character for everything.


Huh, it was cool when vocaloid was released but at this point I would expect more with all the different AI improvements available.

For example DiffSinger sounds better (but not perfect). It's using diffusion model like the popular AI image generators. I cannot find English demos but these are not too bad:

https://www.bilibili.com/video/av599316695/ (unmute the player)

https://www.youtube.com/watch?v=hJ0wNFZGECo (bad mixing but the voice is still better than vocaloid)

code and huggingface demo:

https://github.com/MoonInTheRiver/DiffSinger

https://huggingface.co/spaces/Silentlin/DiffSinger

and other techniques (demos at bottom):

https://r9y9.github.io/projects/nnsvs/

Of course these are not complete user friendly softwares but I would expect Vocaloid would have something like these implemented.


Vocaloid isn't really meant to sound real, though. At least, that's not what the users or fans particularly care about.


Cannot help but remember the lyrics of "Video Killed the Radio Star"

   They took the credit for your second symphony /
   Rewritten by machine and new technology


Vocaloid has the name brand from being old, and Hatsune Miku, but at this point Synthesizer V has completely blown it out of the water. SOLARIA[0] in particular came out in January and, to my ears, sounds significantly better than any of the demo tracks for V6. There's also been a number of AI voicebanks for SynthV like Natalie that came out in the past few months which have sounded way better than the V6 demo tracks.

Vocaloid was the only real competitor in vocal synths for so long they completely stopped innovating. Very IE6-esque.

[0] https://www.youtube.com/watch?v=Uf2a2so86uw


This is quite frustrating to see as someone who has been in the Vocaloid fandom since '08 and maintains a pretty sizable Vocaloid community.

The "realism" of Vocaloid is not exactly a priority--a lot of people enjoy the robotic sounding voice. Making ridiculous statements like "blow it out of the water" really rubs me the wrong way of the typical techbro looking for objective KPIs instead of what the general vibe of the fandom.

To me, this is like saying "DAWs blows physical instruments out of the water because it can create a lot more sounds!"--you're right, but you're totally missing the point.

Here's a KPI: nobody is making music from Synthesizer V, and it's not exactly very popular.

> Vocaloid was the only real competitor in vocal synths for so long they completely stopped innovating. Very IE6-esque.

This is probably the only fair argument here. The Vocaloid editor has always been quite difficult to use, and recent competitors in that space has made improvements to it. Hatsune Miku's new release moves away from Vocaloid in favor of a Piapro/Crypton voice engine with a better editor, for instance.


I didn't mean to sound so negative on Vocaloid. For what it's worth, I'm a producer who used to make songs with Vocaloid (Gumi English) and switched to SynthV this year, so I'm speaking mainly from that perspective. Vocaloid can still make great songs. I'm a big fan of all of the NEXTLIGHT artists (picco, Twinfield, etc) in particular. Certainly not trying to dunk on the fandom or artists.

However, I do think that it's a lot more limited in what it can do. The voices in Vocaloid sound quite synthetic, and there's only so far you can go if you're not going for a stereotypical "Vocaloid-sounding" vocal. IMO, Synth V can get pretty close to the Vocaloid sound with Eleanor Forte, but in addition can also produce some much more realistic-sounding vocals, especially with Solaria, Eleanor AI, or the newer Dreamtronics AI voicebanks.

People are making music with Synth V! Check out AIKA who has made some amazing songs using Synth V.


I do wonder if Vocaloid is like Xerox now. People will hear a SynthV produced song and think "wow, cool Vocaloid".


I agree. And Synthesizer V v1.8.0 has just been released with diffusion probabilistic models, which improve the quality even further.


Wow, nice, thanks, I hadn't seen that yet. The DPM demo sounds awesome. I absolutely love that Synth V is still pushing things forward, despite already being so significantly ahead of the competition.


There's a mega-thread about Synth V with numerous demos on vi-control forum[1]. The progress Synth V has made in just one year is really impressive.

[1]: https://vi-control.net/community/threads/synthesizer-v-vocal...


Youtuber "Doctor Mix" recently used it for a rendition of Bohemian Rhapsody, it was interesting watching how he put it to use. https://www.youtube.com/watch?v=pAkgxhK91kk


As much as I want to dislike this, after hearing some examples, I must admit that it has made quite some progress since the laughable first versions. Of course it falls apart quickly if you hear it solo, but I can absolutely see this used widely for placeholder vocals or background/contract music. And of course in Japan - can please someone explain to me why Japanese electronic musicians seem to be obsessed with artificial singing, although it usually sounds like Mickey Mouse through autotune?


There are also other versions of similar tech like Cevio (https://cevio.jp) which can handle both singing and speech. It can sound a lot more realistic and has the 'gimmick' of making voicebanks for various popular people in Japan. Often the differences in the software generated result compared to the person's voice are down to their tuning preferences rather than technical limitations (on a side note, Cevio uses RNNs while IIRC the original Vocaloids used to concatenate voice clippings and smooth out the transitions using some fourier space tricks, I'm not aware of how modern Vocaloid works but it wouldn't be too surprising if they too use deep learning).

It initially became a thing in Japan because Japanese has fewer phonemes than most mainstream languages, so it was easier to make. Then it gained popularity for the cute characters and the freedom it gave to people who would go on to be music composers. Many popular Japanese composers got their start making and uploading Vocaloid songs on Youtube/NicoNico. On top of that, they have had a series of well designed rhythm games on several platforms for decades now which were pretty popular. These days they have a fairly popular mobile gacha/gambling game too.

Nostalgia is probably also a big driving force since many people grew up with either the music or the games.

These days it's still pretty popular within the anime fanbase outside Japan. Pre-Covid there used to be a vocaloid concert series every year which would alternate between Europe and US tours. The reason it hasn't gone too much more mainstream is likely that English and related languages have thousands of phonemes, so making a good voicebank is significantly harder.


As far as I can tell, because it allows one-man production teams. A single person can do everything, without needing to find a vocalist. It allows them to try more stuff quicker and easier, and anything that happened to get popular ends up covered by vocalists anyways (and the producer gets to sell those rights).


And a popular enough producer can 'go pro' (n-buna -> Yorushika, or Hachi -> Kenshi Yonezu).


It's kind of an acquired taste - it works better for some kinds of music than others. But I admit I can't stand the English voice because that sounds more obviously wrong to me.

I'm not sure you can paint the Japanese music industry like that? It's more like, this gives indie songwriters an easily accessible tool.


> I'm not sure you can paint the Japanese music industry like that?

I didn't mean to. I was just wondering why Vocaloid is so popular specifically in Japan.


I think J-Pop in general tends to create interest through composition (fast and complex chord changes, syncopated melodies with lots of movement) compared to Western pop, which tends to create interest through vocal performance. The characteristics of J-Pop mesh really well with Vocaloid where the voice itself is not that interesting to listen to but you have more composition possibilities for fast tempos or wide pitch ranges that wouldn't be possible with a human singer.


Ah ok, sorry. No clue there, other than the software also being developed there.


I believe it is the democratization of music. Hatsune Miku songs are written and produced by fans. You don’t need to be able to sing, or record instruments. Everyone can join.

From Wikipedia:

> In August 2010, over 22,000 original songs had been written under the name Hatsune Miku. Later reports confirmed that she had 100,000 songs in 2011 to her name.


Well yes, but from what I can tell, it's a very Japanese phenomenon. This hasn't really caught on in other countries, as far as I know.


It's Nico Nico Douga/nicovideo.jp. There is a unique community and reward mechanism that values technical achievements far more than it does for presentation, and that I think is forcing creators to go way past in skills than what YouTube and many other social media demands. That culture comes as an extension to Japanese anonymous BBS(2ch/futaba), and I find content from those communities to bear abnormally better quality compared to those from the Internet outside.


I can walk into Walmart, FYE or Hot Topic and walk out with Hatsune Miku merchandise. You can't even say that for many popular anime franchises (only stuff like Naruto or Dragonball that's been popular for decades). I've even been to small fan run projection concerts.

Vocaloid music has a huge following in the US. It's just a subculture thing and not part of the mainstream sphere...which is a good thing.


China has Luo Tianyi, Miku and co have had concerts in the US, and Miku was slated to perform at Coachella: https://www.rollingstone.com/music/music-news/hatsune-miku-c...

(Admittedly, those are not unlikely all offshoots of the anime/manga/etc. fandom though.)


I don't like Nickelback because of the vocal processing. It annoys me. I watched the vocaloid.com video. It also annoyed me. It pretends to emotion but lacks it.


The (now) Netflix series "Bee & Puppycat: Lazy In Space" features a character (Puppycat) who's voiced by Vocaloid.


6?! Last time I checked anything Vocaloid it was barely on 3 and most people were still using v2 vocaloids. I'm getting old :(


That demo video sounds like a complete mess. I have no idea about the product, but that is one terrible first impression.


My daughter discovered this music genre a year ago and ever since we're listening to vocaloids in the car. It's alright but being non Japanese speakers, we don't really feel what the true experience this genre really is. Maybe that'll change in future when more people make music.


Given how many problems V5 incurred, I wonder how V6 compares, but also how its reception is going to be...


La la la. What a mess. First all the music industry will collapse under A.I. lords, and then maybe people will start to search human made music.

All the stats about the rise of the old music are supporting my claim:) Young people, invest in your talents, with analogue processes in mind.

This will be a huge market.


Vocaloid has been a thing for almost 20 years now. People still sing. Digital art has been a thing for longer, people still paint.

People do art because it's fun. Nothing will stop people from getting together in a band and jamming, because it's fun.


And it's not just that some people still engage in those activities, but a lot of people are starting their carriers by direct influence from those tools.

And, this is subjective, but a lot of those tool-enabled artists seems to do better even in absence of such automagic enablers than non-enabled. Good AIs sharpen humans into unassuming Olympians, bad AIs just fall out of the Internet attention span.


I always thought the main appeal of Vocaloid was it's signature robotic sound.

If you want to synthesize more realistic sounding voices there are better options.


And it's an instrument like any other. It doesn't magically write lyrics and a melody for you, it doesn't mix, and it certainly doesn't sound great without a good ear for the different parameters and EQing. It takes a lot of skill to make music with vocaloid.


It takes talent to make Vocaloid tracks sound like more than another riff on World is Mine.


Yeah, if you're not rich enough to be able to hire a full choir and orchestra, you don't deserve to write songs for more than one person.


Might as well be arguing that old school autocomplete (not even copilot) will kill the software industry.


> First all the music industry will collapse under A.I. lords

The purpose seems as usual to get rid of as many human musicians/singers they can so that everything can be made by a single person (or AI in a few years), therefore saving money. In a different context, the transition from multi elements bands to one man bands with keyboard, then finally karaoke, in many cases is motivated by costs as well.


The point is that it enables musicians who would never have found a vocalist before this. Vocaloid has been around a decade and hasn't eliminated any jobs, unless maybe someone hurt their throat trying to sing Disappearance of Hatsune Miku.

There's hardly any evidence automation ever destroys jobs; it seems to actually create them. It's very silly people just keep claiming this.


I was not referring to musicians using electronics to create something they have no access to (I have two synthesizers just here because I can't afford an orchestra) but to the music industry that in many contexts pushes for solutions motivated only by money.


Ya ya ya. I am lorde. Ya ya ya. (I think they missed a trick not using Randy Marsh[0] for their marketing)

All jokes aside, I will throw serious money at the first streaming service that implements an 'autotuned' tag, and lets me filter anything tagged with it out of my stream. Like, $100 a month. Maybe more.

[0]: https://www.youtube.com/watch?v=AkMJ5GSC37g


I think you're underestimating how subtle and ubiquitous modern autotuning is. The video here is an extreme example of a very broad category of effects. In reality, almost every professional recording you hear uses autotune in the same way that almost every professional photograph you see uses color correction. Most people don't notice it most of the time (despite most people believing they always notice it).

An alternative you've also heard extremely often is a singer recording the same line 100 times, then producers going through each word (or each syllable) to cherry-pick the sample where it was most on-pitch and blend them together.

Neither one of these represents the singer's real ability (whatever that even means), but both can be unnoticeable by 99.99% of the population when applied skillfully.


> almost every professional recording you hear uses autotune

That seems a bit too general. If we narrowed that down to a few popular music genres I would probably subscribe to it. Plenty of solo artists wouldn't be caught dead using autotune or melodyne or whatever (outside of using it for effect/intentional distortion). When being an exceptional singer is your entire brand, you don't want to show up with training wheels.


> The video here is an extreme example of a very broad category of effects.

I mean, it's a joke, it's South Park. It's why I said 'jokes aside' in my next comment. However, if you're into jokes and such, I'd highly recommend that entire episode.


Fact is, people say this, but when artists actually release music that uses no autotuning, it tends to be less popular. Hardcore fans will say “release your next album without autotuning” but when the album actually comes out, those hardcore fans aren’t supporting it.

It’s very difficult to tell if autotuning is used, if it is used well.


That's an interesting defence of autotune, but I don't think it's relevant to my comment, which clearly talks about my own personal preference.

I don't want to listen to artists who use autotune. I _do_ want to listen to artists who don't. I know that there definitely is a set of artists who explicitly don't use it, who are more concerned with accurately expressing their creative intent and musical virtuosity than they are with gaining popularity and mass appeal. What I am saying is that I would like to be able to consciously choose to only listen to and support those musicians, and I will gladly pay a disproportionate amount for it. Anecdotally, I know for a fact that I am not the only one who feels this way.

Also btw -

> when artists actually release music that uses no autotuning, it tends to be less popular

Normally when you make a statement like this on HN I'd expect to see a citation or reference to where that statement came from.


>I know that there definitely is a set of artists who explicitly don't use it, who are more concerned with accurately expressing their creative intent and musical virtuosity than they are with gaining popularity and mass appeal

I think you are forgetting that the autotune IS their creative intent. T-Pain is probably the most famous heavy autotune user of all time, despite having an amazing voice without it. It's an intentional effect - like how electric guitars aren't "worse" because you aren't hearing the raw sound of the string.


I know T-Pain is an incredible singer. I know artists use it as an intentional effect. I don't think I'm forgetting anything though, I'm not sure of what your argument is here.


It may not exactly be about artist. For instance person mixing the artist's song (putting together all the individual instrument and vocal tracks and ensuring it sounds good together) may use autotune without artist even knowing.

Then you have problem of the budget. Studio time is quite expensive and after nth take hard decisions have to be made. Booking another session or fix the tuning with software?

Also take into account that vocals that you hear in songs, that are not obviously autotuned may actually be composed of dozens of takes. That technique is called comping and the mixer (sometimes together with the artist) would choose the best take out of dozen often for each phrase or even single word. Sort of like natural autotune, when e.g. only phrases that sound in tune are picked.


imagine thinking you can actually notice light autotune (let alone melodyne!) use in a recording


Imagine thinking there aren't people with exceptional hearing that notice all sorts of things the average person doesn't.


Sorry but it falls into the realm of audiophilia to me - people who think they hear a difference in their audio because they're using $30,000 gold plated wires lifted 0.25mm off of any surface such as to not disturb the audio harmonics or whatever voodoo they've been sold to believe in.

There are absolutely edits so subtle that without having seen or heard the original you'd have no way of knowing it was modified at all. Pitch correcting someone's voice up 1/1400th of a step is not going to be noticeable no matter how perfect one thinks their hearing is. These kinds of subtle changes are far more common than the drastic and noticeable edits or even smaller but still quite large edits where people with a trained eye/ear will notice but the average person wouldn't.


> people who think they hear a difference in their audio because they're using $30,000 gold plated wires lifted 0.25mm off of any surface such as to not disturb the audio harmonics or whatever voodoo they've been sold to believe in

You're obviously entitled to share your opinion, but also I also find your strawman a little offensive. Vocal pitch correction is a very real and very noticeable category of processing, absolutely not the same realm of snake oil bullshit as $30,000 gold plated wires. If it was, it just wouldn't exist. Why would anyone ever take the time to write software that correct vocal tuning if it made zero perceivable difference to the output?

> There are absolutely edits so subtle [...] Pitch correcting someone's voice up 1/1400th of a step is not going to be noticeable no matter how perfect one thinks their hearing is.

Nobody ever 'corrects someone's voice up 1/1400th of a step'. It just doesn't happen, human voice can't consistently hold a frequency to that resolution for more than a few milliseconds. For reference, a 1/4 step vocal oscillation is only considered 'moderate vibrato' [0]. Even a 1/128th pitch variation is rarely considered noticeable or consistent enough to correct. This is miles away from a '1/1400th' (which is also a very strange fraction to choose btw, even when you take harmonics into account).

I posted this in another comment, but you should take a couple of minutes and take the MusicLab Tone Deafness test [1]. It'll give you an idea of what 1/64th variance sounds like, and how noticeable it is or isn't to your ears.

[0] https://www.vocaltechnique.info/vibrato.html

[1] https://www.themusiclab.org/quizzes/td


The fraction was pulled out my behind as a "Yes, there are edits that - even if completely pointless to make - are made nonetheless and nobody will ever know about it without being told it has been edited and anyone claiming otherwise is a liar."

I'm not sure the point of the Tone Deafness test - I scored a 31/32 but on many of the questions if one tone had been replaced with the lower/higher tone I'd never know it had been replaced despite my ability to tell them apart from one another. My point wasn't that you couldn't tell A from a very similar B (you can!) but that if A had been replaced by that similar B in the first place you'd never have known (because you couldn't!). You'd need to have been privy to the editing process to actually know if it had been edited at all.

Imagine an A/B test where A has been removed entirely and you've been asked which edits B has made to A. It's a very different test from an A/B test where you can compare A with B. While a trained ear may be able to hear obvious edits there are literally hundreds of non-obvious edits made to songs during production that you couldn't possibly know about without hearing the original recording to compare against.


> Sorry but it falls into the realm of audiophilia to me

Perhaps so. I'm not one of those people that picks up on (or claims to) these subtleties, so I wouldn't know. I wonder if any studies have been done on this.


It's sort of like thinking statistical outliers are relevant to a general argument.


Who thinks that? I certainly don't and never said I did. Nevertheless there is no 'light autotune' filter on any of the major streaming services either.

There's a pretty interesting test[0] on themusiclab.org, that will test how good your microtonal perception is. See how you score - it might align with how well you're able to perceive autotune. Here's my score [1]. This was from my first attempt a few months ago, but I've done it a couple more times since and my score is pretty consistent - whether I like it or not.

[0] https://www.themusiclab.org/quizzes/td [1] https://imgur.com/a/5R3K43Z


You also just _completely_ missed my point.

I want to be able to choose to consciously support artists who don't use autotune. I want to be recommended and discover new artists, who consciously don't use autotune. I want to be able to choose to listen to real vocals as a genre. I want that choice.


I'm very curious how autotuned/melodyned vocals are not "real vocals", but vocals that have had EQ applied to them, or have been compressed or flanged or post-process reverberated or delayed, are "real vocals".

Perhaps there is also a standard mic for Vocal Realness. I assume it must be a SM58; it is, after all, well-known that having a low-pass switch on the microphone lowers the industry-calibrated Realness Score by at least 250 mSpr(ingsteen). More if it's on.

edit: A friend also pointed out to me the inherent Springsteen ceiling of computer-reproduced audio. And you know, he's right, I'm going to go find a chamber and hire some monks for the true realness that only an authentic Gregorian chant can provide. Denon sells them in twelve-packs, you know.


>I'm very curious how autotuned/melodyned vocals are not "real vocals", but vocals that have had EQ applied to them, or have been compressed or flanged or post-process reverberated or delayed, are "real vocals".

It's not based on some technical argument or law of physics. People either get why, or they don't.

If you want some kind of technical justification, those effects you mentioned just add some sparkle (flanger) or fixes the balance (often just to make a recording, which begins deader in a sound-treated studio, sound closer to real life environment (reverb) and dynamics (compression).

Autotune, on the other hand, changes the pitch and vibrato, you know two of the main things a singer is supposed to produce. And if overdone as "effect", it also fucks the timbre.

And let's be real, nobody says this about some singer using autotune to fix a flat note or two. It's the autotune-as-effect (whether T-pain levels or more subtle) that people complain about.


I think you may have missed what I'm getting at here. I very much "get" why people don't like it, and most of the music I listen to regularly goes light on autotune/melodyne. And I am not saying that one has to like those vocals at all. But the idea that they're not "real vocals" is an attempt at shitty gatekeeping that has no place in music. This wannabe arbiter does not get to decide what "real" is or what "art" is.

It's gatekeeping bullshit.


It isn't gatekeeping bullshit. It's a matter of personal preference. Personally, I can't sing for shit. I also listen to a lot of artists that use autotune. I bought melodyne on the day of release when they brought their polyphonic editor back in 2009, and used it for years before that. It doesn't change the fact that autotuned vocals are the vocal equivalent of quantised drums. They're fine, but sometimes I want the option to filter it out from my recommended stream. Sometimes I like hearing microtonal mistakes. I like hearing a missed beat. I want that option. I don't want sterile perfection.

> But the idea that they're not "real vocals" is an attempt at shitty gatekeeping that has no place in music.

I'm not sure whether you see the irony in this statement. You're saying my opinion has no place 'in music'?

> This wannabe arbiter does not get to decide what "real" is or what "art" is.

I never claimed to be the arbiter of what art is. Art is subjective. It's a matter of personal preference. Like not wanting to listen to autotune.

Lastly, you seem really angry. I'm really sorry if anything I've said has upset you.


> You're saying my opinion has no place 'in music'?

Man, this is basic Popper stuff. If you'd said "I don't like autotune", I'd have probably agreed with you. If you're saying that what others are doing aren't "real" because you don't like it, that's a whole different kettle of fish.

> I never claimed to be the arbiter of what art is.

You picked the word "real". Words mean things, and "real" does not mean "to my preference". It is a an assertion of legitimacy, it is that assertion to which all of my comments in this thread are directed, and it's something that neither you nor I get to take away from somebody.

> Lastly, you seem really angry. I'm really sorry if anything I've said has upset you.

I wouldn't say that I'm angry, I've been on the internet a long time and random posts have to be really special to do that, but I do write sharply when I care about something. If one believes genuinely in the openness and democracy of art--and I do--I don't think there's a properly strident reaction to the implications you laid down that wouldn't be a little bit testy.


Real (untuned) vocals are like real (unquantised) drums or real (unphotoshopped) photographs. They're no longer real if they have been artificially manually edited after capture.

There is nothing wrong with that. It is still art, I never claimed it wasn't. Deepfaked actors can still constitute 'art' (see Sassy Justice with Fred Sassy for a notable example). Photoshopped photos are still 'art'.

However, we don't refer to those 'real actors' or 'real photos'. They are examples of creative expression through manual editing. Photoshopped photos can be real art without being real photos, but there's a reason why National Geographic photographers don't crazy with the spot healing brush.


This comment section is about as well-informed as sound engineers on gearspace talking about why VSTs written in Javascript have more analogue warmth than ones written in C++. Some people commenting clearly have no idea how music is made.


Even in the analog world, I've heard of things like adding reverb to a track by playing it in a bathroom and recording it again.


This is a legit technique, has been for a loong time. You'd be surprised what they used to call it! [0]

[0]: https://en.wikipedia.org/wiki/Echo_chamber


I've produced with DAWs for 30 years, and even put together a few effects of my own with Max and Reactor, so probably not me...

Do you have anything in mind that sounds misinformed? Or just can't fathom that anybody who can tell what a granular delay or an LFO or an automation envelope is can't possibly dislike Autotune?


I like gatekeeping. It means people see value and lack of value, as opposed to considering everything the same.


There is value in art done honestly, "bad" or not. And neither autotune nor melodyne have intrinsic characteristics that affect either honesty (unless you extend that to all forms of audio engineering that changes the voice) nor quality.

Put frankly: your take is a small one. It's one that weakens the idea of art, and, no less importantly, is cruel and sabotaging to people. You should change your mind, but you're kind of glorifying in that cruelty and that smallness throughout this thread (which is gross!) so I will not be holding my breath for it.


Doesn't have to have 'instrictic characteristics'. Just tendency and majority of real-life application being shitty is enough.

Except if you thought that when complaining about autotune being bad, we were talking about some rare band that uses it as a creative tool, and not about the millions that use it as a clutch or for the 1000000th recreation of the same BS sound...


> There is value in art done honestly, "bad" or not

If I think it's "bad" then it has _zero_ value to _me_. Why does this bother you or anyone else?


Even more so, if I think it's "bad" I might also consider it detrimental to the music world, and even to society at large. Music is not just an isolated consumption, it's also a social force.

And people can still be able to differentiate between stuff they merely don't like (taste) and stuff they consider detrimental to music in a larger way. In fact they might even like the latter and still consider them detrimental (e.g. I find some commercial pop tunes catchy, but consider them a bad musical and societal influence).


Let people like what they like. Let people want what they want.


"I want to be able to choose to consciously support artists who don't use EQ. I want to be recommended and discover new artists, who consciously don't use EQ. I want to be able to choose to listen to real vocals as a genre. I want that choice."

Note that many recording artists do not actually make that choice; it happens further up the chain. Regardless, whether or not a singer or producer uses a particular effect on their voice does not distinguish between the vocals being 'real' or not. If you simply don't like the way it sounds, on the level of artistic taste, well, you can make that judgement for yourself, but, to claim it's more profound than that is just pure pretense.

Also, why are we talking about autotuned vocals on a thread about a speech synthesizer? Claims of 'real vocals' are already out the window at that point.


>Note that many recording artists do not actually make that choice; it happens further up the chain.

If they don't make such choices for themselves, they're not much of an artist, more like mass-produced manufactured candy pop/r&b/etc...


"if you're leaving artistic choices in the hands of someone else and aren't building your computer, DAW, and semi-conductors from scratch, all on your own, you're not much of an artist"


This is some novel kind of cross between a strawman and a slippery slope fallacy, so full of bad faith that it isn't really worth an answer...

Because of course there can't be a cutoff point, with my argument applying to the level of involvement I described (if one's vocals are to be processed with autotune or some major effect is a major thing for a vocalist to leave to others, even level of reverb or echo will often be a thing to debate with the producer/engineers) and not to ridiculous meta-levels like building your own DAW.


my point is that you're depriving yourself of good art for an arbitrary moralism. just as ridiculous as saying "i'll never listen to a singer that uses compression bc the artists i listen to MUST have perfect dynamics (even though this has zero impact on the final product)"

im making fun of you for being silly


I never said I don't listen to it. I said I want the option of explicitly filtering it out sometimes. The two are entirely different.


He says 'I' meaning he's choosing for himself for his own reason, he's not forcing it on others. You might at least respect that even if you, and I, don't share the reasoning.


> I don't want to listen to artists who use autotune. I _do_ want to listen to artists who don't.

If you can’t tell the difference, why do you care? If you can tell the difference, why do you need the label?

Speaking as an amateur musician, I think these arguments reflect a lack of understanding of how music production works, and how that’s connected to the artist’s creative vision. I’ll say that a big part of the blame lies with poorly used autotune. Just to pick an example, the entire first season of Glee is especially bad, to the point where I want to leave the room. Then there’s various places where autotune has been overused on singers who don’t actually need it to begin with (Bublé comes to mind), or where autotune has been used to cover up some sloppy singing. (Bublé is particularly illustrative—it is known that he uses autotune, but he has said that he doesn’t… I suspect that Bublé is simply unaware that autotune is being used. I suspect that many other singers are also unaware that they are being autotuned—but you can sometimes find clear evidence for it when you analyze their songs with a computer.)

But autotune is also used, manually, by producers, to make small adjustments as needed to improve a take. It can mean that the singer does fewer takes to nail the song, because with your comping and autotune choices, you get what you need faster. The amount of comping and autotune that you do is a matter of style and situation—producers are free to do comping and autotune as they see fit, to capture what they think is the best version of each part.

It means that you can say, “take 3 has the most beautiful phrasing and a lot of soul to it, it’s just a little sharp” and then manually adjust the pitch by the amount you think is appropriate.

> Normally when you make a statement like this on HN I'd expect to see a citation or reference to where that statement came from.

I’m not sure why you have that expectation or think that it is at all reasonable. Like everyone else, I have a lifetime of experience. Not everything that I tell you was written down in the first place, and I can’t reasonably be expected to remember the exact source for every piece of information, nor should I be expected to shut up just because I don’t provide a source for something I say.


> why do you need the label

I don't want to get halfway through listening to a song to hit autotuned vox. I just don't, I'd like the option to filter on that. I would like a recommendation/discovery engine that eliminates that anxiety.

> a lack of understanding of how music production works, and how that’s connected to the artist’s creative vision

Quite the opposite. I understand music production extensively. I studied it, I have used it, half my social circle are studio engineers or conservatoire grads. I just want the option of filtering on artists whose creative vision _is_ the original vocal recording.

> I suspect that Bublé is simply unaware that autotune is being used

The only reason that man has a career is because his geriatric target audience is too old to know what autotune is (hence his denial is plausible). I'm pretty sure his record sales depend almost entirely on the abuse of autotune for the hard of hearing. He isn't an example I'd have used in this argument.

> But autotune is also used, manually, by producers, to make small adjustments as needed to improve a take. It can mean that the singer does fewer takes to nail the song, because with your comping and autotune choices, you get what you need faster

I totally know and appreciate all of this. Recording without it is considerably more effort and money, it's often not economically viable. Nevertheless, I want to be able to sometimes filter on it.

> I’m not sure why you have that expectation or think that it is at all reasonable. Like everyone else, I have a lifetime of experience. Not everything that I tell you was written down in the first place

When your argument is your opinion or something drawn from your personal experience, that's fine. However your words were: "Fact is, people say this, but when artists actually release music that uses no autotuning, it tends to be less popular". Stating a 'counterintuitive fact' the way you did suggests it was proven in a study or experiment of some kind. When putting arguments like that forward, most posters on HN tend to reference their sources. Otherwise your 'fact' is just an opinion. There's nothing wrong with that, but it is not a 'fact'. If you state something is a fact, I think an expectation that you can back it up with an external reference is entirely reasonable.

> nor should I be expected to shut up just because I don’t provide a source for something I say

Not sure where I said that you were expected to shut up. If I implied it I apologise. However, I stand by my expectations of providing external references for things stated as fact.


> I would like a recommendation/discovery engine that eliminates that anxiety.

Is “anxiety” the right word? This is an unexpectedly serious way to phrase things and I’m not sure I understand you correctly. If you’re viscerally bothered by the presence of autotuning, I can see why that must be very frustrating.

> I just want the option of filtering on artists whose creative vision _is_ the original vocal recording.

I think the idea of “original vocal recording” gets weaker the closer you look at it. You want something that’s not autotuned, fine. You want to logically explain your personal preferences in terms of preferring the “original” audio recording? The logic doesn’t hold up. That’s okay, you don’t need to logically explain your personal preferences.

In fact, I’d rather you didn’t. Pet peeve of mine. Maybe if you get a visceral reaction to bad autotune, I get a visceral reaction when someone explains the “logic” behind their personal preferences. It’s aesthetics. Maybe someone can bridge the gap between logic and aesthetics someday, but I haven’t yet met somebody who has succeeded.

> The only reason that man has a career is because his geriatric target audience is too old to know what autotune is (hence his denial is plausible).

The reason I brought up Bublé is because it illustrates that autotune is sometimes used even when the singer denies it, even when it’s unnecessary. Please don’t lay on the hate.


I just don't get how you can defend Bublé. I'm sorry.


I’m really not surprised to see this kind of tacky comment in the thread. For some people, the most important part of music taste is making sure that other people know that you hate the right kinds of music.

Gross.


> I know that there definitely is a set of artists who explicitly don't use it, who are more concerned with accurately expressing their creative intent and musical virtuosity than they are with gaining popularity and mass appeal.

Any examples? Are they all fairly small artists, or are there some big names who explicitly don't use it?


Is Bob Dylan a big name? Tom Waits? Coldplay? And if those are "old", one can still find tons of modern (20-something) musicians which don't use it, but still have a sizable following.

Not sure what you have in mind, but people don't need to listen to the top-20 or BS R&B.

Even if you want to listen to electronic music, there's a big universe of artists who have nothing to do with the "autotune" sound and modern commercial productions.


> Is Bob Dylan a big name? Tom Waits? Coldplay?

Tom Waits and Coldplay (or specifically, their mixing engineers) have gone on the record saying they use Melodyne in their marketing brochures.

Again, it is literally impossible for a human to know if pitch correction was used on a song. But if it's a song that was released after 2010 and mixed by an engineer, they used pitch correction, guaranteed.

[0] https://musicmarketing.ca/DNET/rack/brochure_melodyne_3.pdf


>Tom Waits and Coldplay (or specifically, their mixing engineers) have gone on the record saying they use Melodyne in their marketing brochures.

Not exactly. A guy who worked with them said it. They also have worked with dozens of others which would be where they used it. Also Waits recorded for 25+ years before Autotune (much less Melodyne) was even a thing.


> They also have worked with dozens of others which would be where they used it. Also Waits recorded for 25+ years before Autotune (much less Melodyne) was even a thing.

And instead recorded the same piece several dozen times and for the final recording spliced together what sounded best.

This is like complaining that you don't want authors using spell check, they should have to retype the entire paragraph every time they make a mistake!

The end result is the same, the only difference is the path to get there.

There may be some fair complaints about usage at live shows, but for recordings, the end result is going to be the same if melodyne is used or if the artist is recorded again and again until everything is "perfect".


>Fact is, people say this, but when artists actually release music that uses no autotuning, it tends to be less popular.

The kind of artists worth listening to wouldn't touch autotune with a 1000ft pole. They're less popular to begin with - but can still have tens of millions of fans globally (say, someone like Tom Waits).


> The kind of artists worth listening to wouldn't touch autotune with a 1000ft pole. They're less popular to begin with - but can still have tens of millions of fans globally (say, someone like Tom Waits).

Whoops, bad news! Tom Waits' mixing engineer is explicitly mentioned in Melodyne's brochure.[0] Melodyne 3, too, so he's been using it for quite a while. You may want to shorten the length of that 1000ft pole.

[0] https://musicmarketing.ca/DNET/rack/brochure_melodyne_3.pdf


>Whoops, bad news! Tom Waits' mixing engineer is explicitly mentioned in Melodyne's brochure

Which is not Tom Waits. A guy worked with 200 clients, has done gigs with Waits, and puts the most famous client names as ("has worked with"), not necessarily the one they used Melodyne on (which he doesn't even claim).


Any recommendations? I get why singers do it, but I want to hear the imperfections.


> I will throw serious money at the first streaming service that implements an 'autotuned' tag, and lets me filter anything tagged with it out of my stream.

Well, I hope you don't like listening to any music made after ~2010, then. Melodyne is completely standard for any modern vocal processing, in all genres of music.


> in all genres of music

I get where you're coming from. I remember Celemony announcing their polyphonic tuning engine a little over a decade ago, and remember buying it as soon as it came out and re-tuning a load of Imogen Heap tunes and loving it, in something like Reaper v2. I know how prevalent Melodyne is, I know the commercial and production related justifications for its use. But I also know for a fact that this is not the case 'in all genres of music'.

Sometimes, I want to listen exclusively to music with real vocals. I love the first CHVRCHES album, it's one of my favourite albums of the last decade. As it happens, it took me ages, a couple of years, to figure out that the reason I liked it as much as I did was the lack of autotune. I (ironically) figured this out when they released their second album which _did_ utilise autotune. I never made it all the way through listening to that second album.


I guarantee that the music you're listening to that has "real vocals" has used Melodyne, you just don't notice it. Even if the artist themselves thinks they didn't use it, if they worked with a professional mixing/mastering engineer, they likely did vocal correction without the singer's knowledge.

You may as well say "I don't listen to music that uses EQ" or "I don't listen to music that uses compressors."


> You may as well say "I don't listen to music that uses EQ" or "I don't listen to music that uses compressors."

Not the same as either of those things. You might as well be arguing that I'm claiming that volume knobs shouldn't exist.

Pick an equivalent like photography. In the photography world, EQ is like a fill light. A compressor you could compare to a polarising filter. Autotune (or drum quant) though, is like photoshopping. Removing all of the skin blemishes and imperfections at best, and at worst and more often than not, it's fake disproportionate waist and ass booty enhancement.


This doesn't make any sense. Vocal compression alters the source signal significantly more than pitch correction, generally. Like getting -10, even -20dB of compression across multiple compressors (some of which are "character" compressors like the 1176 or LA2A, so doing more than just dynamics) is not uncommon. Every vocal runs through a de-esser (a specialized compressor, really) that removes sibilance, which is much more analogous to "removing all of the skin blemishes and imperfections".


It does make sense. Compression is a uniformly applied filter. It's like a contrast adjustment or colour curve. It also affects amplitude only. (It can also be done really badly - I'm not a huge fan of its abuse in the loudness wars, but that's not what we're talking about here.)

Compression is an effect that's applied. Selective tuning, like quantisation, is the result of an _interactive, selective edit_. It's the equivalent of photoshopping out the spots on my ass and making my hips a little narrower. Sure I could have not had that ingrown asshair, and maybe my hips would be narrower if I worked out. But the reality is different.

(Also IMHO ratio is a better single indicator of severity of compression, rather than dB)


If you're seriously interested in this topic, you should look into some vocal processing courses or videos on YouTube, because you are quite misinformed on some of this stuff.

> Compression is a uniformly applied filter. It's like a contrast adjustment or colour curve. It also affects amplitude only.

Very incorrect. Many famous compressors are known for the color (saturation, technically speaking) that they add to the sound. Hell, some are known for how badly they destroy the sound, like the Level-Loc. Analog-style compressors (which most engineers still use for vocals, in particular) also react very differently to different input gains, so it's not a uniformly applied filter.

> (Also IMHO ratio is a better single indicator of severity of compression, rather than dB)

It is not. Talking in terms of total gain reduction is more indicative of the effect on the sound. Using a 100:1 ratio (in practice, a limiter, something like Pro-L on Safe mode) is very common on vocals to catch quick peaks, but only catching a few dB of gain reduction. You won't notice that a limiter is being used on vocals this way. But you would notice if I used a 2:1 ratio on a vocal and set the threshold all the way down, crushing the dynamic range. You also can't talk about ratios when talking about using compressors in serial, which again, is standard vocal processing.


> If you're seriously interested in this topic, you should look into some vocal processing courses or videos on YouTube, because you are quite misinformed on some of this stuff.

Recommend one, I'd appreciate it.

> Many famous compressors are known for the color (saturation, technically speaking)

Are you comparing colouration / saturation from a classic compressor to autotune?

I agree with you on most of the last paragraph. Thanks for the explanation.


I'm sure there will always be music made with varying levels technology. Seems like it will always be the case that any technology past that made in one's childhood will be the cutoff for some people.


This is one of the funniest clips from the show I can think of, the subversion of expectation from his initial clip to a legit Lorde sounding song is so funny to me.


Autotune also in my opinion sets unrealistic goals for people getting into singing


On the other hand, auto tune is great for people to get into making songs without having the greatest voice.


Seriously, that was complete torture watching that video. I thought for some stupid reason it might be interesting. Wrong.


Is there any Vocaloid music that isn’t cheesy pop music? I know of some by Hosono, that’s about it.



Mikgazer is a pretty popular shoegaze album. There's plenty, but they're a bit harder to find.

https://www.youtube.com/watch?v=TPYqrVDlc_4


now I know what simcard nano uses for their radiohead covers! I have always wondered where did those vocals come from. https://www.youtube.com/watch?v=ApL1d_OQYk4


Random comment

Saw a cool device recently (it's old) called Pocket Miku


It's refreshing that the demo in the video is done not on a Mac but on a Windows machine.


But... but... Melodyne..


Melodyne is (really great) pitch correction/vocal editing software, while Vocaloid is a vocal synth.


It took me like 5 minutes to understand what the product does. Why do so many landing pages NOT TELL YOU WHAT THE DAMN THING IS? Are we just expected to know?

>VOCALOID6 is an AI-based technology created by Yamaha to fully support the musical expressiveness of creators from all perspectives, offering an even more natural singing voice than ever before together with unprecedented freedom to express your vocal ideas. This product lets you express your ideas on the spot in vocal form while producing music.

What does this even mean? It reads like GPT-3 output.


It's software that sings for you. You may have seen the stable of characters marketed around the software, like Hatsune Miku https://www.youtube.com/watch?v=vSnKX7kAgIc


Thanks, I did figure this out eventually, I’m just frustrated that it took me watching the video and scrolling through the landing page for a while until I got it.

“Software that sings for you” is a great explanation, which should have been right there at the top of the landing page instead of that endless stream of buzzwords.


"This product lets you express your ideas on the spot in vocal form while producing music." sounds pretty direct to me. If they'd said "this is a voice synthesizer" you'd complain that they were just expecting you to know what that is.


Ever heard of Hatsune Miku? https://youtu.be/jhl5afLEKdo?t=65

Its just voice synthesizing




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: