I went from Vocaloid hater to fan in the span of this year. There are Japanese Vocaloid producers who are pushing the boundaries of pop music in a way that wouldn't be possible with a real singer. I've never come across anything like this music in the West. Definitely an acquired taste.
My Vocaloid song recommendation: Ungray Days by the producer Tsumiki. Tsumiki creates a sharp, aggressive sound that is disagreeable at first but really addictive. https://www.youtube.com/watch?v=UvF3Mwj5d4E
I'm sorry, but come on. Being a musician myself, I always try to keep an open mind and fully realize that not all music is supposed to give you the warm and fuzzies, but IMO nothing about that is interesting. It sounds like mathcore on fast forward with a chipmunk incomprehensibly chattering over it. No interesting harmonic structure, no interesting instrumentation or arrangement, rhythms straight stolen from other genres.
I'm being genuine when I say I'm interested to hear what about this moves you. I almost always get it even if I don't like it. This...I don't get it.
Certainly Ungray Days incorporates many rhythms, chord progressions, and other structures that are derivative of other works. I think this is true of all pop music though. These strike me as songwriting elements that have withstood the test of time because they work, so I don't necessarily view them negatively - to the contrary I feel that I need a certain number of these familiar elements for music to emotionally resonate with me. What I love about Tsumiki's music is these common pop structures combined with the fast tempo, weird voice, and busy instrumentation. There is chaos unfolding in the music but at the same time it is held together by classic structures that I recognize from older songs. It produces this chaotic yet focused energy that really gets me going. There's definitely nothing complex or groundbreaking in this song in terms of musical structure, but I haven't really come across other music with a similar "vibe", if you will (System of a Down may come closest for me.)
What I like about the song OP linked lies mostly in the legacy of how Vocaloid has been used in the past, e.g. "Disappearance of Hatsune Miku" [0] - there are certain textural elements that come with using Vocaloid that show up repeatedly, and a big one among them is "rapid chatter" effects. There are other ways of getting a similar result that don't sound the same, just like a session performance is probably going to sound different from a sampled instrument.
If you examine this stuff on the basis of harmonic structure, rhythm or arrangement you're basically going down the path of discounting most of electronic music, which is discretized into microgenres just on the basis of using a faster tempo, a different snare hit sound or an unusual mixing strategy. You have to really lean into timbre and texture to find what to appreciate.
Another interesting vocaloid artist is SOOOO. However, anyone who has struggled with depression/self harm or has suffered abuse should not look them up if they think there is any possibility of being triggered by a mention of it (to the degree where I opened youtube in a private tab to find these links so that I wouldn't risk them being recommended while I'm in a bad headspace).
I personally like the chaotic quality that many vocaloid songs carry, and much like other somewhat experimental genre's you start to be able to follow more after listening to it over time.
There are certain limitations (which will get better in time) and it's also because it is a stylistic choice. Vocaloid certainly created it's stylistic niche due to early limitations and it is quite hard to find songs in other styles using voicebanks that are suitable for other styles.
Even if Vocaloid is capable of much more than just Miku, Miku is immensely influential in the subculture.
Does it sound better live ( https://www.youtube.com/watch?v=K_xTet06SUo )? Or for a closer comparison with the first link https://www.youtube.com/watch?v=nepNc0Gk1E8 ? Sure, even though the Vocaloid style has its own charm. But you also do not need to be able to be able to sing in order to create a song with Vocaloid, so overall it's a great tool. If the song is good, someone will eventually cover it live.
Vocaloid allows many who want to compose but do not sing to participate in a remixing ecosystem. The collaborative nature of the community is an incredible strength.
I'm not up to date on J-pop/J-rock as of 2022 but things of past years like King Gnu's Hakujitsu or LiSa's First Take performance seem pretty decent as far as fairly popular music goes, and I'm sure I am missing a lot of stuff due to my musical interests having shifted a fair bit over the years.
Please listen to Kiichi's "Francium" and tell me how much you hate it. My enjoyment of the song is greatly increased whenever I'm reminded that people dislike it.
There is a clear explanation for this. If you like something that others find disgusting, it allows you to consider yourself special. Entire subcultures stand on such a psychological focus. Everything has a place under heaven, and this is not good or bad.
Good or bad, it does mean a general increase is "unique" behaviors which are taken for reasons besides "its good".
Diversity is important, but it has the drawback that compatability suffers. Monoculture is no better, but the tendency to dismiss what others find unique is a recognition of the general (biological) strategy, to conserve some status quo to build community.
General advice, don't have strong feelings about what others like - if they enjoy similar things fine if not their opinions are not worth much in the first place. At the same time, don't be afraid to enjoy what you like or find your tastes change over time, it's natural and not necessarily wrong.
Worst thing you can do to a young person, be old, and tell them you love everything they like - either they will think you a foolish old person or be devastated they aren't as hip as they thought they were...
These songs might be pedestrian (heh) to you, but there is so much niche and experimental all using the same voice--I find this highly fascinating.
The above songs I think give a pretty wide longitudinal view of Vocaloid music and the variety you can find in the fandom, from just Hatsune Miku as the vocal.
Also adding: a couple of imageboard originals that introduced me to vocaloid, not really a fan of most vocaloid music though. I still don't know what sub-genre these are -- the producers being anonymous doesn't help!
Luka(?) sounds pretty natural here, too, especially for 2009. I still remember thinking at first that it was sung by a human.
All of the examples linked so far sounds highly artificial, so much that I'm guessing part of the charm for the listeners is the artificial and robotic sound/voicing, so the producers make it extra so in order to giver higher satisfaction.
Is there any examples of songs out there going the opposite way, trying to use something like Vocaloid to make the voice and singing as realistic and human-like as possible?
Ai Dee by Mitchie M is famously tagged [Miku & Luka sing like humans] on the original upload, so that was an explicit goal. Of course, it may not sound as impressive today as it did in 2012, but it's a reasonable attempt.
Which portions specifically? Gave it a listen and didn't find any parts that came close to not sounding robotic. Maybe in comparison to the other examples linked here, that one was better (but I'm unsure if that's actually true), but if that's state of the art, I'm not sure it really comes close to hitting the mark.
Definitely not state of the art as far as being humanoid, that would be the top level link.
It's also not particularly humanoid, just closer than most vocaloids I've heard. Going off of my memory of the song, it's more the intonation then the timbre that stuck out as being more realistic.
I'm afraid I'm not willing to listen through the song currently, I usually dredge up some stuff I'm not wanting to deal with right now when I listen to this artists music.
I've gone from being Vocaloid indifferent to a Vocaloid hater. Being the father of a 12 year girl who is obsessed with imaginary Japanese Vocaloid artists, I'm totally over the sound of it, although I agree the histrionics produced by these things can be fairly amazing to hear from time to time.
What I did find more interesting was the AI "sung" version of Joelene that was doing the rounds a few days ago, based on the voice of Holly Herndon:
https://youtu.be/kPAEMUzDxuo
Interested to see where that goes, although I've got to admit, I'm a purist, and any type of digital vocalist is going to make me go "meh" sooner or later when compared to even a half decent human singer.
There is definitely a element to high energy music which appeals to younger crowds. Same reason clubs tend to turn music up to levels capable of causing hearing damage.
I even exploited this fact as a way of staying awake a couple times while taking long road trips, as a stand in for caffeine.
Having high energy music is OK, allowing it to disturb the peace is not, time to teach the lessons about manners and being considerate, I suppose (buy a pair of headphones for her, limit her volume so she doesn't suffer early onset hearing loss).
No guarantee she won't turn out to be obnoxious as an adult, but that's the genetic lottery, I'm afraid.
Hmm interesting. I'm not familiar with this scene. I'm not sure how I feel about it. I feel like it feels kind of hollow. Like there's a lot of energy in it, but the vocaloid part just feels so emotionless. Maybe that's a cultural barrier though. Jpop and Kpop make me feel similarly and they're actually singing.
On the western side in a similar vein you've got hyper pop coming up from 100 gecs and laura les and what not. This kind of sound, hypertuned and almost as incomrehensible, sounds better to me. You do still get a vein of emotion. I love this sound.
On the cultural barrier part, vocaloids are nowhere touching the leaderboards even in Japan, so it is far from being a widely accepted thing yet.
To me the most interesting part to vocaloid is the ability for a sole producer to make a complete song without any external help. The vocal parts have always been a barrier, and while emotionless and still lacking in some areas, vocaloids are “good enough” to support a well produced song.
We’ve seen creators rise through the ranks through vocaloid, get experience and exposure, to then move to full professional production with a staff and an actual singer (who’s voice will also be heavily processed, but they have a ton of tuning experience at that point)
I also agree with the parent comment that some creators do benefit from the “mechanical” part. Throwing more links, Giga works with both singers and vocaloids and is pretty good at extracting the best of boths: https://youtube.com/c/GigaVideos
> To me the most interesting part to vocaloid is the ability for a sole producer to make a complete song without any external help. The vocal parts have always been a barrier, and while emotionless and still lacking in some areas, vocaloids are “good enough” to support a well produced song.
An early example of this was the debut album of Boston which was mostly recorded in Scholz's basement with him on every instrument except drums, then the tapes were mailed to LA for Delp to record vocals. I think it's rather funny in particular that Rock and Roll Band was written and mostly recorded before the band even existed.
> I'm not sure how I feel about it. I feel like it feels kind of hollow. Like there's a lot of energy in it, but the vocaloid part just feels so emotionless.
In Japan, the term is "denpa" (電波ソング). Denpa music is intentionally strange as it is catchy, and hypnotic as it is awkward. There are many producers creating high-BPM electronic vocaloid music that is chaotic for effect. It is a bit more twee than the western sounds, as you mentioned, but it can be quite enjoyable if you're in the right mood.
Also, I think you'd enjoy the Song Exploder podcast. If you haven't heard it already, check out the episode where 100 gecs break down how Money Machine was created:
In Japanese, there is no distinction between syllable-final [n] and syllable-final [m]. But in English there is. Traditional romanizations of Japanese will transcribe this as "dempa", for the obvious reasons that (a) that is what the Japanese spelling says; and (b) that is also how the word is pronounced.
I often see English speakers get very confused over exotic modern transcriptions such as "denba" or "senpai", believing there must be a reason they are written that way. But I'm not sure what that reason is supposed to be.
Following the "spelling" surely suggests consistently spelling 電(でん) as "den", not alternating n/m depending on the environment? The Japanese don't write different んs for 電波(でんぱ)・電流(でんりゅう)・電話(でんわ).
Attempting to approximate pronunciation is a valid theory of transcription, but one which also ought to prescribe that 電気(でんき) be transcribed as dengki; English is not much less discerning of syllable-final [n] vs [ŋ] as it is vs [m]. This is not a position I've ever seen anyone defend in earnest, though.
(Romanization for anglophone is a bit of a lost cause anyway, since we're going to fuck up the vowels no matter what you do.)
> Attempting to approximate pronunciation is a valid theory of transcription, but one which also ought to prescribe that 電気(でんき) be transcribed as dengki; English is not much less discerning of syllable-final [n] vs [ŋ] as it is vs [m].
That is blatantly incorrect. English converts syllable-final [n] to [ŋ] when followed by a velar exactly the same way Japanese does, and English spelling reflects that. Consider the English words "think", "clunky", or "handkerchief".
Sure, now show us lack of assimilation to a subsequent bilabial (in a context where /nk/ does assimilate), which is what Japanese does and that you're implying English does differently (it doesn't). English has it baked in so deeply that most would-be /np/s are already spelled <mp>, which muddies the waters a bit, but these past few days have given us plenty of clips of people pronouncing "government", haven't they?
What are you trying to show? You seem to agree that the English spelling of /nt/ is "nt", the English spelling of /ŋk/ is "nk", and the English spelling of /mp/ is "mp". There is no possibility of "np", "nb", or "nm".
How would that suggest that it's reasonable to spell the Japanese word "dempa" as "denpa"?
For demonstrating lack of assimilation of /n/ to following bilabial, there are a couple distinct questions you might ask. It's very frequent for people to preserve the tongue gesture associated with /n/, because a bilabial stop doesn't use the tongue and so [n] is easily coarticulated. But that turns into /mp/ or /mb/ over time because the difference is not easy to hear. In contrast, for a word such as "impossible" where this process completed many hundreds of years ago, the tongue is not used at all in the pronunciation of /mp/. This is a kind of lack of assimilation.
You can also see lack of assimilation in the very people who go to special efforts to pronounce [n] in Japanese words where that is inappropriate.
Note that the English and Japanese phenomena you're talking about are very distinct. This is a fact about the historical development of sounds in English (and Latin...) that doesn't apply to current English, where a sequence like /ng/ will often be preserved across word boundaries. ("One ghost"; this is the only context in which such a sequence can occur at all.[1]) English maintains a robust distinction between /n/ and /m/ and a weaker one between /ŋ/ and the other two.[2]
In contrast, Japanese ん assimilates to whatever follows it, and in the case that nothing follows it it may (rarely) be realized as nothing more than nasalization of the preceding vowel. Word boundaries are not relevant. Japanese does not have a phonemic syllable-final /n/ or /m/ (or /ŋ/). It has a single sound (usually indicated /N/ by specialists, apparently, due to even more weirdnesses that it involves) that gets realized differently in different contexts.
So again - what would justify representing the Japanese sound as "n" regardless of context in languages where, unlike in Japanese, the distinction between "n" and "m" is meaningful?
[1] You say that most would-be /np/s are already spelled "mp", but this is false - the words that are spelled "mp" changed long ago, and do not represent attempts by modern speakers to pronounce an /np/ sequence. They represent attempts to pronounce an /mp/ sequence.
[2] Why weaker? /ŋ/ doesn't have the status the other two do; it cannot begin a syllable. And it makes for a less than perfect contrast with /n/ and /m/ because it has a fairly pronounced effect on the vowel that precedes it, which makes drawing a clean contrast difficult.
> What are you trying to show? You seem to agree that the English spelling of /nt/ is "nt", the English spelling of /ŋk/ is "nk", and the English spelling of /mp/ is "mp". There is no possibility of "np", "nb", or "nm".
Consider "inpainting", "unbiased", and (as suggested earlier) "government", each of which is a synchronically transparent /n/ across a morpheme boundary, yet a cursory survey of recorded English speech suggests that it's pretty common for these tongue gesture associated with /n/ to be absent—infamously, the second syllable of the last routinely loses its coda altogether. This occurs across a transparent morpheme boundary, even with affixes productive in the modern language, even in learned usage.
English does have a lot more wrenches to throw in this, like producing nuclear nasals in a range of situations and not always assimilating across prosodic word boundaries—heck, it probably goes both ways in an utterance like "in my main menu". Words spelled "mp" are reliably [mp] in the modern language, but it's not a simple case as "mp" spelling /mp/ read [mp] and "np" spelling /np/ read [np]; English phonotactics also coerces the nasal in /np/ to a bilabial realization.
> because a bilabial stop doesn't use the tongue and so [n] is easily coarticulated
That doesn't sound quite right—this assimilation surely wouldn't be nearly as globally prevalent as it actually is if that were true.
Try it. While you'd think from the descriptions that a bilabial stop shouldn't care where the tongue goes, I think you'll find it quite challenging to coarticulate [n] with [b]—tongue positioning at lower teeth is pretty obligatory—and much easier to sequence them or produce [mb].
Clearly you can see the unnaturalness of lack of assimilation to call the attempt to do so "special effort"! So of course, the typical anglophone is not going to try to realize [n.p], they'll just see the <np> and read [mp] because that's what they would with any other internal /np/.
> So again - what would justify representing the Japanese sound as "n" regardless of context in languages where, unlike in Japanese, the distinction between "n" and "m" is meaningful?
Now, this gets to an entirely different issue: the purpose of the transcription. You seem convinced that the main goal of romanization is to provide a pronunciation guide for anglophones. But in the context of discussing a niche musical genre on the internet, that's not necessarily a high priority in the first place; you might care more about, say, searchability: we're looking for https://en.wikipedia.org/wiki/Denpa, not https://www.worldbank.org/en/programs/debt-toolkit/dempa.
And in a wider context, the principal users of romaji Japanese aren't anglophones; they're Japanese-speakers who for some or other reason need need to coerce Japanese text into an ~ASCII-subset representation, targeting primarily computer systems with that sort of limitation (most common case being keyboards via IME, hold that thought) and secondarily other people who can read Japanese; and naturally they make the distinctions Japanese makes and largely don't make the distinctions Japanese doesn't make. So unless backed by a marketing department, they tend to produce n (or nn as needed) for ん, because they have a tenuous grasp on how anglos spell [mp] in the first place and でmぱ is garbage that their IME won't convert into the right word, so why type that?
(This is also why pinyin can be the way it is, yet their IMEs have routinely have modes to ignore s-sh/n-ng/n-l distinctions.)
>> because a bilabial stop doesn't use the tongue and so [n] is easily coarticulated
> That doesn't sound quite right—this assimilation surely wouldn't be nearly as globally prevalent as it actually is if that were true.
> Try it.
You know, I mentioned a specific theory here that you've completely ignored. The coarticulation is easy. But it is difficult for a listener to tell the difference between coarticulated [nb] and [mb]. If you're willing to let multiple generations pass, this means that /nb/ will become /mb/ regardless of how easy it is to pronounce.
You will also note that this theory of what's happening mostly cannot be disproved by recordings, which you appear to want to do. You'd want an X-ray or MRI study, something which shows you what the tongue is doing.
> I think you'll find it quite challenging to coarticulate [n] with [b]—tongue positioning at lower teeth is pretty obligatory
This is just obviously false. You have no problems producing [b] with your tongue positioned however you like. You can position it for [t], you can position it for [tʃ], you can position it for [k]. And of the three coarticulations I just mentioned, all of them are well attested, though only the middle one is attested in English ("pshaw", a scoffing sound).
> and much easier to sequence them
This is worthy of comment; there is a linguistic concept called "coarticulation", but all cases of coarticulated consonants seem to have a conventional sequence associated with them. I have no real knowledge or opinion on how real the conventional sequencing is, or how much sequencing is allowed before you stop calling the sounds coarticulated. I suspect that indeed it is easier to sequence two events than to coordinate them to occur at exactly the same time; this is true for all types of events, not just language-related ones. I don't think that the linguistic concept requires absolute synchronization of particular points in time; my understanding is that producing any given phoneme requires some motion and therefore takes place over a nonzero span of time, and "coarticulated" consonants are those for which the durations overlap, not necessarily those for which the durations perfectly coincide.
But I will note that while sequencing of /nb/ is obviously necessary in a way that is not true for /pt/, since /n/ must have nasal airflow and /b/ must not, there is no reason for "coarticulation" of /nb/ to be more difficult than it is in the attested coarticulation /tm/ (exactly the as /np/ for our purposes; /tm/ also features a voicing difference between /t/ and /m/).
> Consider "inpainting", "unbiased", and (as suggested earlier) "government", each of which is a synchronically transparent /n/ across a morpheme boundary
I don't think "government" is a valid example, and you should stop trying to lean on it. In my view, the pronunciation of "government" has as much to do with the morphemes suggested by its spelling as the pronunciation of "comfortable" does with the morphemes suggested by its spelling.
I have no problem with "unbiased"; that's a great example of what we're talking about.
> Clearly you can see the unnaturalness of lack of assimilation to call the attempt to do so "special effort"!
I don't agree with this. I claim that it is common for Anglophones pronouncing "unbiased" to make contact between the tip of their tongue and their alveolar ridge while they pass over the /n/ in the word. (And here, we're on firm ground saying that the internal phoneme is /n/ and not /m/, since it's part of a productive prefix un-.) I further believe that they make no special effort to do so. They may or may not allow a longer duration of nasal murmur than they do in other contexts, to make the /n/ clear; doing this would constitute a special effort. I believe that some speakers will do this and some won't bother. Of those who do, only a small amount of effort will be given to the task.
But the case of English speakers attempting to pronounce Japanese is different. They will go to great lengths to demonstrate that they want to comply with the bizarre textual representation they see. They are happy to produce highly unnatural speech in order to do so. (Which isn't really a problem; they don't really have an alternative to producing unnatural-sounding speech in early attempts to pronounce a foreign language. But this is something they shouldn't encounter problems with.)
> And in a wider context, the principal users of romaji Japanese aren't anglophones; they're Japanese-speakers who for some or other reason need need to coerce Japanese text into an ~ASCII-subset representation, targeting primarily computer systems with that sort of limitation (most common case being keyboards via IME, hold that thought)
> (This is also why pinyin can be the way it is, yet their IMEs have routinely have modes to ignore s-sh/n-ng/n-l distinctions.)
This isn't a flattering comparison for the all-n Japanese transcription system. The pinyin for 吕 is lü. Chinese people don't use German keyboards, which makes the pinyin impossible to type. So where ü contrasts with u, pinyin input methods require you to input V. And Chinese people have responded to this by adopting v-based spellings; it is common to see pseudo-pinyin like "lv" where that pinyin has been generated by an ordinary Chinese person for their own purposes, such as a sign over their business or an online username.
But the letter V is formally not a part of pinyin at all, which means that text generated by the government never uses it and neither do instructional texts.
It is true that this situation is the reverse of the one we're discussing - the Chinese are making a distinction that is required by their language but forbidden by their keyboard, and the fact that they are aware of the distinction makes it easy for them to know what to do. The Japanese are failing to make a distinction that doesn't exist in their language but does exist on their keyboard; this is precisely parallel to the pinyin IME settings you note that will allow the user to ignore phonemic distinctions that they don't make. Again we see that the system maintains the distinction and it's the job of the input method to interpret what the user wants to say.
Chinese IMEs also offer a "double pinyin" input method, in which you type one letter to indicate the onset of a syllable and a second letter to indicate the rime. All syllables are two input-letters long; this model matches the traditional Chinese view of their own phonology. You could just as easily base your system of English transcription on this: instead of "Xi Jinping", 习近平's name would be "Xi Jnp;". Instead of "Sun Yat-sen", we'd talk about "Sp Yixm".
That's what it looks like when you base spelling on what it's convenient for foreigners to type as an intermediate input to their own, different spelling. (As is the case with Japanese input methods.) There are zero people who believe it's a good idea. It's not a better idea in the Japanese case.
+1 for 100 gecs; their sound is a very distinct aural palate cleanser that I consistently enjoy. See also Charli XCX's How I'm Feeling Now album and midwxst's SUMMER03 EP
It's a random-walk of blues riffs over a stock diatonic chord progression with a slow (and predictable) harmonic rhythm.
The only conceivable surprise is a crude chromatic key change to the minor version of the raised mediant.
You'd think the precision of those dynamic envelopes and timbral games would push the artist to venture out and explore that mediant relationship to create quicker and more jarring harmonic progressions and modulations. But no-- it turns out to be less inventive than the mediant chains emanating from, say, Joni Mitchell and her acoustic guitar over fifty years ago:
Compared to the cookie-cutter harmony and melody of the music you linked, even Mitchell's augmented triad in the melody at the end of the chorus sounds like the musical equivalent of solving fast homomorphic encryption.
It's the the audio tech that is on display in the music you linked, so every other musical consideration shifts to the background to illuminate that tech. I get that. But holy shit why does that baseline have to be stuck in the fucking 1650s? While I love the "electrified Vivaldi" hack that is heavy metal from the late 70s/early 80s (Master of Puppets et al), I question whether we really need more than one musical genre based on that parlor trick.
It would be like every stand up comedian ending their set with increasingly theatrical pyrotechnic pull-my-finger jokes. I could laugh my ass off at the absurdity for a year, maybe two. But forever?
It’s possible you’re stuck on the first slope of the dunning kruger graph.
My (possibly wrong) impression of your comment is that you seem to have made the mistake of associating complexity with quality in music which is extremely common in those who’ve just started looking into music theory.
Most music needs only the smallest dash of novelty to achieve the perfect mix of the new and familiar to its target audience. If you start attempting to evaluate popular music on what about it is inventive or new, you’re likely to find yourself unable to appreciate most of what people are enjoying and cut yourself off from loving a broad spectrum of musical expression.
You might also find yourself unable to express why you enjoy the music you do like in a way that doesn’t come across as if you’re arguing an objective scientific point——an approach which might undercut your argument by making you unintentionally come across as someone who has just learned a lot of fancy theory jargon and is eager for an excuse to wield it.
I'm not sure I understand what Vocaloid does? Does it generate vocal parts "from scratch" / just from lyrics? Or is it more like a vocoder?
The track you reference sounds like chipmunks sped up 2x; it's not unpleasant to listen to, and fun, but I feel it could be made just like that (record at 80bpm, high pass filter, maybe transpose 1 octave, and speed up to 180), no "AI" involved.
It's an instrument: you have a piano roll interface, draw in your melody like editing MIDI in a DAW, and add lyrics to each note (usually with some manual phoneme fine-tuning), and it outputs a stream of vocal audio.
Human Japanese singers, especially women, tend to operate in a higher octave range than what is common in the west. It's slightly culturally insensitive to take shots at vocal pitch when talking about J-Pop. Pitch is largely a social/cultural construct, and Japan generally leans into the idea of higher pitch -> polite or cute and lower pitch -> aggressive or rude. (e.g. you raise your pitch when talking to your boss, and drop it to express your disgust with someone.) Just putting that out there, not trying to be accusatory or anything. It's just always good to keep in mind that western cultural norms are hardly universal.
The chipmunk effect isn't even necessarily part of it. Most vocaloid music is in a more "normal" range.
It's a synthesizer. It's an alternative to human singers. I can imagine someone seeing a digital piano for the first time. "I'm not sure what it even does. I could just use an acoustic piano. It sounds the same."
As for the chipmunk sound, it's not unusual for female j-pop vocalists to operate one or two octaves higher than the unfamiliar western ear would generally consider pleasant.
There's also plenty of music directly derivative of the vocaloid scene that maintains a similar aesthetic with 'organic' vocalists and dispenses with some of the awkwardness of vocaloid-oriented compositions. Example: https://www.youtube.com/watch?v=hjJMIWyl_l4
This one if an official track for a popular vocaloid rythm game.
Also, at this point the “chipmunk” sound is part of the brand and will be kept to some extent for tracks labelled as vocaloids (it’s kind of a market on its own)
you write down phonemes on a DAW, and it synthesizes the voice for you. you also put down vibrato or other modifiers like you would for most other instruments.
the audio is generated from a voicebank that is a database of prepared phonemes recorded from a voice actor. some packages come with multiple variants of voicebanks, like you could have a "soft" voice and a "vivid" voice.
The rapper/producer Deko has been doing some very interesting stuff with adding vocaloid synth characters to rap/hyperpop music. He has two vocaloid "characters", Lil Yammeii and Lil Hard Drive. Most vocaloid rap I listen to is terrible but this stuff is super well produced. He'll even do things like add breathing noises to the vocaloid tracks, which improves the sound a lot. https://youtu.be/usRDtHjYKzU
He also has some funny parodical bits he does, like rapping about having a lot of money/jewels/etc and then the vocaloid characters rap about having a lot of RAM.
Nope, certainly stays disagreeable to me. I wonder what makes people enjoy weird stuff in so many different ways. I might not like this, but I enjoy white noise artist Merzbow [0] or breakcore from Drumcorps [1]
I think people who have never heard Merzbow are likely going to misunderstand your post as being dismissive rather than understanding of either opinion.
As a small form of resistance to the surveillance state I partake in, I have taught kids to ask any nearby personal assistants to play woodpecker #2 and they find it hilarious.
My Vocaloid song recommendation: Ungray Days by the producer Tsumiki. Tsumiki creates a sharp, aggressive sound that is disagreeable at first but really addictive. https://www.youtube.com/watch?v=UvF3Mwj5d4E