Viability of replacing human singers with voice synthesis – Research reviewed

Summary of Research Question and Outcome 

Inspired by questions from VOCALOID fans, especially the fans of the most famous Japanese virtual singer ‘Miku Hatsune’, the research question chosen to be investigated was focused on the viability of the replacement of human singers with voice synthesis technology. Research was processed with use of primary and secondary research; online academic journals, description and information guides, email interviews and visiting online stores provided distinctive components required for this research. Throughout this research, the structure of the voice synthesis technology, considerable factors related to music producing, opinions from people related to music industry and the actual case of the replacement of human singer(s) were observed. After the end of the research process, it was concluded that comparisons between human singers and voice synthesisers are nearly impossible and the voice synthesisers should not be considered as ‘a factor of replacement’ but instead another type of digital music instrument for ‘experiment’. 

Link to the research:


Evaluating the Research Processes 

The main structure of my research was built on extensive reading of scientific journals and studies online, which allowed me to quickly establish a specific understanding of the voice synthesis technology. It was decided to obtain the data about the structure of the voice synthesisers as it was expected that understanding of voice synthesis is required for this research. During the first step of the research, researching about what voice synthesis is and how it works, gave me the ‘statement’ that the voice synthesis technology is the technology of reconstructing recorded speech samples for making a speech phrase, which gave me part of the research answer ‘the voice synthesisers should not be considered as a factor that can replace human singers but another method of using human voice’.  

The majority of the data available online were ‘unofficial’ resources those cannot be trusted such as fan fictions and ‘personal opinions’. A lot of ‘fan artworks’ were filtered out, being considered as ‘unreliable’ resources, while ‘public image’ is one of the most important factors for marketing. The only internet research that were valid and credible on answering my research question was the journal articles and scientific descriptions of the technologies used for voice synthesis.  

All the interviews were done via email with people who have experiences related to the music industry. Interviews with experts, including vocalist, professors and producers, allowed me to listen to their ‘honest’ opinions and viewpoints of the voice synthesis. Some interviewees were not confident about their opinion and declared that some of their responses would not be the ‘truth’, which is quite acceptable. Because the feedback was from the experts who work in industry, the gathered information was able to be considered valid. Some interviewees did not really know about the commercial brands or technical structure of the singing synthesisers, therefore some ‘mislead’ judgements were made. Those ‘incorrect’ data were partially modified without changing the key points of the responses from the interviewees.  

Decisions made in Light of Challenges and Opportunities 

The biggest challenge I found in answering my question is people’s personal preferences this is what will impact its viability of replacing human singers with synthesised voices. Some ‘emotional’ definitions and expressions of ‘artworks’ becomes an issue when an ‘artwork’ is not originally made by human. Yet the ‘creation’ of a song is still directed by a ‘human artist’ therefore it can be considered as an ‘artwork’ made by an artist, however, some specific terminologies of art such as ‘soul’ and ‘spirit’ are quite questionable, totally dependent on the viewpoints. To negotiate with the biased information or subjective and personal viewpoints, instead of discarding the data related to this issue, those terminologies became the factors that needed to be considered for checking the viability of the replacement of human singers therefore the content of the research can be ‘interesting’. The biases were decided to be considered as part of the ‘research data’ and thus the outcome could include the multiple viewpoints, which helped me improving the quality of the answer to the research question in terms of the ‘preference’ of people working with music industry. 

A number of vocalists available online did not reveal how much they charge clients for voice recording. In trying to help establish if this was viable, I needed to look at whether it would be cost effective. The vocalists revealed their usual recording charge declared that the price of recording may differ in bars (the unit of the segment time corresponding to a specific number of beats) and usual charges for 16 bars were absolutely dependant to the vocalists. Sellers of voice synthesisers only sold permanent user license of the software, which means that users can make as many recordings as they want after one-time payment. The difference between those two different systems caused difficulties of the price comparison. I tried to improve this by collecting as much opened data as possible and calculate the average price to get the ‘constant’ value of the ‘expected’ price for recording. However, it still was not a reliable conclusion in regards to cost which is a weakness in my research and answering my question. 

One problem I encountered was that not many previous researches were made on this topic. Technical information and the news articles about the development of voice synthesis were available and the replies to the questions about the viability of replacing human singers with voice synthesisers, the answers that can be considered as ‘personal opinions’ written in ‘personal perspectives’, were available and can be easily found online, however, articles about the actual research on preference, audio quality, recording price and time managements that can be referenced were hard to be found. Voice synthesis is not a common technique between people, thus some people might feel ‘uncomfortable’ when they are listening to the ‘results’ made with voice synthesis or talking about this specific topic, which lead to the lack of the practical data. More ‘required’ data were gained via the research on the audio technologies related to the ‘main topic’ of this research, which is about what voice synthesis is and how viable it is to replace human singers with voice synthesisers, therefore the research outcome became more well-constructed.  

One challenge was that even the cross referencing of the data obtained via online research did not provide me enough reliable data, so I decided to ask school music staff to help me. School staff suggested me to email some ‘experts’ for asking help. Those experts gave me more reliable data such as the aspects of the positive impression of the voice synthesis, the examples of the ‘practical’ use of voice synthesis and future expectations of this technology compared to the resources obtained from the online research, thus the quality of the outcome improved in terms of understanding how voice synthesisers would affect the music market or improve its own technology.  

Evaluating the Research Outcome 

Only limited range of questions, those about the preference and viewpoints of the songs that are already released, could be asked to limited number of people – 7 interview requests were sent, 5 people replied – working in the music industry. For improvements, the investigation of the audio quality such as audio frequency and amount of white noise and the quality of speech such as pronunciation and the transitions between syllables had to be performed for researching about the viability of the replacement, as the ‘audio’ is one of the biggest parts of ‘music’, and these investigations could be done by cooperation with audio engineers if the audio engineers were happy to participate and were available. Even the email interviews were performed with audio experts, the opinions of experts cannot be the exact perspective of the opinions of others, therefore different experiments about the preference are also had to be done in public. More interviews conducted with music producers who make/made songs with the use of voice synthesis could be done for making a brief description of what those people think about the use of voice synthesis in general. This would improve my Outcome by adding more explanations for understanding the structure and limitation of the voice synthesis, and thus could include the better suggestion for the improvement of the voice synthesis technology.  

The report should have included more knowledge base by referencing more scientific/technical documents. As the technology of voice synthesis is still not very common, there were some difficulties for finding scientific documents related to voice synthesis. This research became an unreliable research as the sources and range of the research are limited. The conclusion of the research outcome because not enough reliable data for answering the main research question. The process of answering the research question generated follow-up questions such as ‘would people intend to listen to the outcome of the voice synthesis?’ and ‘how cost-efficient it is to use voice synthesis?’ which are crucial for answering the main research question and could not be answered during this research. 

This research was previously not attempted by many people, which can be said that this research can be considered as a ‘new attempt’ for solving the argument occurring between the fans of voice synthesis technology and be helpful for people who would like to have a research about similar topic in the future. The target audience of the research outcome are the users who asked the viability of replacing human singers with voice synthesis on online communities and music producers who are interested in making new ‘experimental’ trials with the use of voice synthesis technology. 

The majority of the key findings of this research were helpful for answering the research question by describing the differences between the voice samples directly recorded from a vocalist and audio files made with voice synthesis. Interviews with the people who have experience with voice synthesis and music industry provided professional ‘expectation’ of the future of voice synthesis technology, explaining which one people prefer in what reason and how this technology can be improved.  

Overall, the outcome is somewhat robust but is something I would like to continue researching in future years. 

Leave a Reply

Your email address will not be published. Required fields are marked *

eleven + eight =

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.