Talk:Speech recognition

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Voice Recognition[edit]

"Voice Recognition" is analysis of the spectral patterns of one's speech to verify if that voice belongs to a registered individual. Voice recognition is used in authentication systems. "Speech Recognition" is analysis of the speech stream to parse semantic content, frequently used for command and control. Why is even wikipedia conflating these two terms? —Preceding unsigned comment added by 24.18.198.206 (talk) 13:50, 22 April 2009 (UTC)[reply]

Probably because Voice Recognition is often mentioned together with Speech recognition in published literature. A second reason would be that VR techniques come from SR. OrenBochman (talk) 14:25, 28 March 2012 (UTC)[reply]
And laypeople often confuse the terms. --Thnidu (talk) 17:03, 14 October 2018 (UTC)[reply]

Textbook Link ?[edit]

http://www.cs.colorado.edu/%7Emartin/SLP/Updates/ —Preceding unsigned comment added by 84.135.80.109 (talk) 10:26, 7 September 2007 (UTC) I have no idea how to use wikipedia, but someone needs to revert this page back to how it was a couple edits back. —Preceding unsigned comment added by 69.138.244.50 (talk) 09:19, 27 November 2009 (UTC)[reply]

Old comments[edit]

Content from Speech Recognition, now a redirect here.

My understanding is that the entropics HTK toolkit, while available, is copyright microsoft. I would suggest looking elsewhere for places to start... probably CMU sphinx, as evil and difficult as it is to use.

NotSoAnonymousCoward 17 Nov, 2005

The phonemes identification is produced when the sound of speech arrives at the computer as analogue wave forms. At the end of the process, these phonemes make up words, which recognise the inputs either they are continuous or discretes. To gather every single word is necessary a hard and long work, which goes on in training samples (known as corpora). There are also some problems when trying this. First of all, the recognition of the kind of speech, depending on a deliberated speech or a continuous one; besides, the difficulty to identify any speaker with the trouble individual speech brings. The pollution in the system produced by outer noises is also another problem. And at the end, to overcome the Grammar mistakes caused by differences in accents, dialects and spoken languages.

Charles Matthews 09:20, 6 May 2004 (UTC)[reply]


Note on the Technical Issues Section: I recently added (as an anonymous user) the first part of this section (up to SPHINX) and deleted the original content which was not really technical and which was mostly speculative rather than factual. Many parts of the original section are wrong or irrelevant. Someone put back the original part which makes the whole thing look like it was stitched together. Perhaps the original part (after SPHINX) should be called something else, e.g. "challenges in speech recognition". -Dan

If it's wrong, then correct it. If it's mislabeled, then label it correctly. Don't just delete--that's not we do things here. Nohat 23:08, 18 Apr 2005 (UTC)
I restored the above content. It's generally considered bad form to delete content from Talk pages unless it's been archived to an archive page. Nohat 23:56, 18 Apr 2005 (UTC)

Other old content from Speech Recognition, now moved here as it is too vague and too out of place. Perhaps someone can boil down some observations in it into a paragraph on the technical difficulty on speech recognition.


Some other key technical problems in speech recognition are:

  • Inter-speaker differences and also intra-speaker variations are often large and difficult to account for. It is not clear which characteristics of speech are speaker-independent.
  • Speech recognition system are based on simplified stochastic models, that do not match the real speech accurately.
  • The interpretation of many phonemes, words and phrases are context sensitive. For example, phonemes are often shorter in long words than in short words. Words have different meanings in different sentences, e.g. "Philip lies" could be interpreted either as Philip being a liar, or that Philip is lying on a bed.
  • Co-articulation of phonemes and words, depending on the input language, can make the task of speech recognition considerably more difficult. Some Languages, such as English, have large amounts of co-articulation in conversational speech (consider for example the sentence "what are you going to do?", which can be pronounced as "whatchagonnado?", which has no resemblance to the "correctly" pronounced sentence). Other languages have almost no co-articulation, and are therefore much easier to recognize. Japanese for example is strictly sylable based, and has no co-articulations, which makes it much easier to recognize than English.
  • Intonation and speech timbre can completely change the correct interpretation of a word or sentence, e.g. "Go!", "Go?" and "Go." can clearly be recognised by a human, but not so easily by a computer.
  • Words and sentences can have several valid interpretations such that the speaker leaves the choice of the correct one to the listener.
  • Written language may need punctuation according to strict rules that are not strongly present in speech, and are difficult to infer without knowing the meaning (commas, ending of sentences, quotations).

The "understanding" of the meaning of spoken words is regarded by some as a separate field, that of natural language understanding. However, there are many examples of sentences that sound the same, but can only be disambiguated by an appeal to context: one famous T-shirt worn by Apple Computer researchers stated, I helped Apple wreck a nice beach, which, when spoken, sounds like I helped Apple recognize speech.

A general solution of many of the above problems effectively requires human knowledge and experience, and would thus require advanced pattern recognition and artificial intelligence technologies to be implemented on a computer. In particular, statistical language models are often employed for disambiguation and improvement of the recognition accuracies.

For foreign speakers an unintended side-effect of using speech recognition technology is that they can improve their pronunciation while trying to make the computer understand what they're saying.

-- 18 Apr 2004; moved by Dan

copyvio[edit]

Possible copyright violation from http://www.wombatnation.com/2004/04/speech-recognition/ Arvindn 08:09, 7 May 2005 (UTC)[reply]

Yes, a large portion of the content does appear to have been copied directly from a post I made to my blog in April, 2004, albeit with a bit of reorganization and editing. All the material on my blog is licensed under a non-commercial, attribution Creative Commons license. I certainly don't mind having content I've authored show up on Wikipedia, and given the nature of Wikipedia articles, I don't expect attribution. However, it would been nice to have been notified about it directly. Thanks very much, Arvind, for bringing it to my attention, as a link to this page showed up in my website referer log today. RobertStewart 00:45, 9 May 2005 (UTC)[reply]

It looks like someone copied in the material on March 4, 2005. Nohat, I saw your comment, "this section is not written in a very encyclopedic style--it is too breezy and sounds like it's written from a single POV." That's spot on! I thought I was just providing a useful summary on my blog, not writing material for an encyclopedia article. As I stated above, you're welcome to use whatever you want of what I wrote. The edits to date have certainly improved it, but I agree that it (at least the parts derived from my original post) could use a lot more editing to make the content more suitable as an encyclopedia article. I'll try to do some editing myself to update things that have changed over time. RobertStewart 01:05, 9 May 2005 (UTC)[reply]

amend[edit]

"Speech recognition systems have found use where the speed of text input is required to be extremely fast." It is hard to believe it could outrank the keyboard with a very proficient typist. Even if the speaker trains to speak extremely fast (which is only possible for short amounts of time due to the huge consumption of neural processing) there still wouldn't be a market and thus no software for it.

speech recognition vs voice recognition[edit]

I was under the impression that speech recognition differs from voice recognition (which uses voice or voiceprints to identify someone). If so, I don't think a search for voice recognition should redirect someone here to speech recognition.

Am I too off target?

Bill

You're thinking of voice authentication, which as of this writing is an article which does not exist yet. You are correct, voice recognition should be a disambiguation page pointing to speech recognition and voice authentication. Nohat 23:42, 10 April 2006 (UTC)[reply]

Open source software[edit]

Are there any open source speech recognition projects? It would be great to summarize how the best few are doing or note the lack if there are none. — Hippietrail 17:35, 15 April 2006 (UTC)[reply]

There are some, all of the following are under a MIT-like license:

http://cmusphinx.sourceforge.net/html/cmusphinx.php http://www.cavs.msstate.edu/hse/ies/projects/speech/index.html

SimonSays[edit]

Quoting: Start-ups are also making an impact in speech recognition, most notably SimonSays Voice Technologies. A Toronto-based company, Simonsays has made several breakthroughs in robust server-based speech recognition. Though SimonSays currently possesses a smaller market share, they are certainly a company to watch.

I would rather like to have some references. This sounds too much like an adverisement. SiriusGrey 17:06, 18 April 2006 (UTC)[reply]

non-encyclopedic?[edit]

In terms of freely available resources, the HTK book (and the accompanying HTK toolkit) is one place to start to both learn about speech recognition and to start experimenting (if you are very brave) The last bit about being brave seems kinda POV... sentence should be reworded? Ben Tibbetts 23:12, 10 May 2006 (UTC)[reply]

  • "In this entry, we will the use of hidden Markov model (HMM) because notably it is very widely used in many systems. (Language modeling has many other applications such as smart keyboard and document classification; to the corresponding entries.)". In the section "Performance of speech recognition systems" is rather unclear. "In this entry"? Remains of the copy from the blog, possibly? Musically ut 15:11, 29 July 2007 (UTC)[reply]

some clean ups[edit]

I have clean up some of the descriptions, mathematics in the previous version up to the point of "HMM-based speech recognition". The previous description was biased towards commercial speech recognition. So it is easy to mislead readers on some basic facts.

Things I will point out is that on one hand, dictation engine could have high recognition performance. However, for individual speaker, the recognition rate could be varied from speaker to speaker. Notably, when the user speak English with different accents, then the performance will not usually be 98%. So this is generally more a claim, than a fact.

Another thing I will point out is the mathematical explanation of the basic principle of speech recognition. The previous authors obviously mistaken the use of noisy channel formulation (The P(W|A) thing if you don't know what I mean) with HMM. This I will call a mistake, because without the max term appear in finding the best words sequence, it is actually not trivial to remove the term P(A). The theory of speech recognition is actually quite sophisticated. The explanation in the old version lacks of certain mathematical vigor.

This version is still lacking behind a certain standard. For example, I strongly agreed the comment in non-encyclopedic and simon said. The former is really POV and the latter is really just an ad. I also disprove of using Bill Gate's quote in speech recognition. (He is really not a researcher of speech recognition or any speech related research at all.

We also need more scholarly articles and references to support the content. Hopefully, we could add that in future.

This section was added by Arthchan2003 at 06:50, 18 May 2006; see diff. --Thnidu (talk) 17:12, 14 October 2018 (UTC)[reply]

Missing history[edit]

This article jumps into the technical issues without giving any context to the history of speech recognition advances over the years. In the intro there is a very short summary of applications, but there are no dates or names associated with them. I guess patent filings would give a good history of who did what when in this field. --DeweyQ 15:59, 22 July 2006 (UTC)[reply]

I agree that the history is very poor for this topic. For some reason, the content seems to imply that speech recognition software is limited to, or at least largely aimed at, medical applications. My awareness of the history of this idea is that it was going to "make keyboards redundant within X years" ... its failure to do so is in my opinion what the history should cover. 150.101.121.107 01:18, 7 November 2007 (UTC)[reply]

"One of the most notable domains for the commercial application of speech recognition in the United States ..." another article which reads like ONLY the US have such technologies ... read i.e. the German version, and learn how Philips (from the Netherlands) took leadership on speech recognition. If I had a wish free at christmas: stop US-propaganda in Wikipedia. --77.186.130.110 (talk) 16:31, 3 November 2008 (UTC)[reply]

I also miss a history section on the topic! I.e. the German article covers it, and usually English WP articles with a much smaller amount feature a history section! --PutzfetzenORG (talk) 12:34, 11 February 2012 (UTC)[reply]

I just made a first attempt at re-writing the history section. --Dithridge (talk) 20:04, 17 January 2015 (UTC)[reply]

Redirected from Speech to text[edit]

Funnily enough, the Speech-to-text article was redirected but the Speech_to_text article wasn't. I have rectified this. rmccue 01:01, 23 July 2006 (UTC) hi[reply]

Broader Implications of Speech Recognition[edit]

This article focuses heavily on the "nuts and bolts" technical discussion of speech recognition technology, and gives scant coverage to the actual business uses, which are many. For example, this article could be expanded to include the impact phone-based speech reco systems have had on customer service (positive and negative). There's also an interesting movement to displace outdated and cumbersome touch-tone IVRs with speech-enabled systems.

Nezzo 15:29, 5 September 2006 (UTC)[reply]

Software?[edit]

No mention of the programs that one could use to perform speech recognition (i.e., NaturallySpeaking or ViaVoice)? Or how about cell phones and car navigation systems with voice commands? Personally, I thought this was lacking from this article. RobertM525 03:36, 12 October 2006 (UTC)[reply]

Yes, as a user I am more or less stunned that the article doesn't even mention Dragon NaturallySpeaking, which is clearly the best speech recognition program out there. I don't know of anyone who disagrees with that, but I would be happy to hear differently. 70.32.206.42 01:46, 5 December 2006 (UTC)Gene Venable 4 December 2006.[reply]

Well, that's a POV. In my experience, Dragon is not the best. Probably better to stay away from naming specific companies, although perhaps a reference list of the most well known is useful. A list of applications would be much more useful (such as the cell phones, or car navigation). NWebb 19:17, 31 January 2007 (UTC)[reply]

There is a pointer to a separate List of speech recognition software page that has that information, including identification of discontinued and unsupported software (e.g., ViaVoice). But I think this page should have some useful information, some sort of top-level summary, not just an easily missed link in the last sentence before the "See also" section. Further, that page may give an incorrect impression, because the available software options for platforms other than Windows are presented in table formats, but the Windows options are presented in three successive unordered lists. I will have a run at producing a software list table for Windows so that the presentation is consistent. Brucewh --/Bruce/ [aka Slasher] 07:51, 16 July 2015 (UTC)[reply]

Remove external link[edit]

Removed a spam link (several times) to a website called ivrdictionary. This is a thinly veiled attempt to put advertising on Wikipedia. Links were added by several anonymous users within a tight IP range. Website purports to list ivr terminology, but in reality it prominently displays an advertisement to Angel dot com, which is a commercial company that sells IVR related products. The same links were added to other articles that are related to IVR technology. Calltech 16:56, 17 November 2006 (UTC)[reply]

Hello.[edit]

Would anyone like to take a stab at making an overview for the following almost-the-same-thing subjects. I know someone round here is just salivating at the prospect of 4 slightly different articles saying the same things but I personally find the whole thing confusing and unrequired. Please put an overview.

--I'll bring the food 01:15, 26 November 2006 (UTC)[reply]

  • I think your first three are pretty much all the same thing. Don't forget Automatic Speech Recognition, (ASR), voice recognition, Direct Voice Input, (DVI), voice command, speech interface, natural language processing, etc etc. Martinevans123 (talk) 00:31, 12 December 2007 (UTC)[reply]
  • The latest edit by Tbutzon to the opening paragraph is very welcome. Am sure similar improvements could be made to the whole of this article. But the statement that ASR....."converts spoken words to text" is not strictly true. I agree in most applications it will, but a visual representation of the output from ASR, including standard graphemes, may or may not be produced, even when the recognition is successful. That’s a system design/ HMI question. But I'm note sure how to correct this. Martinevans123 (talk) 09:27, 7 January 2008 (UTC)[reply]

Books Section[edit]

I find it odd that this article has a section titled 'Books' this section currently conatins a single book on the subject (despite there being many and links to a online bookshop website where the book can be purchased. To me this reads like an advertisement especially as the book presented seems possibly less specialised on the subject of speech recognition than many other books that are out there but of course are not all listed. Canderra 01:41, 18 January 2007 (UTC)[reply]


Possible external link[edit]

I think that VoxForge (www.voxforge.org) should be added as an external link (but I am the VoxForge maintainer, so I cannot add the link myself). LDC is listed, and it *sells* Speech Corpora. VoxForge is trying to create a free English speech corpus for use in creating acoustic models for open source speech recognition engines. We are similar to the BAS – Bavarian Archive for Speech Signals site, which provides a free database of spoken German, which also has an external link. Kmaclean 18:42, 10 May 2007 (UTC)[reply]

Vandalism here and else where from 65.175.138.211[edit]

Please note the "wes is gay ..." comment here. Look then at the history of changes and look at the other things this AC has changed recently. Vandalism pure and simple. —Preceding unsigned comment added by 69.44.127.179 (talk) 18:34, 4 October 2007 (UTC)[reply]

"machine-readable text?"[edit]

I would agree with User: Three-quarter-ten that the task of most ASR is to take an input of human vocal utterances and to deduce from them, by means of phonemes, syllables or word shapes, a series or "words" conforming to an expected syntax or natural lanuage. But the visible (or audible) output is going to be a system design decision, e.g. the output might be - where to route a bag in an airport baggage handling depot, might be a grade on a pupil progress chart, or might be a string of words spoken in a different language i.e. not necessarily "written text" at all. Martinevans123 (talk) 23:12, 15 January 2008 (UTC)[reply]

Very good point. I revised my edit from "converts spoken words to machine-readable text, that is, to a string of character codes" to "converts spoken words to machine-readable input, for example, to a string of character codes". I guess that the irreducible common denominator, regardless of the system, is that the output of speech recognition is a binary string to be used as some sort of input. With a string of character codes to be input into a Word document being a very archetypical example. Thanks! — ¾-10 00:33, 16 January 2008 (UTC)[reply]
  • Binary code certainly, but I'm not sure where "character codes" come in. That's usually the output of an alphabetic keyboard/ keypad, which the ASR usually circumvents entirely. Martinevans123 (talk) 00:46, 16 January 2008 (UTC)[reply]
I see what you mean. From the point of view of the average user, what they "put in" is speech and what they "get out" is a string of character codes, which is to say, "typed" text in their Word document. I'm going to try another revision: "converts spoken words to machine-readable input (for example, to the binary code for a string of character codes)". That's a little dense for the lay reader, but it's more accurate than what I had before. If anybody has any ideas for yet a better phrasing (accurate but lay-friendly), feel free. — ¾-10 01:17, 16 January 2008 (UTC)[reply]


Solicitations of any kind are not proper for inclusion in Wikipedia artilces. OccamzRazor (talk) 00:49, 11 May 2008 (UTC)[reply]

Shameful anti-Gates bias![edit]

The article makes no mention of Windows Vista, even though it has one of the most advanced spoken command recognition and speech dictation capability among home and office use affordable environment. 82.131.210.162 (talk) 14:34, 27 May 2008 (UTC)[reply]

And? Just add it - better than howling around here. By the way, those "capabilities" may work in English, maybe ... maybe ... it is not working in the German Vista version.

--77.186.130.110 (talk) 16:34, 3 November 2008 (UTC)[reply]

Future of[edit]

There is no mention of the expected future for speech recognition. Any studies on the expansion of markets, acceptability of users, rate of increase of robustness to noise, etc?

It has been stated that speech recognition machines may exceed a humans understanding by the year 2012. Can anyone confer? —Preceding unsigned comment added by 71.65.21.242 (talk) 05:40, 8 June 2008 (UTC)[reply]

whoever said that is so dumb, i mean really dumb, for real. —Preceding unsigned comment added by 96.26.178.124 (talk) 04:45, 10 November 2010 (UTC)[reply]

Speech-to-text currently redirects to this article, however Speech-to-Text Reporter is not even mentioned on this article. Speech-to-text reporting is obviously a subset of speech recognition, but as the corresponding article presents no in-text citations and relatively little information anyway, I recommend a merge rather than an inset summary. Neelix (talk) 23:15, 10 November 2008 (UTC)[reply]

  • Agree. Martinevans123 (talk) 20:00, 11 November 2008 (UTC) [reply]
  • Completely disagree. Speech to text descirbes the process whereby people (stenogtraphers or palantypists) create a live sylabic feed that a computer compares to a dictionary to display words on screen. The computer does not recognise directly what the speaker is saying, it is the human operator. —Preceding unsigned comment added by 93.152.126.73 (talk) 15:44, 7 April 2009 (UTC)[reply]
You seem to be simply suggesting that because the computer speech recognition is not optimal in this case, the operator (or a second operator) has to check the output word-by-word before it is saved? But isn't that true for many, if not most, applications which have a visual HMI component? Martinevans123 (talk) 16:12, 7 April 2009 (UTC)[reply]
But reading Speech-to-Text Reporter I now see that it is, as you say, basically a real-time audio-typing function, albeit with a special keyboard. It thus seems to have nothing to do with computer speech recognition and I have to change my view to "Completely disagree" also. Apologies for missing the point first time here. It may be a "subset of speech recognition", but only insofar as is any natural human speech recognition. Martinevans123 (talk) 17:48, 16 April 2009 (UTC)[reply]

Proposed addition: LumenVox Speech Engine[edit]

{{request edit}}

Full disclosure: I am employed by LumenVox, which sells a commercial automatic speech recognizer. I make this disclosure in compliance with the WP:SCOIC guidelines. While I am a longtime user of Wikipedia, I have no experience as a contributor and would appreciate any help or guidance from editors.

Essentially I would like to suggest that our product, the LumenVox Speech Engine, be added to the list of "Commercial software/middleware" in this page. A description of the product can be found on our Web site at http://www.lumenvox.com/products/speech_engine/

I am not sure precisely what sorts of third-party references are needed to justify inclusion into this list, but if any editors can supply me with the type of references that would be required to justify notability, I can happily provide references. I don't see any references for the other applications in the list. I do believe the product (and the company) meet the notability guidelines.

Stephen Keller (talk) 19:09, 10 March 2009 (UTC)[reply]

It looks like someone added this in the mean time. Is it accurate? -- kenb215 talk 20:18, 14 April 2009 (UTC)[reply]
Yes, it looks good. Stephen Keller (talk) 20:25, 14 April 2009 (UTC)[reply]

Speech-to-Text vs Text-to-Speech[edit]

Aren't those 2 different things? Speech-to-text is like Dragon Naturally Speaking whereas Text-to-Speech would be the email readers or the speech reader for the Kindle 2. Shouldn't that be clarified? Harriska2 (talk) —Preceding undated comment added 16:39, 16 April 2009 (UTC).[reply]

Laptop?[edit]

The caption under the main pic (screensaver) describes it as a Toshiba Laptop; is this accurate? —Preceding unsigned comment added by 86.157.24.9 (talk) 14:19, 25 April 2009 (UTC)[reply]


Conflation Error, Corrected?[edit]

An individual speaker can make a speech with their voice, but isn't always the owner of the speech that they have just spoken; a speech can be written down on paper in words (or binary), and repeated.

A voice is a completely different thing, a voice is what an individual generates in their mind, and owns, even before it leaves their head (even if it's then recorded, and finally repeated elsewhere afterwards, as a speech / waveform).

A voice might be pre-molecular in origin, it might not be, but might also be beyond any systematic explanation, so what, its still a voice (just like a tree is a tree).

The semantics, the lack of scientific categorisation, and use of terms, or lack of, has completely messed-up this article.

Wiki needs definitive linked definitions to the following:

Voice recognition, as a generic term (with links to the following two types of voice recognition).

Speaker recognition, as a means of voice evaluation / verification, etc..

Speech recognition, as a means of voice convertion to text / command controls, etc.. —Preceding unsigned comment added by 86.159.197.208 (talk) 17:50, 26 April 2009 (UTC)[reply]

Linkfarms[edit]

I've remove the Commercial software/middleware and Open Source Software/Middleware sections per WP:EL and WP:NOTLINK. --Ronz (talk) 00:48, 13 July 2009 (UTC)[reply]

Thanks Ronz. For the benefit of anyone feeling that the lists should be restored, the See Also section has wikilinks to List of speech recognition software and Speech recognition in Linux which cover most of the notable ones without turning the main article into a link farm Kiore (talk) 11:44, 13 July 2009 (UTC)[reply]

Request for Review of Potential New Article: LumenVox[edit]

I am an employee of a company that I believe deserves an article on Wikipedia, but I am reluctant to post the article myself due to my obvious conflict of interest (I believe in the past my company had some employees post articles which were then deleted). It was previously suggested to me that I write a version of it in my user space and ask for it to be reviewed and eventually created by other editors. I have written a draft of the article at User:Stephen_Keller/LumenVox (edit | talk | history | links | watch | logs) and would like feedback on whether it is sufficiently NPOV, researched, and if it meets the notability guidelines. Any help is appreciated. Stephen Keller (talk) 00:04, 6 March 2010 (UTC)[reply]

This has been done.Stephen Keller (talk) 00:23, 13 March 2010 (UTC)[reply]

Update Links (Euro Fighter)[edit]

Reference 3 is a dead link, isn`t it? (Euro Fighter) I don't know the original page, but my suggestion is: http://www.eurofighter.com/capabilities/technology/voice-throttle-stick/direct-voice-input.html —Preceding unsigned comment added by 88.134.136.209 (talk) 14:00, 10 May 2010 (UTC)[reply]

Current Research[edit]

The discussion of current research is controversial, provocative and not supported by citation. The implication that funding for speech recognition has been reduced since 2001 is factually false. The DARPA projects EARS and GALE were both very well funded.

It is also misleading to say that performance has plateaued. Large vocabulary speech recognition has always been a very difficult task. Most of the performance improvement over the last 35 years has been due to the steady accumulation of small incremental improvements. The published results from the EARS and GALE research continues this trend. In fact the EARS project was noted for a reduction in error rate on certain tasks of nearly a factor of two, one of the largest single-year improvements in speech recognition performance.

It is true that DARPA has changed focus in that GALE no longer supports recognition of English, but only of Mandarin and Arabic. However, it has always been DARPA practice in speech recognition funding to continually shift focus to harder tasks as sufficient progress is made on the earlier tasks. It is sign of success, not of failure. Jay Page (talk) 16:42, 24 July 2010 (UTC)[reply]

it is true that is not DARPA anymore funding this research. But then other entities should bring on this research. DARPA is always on the forefront of research. According to this article the situation hasn't changed much from 2001 to 2006. http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition If there are other results you should write them with the bibliography. —Preceding unsigned comment added by 188.81.64.228 (talk) 10:54, 4 November 2010 (UTC)[reply]

An error in the History section[edit]

There seems to be an error in the first paragraph of the History section, it says the IBM shoebox was first exhibited in New York's world fair of 1964, even though the article for the IBM shoebox states,just as the official IBM site, that it was shown to the public in 1962 at the Seattle's World Fair

Excuse me if my way of speaking is confusing, English is not my first language and I haven't practiced it in years. —Preceding unsigned comment added by 201.172.101.197 (talk) 05:06, 12 October 2010 (UTC)[reply]

Quality & refs[edit]

I am not happy with the refernces (just 8, 3 of which are irrelevant) or the overall quality of the article. It talks a lot about applications, but doe sot really deal with the technical challenges etc. I would say a 70% rewrite is needed. Not that I can pay attention to it now, but it should be suggested for someone in the field to rewrite. History2007 (talk) 13:32, 7 February 2011 (UTC)[reply]

Stress and fatigue voice characteristics[edit]

Where is the best place for voice monitoring for other measures such as stress, e.g. [1] and fatigue, e.g. [2],[3] and [4]? Thanks. Martinevans123 (talk) 21:26, 27 September 2012 (UTC)[reply]

How close are we...?[edit]

I mean, how close is Wikipedia from a [Click for Audio] button so I could listen to Wikipedia articles while I'm doing something else entirely? Anybody know if that is even in the pipeline? Thanks!

testing[edit]

File:Testing

very misleading section under the title "Neural Networks"[edit]

The five paragraphs under the title "Neural Networks" are very misleading. They imply that the only work done on Neural networks and speech recognition before 2010 was to use only neural networks on their own, that is without integrating with HMMs. Whilst this was true for a few years in the late '80's, in the early '90's saw the integration of MLPs and RNNs with HMMs (the neural nets replaced the GMMs in HMMs). This allowed phoneme recognition (see, Deep_learning#Automatic_speech_recognition which starts in 1992) and word recognition (see the section Deep_learning#History which says "Both shallow and deep learning (e.g., recurrent nets) of ANNs have been explored for many years.[42][43][44]"). In my mind this is enough evidence to show what the problem is, but I can find more if needed.

As I did quite a bit of the RNN/HMM work in the '90's I'm well placed to see the problem here and I do know the history so I could fix it. However, that would be in violation of the Wikipedia:Conflict_of_interest, so if I can help someone else do it please let me know.

DrTonyR (talk) 18:40, 8 December 2018 (UTC)[reply]

Voice Tag[edit]

Voice Tag redirects here, but is not explained in the article. -- Beland (talk) 23:52, 1 March 2019 (UTC) hello dear kira — Preceding unsigned comment added by 92.0.161.205 (talk) 03:45, 8 December 2023 (UTC)[reply]

A spoken language that's easy to understand for AIs?[edit]

I'm aware humans are attempting to make software understand their speech. But is there a spoken language specifically designed to be easy to understand by software? I enabled closed captions on a random YouTube video, and Google's software still messes up so much... Surely humans are smart enough to be able to design a spoken language that's easy for both humans (for speech) and AIs (to recognize)? Unless I'm overlooking it, I don't see anything on this in the article. --143.176.30.65 (talk) 15:18, 23 March 2021 (UTC)[reply]

Human natural languages have developed to optimise "human-human" communication. Are you talking just about artificial grammars and vocabularies that use words from existing languages, but restrict legal utterances? Or are you also looking to include general phonetic restrictions as well to create, for example, a wholly artificial spoken language? I guess there would be a trade-off on how easy that would be for any human to learn. AI-recognisable spoken language used in most applications is quite task-specific? Martinevans123 (talk) 15:27, 23 March 2021 (UTC)[reply]
Hi Martinevans123. I'm aware of WP:NOTFORUM, so I'll keep it relatively short. Basically, I'm wondering if a 'wholly artificial spoken language' exists with essentially a full vocabulary that's designed to make speech recognition much easier - yet still being easy enough for humans to learn. A universal language that any human, regardless their native language, could learn to easily communicate with AIs. Because lots of effort is put into improving software to understand dozens - if not hundreds - of different languages that are spoken in all kinds of ways. Is there a versatile language with characteristics (words or sounds) that are impossible to mix up for AIs. Is any research going into that direction. --143.176.30.65 (talk) 15:58, 23 March 2021 (UTC)[reply]
I see. I have never heard of any. I suspect the problems might outweigh the benefits. Martinevans123 (talk) 16:06, 23 March 2021 (UTC)[reply]

A Commons file used on this page or its Wikidata item has been nominated for deletion[edit]

The following Wikimedia Commons file used on this page or its Wikidata item has been nominated for deletion:

Participate in the deletion discussion at the nomination page. —Community Tech bot (talk) 20:25, 11 September 2021 (UTC)[reply]

Everyone[edit]

Hi everyone 27.109.115.130 (talk) 13:34, 8 March 2022 (UTC)[reply]

Ada pesan dari siganteng heri[edit]

Hallo sayang 36.68.53.51 (talk) 14:52, 31 August 2022 (UTC)[reply]

India Education Program course assignment[edit]

This article was the subject of an educational assignment at College Of Engineering Pune supported by Wikipedia Ambassadors through the India Education Program during the 2011 Q3 term. Further details are available on the course page.

The above message was substituted from {{IEP assignment}} by PrimeBOT (talk) on 19:56, 1 February 2023 (UTC)[reply]

First sentence is not a sentence?[edit]

"Speech recognition most important an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability."

Perhaps I'm just having an aneurysm, but this is seems confusing. 72.192.173.145 (talk) 04:24, 27 February 2023 (UTC)[reply]

I also noticed that and came here to see what was up with it 66.108.250.83 (talk) 07:41, 27 February 2023 (UTC)[reply]

Wiki Education assignment: Linguistics in the Digital Age[edit]

This article was the subject of a Wiki Education Foundation-supported course assignment, between 21 August 2023 and 11 December 2023. Further details are available on the course page. Student editor(s): Pbroskoff (article contribs).

— Assignment last updated by Fedfed2 (talk) 00:53, 9 December 2023 (UTC)[reply]

Bias[edit]

I am a first time editor, please let me know if this looks ok? I think it would be good to add in the performance section.

Bias[edit]

Speech recognition tends to understand certain races and genders better than others. Speech recognition has a worse performance for women and non-white English speakers when looking at accuracy in English. Speech recognition has difficulty with certain accents and dialects. The rate of accuracy depends on the dialect that is spoken. Scottish and Indian accents tended to be misunderstood by speech recognition programs, like YouTube's auto-captioning.

These biases exist because speech recognition software has had more exposure to certain types of voices, which can lead to the programs getting voices not included in the training mistaken. Companies that use speech recognition are taking steps to try and fix the bias, but the bias exists on many speech recognition platforms today.

  1. ^
  2. ^ Jump up to:a b
  • This page was last edited on 13 October 2023, at 23:50.

Pbroskoff (talk) 23:54, 13 October 2023 (UTC)[reply]

Crear voz para hablar[edit]

Crear voz para hablar 152.206.232.240 (talk) 09:11, 26 December 2023 (UTC)[reply]

Could you make any comment(s) here in English? Thanks. Martinevans123 (talk) 12:29, 26 December 2023 (UTC)[reply]