13360418 (P.T.A.B. Feb. 17, 2016)

Ex Parte Dhanakshirur et al

Patent Trial and Appeal BoardFeb 17, 2016

13360418 (P.T.A.B. Feb. 17, 2016)

UNITED STA TES p A TENT AND TRADEMARK OFFICE APPLICATION NO. FILING DATE 13/360,418 01127/2012 91337 7590 02/19/2016 Nuance Communications, Inc. c/o Wolf, Greenfield & Sacks, P.C. 600 Atlantic A venue Boston, MA 02210-2206 FIRST NAMED INVENTOR Girish Dhanakshirur UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www .uspto.gov ATTORNEY DOCKET NO. CONFIRMATION NO. N0484.70607US01 7472 EXAMINER ADESANYA, OLUJIMI A ART UNIT PAPER NUMBER 2658 NOTIFICATION DATE DELIVERY MODE 02/19/2016 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address( es): Patents_ eOfficeAction @wolfgreenfield.com N0484_eOfficeAction@WolfGreenfield.com IP.Inbox@nuance.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte GIRISH DHANAKSHIRUR and JAMES R. LEWIS Appeal2014-002405 Application 13/360,418 Technology Center 2600 Before JOSEPH L. DIXON, JAMES R. HUGHES, and ERIC S. FRAHM, Administrative Patent Judges. FRAHM, Administrative Patent Judge. DECISION ON APPEAL Appeal2014-002405 Application 13/360,418 STATEMENT OF THE CASE Appellants appeal under 35 U.S.C. Â§ 134 from a rejection of claims 1, 16, and 18-35. We have jurisdiction under 35 U.S.C. Â§ 6(b). We affirm. The invention relates to collecting prompts in an interactive voice response system in order to replace Text to Speech (TTS) generated audio prompts with professionally recorded audio (see Spec. i-fi-f l, 5-7). Claim 1, reproduced below, is illustrative of the claimed subject matter: 1. A method comprising: executing a voice application that presents one or more user prompts as audio, wherein the voice application dynamically generates text for at least a first user prompt during execution of the voice application; determining, during execution of the voice application, that recorded audio corresponding to the text of the first user prompt is not available; and in response to the determining, storing the text of the first user prompt in a set of prompt texts for which recorded audio is to be recorded. REFERENCES The prior art relied upon by the Examiner in rejecting the claims on appeal is: Gilmore Da Palma Ju US 2003/0216923 Al US 2005/0131707 Al US 2006/0025996 A 1 2 Nov. 20, 2003 June 16, 2005 Feb.2,2006 Appeal2014-002405 Application 13/360,418 REJECTIONS The Examiner made the following rejections: 1 Claims 1, 16, 18-21, 26, 27, and 31-33 stand rejected under 35 U.S.C Â§ 103(a) as being unpatentable over Gilmore and Ju. Claims 22-25, 28-30, 34, and 35 stand rejected under 35 U.S.C Â§ 103(a) as being unpatentable over Gilmore, Ju, and Da Palma. ANALYSIS Appellants contend the "Office Action does not specifically identify how one of ordinary skill would have modified Gilmore's system to insert Ju's confirmation message" and the "Office Action does not explain why 'improving the quality of voice dialing systems' would have provided a reason for one of ordinary skill to modify Gilmore, when Gilmore does not relate to voice dialing systems at all" (Br. 10). Appellants also contend the combination of Gilmore and Ju does not disclose "the voice application dynamically generates text for at least a first user prompt during execution of the voice application," and "in response to the determining [that recorded audio corresponding to the text of the first user prompt is not available], storing the text of the first user prompt in a set of prompt texts for which recorded audio is to be recorded," as recited in claim 1 (Br. 11-13). We disagree with Appellants. Gilmore discloses an interactive voice response system as follows: A dynamic content generation (DCG) command may be used in the voice scripts to significantly increase the ability of the scripts to dynamically change in response to different types 1 The Examiner has withdrawn the double patenting rejection of claims 1, 16, and 18-35 (Ans. 2). 3 Appeal2014-002405 Application 13/360,418 of callers and in response to different caller inputs. DCG commands are inserted into the text of the scripts when the scripts are created, prior to storing them in data store 214. When the voice gateway 208 requests a script from the application server 212, the application server 212 access the script from the data store 214 and processes the script by resolving any DCG commands within the script into voice instructions (i.e., grammar or prompt instructions). The server 212 then sends the processed script to the voice gateway 208, and the voice gateway 208 presents the script to the caller, for example, as an audio message. (Gilmore i-f 52). In other words, the result of "processing all of the DCG commands in the voice script is a voice script in which all of the DCG commands have been replaced by none, one, or more than one voice instructions. DCG scripts, thereby, allow a voice developer to create voice scripts that vary in content on-the-fly .... " (Gilmore i-f 61). Specifically, replacing each dynamic content command includes "accessing a text block from a file corresponding to the identifier parameter of the DCG command . . . . This block of text is 'spoken' by the TTS engine of the gatev,ray 208 when the gateway 208 is unable to access a prompt file corresponding to the URL specified by the prompt instruction." (Gilmore i-f 70). Accordingly, we find Gilmore teaches dynamic generation of a voice script comprised of variable blocks of text specified by dynamic content generation commands that can be read to a caller through the use of a TTS (text-to-speech) engine. Thus, we find Gilmore meets the claim 1 limitation "dynamically generates text for at least a first user prompt during execution of the voice application" because the text content of Gilmore's voice script is dynamically generated. We further find Ju discloses the claim 1 limitation "in response to the determining [that recorded audio corresponding to the text of the first user prompt is not available], storing the text of the first user prompt in a set of 4 Appeal2014-002405 Application 13/360,418 prompt texts for which recorded audio is to be recorded." Specifically, Ju discloses a voice-dialing system where the system confirms the name of an intended call recipient by prompting the caller with "initial inquiry statements (e.g., 'Who would you like to contact?'), confirming statements (e.g., 'Did you say the name ?' or 'I think you said , is that right?'), etc." (Ju i-f 32). Ju' s system can use voice talent recordings or TTS to provide a caller with the audio response of the call recipient's name, but further "accomplishes the use of personal audio recordings to improve name confirmation" (Ju i-f 35). Accordingly, "the personal recording database 230 is automatically re-built every night, or at other predetermined times or frequencies (e.g., once a week, etc[.]) form the personal recordings collected/updated from a collection/update module or component 250" (Id.). Further: [T]he voice-dialing system keeps track of which potential recipients have personal recordings. One method of keeping track of this information is to embed the personal recording availability in the CFG [context-free grammar]. In order for the confirmation module 240 to efficiently decide whether a personal recording is available without posting back to the web server database 230, the nightly built grammar CFG 212 returns not only the ID and full text of the recognized names, but also information or data indicative of whether each recipient has a personal recording available. (Ju i-f 37). During operation of the voice-dialing system, [F]or the case where a personal recording is not available ... the confirmation module 240 can use a statement such as "Did you say ?", or "I think you said , is that right?", or "I think you want , is that right?" In these example statements, since a personal recording cannot be used, 5 Appeal2014-002405 Application 13/360,418 the statements can be generated using voice talent recordings, TTS generation, or a combination of the two. (Ju ii 40). In view of the foregoing, we find Ju teaches collecting and storing personal recordings of call recipients in a database and tracking which call recipients do not have personal recordings by embedding in the CFG the "full text of the recognized names, but also information or data indicative of whether each recipient has a personal recording available" (Ju ii 37). Thus, we find Ju's CFG meets the claim I limitation "storing the text of the first user prompt in a set of prompt texts for which recorded audio is to be recorded" because Ju suggests those names without personal recordings are intended to be recorded with personal audio by disclosing "the use of personal audio recordings to improve name confirmation" and a module 250 which collects personal recordings and periodically updates personal recording database 230 (Ju ii 35). We further find this "storing" is "in response to the determining [that recorded audio corresponding to the text of the first user prompt is not available]" because one of ordinary skill in the art would understand that in the case that a personal recording is not available for a certain name (see Ju ii 40), the CFG would subsequently be rebuilt with the text of the name and data indicating no personal recording is available (see Ju ii 37). We are thus not persuaded by Appellants' arguments that the combination of Gilmore and Ju fails to disclose the disputed limitations (see Br. 11-13). We are also not persuaded by Appellants' argument that the Examiner failed to provide a motivation to combine the references and failed to show how they would be combined (see Br. 10). We agree with the 6 Appeal2014-002405 Application 13/360,418 Examiner that both Gilmore and Ju relate to interactive voice systems, and that Ju suggests improving interactive voice systems by using collected personal audio recordings when available (Final Act. 11; Ans. 18). That is, in certain situations such as voice-dialing a call recipient, personal audio recordings are preferential to voice talent recordings or TTS generation (see Ju 34--41 ). Further, one of ordinary skill in the art would recognize the references would be combined by adding Ju's personal recording database and CFG to Gilmore in order to store the personal recordings and track names or other text for which there are no personal recordings. Moreover, we note that "if a technique has been used to improve one device, and a person of ordinary skill in the art would recognize that it would improve similar devices in the same way, using the technique is obvious unless its actual application is beyond his or her skill." KSR Int'! Co. v. Teleflex Inc., 550 U.S. 398, 417 (2007). Appellants have not shown applying Ju's technique of storing and tracking personal recordings for use in an interactive voice system to Gilmore's system would have been beyond the level of ordinary skill in the art. We are, therefore, not persuaded that the Examiner erred in rejecting claim 1, and claims 16 and 18-35 not specifically argued separately. CONCLUSION The Examiner did not err in rejecting claims 1, 16, and 18-35 under 35 U.S.C. Â§ 103(a). 7 Appeal2014-002405 Application 13/360,418 DECISION For the above reasons, the Examiner's rejection of claims 1, 16, and 18-35 is affirmed. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. Â§ 1.136(a)(l )(iv). AFFIRMED kis 8