International Business Machines CorporationDownload PDFPatent Trials and Appeals BoardOct 1, 20212020005066 (P.T.A.B. Oct. 1, 2021) Copy Citation UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 15/471,436 03/28/2017 Aaron K. Baughman END920170086US1 1049 108686 7590 10/01/2021 IBM IPLAW (GLF) c/o Garg Law Firm, PLLC 800 Bonaventure Way, suite 115 Sugar Land, TX 77479 EXAMINER SIRJANI, FARIBA ART UNIT PAPER NUMBER 2659 NOTIFICATION DATE DELIVERY MODE 10/01/2021 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): dpandya@garglaw.com garglaw@gmail.com uspto@garglaw.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte AARON K. BAUGHMAN, JOHN M. GANCI JR., STEPHEN C. HAMMER and CRAIG M. TRIM Appeal 2020-005066 Application 15/471,436 Technology Center 2600 Before CARL W. WHITEHEAD JR., ERIC S. FRAHM and DAVID M. KOHUT, Administrative Patent Judges. PER CURIAM DECISION ON APPEAL Pursuant to 35 U.S.C. § 134(a), Appellant1 appeals from the Examiner’s decision to reject claims 1–25. We have jurisdiction under 35 U.S.C. § 6(b). We AFFIRM. 1 We use “Appellant” to reference the “applicant” as defined in 37 C.F.R. § 1.42. Appellant identifies the real party in interest as “International Business Machines Corporation.” Appeal Br. 2. Appeal 2020-005066 Application 15/471,436 2 STATEMENT OF THE CASE Appellant’s Invention The present invention relates to a “language model provid[ing] probabilistic indications that a series of words are found together for a particular subject-matter domain” (Spec. ¶ 6), e.g., “‘thoracic’ and ‘surgery’” (id. ¶ 7), and an “acoustic model determin[ing] a probability that a given audio signal includes a series of words of human speech” (id. ¶ 10), i.e., “the probability of phones (a phonetically distinct sound in a speech) given a signal” (id.). The invention can “cause[] an error in the acoustic model [to] be corrected through retraining using the language model’s success.” Id. ¶ 17.2. The invention can “further operate[] . . . the language model on the [of-interest] portion of the speech signal . . . [to produce a] textual representation . . . for the training from the language model that correctly recognized the word[(s)].” Id. ¶ 21. And, the invention can “determine[] a severity of an error associated with the acoustic model incorrectly recognizing the portion of the speech signal . . . [and] boost[] a number of occurrences of the training . . . [as] a function of the severity.” Id. ¶ 19. Claim 1, reproduced below, is illustrative of argued subject matter. 1. A method comprising: selecting, to recognize spoken words in a speech signal generated from a speech, a model-pair, the model-pair comprising an acoustic model and a language model; computing a degree of disjointedness between the acoustic model and the language model relative to the speech by comparing, responsive to the model pair performing speech recognition on the speech signal, a first recognition output Appeal 2020-005066 Application 15/471,436 3 produced from the acoustic model and a second recognition output produced from the language model; determining, responsive to the acoustic model incorrectly recognizing a portion of the speech signal as a first word and the language model correctly recognizing the portion of the speech signal as a second word, a textual representation of the second word; associating with the textual representation, a set of sound descriptors; generating, using the textual representation and the set of sound descriptors, a training speech pattern; and training, using the training speech pattern to produce a retrained acoustic model, the acoustic model to recognize the portion of the speech signal as the second word, the training causing the retrained acoustic model and the language model to recognize the portion of the speech signal as the second word. Appeal Br. 16 (Claims Appendix). Rejections Claims 1–3, 6, 7, 13, 15–17, and 20–25 are rejected under 35 U.S.C. § 103 as being unpatentable over Choi (US 2017/0053652 A1; Feb. 23, 2017) and Faisman (US 2008/0319743 A1; Dec. 25, 2008). Final Act. 14–35. Claim 4 is rejected under 35 U.S.C. § 103 as being unpatentable over Choi, Faisman, and Wang (US 2007/0219798 A1; Sept. 20, 2007). Final Act. 35–37. Claim 5 is rejected under 35 U.S.C. § 103 as being unpatentable over Choi, Faisman, Wang, and Matsuda (US 2016/0260428 A1; Sept. 8, 2016). Final Act. 38–39. Appeal 2020-005066 Application 15/471,436 4 Claims 8 and 10–12 are rejected under 35 U.S.C. § 103 as being unpatentable over Choi, Faisman, and Suendermann ’536 (US 2010/0268536 A1; Oct. 21, 2010). Final Act. 39–44. Claim 9 is rejected under 35 U.S.C. § 103 as being unpatentable over Choi, Faisman, Suendermann ’183 (US 2012/0166183 A1; June 28, 2012), and Rennie (US 2017/0053644 A1; Feb. 23, 2017). Final Act. 44–48. Claim 14 is rejected under 35 U.S.C. § 103 as being unpatentable over Choi, Faisman, and Reich (US 8,285,546 B2; Oct. 9, 2012). Final Act. 49– 50. Claims 18 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over Choi, Faisman, and Suzuki (US 2017/0169813 A1; June 15, 2017). Final Act. 50–54. OPINION Appellant presents the same arguments for each of the independent claims and asserts the dependent claims stand on the patentability of their independent claims. Appeal Br. 10 et seq. We address the arguments with reference to claim 1. For the following reasons, we are unpersuaded of error and thus sustain the rejections of claims 1–25. A dispositive issue before us is whether Choi teaches or suggests the following claim limitation: “responsive to the acoustic model incorrectly recognizing a portion of the speech signal as a first word and the language model correctly recognizing the portion of the speech signal as a second word.” Appeal Br. 10–11 (emphasis omitted). Appellant contends: Choi discloses a speech recognition system using both an acoustic model and a language model. A combiner unit combines outputs from each model to produce a final speech recognition result. In particular, the acoustic model provides the Appeal 2020-005066 Application 15/471,436 5 combiner unit with probabilities that the speech matches particular letters, a syllable, or a higher-level linguistic unit. The language model also provides the combiner unit with probabilities that the speech matches various linguistic units, this time considering the relationships among current and past linguistic units. The combiner combines the highest probability recognition results from each model to determine a final result. . . . Importantly, in Choi’s implementation the acoustic model produces probabilities that the speech matches a particular linguistic unit, not probabilities that any such match is correct or incorrect. . . . Similarly, the language model produces probabilities that the speech matches various linguistic units— also not probabilities that any such match is correct or incorrect. Without any evaluation of correctness, Choi cannot properly be interpreted as teaching or suggesting [the disputed claim limitation]. Appeal Br. 11–12 (emphasis and internal citations omitted). The Examiner responds: Applicant disputes the mapping of the concept of “correctly recognizing a speech input” to Choi. The disputed qualifier “correctly” is [n]ot defined in the Claim. The ordinary meaning of “correctly recognizing a speech input,” in the context of the art of speech recognition, is having a result with a “confidence/likelihood/probability” value that is considered acceptable according to some criterion. . . . Choi considers the result with the “highest probability,” from among two results, as the “accurate” one. The “highest probability” result of Choi teaches “correct” result of the Claim. Ans. 4 (formatting altered, e.g., bullet-format omitted). Appellant replies: Since Appellants did not define the term “correct” in the Application, the plain meaning of the word is necessarily used. According to the Merriam-Webster online dictionary at https://www.merriam-webster.com/dictionary/ Appeal 2020-005066 Application 15/471,436 6 correct, the term “correct” is an adjective defined as “conforming to or agreeing with fact, logic, or known truth”. This definition is the first definition listed for the term, and it is this definition that is the plain meaning intended by the Appellants[.] Reply Br. 2 (emphasis omitted). We are unpersuaded the Examiner erred. We rather agree with the Examiner’s claim interpretation because the Specification explains that the claimed language model and acoustic model each provide a “probability” that a speech signal contains the respectively output series of words. See supra 2 (description of Appellant’s invention). Accordingly, we find the Specification teaches that each output provides a ‘best guess’ and corresponding likelihood of the speaker’s intended series of words. Further, we find (i) the claimed “language model correctly recognizing the portion of the speech signal” to mean that a high probability for the respective guess is found; and (ii) the claimed “acoustic model incorrectly recognizing a portion of the speech signal” to mean that a low probability for the respective guess is found. That is, based on the probabilities of the ‘best guesses,’ the invention deems the language model’s guess as “correct,” the acoustic model’s guess as “incorrect,” and then trains the acoustic model to recognize the speech signal as the language model’s guess. Appellant’s argued claim construction asks us to, essentially, forego the above probabilistic determination of a “correct” output (and an “incorrect” output) that reasonably turns on the language and acoustic models disclosed by the Specification. See id. Appellant asks us to apply, instead, an absolute (i.e., free from imperfection) determination of a “correct” output that unreasonably turns upon the abstract intent of a speaker Appeal 2020-005066 Application 15/471,436 7 implied by the claim.2 See id. And in support, Appellant offers only a single general dictionary definition of “correct,” which is “conforming to . . . logic” (Reply Br. 2 (block-quoted above) (emphasis omitted)) and thus comports with a probabilistic determination of “correct.” See also Merriam Webster Dictionary, available at https://www.merriam-webster.com/dictionary/ correct (last visited Sept. 20, 2020) (supporting a probabilistic-threshold meaning for the claimed “correct” and “incorrect” by defining the term “correct” (adjective) as “conforming to an approved . . . standard”). For the foregoing reasons, the Examiner’s interpretation of “correct” is reasonable and more comprehensive than Appellant’s interpretation. See Phillips v. AWH Corp., 415 F.3d 1303, 1321 (Fed. Cir. 2005) (“[T]he specification is the single best guide to the meaning of a disputed term . . . [and] acts as a dictionary when it expressly defines terms used in the claims or when it defines terms by implication.” (quotation marks and citations omitted)). Appellant also contends: Faisman also discloses using both an acoustic model and a language model for speech recognition. A human proofreader corrects a resulting transcript; then the corrected transcript is used to train the language model, and used along with the original audio to train the acoustic model. However, in Faisman’s implementation the proofreader corrects the final results, after both models have processed the speech. Faisman does not disclose, and cannot be reasonably interpreted as disclosing, “comparing, responsive to the model pair performing speech recognition on the speech signal, a first 2 That is, Appellant asks us to interpret “correct” as meaning the claimed language model’s output is the exact series of word intended by the speaker and “incorrect” as meaning the claimed acoustic model’s output is not the exact series of words intended by the speaker. Appeal 2020-005066 Application 15/471,436 8 recognition output produced from the acoustic model and a second recognition output produced from the language model.” Appeal Br. 12 (emphasis and internal citations omitted). The Examiner responds: “[T]he Claim is entirely mapped to Choi but for the last limitation of ‘training’ for which Faisman was added. . . . [T]he [Choi-Faisman combination is a] tandem of the recognition output of the process of Choi providing input to the training process of Faisman.” Ans. 7. Appellant does not reply to the Examiner’s above clarification of how Faisman is applied, nor address Faisman whatsoever. Reply Br. 2 et seq. As Appellant does not address the Examiner’s clarification of the Choi-Faisman combination, the arguments misapprehend and thereby fail to address the Examiner’s actual reliance on Faisman. OVERALL CONCLUSION For the foregoing reasons, we affirm the Examiner’s decision to reject claims 1–25. Appeal 2020-005066 Application 15/471,436 9 DECISION SUMMARY Claim(s) Rejected 35 U.S.C. § Reference(s)/Basis Affirmed Reversed 1–3, 6, 7, 13, 15–17, 20–25 103 Choi, Faisman 1–3, 6, 7, 13, 15–17, 20–25 4 103 Choi, Faisman, Wang 4 5 103 Choi, Faisman, Wang, Matsuda 5 8, 10–12 103 Choi, Faisman, Suendermann ’536 8, 10–12 9 103 Choi, Faisman, Suendermann ’183, Rennie 9 14 103 Choi, Faisman, Reich 14 18, 19 103 Choi, Faisman, Suzuki 18, 19 Overall Outcome 1–25 TIME PERIOD FOR RESPONSE No time period for taking any subsequent action in connection with this Appeal may be extended under 37 C.F.R. § 1.136(a). See 37 C.F.R. § 1.136(a)(1)(iv). AFFIRMED Copy with citationCopy as parenthetical citation