10100717 (B.P.A.I. Sep. 11, 2007)

Ex Parte Deng

Board of Patent Appeals and InterferencesSep 11, 2007

10100717 (B.P.A.I. Sep. 11, 2007)

The opinion in support of the decision being entered today is not binding precedent of the Board. UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE BOARD OF PATENT APPEALS AND INTERFERENCES ____________ Ex parte LI DENG ____________ Appeal 2007-1864 Application 10/100,717 Technology Center 2600 ____________ Decided: September 11, 2007 ____________ Before MAHSHID D. SAADAT, ROBERT E. NAPPI, and JOHN A. JEFFERY, Administrative Patent Judges. JEFFERY, Administrative Patent Judge. DECISION ON APPEAL Appellant appeals under 35 U.S.C. § 134 from the Examiner’s rejection of claims 1-35. We have jurisdiction under 35 U.S.C. § 6(b). We affirm-in-part. Appeal 2007-1864 Application 10/100,717 STATEMENT OF THE CASE Appellant invented a speech recognition method with improved capability to recognize, among other things, hypo-articulated and hyper- articulated speech. Specifically, a predicted speech value for a hypothesis speech unit is determined using an articulatory dynamics value. In one embodiment, the articulatory dynamics value depends on such a value at a previous time and an articulation target. In another embodiment, the articulatory dynamics value depends in part on acoustic environmental values, such as noise and distortion values. And in another embodiment, a time constant that defines the articulatory dynamics values is trained using a variety of articulation styles.1 Claim 1 is illustrative: 1. A method of speech recognition, the method comprising: receiving an observable value that describes a portion of a speech signal; identifying a predicted value for a hypothesis phonological unit using an articulatory dynamics value that depends on an articulatory dynamics value at a previous time and an articulation target; and comparing the observed value to the predicted value to determine a likelihood for the hypothesis phonological unit. The Examiner relies on the following prior art reference to show unpatentability: Hutchins US 4,980,917 Dec. 25, 1990 1 See generally Specification 2:1 - 4:12. 2 Appeal 2007-1864 Application 10/100,717 Claims 1-35 stand rejected under 35 U.S.C. § 102(b) as being anticipated by Hutchins.2 Rather than repeat the arguments of Appellant or the Examiner, we refer to the Briefs and the Answer3 for their respective details. In this decision, we have considered only those arguments actually made by Appellant. Arguments which Appellant could have made but did not make in the Briefs have not been considered and are deemed to be waived. See 37 C.F.R. § 41.37(c)(1)(vii). OPINION Anticipation is established only when a single prior art reference discloses, expressly or under the principles of inherency, each and every element of a claimed invention as well as disclosing structure which is capable of performing the recited functional limitations. RCA Corp. v. Applied Digital Data Systems, Inc., 730 F.2d 1440, 1444, 221 USPQ 385, 388 (Fed. Cir. 1984); W.L. Gore and Associates, Inc. v. Garlock, Inc., 721 F.2d 1540, 1554, 220 USPQ 303, 313 (Fed. Cir. 1983). 2 We note that the Examiner’s Answer does not expressly state the Examiner’s grounds of rejection, but instead refers us to a previous office action (Answer 3). Such incorporations by reference, however, are improper under current practice. See MPEP § 1207.02 (“An examiner's answer should not refer, either directly or indirectly, to any prior Office action without fully restating the point relied on in the answer.”). 3 An Appeal Brief was first mailed July 12, 2006. A second Brief, however, was filed Aug. 14, 2006 to correct various informalities. In response, a first Examiner’s Answer was mailed Dec. 6, 2006 which was followed by a Reply Brief filed Jan. 16, 2007. However, a second Answer was mailed Mar. 6, 2007 to correct various informalities. Throughout this opinion, we refer to (1) the Aug. 2006 Brief; (2) the Reply Brief; and (3) the March 2007 Answer. 3 Appeal 2007-1864 Application 10/100,717 Claims 1-12 The Examiner has indicated how the claimed invention is deemed to be fully met by the disclosure of Hutchins (Final Rejection 2-4). Regarding independent claim 1, Appellant argues that Hutchins does not identify a predicted value for a hypothesis phonological unit using an articulatory dynamics value that depends on an articulatory dynamics value at a previous time and an articulation target. According to Appellant, the articulatory parameters in Hutchins are not dependent upon a previous set of articulatory parameters (Br. 4-5; Reply Br. 3). The Examiner contends that any one of the identified eight articulatory parameters (76) in Figure 6 of Hutchins depends on predefined articulatory parameters or phonemes represented by the matrices (72). Appellant also argues that Hutchins does not compare an observed value to a predicted value to determine a likelihood for a hypothesis phonological unit as claimed (Br. 4). The Examiner responds that the claimed comparing step reads on Hutchins’ mapping spectral data into a series of predefined articulatory parameters (Answer 6). Appellant, however, disagrees and notes that Hutchins maps observed spectra into the predefined articulatory parameters: a technique involving converting or transforming the spectra into values for these predefined parameters. According to Appellant, there is no comparison between the spectra and the parameters; rather, the spectra are multiplied by a transform matrix to produce the parameter values. Appellant emphasizes that mapping a value into another value is not the same as comparing one value to another (Reply Br. 4) (emphasis added). 4 Appeal 2007-1864 Application 10/100,717 We will not sustain the Examiner’s rejection of independent claim 1. Hutchins discloses a system for determining articulatory parameters from audio speech.4 To this end, certain segments of digital speech data are selected for analysis based on predefined magnitude changes in data energy. After transforming the selected segments to spectral data segments, they are multiplied with a class distinction matrix 62 to provide a normalized vectorial representation of the probability of which predefined spectral class5 the received sound falls into (i.e., the normalized probability class vector 68) (Hutchins, col. 3, ll. 47-68; col. 4, ll. 28-37; col. 16, l. 48 - col. 17, l. 7; Fig. 6). 4 These articulatory parameters describe anatomy of the vocal tract that produces human speech. In particular, each parameter corresponds to a respective portion or sector of the anatomical representation. The value of the parameter indicates the displacement or instantaneous location of the represented anatomical portion with respect to an initial location (Hutchins, col. 3, ll. 9-38). In a preferred embodiment, Hutchins provides eight articulation parameters corresponding to: (1) jaw opening (JO); (2) lip rounding (RO); (3) tongue center height (TC); (4) tongue back horizontal position (BX); (4) tongue back vertical position (BY); (5) tongue tip horizontal position (TX); (5) tongue tip vertical position (TY); and (6) lower lip retraction (LL) (Hutchins, col. 12, l. 62 - col. 13, l. 8). 5 The predefined spectral classes represent groups of similar speech phonemes--the most basic, distinguishable units of speech in a given language (Hutchins, col. 3, ll. 42-53; col. 14, ll. 44-55). In a preferred embodiment, Hutchins provides six classes (Classes 0-5) corresponding to: (1) fricatives; (2) front vowels; (3) low vowels; (4) back vowels; (5) R’s; and (6) L and nasals (Hutchins, col. 14, ll. 50-55; col. 18, ll. 53-58). Two additional classes (Classes 6 and 7) are designated as null classes. See class distinction matrix example bridging columns 18 and 19 in Hutchins; see also matrix bridging columns 21 and 22. 5 Appeal 2007-1864 Application 10/100,717 Simultaneously, the spectral data segments are also applied to multiple class matrix multipliers 72 that provide vectorial outputs representative of predetermined articulatory parameter values. The class distinction vector information (i.e., obtained from the normalized probability class vector 68) is directed to plural multipliers 70 for combination with the output of the class matrix multipliers 72. As a result, a weighed average of class vectors is generated for a given sound. These resultant class vectors are then combined to form a single feature vector 76 whose elements are the articulatory parameter values for the speech data being processed (Hutchins, col. 3, l. 68 - col. 4, l. 11; col. 4, ll. 37-49; col. 17, ll. 7-38; Fig. 6). Turning now to the rejection, we note at the outset that the Examiner indicates that the “predicted value” limitation in claim 1 corresponds to Hutchins’ predefined articulatory parameters (Answer 6-7). The Examiner, however, also indicates that the “articulatory dynamics value at a previous time” likewise corresponds to the “predefined acoustic or articulatory parameter (Answer 6). While we generally agree with the Examiner that a predefined parameter corresponds to a parameter defined earlier in time, we fail to see how these predefined parameters themselves can reasonably correspond to both to the predicted value limitation and previous articulatory dynamics values. These predefined parameters characterizing the relevant anatomical aspects of interest,6 however, are used as the basis for the single feature vector 76. This vector results from the mapping process -- a mapping process which accounts for the probability that the speech has certain spectral characteristics. 6 See n.4, supra, of this opinion. 6 Appeal 2007-1864 Application 10/100,717 Assuming that the Examiner intended for the claimed “predicted value” to correspond to the single feature vector 76 obtained via the mapping process shown in Figure 6, we fail to see how this vector uses an articulatory dynamics value that depends on an articulatory dynamics value at a previous time and an articulation target, as claimed. At best, this single feature vector is based, at least in part, on two predefined parameters: (1) the eight predefined articulatory parameters, and (2) the six predefined spectral classes (and two null classes) forming the basis for class distinction matrix 62. Although we find that these eight predetermined articulatory parameters or six spectral classes (and two null classes) can be broadly considered “articulatory dynamics values at a previous time,” the Examiner has still failed to identify -- nor can we reasonably ascertain -- how the predicted value (i.e., the single feature vector) also depends on an articulation target as claimed. In fact, the Examiner did not identify an “articulation target” at all, let alone the recited dependence on such a target.7 Although Hutchins does indicate that the articulatory parameter values of the feature vector are visually inspected on a display (Hutchins, col. 17, ll. 39-50; Figs. 7-8) which would suggest a “target” application (i.e., an “articulation target”) for the “predicted value,” we still fail to see how the predicted value depends on such a target. To the contrary, the target application would depend on the predicted value under this interpretation. 7 Appellant, too, noted that the Examiner failed to identify the articulation target in a claim comparison chart in the Reply Brief. See Reply Br., at 3 (noting that the Examiner cited no language from Hutchins corresponding to the recited “articulation target” limitation). 7 Appeal 2007-1864 Application 10/100,717 For this reason alone, the Examiner has failed to make a prima facie case of anticipation of claim 1 based on Hutchins. Nevertheless, we do agree with the Examiner regarding the other recited limitations. The Examiner indicates that the claimed “observed value” corresponds to “phoneme that are observed” and that the claimed “comparing” limitation reads on “mapping” (Answer 6). As best we can understand, the Examiner appears to take the position that the mapping process outlined in Figure 6 -- a process that ultimately maps samples of received speech (an “observed value”) into eight articulatory parameters represented by the singe feature vector 76 (the “predicted value”) -- inherently involves comparing the observed and predicted values. We agree with this position. At least at a fundamental level, mapping one value into another necessarily requires comparing the respective values.8 At a minimum, a comparison is needed to make logical connections between the entities involved in the mapping process (e.g., identify and distinguish the source and target entities, etc.). We further note that the single feature vector 76 (“predicted value”) in Hutchins is based in part on the normalized probability class vector 68. Thus, the values constituting the single feature vector, in effect, are determined by the likelihood that a particular speech segment falls within a certain class. Therefore, Hutchins’ mapping process, in effect, compares observed values with predicted values -- a process that also determines a likelihood that a particular speech segment falls within a certain class. 8 The term “map” is defined in pertinent part as “[t]o make logical connections between two entities.” See Webopedia (Internet.com), at http://www.webopedia.com/TERM/m/map.html (last visited Aug. 30, 2007). 8 Appeal 2007-1864 Application 10/100,717 Despite our general agreement with the Examiner regarding certain limitations of claim 1 noted above, we nevertheless conclude that the Examiner has failed to make a prima facie case of anticipation for all limitations of independent claim 1. We therefore will not sustain the Examiner’s rejection of claim 1 or dependent claims 2-12. Claims 27-35 We will also not sustain the Examiner’s rejection of independent claim 27. Claim 27 calls for, in pertinent part, selecting a time constant from a group of time constants that have been trained using constructed speech. The Examiner provides no separate discussion of this limitation, but rather refers generally to the reasons provided in connection with the rejection of claims 1-12 as justification for the rejections of claims 13-35 (Final Rejection 4; Answer 8). We therefore presume that the Examiner’s position with respect to claim 10 (calling for the articulatory dynamics value to depend on a time constant) was intended to also apply to claim 27. The Examiner’s argument in connection with claim 10 in the Answer does not address how the time constant limitation is met by Hutchins (Answer 7). Turning to the Final Rejection, the Examiner indicates that Hutchins’ articulatory dynamics value depends on a time constant as evidenced in column 9, lines 28-37 of the reference. Hutchins bases the decision to select a particular segment, at least in part, on whether the energy level rises for at least four sample periods (Hutchins, col. 8, ll. 46-68). As shown in Table 1, the rise time is determined in this selection decision (Hutchins, col. 9, ll. 1-26; Steps 113- 120). 9 Appeal 2007-1864 Application 10/100,717 Certainly, determining rise time necessarily involves determining a corresponding time constant: the two values are merely proportional.9 But Hutchins’ segment selection that is based, in effect, on a time constant clearly does not involve training a time constant through a variety of articulation styles as claimed, let alone selecting a time constant from a group of such trained time constants. At best, Hutchins merely determines rise time (and therefore a time constant) which forms the basis for segment selection. For this reason alone, we will not sustain the Examiner’s anticipation rejection of independent claim 27 and dependent claims 28-35. Independent claim 13 We will, however, sustain the Examiner’s rejection of independent claim 13. At the outset, we note an ambiguity in the claim language with respect to lines 5 and 6 of the claim pertaining to the acoustic environment value. Specifically, it is unclear whether the phrase “that depends in part on an acoustic environment value” modifies the preceding “articulatory value” limitation or the “predicted acoustic value” limitation. That is, the claim could be construed as follows: (1) determining a predicted acoustic value for a phonological unit and the predicted acoustic value depends in part on an acoustic environment value; or 9 See First-Order RC and RL Circuits, UCSB ECE2A, Spring 2007 Lab #6, at http://www.ece.ucsb.edu/courses/ECE002/2A_S07Banerjee/ECE2A%20lab %206.pdf (noting that rise time can be expressed as 2.2 x the time constant). 10 Appeal 2007-1864 Application 10/100,717 (2) utilizing an articulatory value that (1) describes a dynamic aspect of a speech signal, and (2) depends in part on an acoustic environment value. Our particular choice of construction, however, does not substantively impact our overall interpretation of the claim. Ultimately, the predicted acoustic value will depend either (1) directly on the acoustic environmental value, or (2) indirectly on the acoustic environmental value that utilizes an articulatory value which, in turn, depends on the acoustic environmental value. Nevertheless, despite this ambiguity, we find the second construction to be the most reasonable as it most naturally aligns with the disclosure.10 We therefore construe claim 13 as requiring the articulatory value to depend in part on the acoustic environment value. With this construction, we turn to Hutchins. In our view, Hutchins’ single feature vector 76 reasonably corresponds to “a predicted acoustic value for a phonological unit” since this vector effectively depends on the normalized probability class vector 68. This normalized probability class vector represents the probability that a particular spectral segment (“phonological unit”) falls within a respective class in accordance with the class distinction matrix 62 (“articulatory values” that describe dynamic aspects of the speech signal). These class distinction values are dependent upon not only particular articulatory patterns that arise frequently, but also 10 See, e.g., claims 8 and 9 (reciting that the articulatory dynamics value depends on noise and distortion values respectively); see also Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (en banc) (“The construction that stays true to the claim language and most naturally aligns with the patent’s description of the invention will be, in the end, the correct construction.”) (citations omitted). 11 Appeal 2007-1864 Application 10/100,717 the particular language spoken (i.e., the particular linguistic environment). See Hutchins, col. 12, ll. 30-37. Thus, the class distinction values are dependent upon “acoustic environmental values” pertaining to, at a minimum, the specific language spoken. But even if we were to construe the claim such that the predicted acoustic value directly depended in part on an acoustic environmental value (the first construction above), the claim would still be fully met. In that case, the vector normalization element 66 would constitute an “acoustic environmental value” as it utilizes data from the raw class vector 64 -- data dependent, at least in part, on the class distinction values which, in turn, depend on the particular acoustic environment as noted above (e.g., the linguistic environment). For at least these reasons, we will sustain the Examiner’s rejection of independent claim 13. Claims 14 and 15 We will also sustain the Examiner’s rejection of claims 14 and 15 which call for the acoustic environmental value to comprise a noise and distortion11 value respectively. As noted previously in connection with claim 27, we presume that the Examiner’s position with respect to claims 8 and 9 (calling for the articulatory dynamics value to depend on noise and 11 The term “distortion” is not defined in Appellants’ specification; thus, we construe this term as having its plain meaning. The term “distort” is defined, in pertinent part, as “to change something from its usual, original, natural or intended meaning, condition or shape.” See Cambridge Dictionaries Online, at http://dictionary.cambridge.org/define.asp?key=22695&dict=CALD (last visited Sept. 4, 2007). 12 Appeal 2007-1864 Application 10/100,717 distortion values respectively) was intended to also apply to claims 14 and 15. For both limitations, the Examiner refers to column 6, lines 3-17 of Hutchins (Final Rejection 3). The passage refers to conditioning the frequency domain samples to minimize the impact of variations and noise. In this regard, Hutchins also notes that the output from the FFT element 42 is conditioned by a transform conditioning element 44 before final processing to account for processing variations and noise (Hutchins, col. 10, ll. 48-54; col. 11, ll. 56; Figs. 1 and 5). The values generated as result of this conditioning process, in our view, are reasonably considered “acoustic environmental values” generally. By minimizing the impact of variations (i.e., distortion) and noise, the conditioning process effectively accounts for spurious products generated within the acoustic environment. Furthermore, since the values in the class distinction matrix and raw class vector (and therefore the vector normalization element 66) depend, at least in part, on the selected samples, they likewise depend on these acoustic environmental values as well. For the foregoing reasons, we will sustain the Examiner’s rejection of claims 14 and 15. Claim 16 We will also sustain the Examiner’s rejection of claim 16. As we indicated previously in connection with claim 1, we find that the eight predetermined articulatory parameters or six spectral classes (and two null classes) can be broadly considered “articulatory dynamics values at a previous time” -- values that fully meet articulatory values “of the previous 13 Appeal 2007-1864 Application 10/100,717 time frame” as claimed. Significantly, since these predefined parameters are defined earlier in time, they correspond to articulatory values “of the previous time frame” as claimed -- a time frame that is unspecified. For the foregoing reasons, we will therefore sustain the Examiner’s rejection of claim 16. Claims 17-20 We will not, however, sustain the Examiner’s rejection of claim 17. As we indicated previously in connection with claim 1, the Examiner has failed to identify -- nor can we reasonably ascertain -- how the predicted acoustic value (i.e., the single feature vector) depends on an articulation target as claimed. Although Hutchins does indicate that the articulatory parameter values of the feature vector are visually inspected on a display (Hutchins, col. 17, ll. 39-50; Figs. 7-8) which would suggest a “target” application (i.e., an “articulatory target”) for the articulatory value used by the predicted value, we still fail to see how the articulatory value depends on such a target. For the foregoing reasons, we will not sustain the Examiner’s rejection of claim 17 or dependent claims 18-20. Claim 21 We will, however, sustain the Examiner’s rejection of claim 21. As we indicated previously in connection with claim 27, Hutchins determines rise time (and therefore a time constant) which forms the basis for segment 14 Appeal 2007-1864 Application 10/100,717 selection.12 Thus, since the articulatory value depends on the segments selected, it therefore depends, at least in part, on a time constant. Claim 21 is therefore fully met by Hutchins. Claims 22-26 We will not, however, sustain the Examiner’s rejection of claims 22- 26. Hutchins’ segment selection that is based, in effect, on a time constant does not involve training a time constant, let alone training a time constant using a variety of articulation styles as claimed. At best, Hutchins merely determines rise time (and therefore a time constant) which forms the basis for segment selection. For this reason alone, we will not sustain the Examiner’s anticipation rejection of claims 22-26. DECISION We have sustained the Examiner's rejections with respect to claims 13-16 and 21. We have not, however, sustained the Examiner's rejections with respect to claims 1-12, 17-20, and 22-35. Therefore, the Examiner’s decision rejecting claims 1-35 is affirmed-in-part. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a)(1)(iv). 12 See P. 9, supra, of this opinion. 15 Appeal 2007-1864 Application 10/100,717 AFFIRMED-IN-PART eld WESTMAN CHAMPLIN (MICROSOFT CORPORATION) SUITE 1400 900 SECOND AVENUE SOUTH MINNEAPOLIS, MN 55402-3319 16