12635235 (P.T.A.B. Aug. 31, 2017)

Ex Parte Nilsson et al

Patent Trial and Appeal BoardAug 31, 2017

12635235 (P.T.A.B. Aug. 31, 2017)

United States Patent and Trademark Office UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O.Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 12/635,235 12/10/2009 Mattias Nilsson 335483-US-CIP 8963 69316 7590 09/05/2017 MICROSOFT CORPORATION ONE MICROSOFT WAY REDMOND, WA 98052 EXAMINER ORTIZ SANCHEZ, MICHAEL ART UNIT PAPER NUMBER 2658 NOTIFICATION DATE DELIVERY MODE 09/05/2017 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): u sdocket @ micro soft .com chriochs @microsoft.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte MATTIAS NILSSON, SOREN VANG ANDERSON, and KOEN BERNARD VOS Appeal 2017-005466 Application 12/635,2351 Technology Center 2600 Before CARLA M. KRIVAK, HUNG H. BUI, and JON M. JURGOVAN, Administrative Patent Judges. BUI, Administrative Patent Judge. DECISION ON APPEAL Appellants seek our review under 35 U.S.C. § 134(a) from the Examiner’s Final Rejection of claims 1—5 and 7—24, which are all the claims pending in the application. We have jurisdiction under 35 U.S.C. § 6(b). We AFFIRM-IN-PART.2 1 According to Appellants, the real party in interest is Skype. Br. 3. 2 Our Decision refers to Appellants’ Appeal Brief (“Br.”) filed March 18, 2016; Examiner’s Answer (“Ans.”) mailed September 29, 2016; Final Office Action (“Final Act.”) mailed October 19, 2015; and original Specification (“Spec.”) filed December 10, 2009. Appeal 2017-005466 Application 12/635,235 STATEMENT OF THE CASE Appellants ’ Invention Appellants’ invention relates to a system and a method that regenerates wideband speech from narrowband speech, improves speech naturalness, and alleviates metallic artefacts. Spec. 1:8—10, 2:15—16, 3:10- 11, 8:13—15, Title, and Abstract. According to Appellants, samples of a narrowband speech signal are received and modulated, via a modulation signal having a modulating frequency adapted to upshift each frequency in a narrowband frequency band (associated with the narrowband speech signal) by an amount determined by the modulating frequency. Abstract. The narrowband frequency band that is to be translated upwards is selected to be a band that is more likely to have a harmonic structure closer to that of a missing wideband speech portion. Spec. 4:18—20, 8:13—18, 11:11—15. For example, the narrowband frequency band can be selected to include frequencies with a specified signal characteristic, e.g., good signal-to-noise ratio, minimum echo, minimum distortion, and/or certain degree of voicing. Spec. 5:5—10, 8:13—18, 9:25—30. The modulating frequency is similarly selected so as to upshift the selected narrowband frequency band into a target band of the missing wideband speech, thereby preserving harmonic structure when regenerating the wideband speech. Spec. 4:18—20, 5:13—14, 6:20-24, 7:23—25, Abstract. Claims 1 and 16 are independent. Representative claim 1 is reproduced below with disputed limitations in italics'. 1. A method implemented in a receiver to regenerate wideband speech from narrowband speech, the method comprising: 2 Appeal 2017-005466 Application 12/635,235 receiving, by a decoder of the receiver, samples of a narrowband speech signal in a first range of frequencies, the narrowband speech signal missing a portion of wideband speech from which the narrowband speech signal was generated; evaluating signal characteristics of each of the frequencies in the first range, an evaluation indicating which of the frequencies in the first range when translated into a target band are determined to be more likely, based on the signal characteristics, to result in a reduced-artefact wideband speech signal having a harmonic structure that is closer to a harmonic structure of the portion of the wideband speech that is missing than other frequencies in the first range; modulating the received samples of the narrowband speech signal with a modulation signal having a modulating frequency adapted to upshift each frequency in the first range of frequencies by the modulating frequency, the modulating frequency selected to translate into the target band a selected frequency band of the first range of frequencies that is selected according to the evaluation; filtering the modulated samples using a target band filter to form a regenerated speech signal in the target band; and combining the narrow band speech signal with the regenerated speech signal in the target band to regenerate the reduced-artefact wideband speech signal. Br. 21—26 (Claims App’x). Examiner’s Rejections & References (1) Claims 1—5 and 7—15 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Iser et al. (US 2008/0195392 Al; published August 14, 2008; “Iser”) and Nilsson et al. (US 2003/0009327 Al; published January 9, 2003; “Nilsson”). Final Act. 4—12. (2) Claims 16—24 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Iser, Nilsson, and Jax et al. (US 2003/0050786 Al; published March 13, 2003; “Jax”). Final Act. 12—18. 3 Appeal 2017-005466 Application 12/635,235 Issues on Appeal Based on Appellants’ arguments, the dispositive issue on appeal is whether: (1) the combination of Iser and Nilsson teaches or suggests (i) “receiving, by a decoder of the receiver, samples of a narrowband speech signal in a first range of frequencies, the narrowband speech signal missing a portion of wideband speech from which the narrowband speech signal was generated” and (ii) “evaluating signal characteristics of each of the frequencies in the first range, an evaluation indicating which of the frequencies in the first range when translated into a target band are determined to be more likely, based on the signal characteristics, to result in a reduced-artefact wideband speech signal having a harmonic structure that is closer to a harmonic structure of the portion of the wideband speech that is missing than other frequencies in the first range,” as recited in Appellants’ independent claim 1. Br. 12—15; and (2) the combination of Iser, Nilsson, and Jax teaches or suggests “determining a signal characteristic for selecting which of the frequencies in the first range of frequencies are to be translated into the target band . . . the signal characteristic that is determined being identifiable in the frequencies in the first range that are more likely, when translated according to a pitch- dependent spectral translation, to result in a regenerated wideband speech signal having a harmonic structure that approximates a harmonic structure of the portion of the wideband speech that is missing,” as recited in Appellants’ independent claim 16. Br. 16—19. ANALYSIS Claims 1—5 and 7—15 With respect to independent claim 1, the Examiner finds Iser’s bandwidth extension system regenerates wideband speech from narrowband speech using a spectral shifter for modulating a narrowband speech signal 4 Appeal 2017-005466 Application 12/635,235 (received acoustic signal) with a modulation signal having a modulating frequency (of a predetermined shifting frequency value), the modulating frequency adapted to upshift a frequency range of the received acoustic signal “so that the shifted signal covers a frequency range suitable for complementing the [received] acoustic signal.” Final Act. 5 (citing Iser 126). The Examiner further finds Iser filters modulated samples to form a regenerated speech signal (upper bandwidth extension signal), and then combines the narrowband speech signal (received acoustic signal) with the regenerated speech signal, as claimed. Final Act. 5 (citing Iser || 30, 42). To support the conclusion of obviousness, the Examiner relies on Nilsson for teaching a narrowband speech signal missing a portion of wideband speech from which the narrowband speech was generated, as recited in claim 1. Final Act. 6 (citing Nilsson | 82). The Examiner also finds Nilsson extracts at least one essential attribute from the narrowband speech signal, thereby evaluating signal characteristics of each of the frequencies in a first narrowband range, as claimed. Ans. 18 (citing Nilsson | 82). The Examiner further finds Nilsson’s extracted essential attribute(s) from the narrowband speech signal provide confidence levels and estimated parameter values for wideband frequency components that are missing from the narrowband speech signal, thereby teaching indicating which narrowband frequencies, when translated, are determined to be more likely to result in a reduced-artefact wideband speech signal, as claimed. Final Act. 6—7 (citing Nilsson || 22, 83—84); Ans. 18—19. Nilsson’s Figure 9, as described in paragraphs 81—84 of Nilsson, is reproduced below with additional markings for illustration. 5 Appeal 2017-005466 Application 12/635,235 Nilsson’s Figure 9 illustrates a method of producing a wideband acoustic signal on basis of a narrowband acoustic signal. Nilsson | 81. Appellants dispute the Examiner’s factual findings regarding Nilsson and Iser. Particularly, Appellants contend the combination of Nilsson and Iser does not teach or suggest the claimed “receiving, by a decoder of the receiver, samples of a narrowband speech signal in a first range of frequencies, the narrowband speech signal missing a portion of wideband speech from which the narrowband speech signal was generated.” Br. 12. Appellants further argue “[t]here is simply no mention in Nilsson of indicating which frequencies can be translated into a target frequency band 6 Appeal 2017-005466 Application 12/635,235 to produce an artefact-reduced wideband speech signal.” Br. 13. Rather, Nilsson merely extracts an “essential attribute” from a narrowband acoustic signal to obtain “parameters that estimate characteristics of a high band speech signal,” which does not teach the claimed evaluation indicating frequencies that are more likely, when translated, to produce an artefact- reduced wideband signal. Br. 12—13 (citing Nilsson | 82). Appellants also dispute the Examiner’s rationale for combining Nilsson and Iser. Br. 13—15. Particularly, Appellants argue the Examiner’s rationale for combining the references is “too general” and “does not address why the skilled artisan .... would have selected particular components for combination in the manner claimed.” Br. 14—15. We do not find Appellants’ arguments persuasive. Instead, we find the Examiner has provided a comprehensive response to Appellants’ arguments supported by a preponderance of evidence. Ans. 17—19. As such, we adopt the Examiner’s findings and explanations provided therein. Id. For additional emphasis, we note Appellants do not address the Examiner’s rejection based on a combination of references, in which the Examiner relied on Iser for “receiving, by a decoder of the receiver, samples of a narrowband speech signal in a first range of frequencies,” and Nilsson for a “narrowband speech signal missing a portion of wideband speech from which the narrowband speech signal was generated.” Final Act. 4, 6 (citing Iser 121; Nilsson || 22, 82 (“A first step 901 receives a segment of the incoming narrow-band acoustic signal. . . . The wide-band acoustic signal includes wide-band frequency components outside the spectrum of the narrow-band acoustic signal”)); Ans. 18. Additionally, we agree with the Examiner’s reasonable findings, not addressed by Appellants. 7 Appeal 2017-005466 Application 12/635,235 We further agree with the Examiner that Nilsson’s “extracting] at least one essential attribute from the narrow-band acoustic signal” teaches ‘ ‘'evaluating signal characteristics of each of the frequencies'1'’ in the first range as recited in Appellants’ claim 1. Ans. 18 (citing Nilsson | 82). Nilsson’s “essential attribute” (parameter znb) “describes particular properties of the received narrow-band acoustic signal aNB” and may include “[t]he degree of voicing r, which represents one such essential feature.” See Nilsson |47. Thus, Nilsson’s “essential attribute” represents a “signal characteristic” of each of the frequencies in a “first range of frequencies,” as claimed. Further, Nilsson’s confidence level determines an allocation of signal energies (estimated parameters) to frequency components, thereby indicating frequencies that are more likely to appear in a wideband speech signal. See Nilsson || 17, 64 (the confidence level “can thus also be used to control the energy (or shape) of the bandwidth extended regions ... of the wide-band acoustic signal,” such that “a relatively high energy is allocated to frequency components being associated with a confidence level that represents a comparatively high degree certainty” and “a relatively low energy is allocated to frequency components if the confidence level being associated with a confidence level that represents a comparatively low degree certainty” (emphases added)), 68. Thus, Nilsson’s confidence levels and estimated parameters (signal energies) indicate which frequencies are determined to be more likely to result in a reduced-artefact wideband speech signal, as recited in Appellants’ claim 1. Ans. 18—19; Final Act. 6—7. We are also not persuaded by Appellants’ argument that “[t]here is simply no mention in Nilsson of indicating which frequencies can be 8 Appeal 2017-005466 Application 12/635,235 translated into a target frequency band to produce an artefact-reduced wideband speech signal” (see Br. 13 (emphasis added)). Particularly, Nilsson discloses that a model of energy dependencies between narrowband and wideband controls the signal energies allocated to regenerate a high quality wideband speech signal from narrowband speech by spectral folding. See Nilsson H 22, 35, 46, 74 (describing spectral folding), Fig. 3. Thus, we agree with the Examiner that Nilsson teaches an evaluation indicating which frequencies, when translated, are determined to be more likely to result in reduced-artefact wideband speech, as claimed. Ans. 18—19; Final Act. 6—7. Additionally, Iser discloses “shift[ing] the portion of the received acoustic signal x(n) that is above a predetermined lower frequency value and/or below a predetermined upper frequency value” so that “the shifted signal covers a frequency range suitable for complementing the received acoustic signal x(n),” also suggesting an evaluation indicating which frequency translations are suitable for extending a narrowband speech signal x(n). See Iser 1121,26-27. As to Appellants’ argument that the Examiner’s combination of Nilsson and Iser is improper (Br. 13—15), the Examiner has articulated sufficient reasoning for refining Iser’s frequency upshifting technique for speech extension, with Nilsson’s wideband speech extension based on human voice characteristics, to improve the quality of reconstructed speech—a concern shared by Iser and Nilsson. Final Act. 7 (citing Nilsson 122); see also Iser 1 6; Nilsson H 7, 13—14, 55. In particular, Iser and Nilsson both teach the desirability of reducing artefacts in reconstructed speech, which logically links the references together. See Iser 1 6; Nilsson 1113, 22. Enabling wideband speech reconstruction with reduced artefacts, 9 Appeal 2017-005466 Application 12/635,235 as taught by both references, would have been a desirable feature to a person of ordinary skill, and that these teachings provide sufficient rationale to combine the references. In sum, there was “an apparent reason to combine the known elements in the fashion claimed.” KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 417— 18 (2007). We find the Examiner provides “some articulated reasoning with some rational underpinning to support the legal conclusion of obviousness.” Id. at 418 (quoting In re Kahn, 441 F.3d 977, 988 (Fed. Cir. 2006)). Thus, we agree with the Examiner the combination of Iser and Nilsson teaches the “evaluating” step recited in Appellants’ claim 1. Based on this record, we are not persuaded of Examiner error. Accordingly, we sustain the Examiner’s obviousness rejection of independent claim 1, and the rejection of its dependent claims 2—5 and 7—15 for which no substantive arguments are provided. Br. 15. Claims 16—24 Independent claim 16 recites, inter alia: means for controlling which frequencies in the first range of frequencies are to be translated into a target band, the controlling including determining a signal characteristic for selecting which of the frequencies in the first range of frequencies are to be translated into the target band, the signal characteristic comprising one of a minimum echo, a minimum pre-processor distortion, or a minimum degree of voicing, and the signal characteristic that is determined being identifiable in the frequencies in the first range that are more likely, when translated according to a pitch-dependent spectral translation, to result in a regenerated wideband speech signal having a harmonic structure that approximates a harmonic structure of the portion of the wideband speech that is missing. 10 Appeal 2017-005466 Application 12/635,235 Br. 24 (emphasis added). Appellants’ Specification further describes determining a signal characteristic for selecting frequencies in a narrowband range that are more likely, when translated according to a pitch-dependent spectral translation, to result in regenerated wideband speech without harmonic distortions (e.g., metallic-sounding artefacts). See Spec. 6:20—28, 7:20-8:3,8:13—18,9:19-10:2, 11:11—15. According to Appellants’ Specification: [A] pitch dependent spectral translation translates a frequency band (a range of frequencies from the narrowband speech signal) into a target frequency band with properly preserved harmonics. In the embodiment . . . the range of the frequencies from 2-4kHz is translated to the target frequency band of between 4 and 6kHz. However, it will be clear from the following that these can be selected differently without diverging from the concepts of the invention. . . . The modulation frequency fmod is determined such as to preserve the harmonic structure in the regenerated excitation high band. In the present implementation, the modulating frequency is normalised by the sampling frequency. Taking the specific example, consider the pitch frequency to be 180Hz, then the closest frequency to 2kHz that is an integer multiple of the pitch frequency is floor(200/180)*180 (1980Hz). Normalised by 1200Hz it becomes 0.165. For a sampling frequency (after upsampling) of 12kHz and a value of 2kHz of the frequency shift, the frequency fmod can be expressed as fmod=floor(p/6)/p, where p represents the fractional pitch-lag.. . . The frequency band of the narrow band speech x which is translated can be selected to alleviate metallic artefacts by selection of a frequency band that is more likely to have harmonic structure closer to that of the missing (high) frequency band by selection of a frequency band that includes frequencies showing an identified signal characteristic. . . . The signal characteristic can be chosen from a number of different possibilities. . . . [A] possibility is that the block 30 11 Appeal 2017-005466 Application 12/635,235 determines the degree of voicing. According to one example, a measure of the degree of voicing can be the normalised correlation between the signal inside a frequency band and the same signal one pitch-cycle earlier. Smoothed versions of this measure can also be used to determine whether or not a frequency should be included in the first range offrequencies for translation. Spec. 6:20-28, 7:20-8:3, 8:13—18, 9:19-10:2 (emphases added). Appellants assert Iser, Nilsson, and Jax do not discuss a “pitch- dependent spectral translation,” and do not teach or suggest a determined signal characteristic that is “identifiable in the frequencies in the first range that are more likely, when translated according to a pitch-dependent spectral translation, to result in a regenerated wideband speech signal having a harmonic structure that approximates a harmonic structure of the portion of the wideband speech that is missing,” as recited in claim 16. Br. 16, 18 (emphases added). Thus, Appellants argue the combination of Iser, Nilsson, and Jax does not teach or suggest all features of claim 16. Br. 16— 18. We agree with Appellants. The Examiner has not responded to Appellants’ arguments in the Answer. The Examiner, in the Final Action, merely states that “Iser does not teach . . . the signal characteristic that is determined being identifiable in the frequencies in the first range that are more likely, when translated according to a pitch-dependent spectral translation, to result in a regenerated wideband speech signal,” without further addressing the claimed “pitch-dependent spectral translation.” Final Act. 13—14 (emphasis added). We have also reviewed the cited portions of Nilsson and Jax and do not find they disclose a signal characteristic identifiable based on “a pitch-dependent spectral translation” of frequencies 12 Appeal 2017-005466 Application 12/635,235 in a range, as claimed. See Final Act. 14—16 (citing Nilsson || 22, 82—84; Jax || 73—75, 80, 171). Absent evidence of such teachings or reasoning as to why it would have been obvious to modify Iser, Nilsson, and Jax to distinguish a signal characteristic for selecting narrowband frequencies based on a pitch-dependent spectral translation, on the record before us, we do not sustain the Examiner’s rejection of claim 16 and its dependent claims 17—24 under 35 U.S.C. § 103(a) over Iser, Nilsson, and Jax. CONCLUSION On the record before us, we conclude Appellants have demonstrated the Examiner erred in rejecting claims 16—24, but not claims 1—5 and 7—15. DECISION As such, we AFFIRM the Examiner’s Final Rejection of claims 1—5 and 7—15. However, we REVERSE the Examiner’s Final Rejection of claims 16—24. No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a)(l)(iv). AFFIRMED-IN-PART 13