Ex Parte DockserDownload PDFBoard of Patent Appeals and InterferencesNov 2, 200911150501 (B.P.A.I. Nov. 2, 2009) Copy Citation UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE BOARD OF PATENT APPEALS AND INTERFERENCES ____________ Ex parte KENNETH ALAN DOCKSER ____________ Appeal 2009-002099 Application 11/150,5011 Technology Center 2100 ____________ Decided: November 3, 2009 ____________ Before LEE E. BARRETT, JOSEPH L. DIXON, and THU A. DANG, Administrative Patent Judges. BARRETT, Administrative Patent Judge. DECISION ON APPEAL This is a decision on appeal under 35 U.S.C. § 134(a) from the final rejection of claims 9, 12, 14, 16, and 20-25. Claims 1-8, 10, 11, 13, 15, and 17-19 have been canceled. We have jurisdiction pursuant to 35 U.S.C. § 6(b). We reverse. 1 Filed June 9, 2005, titled "Software Selectable Adjustment of SIMD Parallelism." The real party in interest is Qualcomm Incorporated. Appeal 2009-002099 Application 11/150,501 2 STATEMENT OF THE CASE The invention The invention relates to selectively controlling active status (active and inactive) of one or a number of parallel data processing elements, e.g. of a Single Instruction, Multiple Data (SIMD) processor, based on software instructions, to conserve power for low power applications. Spec. ¶ [0001]. For example, in a 128-bit SIMD design, the processing elements might be two 64-bit SIMD arithmetic logic units (ALUs). When program operations require less than the full 128-bit width of the data path, a software instruction of the program changes the mode to 64-bit. One ALU that is not needed can be shut down to conserve power and the other ALU executes instructions for the 64-bit wide data path. Even in the 64-bit mode when one ALU is inactive, the processor may handle instructions for processing of 128-bit data by expanding that SIMD instruction to two instructions calling for processing of data of the 64-bit width. Spec. ¶ [0013]. At a later time, when the added capacity is needed, execution of another software instruction sets the mode of operation to that of the wider data path, typically the full width, and the mode change reactivates the previously shut-down processing element. Spec. ¶ [0020]. The ALU may be disabled by cutting off the clock signal, Spec. ¶ [0037], or by cutting off the power, Spec. ¶ [0038]. Appeal 2009-002099 Application 11/150,501 3 The claims Claim 9 is reproduced below: 9. A method of processing a Single Instruction, Multiple Data (SIMD) instruction when the SIMD instruction requires a data path width greater than an active data path width in a parallel data processor, comprising: executing one or more instructions in parallel in at least two parallel arithmetic logic units of the data processor, so as to process data of a first width, said at least two parallel arithmetic logic units are SIMD type arithmetic logic units; upon execution of a mode change instruction, powering down a first one of the two parallel arithmetic logic units to conserve power; and while the first arithmetic logic unit is inactive, executing one or more instructions in a second one of the two parallel arithmetic logic units, so as to process data of a second width smaller than the first width; receiving a SIMD instruction calling for processing of data of the first width; expanding the SIMD instruction in response to the received SIMD instruction calling for processing of data of the first width to at least two instructions calling for processing of data of the second width; and executing the two instructions resulting from the expansion in sequence through the second arithmetic logic unit. The references Gschwind US 2003/0037221 A1 Feb. 20, 2003 Appeal 2009-002099 Application 11/150,501 4 Giernalczyk US 2004/0254965 A1 Dec. 16, 2004 Raman et al., Implementing Streaming SIMD Extensions on the Pentium III Processor, IEEE Micro, Vol. 20, No. 4, pp. 47-57 (July/Aug. 2000) ("Raman."). The rejection Claims 9, 12, 14, 16, and 20-25 stand rejected under 35 U.S.C. § 103(a) as unpatentable over Giernalczyk, Gschwind, and Raman. The Examiner finds that the instructions in Giernalczyk change the width of the data path and are "mode change instructions" that make some ALUs inactive, but Giernalczyk does not teach making the ALUs inactive by powering them down to conserve power. Final Rej. 3. The Examiner finds that Gschwind teaches powering down functional units when not required to execute an instruction and concludes that one of ordinary skill in the art would have been motivated to incorporate the teachings of Gschwind into Giernalczyk to reduce power consumption. Id. The Examiner interprets that a "mode change instruction" does not preclude the instruction performing other functions, such as specifying an ordinary operation, in addition to causing powering down of an ALU. Ans. 4. The Examiner finds that Giernalczyk and Gschwind do not teach that while an ALU is inactive, expanding a instruction of a first width into at least two instructions calling for processing of data of a second width smaller than the first width. Final Rej. 4. The Examiner finds that Raman teaches expanding a 128-bit instruction to 64-bit instruction and concludes that it would have been obvious to incorporate the teachings of Raman into Giernalczyk and Gschwind to allow a system to process larger instructions. Id. Appeal 2009-002099 Application 11/150,501 5 PRINCIPLE OF LAW Obviousness requires that the combination of references teach or suggest to a person of ordinary skill in the art all of the claim limitations. 35 U.S.C. § 103(a). ISSUE Has Appellant shown that the Examiner erred in concluding that the combination of Giernalczyk, Gschwind, and Raman would have taught or suggested to one or ordinary skill in the art a method of processing data of a first width "in parallel in at least two parallel arithmetic logic units of the data processor" and (1) "upon execution of a mode change instruction, powering down a first one of the two parallel arithmetic logic units to conserve power," and (2) "while the first arithmetic logic unit is inactive" "receiving a SIMD instruction calling for processing of data of the first width; expanding the SIMD instruction . . . calling for processing of data of the first width to at least two instructions calling for processing of data of the second width; and executing the two instructions resulting . . . through the second arithmetic logic unit" as recited in claim 9? Claims 12, 16, and 20 contain similar limitations. Appeal 2009-002099 Application 11/150,501 6 FACTS Giernalczyk Giernalczyk describes means for creating SIMD based processing units whose data path width can be varied to match the word length of the data word to be processed. ¶ [0011]. As shown in Figure 1, a data processor 1 comprises a computational unit (CU) 3 having a plurality of SIMD based processing elements (PEs) 5, 7, 9, 11, each of which has an arithmetic logic unit (ALU) 15, 17, 19, 21, and a control circuit 31 (also called a boundary circuit) which controls the transmission of data to one or more processing elements within the processor to enable the processor to operate on multiple bit data. ¶ ¶ [0027] - [0030]. An aspect of the invention is to "provide a system for grouping SIMD based processing elements into a processing unit whose bit width can be varied to match that that of the word being computed." ¶ [0036]. An extension circuit 111, Figure 2A, is used to combine computational units to widen the selective data path. "For example, the extension circuit allows two N-bit computational units 103, 105 to be combined such that a 2N bit wide processing unit is formed." ¶ [0037]. "The extension circuit 223 has a first operating mode or state, in which the first and second CUs are decoupled and have their individual parallel processing capability, and a second operating mode or state, which couples the CUs together to combine their parallel processing capability, i.e. for parallel processing a word having length which is the sum of the lengths of the words that they can parallel process individually." ¶ [0047]. "[I]n the coupled mode, the extension circuit provides the required connections to integrate the two arrays of processor elements of the first and Appeal 2009-002099 Application 11/150,501 7 second CUs into a parallel processor having the combined number of PEs." ¶ [0050]. Gschwind Gschwind describes a processor implementation in which scalar and vector processing components are merged to reduce complexity. Abstract. A vector (SIMD) instruction identifies a single operation to be performed on a plurality of data elements and a scalar instruction identifies a single operation to performed on a single data element. ¶ [0016]. As shown in Figure 3, a scalar-vector register file (SVRF) 310 stores data lines consisting of four scalar data words, such that each of the four data words is associated with a different functional unit pair 311-314, each functional unit pair consisting of a fixed point unit (FXU) and a floating point unit (FPU). ¶ [0029]. Issue logic directs the SVRF 310 to pass the contents of a data line to the appropriate functional unit within each set of functional units. ¶ [0030]. Scalar processing occurs in one of the units, designated as the "preferred slot." ¶ [0025]. Scalar and vector operations are identified by the operation code. ¶ [0033]. "If scalar processing is selected, then functional units other than the functional units associated with the preferred slot will enter a power-saving operation mode and not produce valid results." ¶ [0041]. "Selected functional units can be enabled or disabled in accordance with the invention by enabling or disabling the clock input for the functional unit, or, in an alternative embodiment, by disabling the power input for the functional unit." ¶ [0042]. Appeal 2009-002099 Application 11/150,501 8 Raman Raman describes implementing streaming SIMD extensions (SSE), a set of processor instructions designed to boost performance of multimedia and Internet applications. P. 47. Raman describes: The Pentium III implements each four-wide (128-bit) SSE computational macroinstruction as two, two-wide (64-bit) microinstructions. . . . The instruction decoder transforms each 128-bit micro-operation into a pair of 64-bit internal micro-operations. . . . This approach avoids the massive and instrusive changes of adding 128-bit buses, 128-bit execution units, and register renaming in both the in-order and out-of-order portions of the machine. P. 51. ANALYSIS 1. "Execution of a mode change instruction" causing "powering down a first one of the two parallel arithmetic logic units to conserve power" Appellant argues that Giernalczyk does not teach a mode change instruction, which upon execution, "power[s] down a first one of the two parallel arithmetic logic units to conserve power" as recited in claim 9, but merely adjusts the data path on a per instruction basis. Br. 13. It is argued that "Giernalczyk does not address a fundamental problem of conserving power associated with at least two parallel arithmetic units." Id. It is also argue that "Giernalczyk does not address another fundamental problem which occurs when one of the at least two parallel arithmetic units is powered down in response to a mode change instruction, then adapting Appeal 2009-002099 Application 11/150,501 9 computation of a data type larger than supported by the remaining active arithmetic unit." Id. at 14. As noted by the Examiner, the rejection does not rely on Giernalczyk for powering down an ALU or, while the ALU is inactive, expanding an instruction calling for processing data of a first width into at least two instructions. Ans. 21-22. Appellant argues that Gschwind obtains power saving by powering down functional units not required by an instruction to be executed based on decoded information associated with the instruction. Br. 14. It is argued that Gschwind, like Giernalczyk, does not teach a mode change instruction, which upon execution, "power[s] down a first one of the two parallel arithmetic logic units to conserve power" as recited in claim 9, but merely adjusts the data path on a per instruction basis. Id. By contrast, it is argued that the invention may have different data path widths for different executions of the same instruction. Id. The Examiner finds that each instruction of Giernalczyk and Gschwind can be a mode change instruction because the claim is open-ended and does not preclude other functions. Ans. 4-5, 23. We agree with the Examiner that as to the limitation under consideration of "upon execution of a mode change instruction, powering down a first one of the two parallel arithmetic logic units to conserve power," the instructions in Giernalczyk and Gschwind are "mode change instructions" in that they change the mode of operation between two data path widths. The instructions in Gschwind are "mode change instructions" as claimed because they power down (as well as power up) a functional unit as well changing data path widths. The fact that the instruction in Gschwind Appeal 2009-002099 Application 11/150,501 10 is interpreted by the system to change data path widths in addition to controlling power is not precluded by the claim, which is open-ended; i.e., a "mode change instruction" is not limited to only powering down an ALU. Appellant's argument about different data path widths for different executions of the same instruction refers to the second limitation of expanding an instruction while the first ALU is inactive. It is not clear exactly why the Examiner uses Giernalczyk. To the extent that Giernalczyk shows ALUs and Gschwind shows "functional units," we do not think Appellant argues that the functional units are not part of or equivalent to an ALU. Nevertheless, we agree with the Examiner that it would have been obvious to control power to ALUs in Giernalczyk given the teachings of Gschwind. The limitation of "powering down a first one of the two parallel arithmetic logic units to conserve power" in claim 9 would have been obvious over the combination of Giernalczyk and Gschwind. Appeal 2009-002099 Application 11/150,501 11 2. "While the first arithmetic logic unit is inactive . . . expanding the SIMD instruction . . . calling for processing of data of the first width to at least two instructions calling for processing of data of the second width" Neither Giernalczyk nor Gschwind, singly or in combination, describe or suggest processing data of a first width while one of the ALUs or functional units is inactive (powered down) by expanding a SIMD calling for processing of data of the first width to at least two instructions calling for processing of data of the second width. These references do not teach or suggest adapting computation of a data type larger than the active ALU, but utilize the same data path each time the same instruction type is executed. Appellant argues that Raman does not recognize a need for powering down an execution unit to conserve power since only one 64-bit execution unit is used in Raman. Br. 16-17. The Examiner responds that Raman is not relied upon for powering down an execution unit. Ans. 25. While we understand the Examiner's point that Raman is not relied upon to show powering down an execution unit, or using less than all of the execution units, the fact the Raman does not switch between a 128-bit mode and a 64-bit mode is relevant to the question of motivation discussed next. Appellant argues that Raman expands all 128-bit SSE instructions to 64-bit instructions regardless of the state of the execution units whereas "the present invention supports 128-bit data path operations to obtain performance and supports 64-bit data path operations to conserve power through use of a mode change instruction, none of which are taught or made obvious by Raman." Br. 17. The Reply Brief argues at length that the combination of references does not teach or suggest expanding instructions while a first ALU is inactive. Reply Br. 2-5. Appeal 2009-002099 Application 11/150,501 12 The Examiner responds that one skilled in the art would appreciate the ability to have wider instructions split into two narrow instructions in order to conserve power and "allow for longer battery life in mobile systems." Ans. 26. In the statement of the rejection in the Answer, the Examiner states for the first time: One of ordinary skill in the art would appreciate the ability to execute a wide instruction as two narrow instructions in order to save power. . . . In the event of a mobile system it would be advantageous to exchange a small amount of processor performance (all 128-bit instructions now take twice as long, but 64-bit instructions would remain unaffected) for a significant improvement in power efficiency (half of the execution units are always powered down). This would allow the battery life to be extended in a mobile system. Ans. 10-11. This reasoning about making the modification to save power is new. It does not appear that the reasoning is correct since executing 128-bit instructions as 64-bit instructions would require the same amount of power (or even more due to inefficiencies due to circuitry needed to split the 128- bit instructions), just on fewer execution units. Power savings is only realized when instructions only require a 64-bit execution unit and the other one can be turned off, as taught in Gschwind. Thus, we do not agree with the Examiner's motivation for the modification. Giernalczyk and Gschwind do not teach or suggest executing the same instruction using different data path widths and therefore do not teach or suggest expanding an instruction of a first width into instructions of a second width while an ALU is inactive (powered down). Raman describes that all instructions are expanded and does not suggest that some instructions are expanded and some are not, much less expanding instructions while an ALU Appeal 2009-002099 Application 11/150,501 13 is inactive. Accordingly, we do not find any reason why one of ordinary skill in the art would have modified Giernalczyk and Gschwind to expand instructions while one of the ALUs is inactive. CONCLUSION Appellant has shown that the Examiner erred in concluding that the combination of Giernalczyk, Gschwind, and Raman would have taught or suggested to one or ordinary skill in the art a method of processing data of a first width "in parallel in at least two parallel arithmetic logic units of the data processor" and (1) "upon execution of a mode change instruction, powering down a first one of the two parallel arithmetic logic units to conserve power," and (2) "while the first arithmetic logic unit is inactive" "receiving a SIMD instruction calling for processing of data of the first width; expanding the SIMD instruction . . . calling for processing of data of the first width to at least two instructions calling for processing of data of the second width; and executing the two instructions resulting . . . through the second arithmetic logic unit" as recited in claim 9. The rejection of claim 9 is reversed. Claims 12, 16, and 20 contain similar limitations. The rejection of claims 12, 16, and 20, and dependent claims 14 and 21-25 is reversed. REVERSED erc QUALCOMM INCORPORATED 5775 MOREHOUSE DR. SAN DIEGO, CA 92121 Copy with citationCopy as parenthetical citation