13874327 - (D) (P.T.A.B. May. 28, 2019)

Ex Parte Richtarsky et al

Patent Trials and Appeals BoardMay 28, 2019

13874327 - (D) (P.T.A.B. May. 28, 2019)

UNITED STA TES p A TENT AND TRADEMARK OFFICE APPLICATION NO. FILING DATE FIRST NAMED INVENTOR 13/874,327 04/30/2013 64280 7590 05/30/2019 Mintz Levin/SAP Mintz Levin Cohn Ferris Glovsky and Popeo, P.C. One Financial Center Boston, MA 02111 Martin Richtarsky UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www .uspto.gov ATTORNEY DOCKET NO. CONFIRMATION NO. 34874-870001US 8751 EXAMINER VO,CECILEH ART UNIT PAPER NUMBER 2153 NOTIFICATION DATE DELIVERY MODE 05/30/2019 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): IPDocketingBOS@mintz.com IPFileroombos@mintz.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Exparte MARTIN RICHTARSKY, FRANZ FARBER, JUCHANG LEE, and IV AN SCHRETER 1 Appeal2018---007580 Application 13/874,327 Technology Center 2100 Before BRADLEY W. BAUMEISTER, SHARON PENICK, and RUSSELL E. CASS, Administrative Patent Judges. CASS, Administrative Patent Judge. DECISION ON APPEAL Appellants appeal under 35 U.S.C. Â§ 134(a) from the Examiner's final rejection of claims 1, 3-9, 11-12, and 15, which constitute all the claims pending in this application. Appeal Br. 1. 2 We have jurisdiction under 35 U.S.C. Â§ 6(b). We affirm. We designate the rejection as constituting a new ground pursuant to our discretionary authority under 37 C.F.R. Â§ 41.50(b). 1 Appellants list SAP SE as the real party in interest. Appeal Brief filed January 1, 2018 ("Appeal Br.") 2. 2 Rather than repeat the Examiner's positions and Appellants' arguments in their entirety, we refer to the above mentioned Appeal Brief, as well as the following documents for their respective details: the Final Action mailed July 27, 2017 ("Final Act."); the Advisory Action mailed October 12, 2017 ("Advisory Act."); the Examiner's Answer mailed May 22, 2018 ("Ans."); and the Reply Brief filed July 17, 2018 ("Reply Br."). Appeal2018---007580 Application 13/874,327 THE INVENTION Appellants' invention is directed to a method for database management. Spec ,r 2. Appellants' Specification explains that in a traditional row-based database, a set number of bytes of storage must be set aside for each row. Spec. ,r,r 4---6. Appellants' invention seeks to reduce the amount of memory needed for a database by replacing the data in in each column of the database with a reference to a value in a dictionary. Id. ,r 7. The dictionary includes distinct values for each column of the database. Id. ,r 8. Both the reference to the dictionary and the dictionary, itself, are stored in the memory structure. Id. Appellants explain that further memory savings can be achieved by compressing the block of memory defined for the dictionary. Id. ,r 9. To perform this compression, the system identifies one or more columns that include the same character in each row, and excludes these constant characters from storage in memory. Id. ,r,r 41--47. Claim 1 is illustrative of the claims at issue: 1. A method comprising: storing, by one or more processors, data in a memory structure organized in a column store format defined by a plurality of columns and a plurality of rows; generating, by the one or more processors, a fixed string dictionary for each column in the memory structure, the fixed string dictionary having distinct values for each column, each value of the fixed string dictionary being represented by a plurality of characters occupying preset positions in the fixed string dictionary; generating, by the one or more processors, a reference to the fixed string dictionary for each column in the memory structure; 2 Appeal2018---007580 Application 13/874,327 storing, by the one or more processors, the fixed string dictionary and the reference to the fixed string dictionary in the memory structure; and compressing, by the one or more processors, a block of memory defined for each fixed string dictionary, the compressing comprising excluding storage of at least one character of the plurality of characters that is repeated in each row of at least one column that is specific to a corresponding at least one preset position in the block of memory. Appeal Br. 15 (Claims Appendix). THE EXAMINER'S REJECTION AND APPELLANTS' CONTENTIONS The Examiner rejected claims 1, 3-9, and 11-12 as being unpatentable over Franke (US 8,359,316 B2; issued January 22, 2013) in view of Netz (US 8,108,361 B2; issued January 31, 2012) and further in view of Mishra (US 2010/0223237 Al; published September 2, 2010). Final Act. 3--4. The Examiner finds that Franke and Netz disclose all of the limitations of the claim except for the final claim limitation, which requires: "compressing, by the one or more processors, a block of memory defined for each fixed string dictionary, the compressing comprising excluding storage of at least one character of the plurality of characters that is repeated in each row of at least one column that is specific to a corresponding at least one preset position in the block of memory" ("the compressing limitation"). Final Act. 6. The Examiner relies on Mishra to teach this limitation. Final Act. 6-7; Advisory Action; Ans. 4--5. Appellants do not dispute the Examiner's findings concerning the teachings of Franke and Netz, but argue that Mishra does not teach the 3 Appeal2018---007580 Application 13/874,327 compressing limitation of claim 1. In the Final Action, the Examiner relied on Figure 71 and paragraph 269 of Mishra. Figure 71 of Mishra is reproduced below: n bit 000 fl â€¢11 \-> l..i: 01:Q 011 ;......, ................ .. 10 0 10 l 110 i 11 A 3.-bit Mmask transtcrmeo t-0 2-'Â°it bilmask Figure 71 of Mishra depicts a process for removing bits from a bitmask according to that invention. The Examiner states that Figure 71 of Mishra discloses repeating values in the first column of each row, with the exception of "the first bit position with count O is considered to be the initial skip map, e.g., the first bit position in the [R]emoved Bits column in Fig. 71." Final Act. 7. The Examiner determines that "[f]rom this teaching, it is understood that the first bit position with count O is excluded from the repeating count." Ans. 5. See also Advisory Act. Thus, the Examiner concludes, "Mishra's Fig. 71 teach[ es] the specific claimed features of compression" in claim 1. Ans. 5. 4 Appeal2018---007580 Application 13/874,327 Appellants argue that Mishra does not teach the "compressing" limitation of claim 1. Appellants acknowledge that in Fig. 71, "the first column of the structure after the rotation is subsequently removed," but argues that "[i]t is inherent that to remove this column, it must be noted that the first value in the column is O and all subsequent values are 1 to prevent loss of value stored in the removed bits." Appeal Br. 11. Appellants assert that "[ n ]oting that the first value is O is necessary to enable Mishra' s lossless compression." Id. Appellants further argue that "nowhere does Mishra disclose or suggest that the first row having bits '000' is an initial skip map." Id. at 12. Appellants argue that the use of a skip map is not discussed in connection with Figure 71, but instead relates to a different portion of Mishra, paragraph 269, which discusses the use of a skip map in connection with Figure 64. Reply Br. 10. Figure 64, Appellants argue, involves a different technique than Figure 71, which uses a data block that remains constant across different slices. Id. Therefore, Appellants argue, "the Examiner inappropriately conflates the concepts taught by Mishra at cited FIG. 71 and cited paragraph [0269]." Id. ANALYSIS I. The Examiner's Reliance on Figure 71 of Mishra The dispute here centers on the disclosure of Mishra, which the Examiner relies on to teach the compressing limitation of claim 1. Mishra discloses a method for storing and compressing data (specifically control words forming an instruction sequence for a program) using a dictionary and a series ofbitmasks. Mishra, Abstract, ,r,r 87, 95-96, 102, 105-106, 262, 5 Appeal2018---007580 Application 13/874,327 265. The dictionary is used to compress the control words by storing certain "commonly occurring" data sequences. Id. ,r 95. Sequences in the control words that match sequences stored in the dictionary are replaced "with a codeword that points to the index of the dictionary that contains" that sequence. Id. The system also uses bitmasks, which are a series of bits that are used to allow the system to code mismatches between the data and the dictionary entries by identifying (1) the bits in the data that are different from the dictionary entry, and (2) the location of those bits. Id. ,r,r 97, 102, 105. The Examiner relies on Figure 71 of Mishra, which discloses a method for compressing a bitmask. However, we agree with Appellants that the method of compression disclosed in Figure 71 of Mishra does not exclude storage of "at least one character of the plurality of characters that is repeated in each row of at least one column that is specific to a corresponding at least one preset position in the block of memory." The Examiner relies on the first column on the right-hand portion of Figure 71 to teach this limitation. However, as Appellants point out, the first value in the first column of that portion is a "O" which differs from the "1" values in the remainder of that column. Thus, the "1" value is not repeated in each row of the first column of Figure 71. We also agree that the Examiner has not established that the first row of Figure 71 having bits "000" is an "initial skip map." Appeal Br. 12; See Ans. 5. The Examiner does not point to any disclosure in Mishra of the use of an "initial skip map" with the embodiment of Figure 71. Instead, the Examiner points to paragraph 269 for this disclosure. Ans. 5. 6 Appeal2018---007580 Application 13/874,327 Paragraph 269, however, discusses Figures 63 and 64, which involves a different compression technique applied to "the control word sequence" (i.e., the data to be compressed), not to the bitmask, as is the case with Figure 71. Mishra, ,r,r 269-270. The technique in Figures 63 and 64 is used for "eliminating non[-]changing bits" in each bit position of the control word, and generating a "skip map" that indicates which bit positions can be skipped. Id. We agree with Appellants that the Examiner has not established that the "skip map" disclosed in ,r,r 269-270 of Mishra is used in connection with the compression technique of Figure 71. Nor has the Examiner indicated why the first row ("000") in Figure 71 can be removed. Consequently, we agree with Appellants that the Examiner has not established that Figure 71 of Mishra teaches the compressing limitation of claim 1. II. New Ground of Rejection Based on Mishra The error in the Examiner's determination, however, does not end our inquiry because portions of Mishra other than Figure 71 teach the "compressing" limitation of claim 1. Specifically, paragraph 265 of Mishra teaches an initial step of compressing the data by removing constant bits in a column to create an initial skip map, which map represents the bits that can be skipped from compression. Mishra ,r 265 ("Initially all constant bits are removed to get reduced control words along with initial skip map."). Later in the process, an additional compression is performed, described in connection with Figure 64, for "eliminating non[-]changing bits and less frequently occurring bits" using a skip map. Mishra ,r 269. 7 Appeal2018---007580 Application 13/874,327 To better understand this operation, it is helpful to consider Figure 64 of Mishra in more detail. Figure 64 is reproduced below: I ~ .... n ... LHHHHH~~-~UH Â·t 1 ~ I " 0 1 0 0 0 0 0 l t ...... ., ...... 0 G l i l C \ 0 0 1 1 1 f" :i 1 i 0 1 0 ' r .... f_-_1 _-_Â· ' ..... CU ...._ 1_1_-___ I Cooflict rmp F1G ,6Â· 4_ t . â€¢ ' Figure 64 of Mishra illustrates removal of constant and less frequent bits according to one embodiment of that invention. The compression technique of Figure 64 first "calculates the number of ones in each bit position ( column) of the data on the left-hand portion of Figure 64." Mishra ,r 269. Columns that have all "O"s or a number of"l"s "less than threshold t" (t=2 in this embodiment) are skipped, and are reflected in the skip map on the right side of Figure 64 by a "O". Id. ,r,r 269- 270. Columns that have a number of"l"s above the threshold tare not skipped, and are represented in the skip map by a"-". Id. The remaining data to be encoded is shown on the right side of Figure 64 underneath the skip map. Id. ,r 270. Although the particular example shown in Figure 64 does not depict any columns with all "O"s, Mishra makes it clear that the compression 8 Appeal2018---007580 Application 13/874,327 technique of Figure 64 is designed to eliminate columns with all "O"s when they occur in the control word data. See Mishra ,r 269 ( explaining that the algorithm "eliminat[ es] non[-]changing bits"); id ( explaining that the algorithm creates a skip map including "only those bit positions with count 0 or less than threshold t"). Therefore, Mishra discloses a compression technique that "exclud[ es] storage of at least one character of the plurality of characters that is repeated in each row of at least one column that is specific to a corresponding at least one preset position," as required by claim 1. To further flesh out the meaning of Mishra's disclosure, we also rely on the teachings of an article authored by Chetal Murthy and Prabhat Mishra, both of the co-inventors listed on the cited Mishra reference: Chetal Murthy & Prabhat Mishra, Bitmask-based Control Word Compression for NISC Architectures, Proceedings of the 19th ACM Great Lakes symposium on VLSI, 321-326 (May 10-12, 2009) ("Murthy"). More specifically, Murthy includes additional disclosure of the use of a skip map, as described in the Mishra patent application relied on by the Examiner. Murthy states that in step 1 of the control-word compression "all the constant bits are removed to get reduced control words along with an initial skip map" and explains that a "skip map represents the bits that can be skipped from compression." Id. at 323. Murthy also explains that a later compression step, comparable to the embodiment of Mishra's Figure 64, creates a skip map that includes the bit positions (columns) that are constant or change less than a threshold, and provides further description of this compression technique. Id. at 324. Claim 1 also includes the additional requirement that the compression occur for a "block of memory defined for each fixed string dictionary." The 9 Appeal2018---007580 Application 13/874,327 portions of Mishra relied on above disclose eliminating repeated bits in the control word sequences rather than in the dictionary. See Mishra ,r,r 265 ("Initially all constant bits are removed to get reduced control words along with initial skip map"), 270 ("Figure 64 illustrates a sample control word sequence under[]going bits reduction."). We find, however, that it would have been obvious to a person of ordinary skill to use the skip map techniques in Mishra for a fixed string dictionary. Mishra teaches that these techniques can be used "[t]o achieve further code reduction" by "provid[ing] improvement without adding any significant overhead on the decoder." Mishra ,r 265. Mishra further teaches that removal of bits that "are constant or change[] less frequently throughout" a set of data "improves compression efficiency." Id. ,r 269. Mishra teaches that as a result of the bit removal, "there is a significant reduction in the code size to compress." Id. ,r 270. Consequently, Mishra provides a motivation to use the disclosed compression technique for other forms of data, and one of ordinary skill would have been motivated to use that technique for a dictionary, as taught in Franke and Netz. See Final Act. 7. Because our reasoning differs from that of the Examiner, and because we also rely on Murthy to further clarify Mishra's teachings, we designate the rejection of claim 1 as constituting a new ground pursuant to our discretionary authority under 37 C.F.R. Â§ 41.50(b). III. Appellants 'Arguments Based on Figure 64 of Mishra Although the Examiner relies on Figure 71 of Mishra, Appellants provide additional arguments in their Briefs attempting to distinguish the 10 Appeal2018---007580 Application 13/874,327 skip map of Figure 64. Appeal Br. 6-7; Reply Br. 6-7. We do not find these arguments to be persuasive. Appellants first argue that Figure 64 of Mishra "does not have any column in which a same character is repeated in each row," as recited in claim 1. Appeal Br. 10-11. As discussed in Section II above, Figure 64 itself does not depict any columns with the same character repeated in each row (e.g., all "O"s in a row). As also discussed above, however, Mishra makes it clear that this compression technique is designed to eliminate columns with all "O"s when they occur in the control word data. See Mishra ,r 269 ( explaining that the algorithm "eliminat[ es] non[-]changing bits"); Id ( explaining that the algorithm creates a skip map including "only those bit positions with count O or less than threshold t"). Consequently, a person of ordinary skill would understand Mishra to disclose a compression technique that involves removing a column in which the same character is repeated in each row. Second, Appellants argue as follows: Moreover, even if assumed, for the sake of argument, that Mishra's slice shown in FIG. 64 can possibly have a column that has the same character in every row, it is respectfully submitted that Mishra may not exclude this column, as recited in claim 1. Mishra may not exclude that column with same character in every row because Mishra excludes characters that are constant across different slices rather than being same within every row a column. Accordingly, Mishra's selection of characters that are excluded from storage to attain compression is fundamentally different from the claimed implementation's selection of characters that are excluded from storage to attain compress10n. Appeal Br. 11. 11 Appeal2018---007580 Application 13/874,327 It is unclear what protocol Appellants are referring to when they assert that "Mishra excludes characters that are constant across different slices rather than being the same within every row or column." Neither Figure 64 nor paragraphs 269-270 discuss excluding characters that are constant across different slices. Appellants provide no citation for this assertion or attempt to tie the assertion to specific portions of Mishra' s disclosure. And Figure 64 and the accompanying disclosure show the use of a skip map to remove from compression repeating bits in a column. Consequently, we do not find this argument persuasive. IV. Conclusion Having considered the Examiner's rejection of claim 1 in light of the Appellants' arguments and the evidence of record, we conclude that claim 1 is unpatentable over Frank and Netz in view of Mishra and further in view of Murthy. We, therefore, sustain the obviousness of that claim. Because our reasoning differs from that of the Examiner, and because we also rely on Murthy, we designate the rejection of claim 1 as constituting a new ground pursuant to our discretionary authority under 37 C.F.R. Â§ 4I.50(b). For the reasons set forth by the Examiner, as modified by our above-noted alternative reasoning, we likewise sustain the Examiner's rejection of claims 3-9, 11-12, and 15, which Appellants do not argue separately. See Appeal Br. 10-13. DECISION We affirm the Examiner's rejection of claims 1, 3-9, 11-12, and 15. We designate our affirmance of the modified rejections as constituting a new ground of rejection pursuant to 37 C.F.R. Â§ 4I.50(b). 12 Appeal2018---007580 Application 13/874,327 Rule 37 C.F.R. Â§ 4I.50(b) provides "[a] new ground of rejection pursuant to this paragraph shall not be considered final for judicial review." Rule 37 C.F.R. Â§ 41.50(b) also provides that Appellants, WITHIN TWO MONTHS FROM THE DATE OF THE DECISION, must exercise one of the following two options with respect to the new grounds of rejection to avoid termination of the appeal as to the rejected claims: (1) Reopen prosecution. Submit an appropriate amendment of the claims so rejected or new Evidence relating to the claims so rejected, or both, and have the matter reconsidered by the Examiner, in which event the proceeding will be remanded to the examiner. The new ground of rejection is binding upon the examiner unless an amendment or new Evidence not previously of Record is made which, in the opinion of the examiner, overcomes the new ground of rejection designated in the decision. Should the examiner reject the claims, appellant may again appeal to the Board pursuant to this subpart. (2) Request rehearing. Request that the proceeding be reheard underÂ§ 41.52 by the Board upon the same Record. The request for rehearing must address any new ground of rejection and state with particularity the points believed to have been misapprehended or overlooked in entering the new ground of rejection and also state all other grounds upon which rehearing is sought. Pursuant to 37 C.F.R. Â§ 1.136(a)(l )(iv), no time period for taking any subsequent action in connection with this appeal may be extended. See 37 C.F.R. Â§ 41.50(Â±). AFFIRMED 37 C.F.R. Â§ 4I.50(b) 13 Application/Control No. Applicant(s)/Patent Under Patent Appeal No. 13/874,327 2018-007580 Notice of References Cited Examiner Art Unit Page 1 of 1 2153 U.S. PATENT DOCUMENTS * Document Number Date Name Classification Country Code-Number-Kind Code MM-YYYY A US- B US- C US- D US- E US- F US- G US- H US- I US- J US- K US- L US- M US- FOREIGN PATENT DOCUMENTS * Document Number Date Country Name Classification Country Code-Number-Kind Code MM-YYYY N 0 p Q R s T NON-PATENT DOCUMENTS * Include as applicable: Author, Title Date, Publisher, Edition or Volume, Pertinent Pages) Chetan Murthy and Prabhat Mishra, Bitmask-based Control Word Compression for NISC Architectures. u V w X *A copy of this reference 1s not being furnished with this Office action. (See MPEP Â§ 707.05(a).) Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. U.S. Patent and Trademark Office PT0-892 (Rev. 01-2001) Notice of References Cited Part of Paper No. Bitmask-based Control Word Compression for NISC Architectures C hetan Murthy and P rabhatM .ishra Computer & Jhf:mn ati:rn S c.:ence & E ngileer:ing U nirersfy ofF br:da, G ailesv.ile, FL 32611, USA {cm urthy, prabhat}@ c.ise .ufl.edu ABSTRACT Implementing a custom hardware is not always feasible due to cost and time considerations. No instruction set computer (NISC) archi- tecture is one of the promising direction to design a custom datap- ath for each application using its execution characteristics. A major challenge with NISC control word is that they tend to be at least 4 to 5 times larger than regular instruction size, thereby imposing higher memory requirement. A promising approach is to compress these control words to reduce the code size of the application. This article proposes an efficient bitmask-based compression technique to drastically reduce the control word size while keeping the de- compression overhead minimal. The main contributions of our ap- proach are: i) efficient don't care resolution for maximum bitmask coverage using limited dictionary entries, ii) run length encoding to significantly reduce repetitive control words, and iii) smart encod- ing of constant and less frequently changing bits. Our experimental results demonstrate that our approach improves compression effi- ciency by an average of 20% over the best known control word compression, giving a compression ratio of 25% to 35%. Categories and Subject Descriptors: C.3 [Special-Purpose and Application-based Systems]: Microprocessor applications General Terms: Design, Algorithms Keywords: No-Instruction-Set Computer, Compression. 1. INTRODUCTION It is not always efficient to run an application on a generic pro- cessor, whereas implementing a custom hardware is not always feasible due to cost and time considerations. One of the promis- ing direction is to design a custom datapath for each application using its execution characteristics. The abstraction of instruction set in generic processors limits from choosing such custom data path. No instruction set architecture (NISC) promises faster perfor- mance guarantees by analyzing the datapath behavior and eliminat- ing abstraction of instruction set to choose a custom datapath thus controlling the selection of optimal datapath to meet applications performance requirements. These datapath or control word (CW) tend to be at least 4 to 5 times wider than regular instructions thus increasing the code size of applications. One promising approach is to reduce these control words by compressing them. Figure 1 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GLSVLSI'09, May 10-12, 2009, Boston, Massachusetts, USA. Copyright 2009 ACM 978-l-60558-522-2/09/05 ... $5.00. 321 shows the compressed control word execution flow on a NISC ar- chitecture. The compressed control word is read from control word memory (CW Mem) and decoded to obtain original control word and sent to controller for execution. Compression ratio is the metric commonly used to measure ef- fectiveness of a compression technique (Equation 1). Clearly, smaller the compression ratio better the compression efficiency. Compressed Size Compression Ratio = (1) U ncompresssed Size Widely used code compression techniques utilize a dictionary to store most frequently occurring instructions. The instructions are replaced with smaller dictionary indices. The application of such algorithm on NISC control words is found not to result in sig- nificant reduction in code size. This is because the control words replacing the original instructions might not retain the same repeat- ing pattern. Gorjiara et al. [3] described an interesting approach in which the control words are split to obtain redundancies and then compressed using multiple dictionaries. The main disadvantage of this technique is that the complete binary is stored in dictionary. This leads to large compressed code (less compression) and vari- able length dictionary size that requires variable number of block RAMs (BRAM) on which these dictionaries are stored. Seang et al. [9] presented a promising approach to achieve better compres- sion by using limited dictionary entries and recording bit changes to match most of the instructions with dictionary. The direct appli- cation of this algorithm is found not to reduce the control word size significantly. It is a major challenge to develop an efficient com- pression technique which significantly reduces control word size and has minimal decompression overhead. CWMem Decoder Offset Address Functional Units Data Memory Figure 1: NISC architecture and decoder placement In this paper, we propose an efficient compression technique that takes advantage of both NISC control word compression [3] and bitmask-based compression [9] techniques. This paper makes four important contributions: i) an efficient control word compression technique to improve compression ratio by splitting control words and compressing them using multiple dictionaries, ii) a bitmask aware don't care resolution to decrease dictionary size and improve dictionary coverage, iii) smart encoding of constant and less fre- quently changing bits to further reduce the control word size, and iv) run length encoding of repetitive sequences to both improve compression ratio and decrease decompression overhead by pro- viding the uncompressed words instantaneously. The rest of the paper is organized as follows. Section 2 dis- cusses the existing code compression techniques. Section 3 de- scribes NISC architecture and existing bitmask based compression. Section 4 describes our compression technique followed by a dis- cussion on multiple dictionary based decompression in Section 5. Section 6 presents our experimental results. Finally Section 7 con- cludes the paper. 2. RELATED WORK The first code-compression technique for embedded processors was proposed by Wolfe and Chanin [l]. Their technique uses Huff- man coding, and the compressed program is stored in the main memory. The decompression unit is placed between the main mem- ory and the instruction cache. They used a Line Address Table (LAT) to map original code addresses to compressed block ad- dresses. Lekatsas et al. [4] proposed a dictionary based decom- pression prototype that is capable of decoding one instruction per cycle. The idea of using dictionary to store the frequently occurring instruction sequences has been explored by various researchers [2], [8]. The techniques discussed so far target reduced instruction set computer (RISC) processors. There has been a significant amount of research in the area of code compression for very long instruc- tion word (VLIW) and no instruction set computer (NISC) proces- sors. The technique proposed by Ishiura and Yamaguchi [6] splits a VLIW instruction into multiple fields, and each field is compressed by using a dictionary-based scheme. Gorjiara et al. [3] applies similar approach in splitting the con- trol words into different fields and compressing them using multi- ple dictionaries. Their approach can lead to unacceptable compres- sion since it stores the complete binary in the dictionary. More- over, it will require variable number of block RAMs (BRAM) to store variable-length dictionaries for different applications. Our approach outperforms [3] on both fronts by achieving 20% better compression (on average) using a fixed-length dictionary. Seang et al. [9] proposes a bitmask-based compression to improve com- pression ratio by creating matching patterns using bitmasks. How- ever, the direct application of their algorithm is not beneficial due to lack of redundancy in longer control words. Moreover, the exist- ing approach does not handle the presence of don't cares in input control words. As a result, it will sacrifice on compression effi- ciency by randomly replacing don't cares by O's or l's. The previ- ous two methods ([3] and [9]) are closest to our approach. Section 6 presents experimental results to show how our approach improves compression efficiency compared to these approaches, without in- troducing any additional decompression overhead. 3. BACKGROUND AND MOTIVATION 3.1 No Instruction Set Computer (NISC) NISC technology is based on horizontal microcoded architec- ture. In this technology, first a custom datapath is generated for an application, and then the datapath is synthesized and laid out prop- erly to meet timing and physical constraints. The final step is to 322 compile the program on the generated datapath. If the application is changed after synthesis, it is simply recompiled on the existing datapath. This feature significantly improves the productivity of the designer by avoiding repetition of timing closure phase. NISC re- lies on a sophisticated compiler [7] to compile a program described in a high-level language to binary that directly drives the control signals of components in the datapath. The values of control sig- nals generated for each cycle are called a control word. The control words (CW) are stored in control word memory (CW Mem, shown in Figure 1) in programmable IPs, while they are synthesized to lookup-table logic in hardwired dedicated IPs. PC Program Memory (a) RISC cw Data Memory Data Path PC Program Memory (Control Words) cw Figure 2: RISC and NISC Architectures Data Memory Data Path (b) NISC Figure 2 shows both RISC and NISC architectures. In RISC ar- chitectures, instructions are fetched from the program memory and stored in the instruction register (IR). The decoder decodes it to generate the required control word. In contrast, the NISC archi- tecture stores the control word in the program memory thus elim- inating the instruction decoder and hardware scheduler as shown in Figure 2(b). However, the code size is very large compared to RISC architectures due to two factors: control words are wider than instructions, and the number of NISC control words can be more than the number of RISC instructions. On MiBench benchmarks, NISC implementation runs 5.54 times faster than RISC-based Mi- cro Blaze, while its code size is four times larger [3 ]. The large code size increases the size of control memory in programmable IPs, and the area of control logic in dedicated IPs. The goal of our code compression technique is to reduce the control word size of NISC processors while maintaining the performance benefits. 3.2 Bitmask-based Compression Dictionary-based compression techniques provide compression efficiency as well as fast decompression mechanism. The basic idea is to take advantage of commonly occurring instruction se- quences by using a dictionary. The repeating occurrences are re- placed with index of the dictionary that contains the original data. Figure 3 shows an example of dictionary based code compression using a 2 entry dictionary. Most frequently occurring words are stored in a dictionary and are replaced with the dictionary index (1 bit) during compression. The compressed program consists of both indices and uncompressed instructions. Figure 4(a) and (b) show the encoding scheme for a generic dictionary based compres- sion technique. In this example, the dictionary based compression achieves a compression ratio of 97.5%. Seang et al. [9] improve the standard dictionary based com- pression techniques by considering mismatches. The basic idea is to find the instruction sequences that are different in few con- t compress flag t .- bitmask flag 0000 0000Â· Â· Â· ... 0 0 0 1 0 10000010Â·---.- 1 10000010 ...... 0 000111 0000 0010Â· Â· Â· ... 1 0000 0010 Â· Â· Â· ... 0 0 11 10 1 0100 0010Â· Â· Â· . ._ 0 1 Â· Â· Â· . ._ 0 1 1 0100 1110Â· Â· Â· ... 1 0100 1110 0 0 10 11 1 Dictionary 0101 0010Â· Â· Â· ... 1 0101 0010 0 0 01 01 1 -----, 0000 11 ooÂ· Â· Â· . ._ 1 0000 1100 Â· Â· Â· . ._ 0 0 10 11 0 'd I Content 0100 0010Â· Â· Â· . ._ 0 1 Â· Â· Â· . ._ 0 1 J ,n ex 1100 0000Â· . . ... 1 1100 0000 Â· Â· Â· ... 0 0 00 11 O O 0000 0000 0000 0000Â·.. ... 0 0 . . . ... 0 1 yy O 1 0100 0010 Original Code bitmask positionj L bitmask value Dictionary-Compressed Bitmask-Compressed Figure 3: Dictionary and bitmask based compression secutive bit positions and store that information in the compressed program. Compression ratio will depend on how many bit changes ( and length of each consecutive change) are considered during com- pression. Figure 3 shows the same example compressed using one bitmask allowing 2 consecutive bit changes starting at even loca- tions. In this example we are able to compress all the mismatched words using smaller number of bits and achieve compression ratio of 87 .5%. Figure 4(b ), (c ), and ( d) show the encoding format used by these techniques for a w-bit program. In general, the compres- sion ratio depends mainly on the instruction width, dictionary size and the number of bitmasks used. A smaller instruction size results in more direct matches whereas increasing the number of words to compress. A larger dictionary size can match more words replac- ing them with dictionary index but at the cost of increased dictio- nary index bits. More number of bitmasks results in more com- pressed words at the same time requiring more bits to encode the bitmask information. Our approach splits the wider control words to achieve better redundancy by employing multiple dictionaries. isCompressed Dictionary Index (1 bit) (log (d) bits) (a) Compressed with dictionary index isCompressed Uncompressed Word (1 bit) (w bits) (b) Uncompressed word isCompressed isBitmasked Dictionary Index (1 bit) (1 bit) (log (d) bits) ( c) Bitmask compressed using only dictionary index isCompressed isBitmasked Nwnber of Positio Bitmas (1 bit) (1 bit) Bitmasks ------- bitmask information Â·-Â·-Â·-Â· ( d) Compressed using bitmasks Dictionary Index (lo (d) bits) Figure 4: Encoding formats of existing compression techniques 4. CONTROL WORD COMPRESSION The existing bitmask-based compression is promising but there are various challenges discussed in the previous section that needs to be addressed. NISC control words are usually 3 to 4 times wider than normal instructions. To achieve more redundancy and to re- duce code size, the control words are split into two or more slices depending on the width of the control word. Then don't care bits are resolved using vertex coloring. The resultant control words are then scanned for less frequent and constant bits. These infrequent bits are then encoded as a skip map. Then each slice is compressed using bitmask based algorithm described in article [9] by selecting 323 profitable parameters. The parameters are selected based on the length of each slice. Algorithm 1 lists the major steps in compressing NISC control words. Initially in step 1, all the constant bits are removed to get reduced control words along with an initial skip map (skip map represents the bits that can be skipped from compression, and are hardcoded). In step 2, the input is split into required slices. The less frequently changing bits are then removed from each slice us- ing Algorithm 3. For each slice, don't care values are resolved using Algorithm 2 in step 3.1. The resultant slices are compressed in step 3.2, using a combination of run length encoding (RLE) and bitmask based compression [9]. Step 4 returns the compressed con- trol words. Algorithm 1: Multi-dictionary compression Input: i) control words with don't cares, I ii) number of slices n iii) threshold bits that can change t Output: Compressed control words C 1. W = remove_constant_bits (I) 2. S[] = slice_and_remove_less_frequent_bits (W, n, t) 3. forall sin S[] do end 3.1 S[i] = bitmask_aware_dont_care_resolve (s) 3.2 C[i] = RLE_bitmask_compress (S[i]) 4. Return C The complete compression and decompression methodology of control words is shown in Figure 5. In this example the input con- trol word is split into three slices. The input file containing the control words is passed to the compressor. The compressor reduces the control word size by applying the Algorithm 1 and produces the compressed file in the order of slices. Later each decoder fetches compressed words from different locations in the memory. These compressed words are then decoded using the dictionary stored on block RAM (BRAM). The decompressed control word is then as- sembled to form the original control word. Compressed CW Decompression Engine Control Words Figure 5: Compressed control word compression 4.1 Bitmask-Aware Don't Care Resolution In a generic NISC processor implementation not all functional units are involved in a given datapath, such functional units can be either enabled or disabled. The compiler [7] inserts don't care bits in such control words. To obtain maximum compression any com- pression algorithm can utilize these don't care values efficiently. One such algorithm presented in [3] creates a conflict graph with nodes representing unique control words and edges between them represent that these words cannot be merged. Application of min- imal k colors to these vertices results in k merged words. It is well known fact that vertex coloring is a NP-Hard problem. Hence a heuristic based algorithm proposed by Welsh and Powell [5] is applied to color the vertices to obtain merged dictionary. This al- gorithm is well suited in reducing the dictionary size with exact matches. The dictionary chosen by this algorithm might not yield a better bitmask coverage. An intuitive approach is to consider the fact that the dictionary entries will be used for bitmask based matching during compres- sion. Algorithm 2 describes the steps involved in choosing such a dictionary. The algorithm allows certain bits that can be bitmasked while creating a conflict graph. This reduces the dictionary size drastically. The algorithm allows certain bits that can be bitmasked to avoid them to be represented as edges in the conflict graph, thus allowing the graph to be colored with less number of colors. This results in dictionary size with smaller dictionary index bits thus re- ducing the final compressed code size. It may be noted that while merging the vertices if the bits are already set then bits originating from the most frequent words are retained. This promises reduced size as they result in more direct matches. Algorithm 2: Bitmask Aware Don't Care Resolution Input: i) Unique input control words C = { Ci, Ji}, ii) number and type of bitmasks b, B = { Si, ti} Output: merged control words M forall u in C do forall v in C do end end if bit_conflict (u, v) cannot be bitmasked using B then add (u,v) with Cuv = f u and (v,u) with Cvu = fv colors= wp_color_graph (G) sort_on_frequencies (G) forall clr E colors do M = merge all the nodes with same color clr Retain the bits of most frequent words while merging end Return M Figure 6 describes an example don't care resolution of NISC control words and a merging iteration. The input words and their frequencies are as shown in Figure 6 (a). There are four inputs A, B, C and D. Figure 6 (b) represents the conflict graph constructed by the don't care resolution algorithm in [3]. The algorithm chooses three colors which represents the merged dictionary entries. Our approach skips the edges which can be bitmasked as shown in Fig- ure 6 (c). This example uses one 1-bit bitmask to store differences. The final colors indicate the merged dictionary entries. The tra- ditional approach will require three dictionary entries, whereas our approach requires only two dictionary entries that results in savings of one bit in dictionary index. 4.2 Encoding Less Frequently Changing Bits Upon closer analysis of the control word sequence reveals that some bits are constant or changes less frequently throughout the code segment. Removal of such bits improves compression effi- ciency and does not affect matches provided by rest of the bits. The less frequently changing bits are encoded by using an unused bitmask value as a marker (01 in case of a 2-bit bitmask). A thresh- old number determines the number of times that a bit can change in the given location throughout the code segment. It is found that 10 to 15 is a good threshold for the benchmarks used in our exper- iments. Algorithm 3 lists the steps in eliminating the constant bits and less frequently changing bits. Initially the Algorithm 3 calcu- lates the number of ones and zeros in each bit position. In the next 324 A B C D OOOX 1 10 1 OOX 1 2 00011 5 011 XO 8 a) Control Words t-7r ~ ~ 00011 0 lOOXl @:vo11xo b) Traditional Vertex Coloring 1Â·Â·7Â® ~ 00011 011 XO c) Bitmask-aware Vertex Coloring Figure 6: Bitmask aware don't care resolution step only those bit positions that change less than threshold t are considered to be the initial skip map. For any given control word if there are more than one bit position that change, it is not profitable to encode all these bit changes. To avoid this condition, the last step of the algorithm updates the initial skip map by constructing a conflict map for each control word. The bit position which causes the least conflict is retained for skipping. Algorithm 3: Removal of Less Frequently Changing Bits Input: i) Control Words with don't cares D, ii) Threshold t number of bits Output: Skip Map S S=cp forall w in D do forall bi, ith bit in w do count_ones end end count_zeros create a skip_map of 0/1 or taken with count < threshold t. forall w in D do end if w has a conflict with skip _map then count the number of bits w conflicts with skip_map. if conj lict > 1 then remove most conflict bit from previously calculated skip_map. return S 1 0 1 1 0 0 0 0 OOOIOOOX IOIIOOOX OOOIOXOO 1(Do@o ox o OOIIOXOO I(DO 1 0 0 X 0 OOIOIOXO Skip map I - 0 - - 0 0 0 lo onstant bits 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 0 1 0 1 I - 1 -(6) 1 - - -I Conflict map with threshold 2 Figure 7: Removal of constant and less frequently changing bits Figure 7 shows an example control word sequence to demon- strate bit reduction. Each control word is scanned for number of ones and zeros in each bit position. The last three bit positions do not change throughout the input thus they are removed from the input, storing these bits in a skip map. Columns with bit changes less than threshold (2 in this example) i.e. column 2, 4 and 5 have less frequent bits changes. In the final step conflict map is created (listed at the bottom part of the figure) representing the number of collisions. The bit positions with collisions O or 1 are consid- ered for skipping, the remaining columns (column 4) are excluded from the initial skip map. The skip map and the bits which needs to be encoded are shown on the right side of the figure. It can be noted that there is a significant reduction in control word size for compression. The decompression section discusses how these less frequently changing bits are reassembled. 4.3 Run Length Encoding Careful analysis of the control words pattern reveals that the in- put control words contain consecutive repeating patterns of words. The bitmask based compression [9] encodes such patterns using same repeated compressed words. Instead we use a method in which repetition of such words are run length encoded (RLE). To represent such encoding no extra bits are needed. An interesting observation leads to the conclusion that bitmask value O is never used, because this value means that it is an exact match and would have encoded using zero bitmasks. Using this as a special marker, these repetitions can be encoded. This smart encoding reduces the extra bit that is required to indicate on all the compressed words otherwise. Another advantage of such run length encoding is that it allevi- ates the decompression overhead by providing the decompressed control word instantaneously to the configuration hardware in the same cycle. Figure 8 illustrates the bitmask-based RLE. The in- put contains, control word "0000 0000" repeating 5 times. In nor- mal bitmask-based compression these control words will be com- pressed with repeated compressed words, whereas our approach re- places such repetitions using a bitmask of "00". The number of rep- etition is encoded as bitmask offset and dictionary bits combined together. In this example, the bitmask offset is "10" and dictionary index is 'O'. Therefore, the number of repetition will be "100" i.e., 4 (in addition to the first occurrence). The compressed words are run length encoded only if the savings made by RLE word encod- ing is greater than the actual encoding. In other words, if there are r repetition of compressed words and cost of representing each con- trol word is x bits and the number of bits required to encode run length is y bits then RLE is used only if x * r < y bits. Input 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1111 Dictionary withoutRLE 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 0000 0000 0000 1111 with RLE 0 1 0 0 0 10 00 0 0 1 1 Figure 8: An illustrative example of RLE with bitmask 5. DECOMPRESSION MECHANISM This section analyzes the modification required for the decom- pression engine proposed in [9]. Figure 9 describes the structure 325 of the NISC control word decompression engine. The decompres- sion hardware consists of multiple decoding units for each slice of compressed control words. Each decoder contains input buffer to store the incoming data from memory. Based on the type of com- pressed word, control is passed to the corresponding decode unit. Each decoding engine has a skip map register to insert extra bits that were removed during less frequently changing bit reduction. A separate unit to toggle these bits handles the insertion of these difference bits. This unit reads the offset within the skip map reg- ister to toggle the bit and places it in the output buffer. All outputs from decoding engine are then directed to the skip map for con- stant bits which holds the completely skipped bits (bits that never change). The output from constant bit register will be connected to controller for execution. Compressed Control Words Input Data Buffer Compression Flag Register Matching Flag Register r I 1~1-~ /----~ Bypass Bit Location Decoder r--'-~~~-~--+~~ Less Frequent ~~-~-........ -~-~ Bits Skip Map Constant Bits Skip Map From Decoder 1 0 Output Buffer From Decoder n Figure 9: Multi-dictionary based decompression engine 6. EXPERIMENTS The effectiveness of our proposed compression technique is mea- sured using MiBench benchmarks [3]. In particular we use adpcm- coder, adpcmdecoder, crc32, dijkstra, sha, and fpmp3 programs. These application are commonly used embedded software in mo- bile, network, security and telecom domains. These benchmarks are also used in the existing control word compression technique [3]. We evaluate the compression ratio and resources used by de- compression engine (BRAMs) against the compression approach proposed by Gorjiara et al. [3] and original bitmask based com- pression [9]. 6.1 Compression Performance The benchmarks are compiled in release mode using NISC com- piler [7]. The profitable parameters selected for bitmask based compression are determined by the width of the control word. For example a control word between 16 and 32 bits, dictionary size of ~ m-BMC - opl :::: m-BMC - op2 Sim-BMC - op3 :::: m-BMC - op4 70.00% ';-------------------------------------------------------------------------------------------------------------- iHg~ F.----:----_-_-_-_-_-_-_-_-_-_-_-_-_1---_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_,_-_-_-_-_-_-_-_-_-_-_-_-_-_-~_-_-_-_-_-_-_-_-_-_-_-_- 50_00Yo ,--~--------------~---------------~--------------------------------1--------------l------------ tt! Will~)ili Figure 10: Our approach using multiple dictionaries 16, two bitmasks of each 2-bit sliding is selected. For a control word less than 16 bits, dictionary size of 8, single bitmask of 2-bit sliding is selected for compression. Figure 10 compares the compression ratios of our approach (m- BMC) using single (m-BMC-opl) and multiple (two: m-BMC-op2, three: m-BMC-op3, four: m-BMC-op4) dictionaries. We split the input control words equally into the number of dictionaries used. For each control word slice we select the profitable parameters mentioned above for bitmask based compression. The three dic- tionary option clearly outperforms the other combination except in the case of sha and crc32 program, where two and four dictio- nary options respectively result in better compression ratio. Overall we find that using three dictionary option is better for most of the benchmarks. ~ BMC :,:: m-BMC - our approach :::::: c: ______________________ ;-----------1---------------------------------------------------------------- 35_00% t----i-------------1------------1---------------~-------------1--------------------------- 30_00% t-1------------1 -------1 '-------1------------1 '--------------- ~= 1---1 -------1 -------1 --------1 -------1 -------1 ---- 20_00% -,---~--------,----~---------.----~--------,---ll--------,----~---------,--_[I _______ __ Figure 11: Bitmask-based compression versus our approach Figure 11 compares our approach (using three dictionaries) with the existing bitmask based compression technique (BMC [9]). Our bitmask-based RLE approach combined with constant and less fre- quently changing bits outperforms the existing bitmask based com- pression method [9] by an average of 20 to 30%. Figure 12 compares the compression ratios between the exist- ing multi-dictionary compression technique proposed by Gorjiara et al. [3] and our approach. Both approaches use three dictionaries. On an average, our approach outperforms over NISC compression technique [3] by 15 to 20%. It is important to note that our ap- proach uses up to six times less number of BRAMs to store the dictionary compared to [3]. 6.2 Decompression Overhead Our approach automatically generates the Verilog based decom- pression engine with selected compression parameters. The de- compression engine can operate at the speed of the decoding units capable of operating at 100 MHz in the range of NISC processor operating range. The dictionary size used in all the benchmarks are 326 ~ NISC- [3] :::: m-BMC - our approach 45_00% -,-------------------------------------------------------------------------------------------------------------- :::: i,r-1,i>::-li>:::_111_ Ill_" - 30 - 00 % r--1:::::-------1 r-----1-------1------------1 r-------1:::::---- 25_00% ,---l,=========--------l==========-------l===========--------l==========--------l==========-------l==========---- 20_00% ,----~---------,---~--------,--_[l ______________ ~---------,---ll--------,----~---------. Figure 12: Our approach versus existing NISC compression small and limited. In our approach, the BRAM used to store these dictionaries are fixed requiring 1 or 2 BRAMs, whereas the existing method [3] uses up to six BRAMs. 7. CONCLUSIONS This paper presented a bitmask based compression technique to reduce the size of NISC control words by splitting and compress- ing them using multiple dictionaries. We designed a bitmask aware don't care resolution that produces dictionary having large bitmask coverage with minimal and restricted dictionary size. We devel- oped an efficient RLE technique that encodes the consecutive repet- itive patterns to improve both compression efficiency and decom- pression performance. This paper also developed an efficient way of encoding constant and less frequently changing bits to signifi- cantly reduce the control word size. Our approach improved com- pression efficiency by 20 to 30 % over the best known compression technique [3] with no additional decompression overhead. 8. ACKNOWLEDGMENTS This work was partially supported by NSF CAREER award 07 46261. We would like to thank Dr. Bita Gorjiara and Dr. Mehrdad Reshadi for their insightful comments and suggestions about NISC technol- ogy and control word compression. 9. REFERENCES [1] A. Wolfe and A. Chanin. Executing compressed programs on an embedded RISC architecture. MICRO, 81-91, 1992. [2] C. Lefurgy et al. Improving code density using compression techniques. MICRO, 194-203, 1997. [3] B. Gorjiara and D. Gajski. FPGA-friendly code compression for horizontal microcoded custom IPs. FPGA, 108-115, 2007. [4] H. Lekatsas and J. Henkel and V. Jakkula. Design of an one-cycle decompression hardware for performance increase in embedded systems. DAC, 34-39, 2002. [5] T. Jensen and B. Toft. Graph coloring problems. Discrete Mathematics and Optimization. Wiley-Interscience, 1995. [6] N. Ishiura and M. Yamaguchi. Instruction code compression for application specific VLIW processors based of automatic field partitioning. SAS/Ml, 105-109, 1997. [7] M. Reshadi. No-Instruction-Set-Computer (NISC) technology modeling and compilation. PhD thesis, UC Irvine, 2007. [8] S. Liao, S. Devadas and K. Keutzer. Code density optimization for embedded DSP processors using data compression techniques. Adv. Res. VLSI, 393-399, 1995. [9] S. Seang and P. Mishra. Bitmask-based code compression for embedded systems. IEEE Trans. CAD, 27(4): 673-685, 2008. 4/30/2019 Bitmask-based control word compression for NISC architectures https://dl.acm.org/citation.cfm?id=1531616 2/2