11677096 (P.T.A.B. Apr. 9, 2014)

Ex Parte Praveena et al

Patent Trial and Appeal BoardApr 9, 2014

11677096 (P.T.A.B. Apr. 9, 2014)

UNITED STATES PATENT AND TRADEMARK OFFICE 1 ___________ 2 3 BEFORE THE PATENT TRIAL AND APPEAL BOARD 4 ___________ 5 6 Ex parte MYLAVARAPU PRAVEENA and BHASHYAM RAMESH 7 ___________ 8 9 Appeal 2011-010553 10 Application 11/677,096 11 Technology Center 2100 12 ___________ 13 14 15 Before ANTON W. FETTING, JOSEPH A. FISCHETTI, and 16 NINA L. MEDLOCK, Administrative Patent Judges. 17 18 FETTING, Administrative Patent Judge. 19 20 21 DECISION ON APPEAL 22 Appeal 2011-010553 Application 11/677,096 2 STATEMENT OF THE CASE1 1 1 Our decision will make reference to the Appellants’ Appeal Brief (“App. Br.,” filed November 17, 2010) and the Examiner’s Answer (“Ans.,” mailed February 16, 2011). Mylavarapu Praveena and Bhashyam Ramesh (Appellants) seek review 2 under 35 U.S.C. § 134 of a non-final rejection of claims 1-7, the only claims 3 pending in the application on appeal. We have jurisdiction over the appeal 4 pursuant to 35 U.S.C. § 6(b). 5 The Appellants invented a way of selecting for use a stored execution 6 plan for a dynamic SQL query within a database system. (Specification 7 1:26-27). 8 An understanding of the invention can be derived from a reading of 9 exemplary claim 1, which is reproduced below [bracketed matter and some 10 paragraphing added]. 11 1. A method of 12 selecting for use 13 a stored or previously compiled execution plan 14 for a dynamic SQL query 15 within a database system, 16 the method comprising: 17 [1] maintaining respective selectivity values 18 associated with one or more predicates 19 in the dynamic SQL query 20 for respective historical data values; 21 [2] maintaining respective confidence level values 22 associated with one or more 23 of the selectivity values; 24 [3] receiving one or more data values 25 with which to execute the dynamic SQL query; 26 [4] calculating respective selectivity values 27 Appeal 2011-010553 Application 11/677,096 3 for one or more of the predicates 1 in the dynamic SQL query 2 for the received data value(s); 3 [5] comparing 4 the stored selectivity values 5 with 6 respective corresponding calculated selectivity values; 7 and 8 [6] selecting for execution 9 the stored execution plan 10 on detecting substantial equality 11 between the respective pairs of compared values. 12 13 The Examiner relies upon the following prior art: 14 Chaudhuri ’901 US 6,529,901 B1 Mar. 4, 2003 Chaudhuri ’779 US 2005/0228779 A1 Oct. 13, 2005 15 Claims 1-7 stand rejected under 35 U.S.C. § 103(a) as unpatentable over 16 Chaudhuri ’901 and Chaudhuri ’779. 17 ISSUES 18 The issues of obviousness turn primarily on whether the optimizer in 19 Chaudhuri ’901 uses statistics for estimation of selectivity of predicates. 20 FACTS PERTINENT TO THE ISSUES 21 The following enumerated Findings of Fact (FF) are believed to be 22 supported by a preponderance of the evidence. 23 Facts Related to Claim Construction 24 01. Selectivity is an estimate of the percentage of rows that will be 25 returned by a filter in a query. Spec. 8:6-7. 26 02. “Statistics” is not lexicographically defined in the Specification. 27 28 Appeal 2011-010553 Application 11/677,096 4 Facts Related to the Prior Art 1 Chaudhuri ’779 2 01. Chaudhuri ’779 is directed to database query optimization and 3 more particularly to selectivity estimation for query plan 4 evaluation. Chaudhuri ’779 para [0001]. 5 02. The task of the query optimizer is to select a low-cost query 6 plan. The execution cost of a query plan depends on a large 7 number of parameters, including the sizes of the relations being 8 queried, the selectivity of query operators, the amount of memory 9 available at query execution time, the number of concurrently 10 executing queries, the contents of the buffer cache, and the 11 physical layout of selected records on disk. Because many of 12 these factors are unknown at query compilation time, the standard 13 approach to query optimization is as follows: first, generate rough 14 guesses as to the values of the relevant parameters, using heuristic 15 rules or extrapolating from any available statistics. Next, using 16 the rough guesses as input, a search algorithm is invoked to find 17 the least costly plan. The search phase typically treats the 18 estimated parameter values as though they were completely 19 precise and accurate values, rather than the coarse estimates that 20 they actually are. This may lead to less predictable behavior by 21 the optimizer when it selects a query plan that promises a quick 22 query execution time, but is in reality based on estimated 23 selectivity values that are generated with relatively little 24 information and, therefore, low confidence. The execution time 25 Appeal 2011-010553 Application 11/677,096 5 penalty when the selectivity estimate is incorrect can be 1 significant. Chaudhuri ’779 para [0004]. 2 03. By specifying a desired threshold of confidence in selectivity 3 estimation for queries, a database system user or architect is able 4 to specify a level of tradeoff between predictability and 5 performance. Chaudhuri ’779 para [0005]. 6 04. A selectivity of a query expression on a database that stores 7 tuples is estimated by deriving a probability distribution for 8 possible selectivity values for the query expression. The 9 probability distribution is evaluated using a desired selectivity 10 confidence to derive the estimated selectivity. Chaudhuri ’779 11 para [0006]. 12 05. The probability distribution for possible selectivity values can 13 be determined by evaluating the query expression on an 14 appropriate precomputed random sample of database tuples to 15 determine an observed selectivity. A probability density function 16 can then be formed based on the observed selectivity using 17 Bayes’s rule. The probability density function can be derived, for 18 example, by using uniform or Jeffreys prior distribution. A 19 cumulative distribution can be inverted and solved to determine 20 the estimated selectivity based on the desired selectivity 21 confidence to produce a selectivity value such that the actual 22 selectivity, with the threshold confidence, is no higher than the 23 estimated selectivity. Chaudhuri ’779 para [0007]. 24 06. Selectivity estimation is an important subproblem in query 25 optimization. During query processing, the query optimizer 26 Appeal 2011-010553 Application 11/677,096 6 chooses from alternative execution plans by estimating and 1 comparing the time to execute, hence the cost of, the plans. The 2 amount of time that a particular query plan takes to execute is 3 dependent on the sizes of the relations accessed in the query, both 4 the base relations stored on disk and the temporary relations 5 produced at intermediate stages in the query plan. Therefore, 6 accurate estimation of the size or cardinality of relations and 7 intermediate results is important to choosing the most appropriate 8 execution plan. The sizes of relations produced as intermediate 9 results in a query plan can generally not be computed without first 10 executing the query plan, so in order to produce an estimate for 11 the cost of a query plan, the query optimizer relies on quickly-12 computed estimates of the sizes of intermediate relations. 13 Chaudhuri ’779 para [0013]. 14 Chaudhuri ’901 15 07. Chaudhuri ’901 is directed to query optimization for database 16 systems. Chaudhuri ’901 1:14-15. 17 08. A query optimizer estimates the selectivity of queries and 18 generates efficient execution plans for queries. Query optimizers 19 generate execution plans based on the data distribution and other 20 statistical information on the column(s) of the table(s) referenced 21 in the queries. For example, information about data distribution is 22 used to approximate query processing, load balancing in parallel 23 database systems, and guiding the process of sampling from a 24 relation. The increasing importance of decision support systems 25 has amplified the need to ensure that optimizers produce query 26 Appeal 2011-010553 Application 11/677,096 7 plans that are as optimal as possible. The quality of the optimizer 1 is the most important factor in determining the quality of the 2 plans. The query optimizer component of a database system relies 3 on the statistics on the data in the database for generating query 4 execution plans. The availability of the necessary statistics can 5 greatly improve the quality of plans generated by the optimizer. 6 In the absence of statistics, the cost estimates can be dramatically 7 different, often resulting in a poor choice of execution plans. On 8 the other hand, the presence of statistics that are not useful may 9 incur a substantial overhead due to cost of creation and the cost of 10 keeping them updated. Chaudhuri ’901 1:30-55. 11 09. Essential statistics identification utility tool 230 for other 12 examples may identify an initial essential set of statistics E 204 13 for step 304 of FIG. 3 in accordance with a suitable magic number 14 sensitivity analysis technique depicted in flow diagram 1100 of 15 FIG. 11. Chaudhuri ’901 17:11-15. 16 10. A technique called magic number sensitivity analysis (MNSA) 17 can be used to identify candidate statistics that cannot have an 18 impact on the plan of the query without creating them in the first 19 place. Using this technique it is possible to significantly reduce 20 the cost of creating statistics for a query without affecting the 21 quality of the execution plan generated. The goal of MNSA is to 22 determine that statistics on a column are not useful without first 23 building statistics on that column. Chaudhuri ’901 17:16-33. 24 11. To implement MNSA, it is necessary to select a concept of 25 equivalence to be used when comparing an existing set of 26 Appeal 2011-010553 Application 11/677,096 8 statistics with a proposed set of statistics. Three possible choices 1 for equivalence concepts are Execution-Tree equivalence, 2 Optimizer-Cost equivalence, and t-Optimizer-Cost equivalence. 3 Two sets of statistics S and S' are Execution-Tree equivalent for a 4 query Q if the optimizer generates the same query execution tree 5 for Q for both choices of S and S'. Execution-Tree equivalence is 6 the strongest notion of equivalence since it implies execution cost 7 equivalence. Two sets of statistics S and S' are optimizer-cost 8 equivalent for query Q if the optimizer-estimated costs of the 9 plans are the same irrespective of whether S or S' is used. In such 10 cases, the plans generated by the optimizer for Q can still be 11 different. Thus, this is a weaker notion of equivalence than 12 Execution-Tree equivalence. t-Optimizer-Cost equivalence 13 generalizes optimizer-cost equivalence. Chaudhuri ’901 17:34-51. 14 12. While any number of equivalence concepts may be used to 15 implement MNSA successfully, for purposes of this description 16 the third notion of equivalence, t-Optimizer-Cost equivalence will 17 be used. Estimated-Cost (Q,S) denotes the optimizer-estimated 18 cost for query Q when the existing set of statistics is S, two sets of 19 statistics S and S' are t-Optimizer-Cost equivalent if Estimated-20 Cost (Q,S) and Estimate-Cost (Q,S') are within t % of each other. 21 This definition reflects the intuition that while analyzing the 22 choice of statistics, it may be advantageous to ignore the 23 difference among plans that differ in cost in a limited fashion. 24 The value of t reflects the degree of rigor used to enforce 25 Appeal 2011-010553 Application 11/677,096 9 equivalence. A value of t=20% has proven to be effective in 1 experiments on Microsoft SQL Server. Chaudhuri ’901 17:52-67. 2 13. In general, MNSA has two components, first a technique which 3 detects when creating additional statistics would have no effect on 4 the quality of plans generated by the optimizer. This can happen 5 when, for the current database and the query, the existing set of 6 statistics is adequate for obtaining the "best" possible plan for the 7 query (i.e., as good a plan as having all the statistics). The second 8 component to MNSA is a strategy for selecting the next statistic to 9 create from the remaining candidate statistics when it is 10 determined that additional statistics would improve the quality of 11 the plan. Chaudhuri ’901 18:1-10. 12 14. The first component of MNSA tests if the existing set of 13 statistics for a database (denoted S) includes an essential set of Q 14 but without creating statistics in C-S, where C denotes the set of 15 candidate statistics for the query. A key to MNSA is to consider 16 how the presence of statistics impacts optimization of queries. 17 The optimizer uses statistics primarily for estimation of 18 selectivity of predicates. For example, if a histogram is available 19 on R.a and the query Q has a condition R.a<10, then the histogram 20 is used to estimate a selectivity for the predicate. If statistics 21 appropriate for a predicate are not available, then the optimizer 22 uses a default “magic number” for the corresponding selectivity. 23 Magic numbers are systemwide constants between 0 and 1 that are 24 predetermined for various kinds of predicates. For example, 25 consider query Q in the above example. Since statistics on the 26 Appeal 2011-010553 Application 11/677,096 10 column E.Age are not present, most relational optimizers use a 1 default magic number, such as 0.30, for the selectivity of the range 2 predicate on E.Age. Thus, for an SPJ query Q, the dependence of 3 the optimizer on statistics can be conceptually characterized by a 4 set of selectivity variables, with one selectivity variable 5 corresponding to each predicate in Q. The specific value used for 6 the selectivity variable for a predicate must be between 0 and 1. 7 The value is determined either by using an existing statistic or by 8 a default magic number. Chaudhuri ’901 18:11-35. 9 ANALYSIS 10 We are not persuaded by the Appellants’ argument that Chaudhuri ’779 11 generally describes "estimated selectivity values" but does not describe, 12 suggest, or otherwise allude to "stored selectivity values." App. Br. 5. As 13 the Examiner found at Answer 7, all values used in a computer are 14 inherently and necessarily stored, for otherwise there would be no way to 15 access such values. 16 We are not persuaded by the Appellants’ argument that a step of 17 "creating statistics" (as in Chaudhuri ’901) is not a step of "estimating 18 selectivity values" (as in Chaudhuri ’779)." App. Br. 6. The optimizer in 19 Chaudhuri ’901 uses statistics primarily for estimation of selectivity of 20 predicates. FF 14. The statistics produce a selectivity value. 21 We are not persuaded by the Appellants’ argument that 22 even arguendo if Chaudhuri 1 were to be modified in view of 23 Chaudhuri 2 (as proposed by the Examiner in the present 24 application), the resulting modification would be a database 25 system method in which selectivity values are estimated based 26 Appeal 2011-010553 Application 11/677,096 11 on a probability distribution and in which statistics are created 1 using an MNSA technique. The resulting modification would 2 not be a database system method in which stored selectivity 3 values are compared with respective corresponding calculated 4 selectivity values. 5 App. Br. 6-7. 6 The optimizer in Chaudhuri ’901uses statistics primarily for estimation of 7 selectivity of predicates. FF 14. The statistics produce a selectivity value. 8 The Appellants’ fourth argument at Appeal Brief 7 is premised entirely 9 on the third argument. 10 We are not persuaded by the Appellants’ argument that since Chaudhuri 11 ’901 selectivity is generated from statistics, the two (selectivity and 12 statistics) cannot be equivalent. Reply Br. 2. The optimizer in Chaudhuri 13 ’901 uses statistics primarily for estimation of selectivity of predicates. 14 FF 14. The statistics produce a selectivity value. As a selectivity value is 15 simply an estimate of the percentage of rows that will be returned by a filter 16 in a query, the statistics behind that estimate provide such an estimate. It is 17 at least predictable to convert the statistics to the estimate prior to 18 comparison, although as the statistics imply the estimate, basing 19 comparisons on the statistics without incurring an unnecessary conversion 20 would be within the scope of using selectivity values, particularly as the 21 claims do not recite a particular format for how the selectivity value is 22 represented. 23 CONCLUSIONS OF LAW 24 The rejection of claims 1-7 under 35 U.S.C. § 103(a) as unpatentable 25 over Chaudhuri ’901 and Chaudhuri ’779 is proper. 26 Appeal 2011-010553 Application 11/677,096 12 DECISION 1 The rejection of claims 1-7 is affirmed. 2 No time period for taking any subsequent action in connection with this 3 appeal may be extended under 37 C.F.R. § 1.136(a). See 37 C.F.R. 4 § 1.136(a)(1)(iv) (2011). 5 6 7 AFFIRMED 8 9 10 11 Klh 12