2020004613 (P.T.A.B. Sep. 7, 2021)

Fortinet, Inc.

Patent Trials and Appeals BoardSep 7, 2021

2020004613 (P.T.A.B. Sep. 7, 2021)

UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 15/214,245 07/19/2016 Xiping Cao FOR-303 1018 88268 7590 09/07/2021 Law Office of Dorian Cartwright P.O. Box 6629 San Jose, CA 95150 EXAMINER FERRER, JEDIDIAH P ART UNIT PAPER NUMBER 2164 NOTIFICATION DATE DELIVERY MODE 09/07/2021 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): eofficeaction@appcoll.com uspto@cartwrightesq.com vibrantnet@yahoo.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte XIPING CAO and YE MA Appeal 2020-004613 Application 15/214,245 Technology Center 2100 Before AMBER L. HAGY, DAVID J. CUTITTA II, and MICHAEL J. ENGLE, Administrative Patent Judges. HAGY, Administrative Patent Judge. DECISION ON APPEAL STATEMENT OF THE CASE Pursuant to 35 U.S.C. § 134(a), Appellant1 appeals from the Examiner’s decision to reject claims 1–25, which are all of the pending claims. See Final Act. 1–2; Brief 5. We have jurisdiction under 35 U.S.C. § 6(b). We affirm. 1 “Appellant” herein refers to “applicant” as defined in 37 C.F.R. § 1.42. Appellant identifies the real party in interest as Fortinet, Inc. Brief 3. Appeal 2020-004613 Application 15/214,245 2 CLAIMED SUBJECT MATTER The subject matter of the present application pertains to “web page classification,” and in particular to “systems and methods for web page classification/categorization based on removal of noisy content/tags/ hyperlinks, and classifying the web page based on the remaining meaningful content/hyperlinks.” Spec. ¶ 2. By way of background, the Specification describes the process of web page classification for purposes of, e.g., “providing relevant web directories/pages to a search user” and “improving the quality of search results,” as well as “blocking/filtering web pages that contain objectionable material/content.” Id. ¶ 4. The Specification notes that one drawback of existing web-page classification systems is that they “typically use the complete content and hyperlinks of a web page . . . regardless of their relevance to the webpage, and hence do not yield the most accurate classification.” Id. ¶ 8. The Specification describes an improvement to the accuracy of web content classification by “removing perceived noise,” wherein a system receives a Uniform Resource Locator (URL) of a web page to be classified, and parses the web page so as to construct a tree containing a list of tags. Unwanted tags are removed from the list of tags to yield a tree containing only desired tags that form part of the web page. Subsequently, a list of hyperlinks are based on processing of the tree having desired tags, wherein the list of hyperlinks can include unwanted/ undesired/invalid hyperlinks and valid hyperlinks. Unwanted hyperlinks can accordingly be removed from the list of hyperlinks, and each valid hyperlink can be categorized based on a list of categories, and a final category for the web page is determined based on a vector analysis of each category assigned to each valid hyperlink. Id. ¶ 9. Appeal 2020-004613 Application 15/214,245 3 Claims 1, 10, and 19 are independent. Claim 1, reproduced below with disputed limitations italicized, is representative: 1. A system for web page classification comprising: a non-transitory storage device having embodied therein one or more routines operable to facilitate categorization of content of a web page; and one or more processors coupled to the non-transitory storage device and operable to execute the one or more routines, wherein the one or more routines include: a Uniform Resource Locator (URL) receive module, which when executed by the one or more processors, receives a URL of a web page to be categorized; a URL tree construction module, which when executed by the one or more processors, constructs a tree for the web page, wherein the tree represents a layout and a hierarchy of a plurality of markup language tags that are used to represent the web page; a tag based filtration module, which when executed by the one or more processors, filters out from the tree a first set of markup language tags from the plurality of markup language tags to retain desired markup language tags in the tree that are indicative of relevant and actual content displayed by or linked by the web page; a hyperlink list retrieval module, which when executed by the one or more processors, retrieves from the tree a list of hyperlinks that form part of the web page based on processing of the desired markup language tags; a valid hyperlink list generation module, which when executed by the one or more processors, processes the list of hyperlinks to generate a valid hyperlink list based on rejection of any or a combination of irrelevant hyperlinks, stop hyperlinks, and hyperlinks having a Appeal 2020-004613 Application 15/214,245 4 distance from a valid hyperlink of greater than a defined threshold; and a valid hyperlink list based categorization module, which when executed by the one or more processors, processes the valid hyperlink list to associate a final category from a plurality of categories with the web page. Brief 23–24 (Claims App.). REFERENCES The Examiner relies on the following references: Name2 Reference Date Henkin US 2011/0213655 A1 Sept. 1, 2011 Sandhaus US 2012/0254726 A1 Oct. 4, 2012 Gattani US 8,315,849 B1 Nov. 20, 2012 Brown US 9,311,423 B1 April 12, 2016 Kislyuk US 9,356,941 B1 May 31, 2016 Katzer US 10,083,222 B1 Sept. 25, 2018 REJECTIONS Claims 1, 4–6, 8–10, 13–15, 17–19, 22, 24, and 25 stand rejected under 35 U.S.C. § 103 as obvious over the combined teachings of Katzer, Sandhaus, and Kislyuk. Final Act. 4–21.3 2 All references are cited using the first-named inventor. 3 The statement of the rejection on page 4 of the Final Action omits reference to claims 4 and 24; however, those claims are specifically identified as rejected on pages 19 and 21, respectively, of the Final Action. The statement of the rejection on page 4 of the Final Action also includes claims 2 and 11; however, those claims rejected on a different ground on pages 22–25 of the Final Action. Appeal 2020-004613 Application 15/214,245 5 Claims 2, 11, and 20 stand rejected under 35 U.S.C. § 103 as obvious over the combined teachings of Katzer, Sandhaus, Kislyuk, Lee, and Gattani. Final Act. 22–25. Claims 3, 12, and 21 stand rejected under 35 U.S.C. § 103 as obvious over the combined teachings of Katzer, Sandhaus, Kislyuk, and Brown. Final Act. 25–26. Claims 7, 16, and 23 stand rejected under 35 U.S.C. § 103 as obvious over the combined teachings of Katzer, Sandhaus, Kislyuk, and Henkin. Final Act. 27–28. OPINION We have considered Appellant’s arguments (Brief 12–22) in light of the Examiner’s findings and explanations (Final Act. 4–28; Ans. 4–10). For the reasons set forth below, we are not persuaded of Examiner error in the rejections of the pending claims, and we, therefore, sustain the Examiner’s rejections. Appellant argues only claim 1 with particularity, and argues the remaining claims are patentable for the reasons expressed as to claim 1. See Brief 20–21. Therefore, based on Appellant’s arguments, we decide the appeal of claims 1–25 based on claim 1 alone. See 37 C.F.R. § 41.37(c)(1)(iv) (2019). The Examiner relies on a combination of Katzer, Sandhaus, and Kislyuk as teaching or suggesting the subject matter of claim 1. Final Act. 4–9. Appellant argues the Examiner’s findings are in error with regard to three limitations, which are discussed in turn below. See Brief 10–20. A. “tag based filtration module . . . ” Appeal 2020-004613 Application 15/214,245 6 With regard to the disputed limitation “a tag based filtration module, which . . . filters out from the tree a first set of markup language tags from the plurality of markup language tags to retain desired markup language tags . . . that are indicative of relevant and actual content . . . ,” the Examiner relies on a combination of Katzer and Sandhaus. Final Act. 5–7 (citing Katzer Abs., 4:64–5:22, 14:58–65, Fig. 3A; Sandhaus ¶¶ 2–5, 52, 55). In particular, the Examiner finds Katzer teaches an application that parses portions of a web page to distinguish words (which the Examiner interprets as “tags”) and to identify keywords (which the Examiner interprets as “desired tags”). Id. at 5 (citing Katzer Abs., 4:64–5:22, 14:58–65, Fig. 3A). The Examiner finds Katzer does not expressly disclose constructing a tree for the web page and filtering tags from the tree (id. at 6), but finds Sandhaus teaches constructing a tree for a web page, “wherein the tree represents a layout and a hierarchy of a plurality of markup language tags that are used to represent the web page” (id. at 7 (citing Sandhaus ¶¶ 49–54, Figs. 2–4)), and also teaches a parser module that trims the tree by removing all HTML tags “that may be the root node of a sub tree that may contain a substantial amount of link text from the parse tree and associated sub trees” (id. (citing Sandhaus ¶¶ 52, 55, Figs. 4–5)). Appellant argues the Examiner’s findings are in error because the trimming described in Sandhaus is of tags “that are the root of a subtree containing ‘a substantial amount of link text.’” Brief 13 (emphasis omitted). Appellant contends such tags “are not undesired tags” that are filtered “to retain desired markup language tags,” but rather “are HTML tags that trigger analysis of a ratio of link text and a comparison of that ratio to a threshold to Appeal 2020-004613 Application 15/214,245 7 determine whether to delete the subtree at issue.” Id. at 14 (emphasis omitted). The Examiner responds by stating that Sandhaus’ teaching of trimming a parse tree does “result in filtering a portion of the tree that contains tags . . . retaining a desired remainder of the tree that also contains tags, which would essentially be the retained desired tags (such as the

HTML or HTML tags described in ¶ 0055).” Ans. 5. The Examiner further finds that, by teaching retaining tags indicating leaf nodes “with the desired link text to text ratio,” Sandhaus teaches “desired tags indicative of their respective content.” Id. We are not persuaded of Examiner error in this finding. Sandhaus describes “systems and methods for automatically detecting and extracting semantically significant text from a HyperText Markup Language (‘HTML’) document.” Sandhaus ¶ 2. Sandhaus describes that “the existence of irrelevant text in a web page may increase the likelihood that a search engine will return irrelevant pages.” Id. ¶ 3. “[I]nsignificant text,” according to Sandhaus, includes “template text,” which includes “headers, footers, navigation, and advertisements.” Id. ¶ 21. Sandhaus also describes “link text”—that is, “text associated with HTML links”—as insignificant, and states that a parser module may delete tags that are the root node of a sub tree that contain “an amount of link text greater than a threshold amount.” Id. ¶ 32. As the Examiner finds, and we agree, Sandhaus’ disclosure teaches or suggests removing content/tags that are undesired (e.g., tags indicating leaf nodes containing more than a threshold amount of semantically insignificant “link text”), thus “resulting in retained/remaining desired information.” See Ans. 5. Appeal 2020-004613 Application 15/214,245 8 The Examiner also notes that, with regard to this disputed limitation, the rejection is based on the combination of Katzer and Sandhaus, in which Katzer is “used especially to teach tags indicative of the relevant/actual content.” Id. The Examiner states that, as noted above, Sandhaus also teaches “tags indicative of relevant/actual content,” but states “it is primarily used in conjunction with Katzer in a manner where Katzer teaches the majority of the limitation, and Sandhaus addresses deficiencies regarding retaining desired tags and regarding markup language tags.” Id. at 5–6. Appellant does not address the Examiner’s findings regarding the combination of Katzer and Sandhaus. It is well established that one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 425 (CCPA 1981); In re Merck & Co., 800 F.2d 1091 (Fed. Cir. 1986). B. “hyperlink list retrieval module . . . ” With regard to the limitation “hyperlink list retrieval module which . . . retrieves from the tree a list of hyperlinks that form part of the web page based on processing of the desired markup language tags,” the Examiner again relies on a combination of Katzer and Sandhaus, as described above, and additionally finds Sandhaus teaches “outputting a trimmed parse tree that was trimmed to exclude the sub tree 510 of Fig. 5, ¶ 0056 [which would not be desired tags] and then outputting segments/paths.” Final Act. 7–8. Appellant argues the Examiner’s findings are in error for essentially the same reasons argued as to the “tag based filtration module” limitation— that is, because “to the extent Sandhaus retrieves a list of hyperlinks from the trimmed parse tree, it is not ‘based on processing of the desired markup Appeal 2020-004613 Application 15/214,245 9 language tags’ as required.” Brief 18. In other words, Appellant’s emphasis in this argument is over whether the list of hyperlinks is based on processing of desired markup language tags. Appellant again argues that Sandhaus does not trim the tree to retain only “desired markup language tags,” but instead filters based on whether the link represents a subtree that contains an amount of text greater than or equal to a threshold of link text. Id. at 17. This is the same reasoning Appellant used in challenging the Examiner’s findings as to the “tag based filtration module limitation,” which we find unpersuasive for essentially the reasons stated by the Examiner, as noted above. See Ans. 5–6. Because Appellant’s argument as to the “hyperlink list retrieval module” is premised on the same alleged error, it is also not persuasive of Examiner error in the rejection. C. “valid hyperlink list generation module . . . ” With regard to the limitation “a valid hyperlink list generation module, which . . . processes the list of hyperlinks to generate a valid hyperlink list based on rejection of any or a combination of irrelevant hyperlinks . . . ,” the Examiner relies on the combination of Katzer, Sandhaus, and Kislyuk. In particular, the Examiner finds Katzer teaches categorizing and evaluating URLs, storing an entry for each evaluated URL, and creating a pool of validated URLs. Final Act. 6 (citing Katzer Abst., 15:4–19, Fig. 3A). The Examiner additionally relies on Kislyuk as teaching classifying certain links as irrelevant, finding Kislyuk teaches, e.g., classifying web pages (identified by URLs) as “suspicious, malicious, or benign,” which the Examiner finds “can be considered irrelevant” to a list of validated URLs. Final Act. 9 (citing Kislyuk, 6:11–13, 6:25–30, 6:48–53, 8:55–67, 10:39–46, Fig. 3). Appeal 2020-004613 Application 15/214,245 10 Appellant argues the Examiner errs in finding “malicious” web pages can be considered “irrelevant,” and insists “malicious web pages appear to be highly relevant to establishment of Kislyuk’s classification model as the classification model is said to be ‘based at least in part on the plurality of malicious web pages.’” Brief 19–20 (emphasis omitted). The Examiner responds by noting that “the claims do not provide an explicit indication to what the hyperlinks are irrelevant.” Ans. 9. The Examiner then interprets “irrelevant hyperlinks” as “unwanted hyperlinks that are to be removed.” Id. We are not persuaded of Examiner error in the rejection. First, we conclude that the Examiner’s broad interpretation of “irrelevant hyperlinks” is reasonable in light of the Specification, which describes “unwanted hyperlinks” to be removed as including “irrelevant hyperlinks.” E.g., Spec. ¶ 27. We also agree with the Examiner that Kislyuk teaches attempting to prevent users from being redirected to web pages identified by malicious hyperlinks, and “therefore, the malicious web pages do directly correspond to unwanted hyperlinks that are to be removed.” Ans. 9–10; see Kislyuk 6:11–13, 6:25–30, 6:48–53, 8:55–67, 10:39–46, Fig. 3. We are not persuaded of Examiner error in finding that the combination of Katzer and Kislyuk discloses creating a list of valid hyperlinks, which excludes “irrelevant” hyperlinks—that is, hyperlinks that are unwanted. See Final Act. 8–9; Ans. 9–10. For the foregoing reasons, we are not persuaded of Examiner error in the rejection of claim 1, and we, therefore, sustain that rejection, along with the rejection of independent claims 10 and 19, and all of the dependent claims, which are all argued collectively with claim 1. See Brief 20–21. Appeal 2020-004613 Application 15/214,245 11 CONCLUSION We sustain the Examiner’s obviousness rejections of claims 1–25. DECISION SUMMARY In summary: Claim(s) Rejected 35 U.S.C. § Reference(s)/Basis Affirmed Reversed 1, 4–6, 8– 10, 13– 15, 17– 19, 22, 24, 25 103 Katzer, Sandhaus, Kislyuk 1, 4–6, 8– 10, 13–15, 17–19, 22, 24, 25 2, 11, 20 103 Katzer, Sandhaus, Kislyuk, Lee, Gattani 2, 11, 20 3, 12, 21 103 Katzer, Sandhaus, Kislyuk, Brown 3, 12, 21 7, 16, 23 103 Katzer, Sandhaus, Kislyuk, Henkin 7, 16, 23 Overall Outcome: 1–25 TIME PERIOD FOR RESPONSE No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a). See 37 C.F.R. § 1.136(a)(1)(iv). AFFIRMED