UNITED STATES PATENT AND TRADEMARK OFFICE
UNITED STATES DEPARTMENT OF COMMERCE
United States Patent and Trademark Office
Address: COMMISSIONER FOR PATENTS
P.O. Box 1450
Alexandria, Virginia 22313-1450
www.uspto.gov
APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO.
15/214,245 07/19/2016 Xiping Cao FOR-303 1018
88268 7590 09/07/2021
Law Office of Dorian Cartwright
P.O. Box 6629
San Jose, CA 95150
EXAMINER
FERRER, JEDIDIAH P
ART UNIT PAPER NUMBER
2164
NOTIFICATION DATE DELIVERY MODE
09/07/2021 ELECTRONIC
Please find below and/or attached an Office communication concerning this application or proceeding.
The time period for reply, if any, is set in the attached communication.
Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the
following e-mail address(es):
eofficeaction@appcoll.com
uspto@cartwrightesq.com
vibrantnet@yahoo.com
PTOL-90A (Rev. 04/07)
UNITED STATES PATENT AND TRADEMARK OFFICE
BEFORE THE PATENT TRIAL AND APPEAL BOARD
Ex parte XIPING CAO and YE MA
Appeal 2020-004613
Application 15/214,245
Technology Center 2100
Before AMBER L. HAGY, DAVID J. CUTITTA II, and
MICHAEL J. ENGLE, Administrative Patent Judges.
HAGY, Administrative Patent Judge.
DECISION ON APPEAL
STATEMENT OF THE CASE
Pursuant to 35 U.S.C. § 134(a), Appellant1 appeals from the
Examiner’s decision to reject claims 1–25, which are all of the pending
claims. See Final Act. 1–2; Brief 5. We have jurisdiction under
35 U.S.C. § 6(b).
We affirm.
1 “Appellant” herein refers to “applicant” as defined in 37 C.F.R.
§ 1.42. Appellant identifies the real party in interest as Fortinet, Inc.
Brief 3.
Appeal 2020-004613
Application 15/214,245
2
CLAIMED SUBJECT MATTER
The subject matter of the present application pertains to “web page
classification,” and in particular to “systems and methods for web page
classification/categorization based on removal of noisy content/tags/
hyperlinks, and classifying the web page based on the remaining meaningful
content/hyperlinks.” Spec. ¶ 2. By way of background, the Specification
describes the process of web page classification for purposes of, e.g.,
“providing relevant web directories/pages to a search user” and “improving
the quality of search results,” as well as “blocking/filtering web pages that
contain objectionable material/content.” Id. ¶ 4. The Specification notes
that one drawback of existing web-page classification systems is that they
“typically use the complete content and hyperlinks of a web page . . .
regardless of their relevance to the webpage, and hence do not yield the most
accurate classification.” Id. ¶ 8. The Specification describes an
improvement to the accuracy of web content classification by “removing
perceived noise,” wherein
a system receives a Uniform Resource Locator (URL) of a web
page to be classified, and parses the web page so as to construct
a tree containing a list of tags. Unwanted tags are removed
from the list of tags to yield a tree containing only desired tags
that form part of the web page. Subsequently, a list of
hyperlinks are based on processing of the tree having desired
tags, wherein the list of hyperlinks can include unwanted/
undesired/invalid hyperlinks and valid hyperlinks. Unwanted
hyperlinks can accordingly be removed from the list of
hyperlinks, and each valid hyperlink can be categorized based
on a list of categories, and a final category for the web page is
determined based on a vector analysis of each category
assigned to each valid hyperlink.
Id. ¶ 9.
Appeal 2020-004613
Application 15/214,245
3
Claims 1, 10, and 19 are independent. Claim 1, reproduced below
with disputed limitations italicized, is representative:
1. A system for web page classification comprising:
a non-transitory storage device having embodied therein
one or more routines operable to facilitate categorization of
content of a web page; and
one or more processors coupled to the non-transitory
storage device and operable to execute the one or more
routines, wherein the one or more routines include:
a Uniform Resource Locator (URL) receive
module, which when executed by the one or more
processors, receives a URL of a web page to be
categorized;
a URL tree construction module, which when
executed by the one or more processors, constructs a tree
for the web page, wherein the tree represents a layout and
a hierarchy of a plurality of markup language tags that
are used to represent the web page;
a tag based filtration module, which when
executed by the one or more processors, filters out from
the tree a first set of markup language tags from the
plurality of markup language tags to retain desired
markup language tags in the tree that are indicative of
relevant and actual content displayed by or linked by the
web page;
a hyperlink list retrieval module, which when
executed by the one or more processors, retrieves from
the tree a list of hyperlinks that form part of the web page
based on processing of the desired markup language
tags;
a valid hyperlink list generation module, which
when executed by the one or more processors, processes
the list of hyperlinks to generate a valid hyperlink list
based on rejection of any or a combination of irrelevant
hyperlinks, stop hyperlinks, and hyperlinks having a
Appeal 2020-004613
Application 15/214,245
4
distance from a valid hyperlink of greater than a defined
threshold; and
a valid hyperlink list based categorization module,
which when executed by the one or more processors,
processes the valid hyperlink list to associate a final
category from a plurality of categories with the web
page.
Brief 23–24 (Claims App.).
REFERENCES
The Examiner relies on the following references:
Name2 Reference Date
Henkin US 2011/0213655 A1 Sept. 1, 2011
Sandhaus US 2012/0254726 A1 Oct. 4, 2012
Gattani US 8,315,849 B1 Nov. 20, 2012
Brown US 9,311,423 B1 April 12, 2016
Kislyuk US 9,356,941 B1 May 31, 2016
Katzer US 10,083,222 B1 Sept. 25, 2018
REJECTIONS
Claims 1, 4–6, 8–10, 13–15, 17–19, 22, 24, and 25 stand rejected
under 35 U.S.C. § 103 as obvious over the combined teachings of Katzer,
Sandhaus, and Kislyuk. Final Act. 4–21.3
2 All references are cited using the first-named inventor.
3 The statement of the rejection on page 4 of the Final Action omits
reference to claims 4 and 24; however, those claims are specifically
identified as rejected on pages 19 and 21, respectively, of the Final Action.
The statement of the rejection on page 4 of the Final Action also includes
claims 2 and 11; however, those claims rejected on a different ground on
pages 22–25 of the Final Action.
Appeal 2020-004613
Application 15/214,245
5
Claims 2, 11, and 20 stand rejected under 35 U.S.C. § 103 as obvious
over the combined teachings of Katzer, Sandhaus, Kislyuk, Lee, and Gattani.
Final Act. 22–25.
Claims 3, 12, and 21 stand rejected under 35 U.S.C. § 103 as obvious
over the combined teachings of Katzer, Sandhaus, Kislyuk, and Brown.
Final Act. 25–26.
Claims 7, 16, and 23 stand rejected under 35 U.S.C. § 103 as obvious
over the combined teachings of Katzer, Sandhaus, Kislyuk, and Henkin.
Final Act. 27–28.
OPINION
We have considered Appellant’s arguments (Brief 12–22) in light of
the Examiner’s findings and explanations (Final Act. 4–28; Ans. 4–10). For
the reasons set forth below, we are not persuaded of Examiner error in the
rejections of the pending claims, and we, therefore, sustain the Examiner’s
rejections.
Appellant argues only claim 1 with particularity, and argues the
remaining claims are patentable for the reasons expressed as to claim 1.
See Brief 20–21. Therefore, based on Appellant’s arguments, we decide the
appeal of claims 1–25 based on claim 1 alone. See 37 C.F.R.
§ 41.37(c)(1)(iv) (2019).
The Examiner relies on a combination of Katzer, Sandhaus, and
Kislyuk as teaching or suggesting the subject matter of claim 1. Final
Act. 4–9. Appellant argues the Examiner’s findings are in error with regard
to three limitations, which are discussed in turn below. See Brief 10–20.
A. “tag based filtration module . . . ”
Appeal 2020-004613
Application 15/214,245
6
With regard to the disputed limitation “a tag based filtration module,
which . . . filters out from the tree a first set of markup language tags from
the plurality of markup language tags to retain desired markup language tags
. . . that are indicative of relevant and actual content . . . ,” the Examiner
relies on a combination of Katzer and Sandhaus. Final Act. 5–7 (citing
Katzer Abs., 4:64–5:22, 14:58–65, Fig. 3A; Sandhaus ¶¶ 2–5, 52, 55). In
particular, the Examiner finds Katzer teaches an application that parses
portions of a web page to distinguish words (which the Examiner interprets
as “tags”) and to identify keywords (which the Examiner interprets as
“desired tags”). Id. at 5 (citing Katzer Abs., 4:64–5:22, 14:58–65, Fig. 3A).
The Examiner finds Katzer does not expressly disclose constructing a tree
for the web page and filtering tags from the tree (id. at 6), but finds
Sandhaus teaches constructing a tree for a web page, “wherein the tree
represents a layout and a hierarchy of a plurality of markup language tags
that are used to represent the web page” (id. at 7 (citing Sandhaus ¶¶ 49–54,
Figs. 2–4)), and also teaches a parser module that trims the tree by removing
all HTML tags “that may be the root node of a sub tree that may contain a
substantial amount of link text from the parse tree and associated sub trees”
(id. (citing Sandhaus ¶¶ 52, 55, Figs. 4–5)).
Appellant argues the Examiner’s findings are in error because the
trimming described in Sandhaus is of tags “that are the root of a subtree
containing ‘a substantial amount of link text.’” Brief 13 (emphasis omitted).
Appellant contends such tags “are not undesired tags” that are filtered “to
retain desired markup language tags,” but rather “are HTML tags that trigger
analysis of a ratio of link text and a comparison of that ratio to a threshold to
Appeal 2020-004613
Application 15/214,245
7
determine whether to delete the subtree at issue.” Id. at 14 (emphasis
omitted).
The Examiner responds by stating that Sandhaus’ teaching of
trimming a parse tree does “result in filtering a portion of the tree that
contains tags . . . retaining a desired remainder of the tree that also contains
tags, which would essentially be the retained desired tags (such as the
HTML or HTML tags described in ¶ 0055).” Ans. 5.
The Examiner further finds that, by teaching retaining tags indicating leaf
nodes “with the desired link text to text ratio,” Sandhaus teaches “desired
tags indicative of their respective content.” Id.
We are not persuaded of Examiner error in this finding. Sandhaus
describes “systems and methods for automatically detecting and extracting
semantically significant text from a HyperText Markup Language (‘HTML’)
document.” Sandhaus ¶ 2. Sandhaus describes that “the existence of
irrelevant text in a web page may increase the likelihood that a search engine
will return irrelevant pages.” Id. ¶ 3. “[I]nsignificant text,” according to
Sandhaus, includes “template text,” which includes “headers, footers,
navigation, and advertisements.” Id. ¶ 21. Sandhaus also describes “link
text”—that is, “text associated with HTML links”—as insignificant, and
states that a parser module may delete tags that are the root node of a sub
tree that contain “an amount of link text greater than a threshold amount.”
Id. ¶ 32. As the Examiner finds, and we agree, Sandhaus’ disclosure teaches
or suggests removing content/tags that are undesired (e.g., tags indicating
leaf nodes containing more than a threshold amount of semantically
insignificant “link text”), thus “resulting in retained/remaining desired
information.” See Ans. 5.
Appeal 2020-004613
Application 15/214,245
8
The Examiner also notes that, with regard to this disputed limitation,
the rejection is based on the combination of Katzer and Sandhaus, in which
Katzer is “used especially to teach tags indicative of the relevant/actual
content.” Id. The Examiner states that, as noted above, Sandhaus also
teaches “tags indicative of relevant/actual content,” but states “it is primarily
used in conjunction with Katzer in a manner where Katzer teaches the
majority of the limitation, and Sandhaus addresses deficiencies regarding
retaining desired tags and regarding markup language tags.” Id. at 5–6.
Appellant does not address the Examiner’s findings regarding the
combination of Katzer and Sandhaus. It is well established that one cannot
show nonobviousness by attacking references individually where the
rejections are based on combinations of references. See In re Keller, 642
F.2d 413, 425 (CCPA 1981); In re Merck & Co., 800 F.2d 1091 (Fed. Cir.
1986).
B. “hyperlink list retrieval module . . . ”
With regard to the limitation “hyperlink list retrieval module which
. . . retrieves from the tree a list of hyperlinks that form part of the web page
based on processing of the desired markup language tags,” the Examiner
again relies on a combination of Katzer and Sandhaus, as described above,
and additionally finds Sandhaus teaches “outputting a trimmed parse tree
that was trimmed to exclude the sub tree 510 of Fig. 5, ¶ 0056 [which would
not be desired tags] and then outputting segments/paths.” Final Act. 7–8.
Appellant argues the Examiner’s findings are in error for essentially
the same reasons argued as to the “tag based filtration module” limitation—
that is, because “to the extent Sandhaus retrieves a list of hyperlinks from
the trimmed parse tree, it is not ‘based on processing of the desired markup
Appeal 2020-004613
Application 15/214,245
9
language tags’ as required.” Brief 18. In other words, Appellant’s emphasis
in this argument is over whether the list of hyperlinks is based on processing
of desired markup language tags. Appellant again argues that Sandhaus
does not trim the tree to retain only “desired markup language tags,” but
instead filters based on whether the link represents a subtree that contains an
amount of text greater than or equal to a threshold of link text. Id. at 17.
This is the same reasoning Appellant used in challenging the Examiner’s
findings as to the “tag based filtration module limitation,” which we find
unpersuasive for essentially the reasons stated by the Examiner, as noted
above. See Ans. 5–6. Because Appellant’s argument as to the “hyperlink
list retrieval module” is premised on the same alleged error, it is also not
persuasive of Examiner error in the rejection.
C. “valid hyperlink list generation module . . . ”
With regard to the limitation “a valid hyperlink list generation
module, which . . . processes the list of hyperlinks to generate a valid
hyperlink list based on rejection of any or a combination of irrelevant
hyperlinks . . . ,” the Examiner relies on the combination of Katzer,
Sandhaus, and Kislyuk. In particular, the Examiner finds Katzer teaches
categorizing and evaluating URLs, storing an entry for each evaluated URL,
and creating a pool of validated URLs. Final Act. 6 (citing Katzer Abst.,
15:4–19, Fig. 3A). The Examiner additionally relies on Kislyuk as teaching
classifying certain links as irrelevant, finding Kislyuk teaches, e.g.,
classifying web pages (identified by URLs) as “suspicious, malicious, or
benign,” which the Examiner finds “can be considered irrelevant” to a list of
validated URLs. Final Act. 9 (citing Kislyuk, 6:11–13, 6:25–30, 6:48–53,
8:55–67, 10:39–46, Fig. 3).
Appeal 2020-004613
Application 15/214,245
10
Appellant argues the Examiner errs in finding “malicious” web pages
can be considered “irrelevant,” and insists “malicious web pages appear to
be highly relevant to establishment of Kislyuk’s classification model as the
classification model is said to be ‘based at least in part on the plurality of
malicious web pages.’” Brief 19–20 (emphasis omitted). The Examiner
responds by noting that “the claims do not provide an explicit indication to
what the hyperlinks are irrelevant.” Ans. 9. The Examiner then interprets
“irrelevant hyperlinks” as “unwanted hyperlinks that are to be removed.” Id.
We are not persuaded of Examiner error in the rejection. First, we
conclude that the Examiner’s broad interpretation of “irrelevant hyperlinks”
is reasonable in light of the Specification, which describes “unwanted
hyperlinks” to be removed as including “irrelevant hyperlinks.” E.g., Spec.
¶ 27. We also agree with the Examiner that Kislyuk teaches attempting to
prevent users from being redirected to web pages identified by malicious
hyperlinks, and “therefore, the malicious web pages do directly correspond
to unwanted hyperlinks that are to be removed.” Ans. 9–10; see Kislyuk
6:11–13, 6:25–30, 6:48–53, 8:55–67, 10:39–46, Fig. 3. We are not
persuaded of Examiner error in finding that the combination of Katzer and
Kislyuk discloses creating a list of valid hyperlinks, which excludes
“irrelevant” hyperlinks—that is, hyperlinks that are unwanted. See Final
Act. 8–9; Ans. 9–10.
For the foregoing reasons, we are not persuaded of Examiner error in
the rejection of claim 1, and we, therefore, sustain that rejection, along with
the rejection of independent claims 10 and 19, and all of the dependent
claims, which are all argued collectively with claim 1. See Brief 20–21.
Appeal 2020-004613
Application 15/214,245
11
CONCLUSION
We sustain the Examiner’s obviousness rejections of claims 1–25.
DECISION SUMMARY
In summary:
Claim(s)
Rejected
35 U.S.C. § Reference(s)/Basis Affirmed Reversed
1, 4–6, 8–
10, 13–
15, 17–
19, 22,
24, 25
103 Katzer, Sandhaus,
Kislyuk
1, 4–6, 8–
10, 13–15,
17–19, 22,
24, 25
2, 11, 20 103 Katzer, Sandhaus,
Kislyuk, Lee, Gattani
2, 11, 20
3, 12, 21 103 Katzer, Sandhaus,
Kislyuk, Brown
3, 12, 21
7, 16, 23 103 Katzer, Sandhaus,
Kislyuk, Henkin
7, 16, 23
Overall
Outcome:
1–25
TIME PERIOD FOR RESPONSE
No time period for taking any subsequent action in connection with
this appeal may be extended under 37 C.F.R. § 1.136(a). See 37 C.F.R.
§ 1.136(a)(1)(iv).
AFFIRMED