Note: Descriptions are shown in the official language in which they were submitted.
CA 03150187 2022-02-07
METHOD AND APPARATUS FOR PROTECTING WEB SCRIPT CODES
BACKGROUND OF THE INVENTION
Technical Field
[0001] The present invention relates to the technical field of cyber security,
and more particularly
to a method and an apparatus for protecting web script codes.
Description of Related Art
[0002] Web script codes are a kind of interpreted language. The language can
be run without the
need of compiling it into binary machine codes in advance. By opening a page
in a
browser and loading web script source codes, these codes can be run, so the
source codes
of the web scripts are totally open and noting is confidential. It enables
easy breakpoint
debugging in browsers, and this endangers critical front-end business logics
very much.
In fact, the first step the underground industry takes for attack is usually
analyzing script
codes of front-end webpages.
[0003] In order to avoid such a danger, common solutions for providing
protection are to
obfuscate and encrypt web script codes. However, both of them cannot
satisfyingly
defend against malicious code analysis. First, obfuscation of web script codes
is about
simplification in nature, including simplifying codes, simplifying variable
naming,
removing comments, and simplifying sentences, but makes no difference in terms
of code
logic. This only change is increased reading costs. For example, the UglifyJS
is a
commonly used obfuscation tool for web script codes. Codes as a result of its
obfuscation
processing can still be beautified by some code beautifiers to the extent that
they are
almost as readable as the source codes. Secondary, the encrypted web script
codes are not
secure. Since decryption means or private keys can only be provided in the
form of web
1
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
script codes, it is not impossible for skilled people in the art to find the
decryption means
or private keys. For example, encrypted character strings can be easily
decrypted by
simply calling the encryption function used for encryption.
[0004] To sum up, normal means for obfuscation and encryption can be defeated
by professional
malicious code analysis. A well trained code breaker can directly acquire
business logics
in codes and accordingly forge webpage requests.
SUMMARY OF THE INVENTION
[0005] The objective of the present invention is to provide a method and an
apparatus for
protecting web script codes, which effectively protect web script codes from
malicious
code analysis.
[0006] To achieve the foregoing objective, in a first aspect the present
invention provides a
method for protecting web script codes. The method comprises:
[0007] analyzing the web script codes by means of a code analysis tool, so as
to obtain a tree
code structure composed of plural nodes;
[0008] traversing the nodes to be encrypted in the tree code structure, and
encrypting and
converting each of the nodes to be encrypted based on offset parameters
successively in
a bottom-to-top order, until the nodes to be encrypted at a top layer have all
been
converted, so as to generate encrypted bytecodes;
[0009] configuring virtual machine interpreters used to construct and execute
the encrypted
bytecodes according to the offset parameters; and
[0010] packaging and storing the virtual machine interpreters and the
encrypted bytecodes in
web script code documents for executing a calling.
[0011] Preferably, before the step of analyzing the web script codes by means
of a code analysis
tool, so as to obtain a tree code structure composed of plural nodes, the
method further
comprises:
[0012] performing initial obfuscation on source codes of a web script by means
of a code
obfuscation tool, so as to obtain the web script codes.
2
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
[0013] More preferably, after the step of performing initial obfuscation on
source codes of a web
script by means of a code obfuscation tool, so as to obtain the web script
codes, the
method further comprises:
[0014] based on the web script codes, selecting and marking a part or all of
the script codes with
protection code blocks, in which the protection code blocks comprise entry
mark
information.
[0015] Preferably, the tree code structure comprises node information
corresponding to each of
the nodes, and the node information comprises information about types, names,
sub-
nodes, and locations of the nodes.
[0016] Preferably, the offset parameters are randomly and dynamically
generated based on the
current web script codes.
[0017] Preferably, the step of traversing the nodes to be encrypted in the
tree code structure, and
encrypting and converting each of the nodes to be encrypted based on offset
parameters
successively in a bottom-to-top order, until the nodes to be encrypted at a
top layer have
all been converted, so as to generate encrypted bytecodes comprises:
[0018] with reference to the marked protection code blocks, screening out the
nodes to be
encrypted from the tree code structure;
[0019] identifying the nodes to be encrypted corresponding to the entry marks
so as to perform
deep traversal downward, and recording information about the type, length, and
contents
of each said node to be encrypted; and
[0020] based on the offset parameters, encrypting and converting the nodes to
be encrypted in
every layer from bottom to top by means of a recursion method, until the nodes
to be
encrypted at the top layer have been converted, so as to generate the
encrypted bytecodes.
[0021] More preferably, the step of configuring virtual machine interpreters
used to construct
and execute the encrypted bytecodes according to the offset parameters
comprises:
[0022] according to the type of every node in the tree code structure, pre-
generating the virtual
machine code corresponding to each said node to be encrypted, respectively;
and
[0023] using the shift parameters to configure and generate a said unique
virtual machine
interpreter capable of interpreting and executing the said virtual machine
code.
3
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
[0024] Further, the step of packaging and storing the virtual machine
interpreters and the
encrypted bytecodes in web script code documents comprises:
[0025] when the virtual machine interpreters and the encrypted bytecodes
correspond to each
other in a one-to-one manner, packing mutually corresponding said virtual
machine
interpreter and said encrypted bytecodes together, respectively, and storing
the packages
in independent web script code documents, respectively; and
[0026] when one said virtual machine interpreter corresponds to plural said
encrypted bytecodes,
packaging the virtual machine interpreter and the plural said encrypted
bytecodes
separately, and storing the virtual machine interpreter and the encrypted
bytecodes in
separate said web script code documents at the same time, respectively.
[0027] As compared to the prior art, the method for protecting web script
codes of the present
invention has the following beneficial effects:
[0028] The method for protecting web script codes provided by the present
invention first
converts web script codes into a tree code structure by means of a code
analysis tool, and
then based on the shift parameters, encrypts and converts the nodes to be
encrypted in
every layer from bottom to top by means of a recursion method, until the nodes
to be
encrypted at the top layer have been converted, so as to generate the
encrypted bytecodes.
Correspondingly, with the foregoing configuration of the shift parameter, the
method can
dynamically generate virtual machine interpreters that interpret and execute
the encrypted
bytecodes. At last, the virtual machine interpreters and the encrypted
bytecodes are
packaged and stored in web script code documents, so that terminal software
(such as a
browser) only needs to run the virtual machine codes. This prevents code
breakers from
setting breakpoints for business logic codes of web script codes, and
significantly reduces
the efficiency in dynamic debugging codes. Meanwhile, since the web script
codes are
converted into encrypted bytecodes, they cannot be dynamically matched and
differentiated using normal regular expressions. It is thus clear that the
disclosed scheme
makes it more difficult to break web script codes, thereby realizing enhanced
protection
for web script codes.
[0029] In another aspect, the present invention provides an apparatus for
protecting web script
4
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
codes, which is applied to the method for protecting web script codes as
recited in the
foregoing technical scheme. The apparatus comprises:
[0030] an initially-obfuscating unit, for performing initial obfuscation on
source codes of a web
script by means of a code obfuscation tool, so as to obtain the web script
codes;
[0031] a code-block-marking unit, for based on the web script codes, selecting
and marking a
part or all of the script codes with protection code blocks, in which the
protection code
blocks comprise entry mark information;
[0032] a code-analyzing unit, for analyzing the web script codes by means of a
code analysis
tool, so as to obtain a tree code structure composed of plural nodes;
[0033] an encrypting-and-converting unit, for traversing the nodes to be
encrypted in the tree
code structure, and encrypting and converting each of the nodes to be
encrypted based on
offset parameters successively in a bottom-to-top order, until the nodes to be
encrypted
at a top layer have all been converted, so as to generate encrypted bytecodes;
[0034] a virtual-machine-generating unit, for according to the shift
parameters, configuring a
virtual machine interpreter that is used to generate, interpret and execute
the encrypted
bytecodes; and
[0035] a packaging unit, for packaging and storing the virtual machine
interpreters and the
encrypted bytecodes in web script code documents for executing a calling.
[0036] As compared to the prior art, the disclosed apparatus for protecting
web script codes
provides beneficial effects that are similar to those provided by the
disclosed method for
protecting web script codes as enumerated above, and thus no repetitions are
made herein.
[0037] In a third aspect the present invention provides a computer readable
storage medium,
storing thereon a computer program. When the computer program is executed by a
processor, it implements the steps of the method for querying multi-
dimensional data as
described previously.
[0038] As compared to the prior art, the disclosed computer-readable storage
medium provides
beneficial effects that are similar to those provided by the disclosed method
for protecting
web script codes as enumerated above, and thus no repetitions are made herein.
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
BRIEF DESCRIPTION OF THE DRAWINGS
[0039] The accompanying drawing is provided herein for better understanding of
the present
invention and form a part of this disclosure. The illustrative embodiments and
their
descriptions are for explaining the present invention and by no means form any
improper
limitation to the present invention, wherein:
[0040] FIG. 1 is a flowchart of a method for protecting web script codes
according to one
embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0041] To make the foregoing objectives, features, and advantages of the
present invention
clearer and more understandable, the following description will be directed to
some
embodiments as depicted in the accompanying drawings to detail the technical
schemes
disclosed in these embodiments. It is, however, to be understood that the
embodiments
referred herein are only a part of all possible embodiments and thus not
exhaustive. Based
on the embodiments of the present invention, all the other embodiments can be
conceived
without creative labor by people of ordinary skill in the art, and all these
and other
embodiments shall be embraced in the scope of the present invention.
Embodiment 1
[0042] Referring to FIG. 1, the present embodiment provides a method for
protecting web script
codes. The method comprises:
[0043] analyzing the web script codes by means of a code analysis tool, so as
to obtain a tree
code structure composed of plural nodes; traversing the nodes to be encrypted
in the tree
code structure, and encrypting and converting each of the nodes to be
encrypted based on
offset parameters successively in a bottom-to-top order, until the nodes to be
encrypted
at a top layer have all been converted, so as to generate encrypted bytecodes;
configuring
virtual machine interpreters used to construct and execute the encrypted
bytecodes
according to the offset parameters; and packaging and storing the virtual
machine
6
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
interpreters and the encrypted bytecodes in web script code documents for
executing a
calling.
[0044] The method for protecting web script codes provided by the present
invention, which is
suitable for environments running webpage scripting language (JavaScript),
such as
mainstream browsers, various small programs, and so on, firstly converts web
script
codes into a tree code structure by means of a code analysis tool, and then
based on the
shift parameters, encrypts and converts the nodes to be encrypted in every
layer from
bottom to top by means of a recursion method, until the nodes to be encrypted
at the top
layer have been converted, so as to generate the encrypted bytecodes.
Correspondingly,
with the foregoing configuration of the shift parameter, the method can
dynamically
generate virtual machine interpreters that interpret and execute the encrypted
bytecodes.
At last, the virtual machine interpreters and the encrypted bytecodes are
packaged and
stored in web script code documents, so that terminal software (such as a
browser) only
needs to run the virtual machine codes. This prevents code breakers from
setting
breakpoints for business logic codes of web script codes, and significantly
reduces the
efficiency in dynamic debugging codes. Meanwhile, since the web script codes
are
converted into encrypted bytecodes, they cannot be dynamically matched and
differentiated using normal regular expressions. It is thus clear that the
disclosed scheme
makes it more difficult to break web script codes, thereby realizing enhanced
protection
for web script codes.
[0045] Specifically, in the embodiment described above, before the step of
analyzing the web
script codes by means of a code analysis tool, so as to obtain a tree code
structure
composed of plural nodes, the method further comprises: performing initial
obfuscation
on source codes of a web script by means of a code obfuscation tool, so as to
obtain the
web script codes.
[0046] In particular implementations, a code obfuscation tool is used to
obfuscate the web script
source codes, so as to remove comments from source codes and simplify variable
names
and method names with meaning. This is to make source codes of web scripts
less clear,
thereby making breaking attempts more time-consuming and more difficult.
7
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
[0047] Further, in the embodiment described above after the step of performing
initial
obfuscation on source codes of a web script by means of a code obfuscation
tool, so as to
obtain the web script codes, the method further comprises: based on the web
script codes,
selecting and marking a part or all of the script codes with protection code
blocks, in
which the protection code blocks comprise entry mark information.
[0048] In particular implementations, the web script codes may be optionally
marked with
protection code blocks. The marking means that this part of web script codes
is reinforced.
In the protection code blocks, marks indicating to skip protection may be
added for web
script codes requiring complex computation to exclude the marked codes from
protection.
These codes will not be executed in the virtual machine interpreters, but the
results will
still be saved inside the virtual machine. Since compute-intensive source
codes may have
their computing performance degraded due to reinforcement, if they are marked
as being
skipped form protection, computing performance at critical codes can be
improved. In
addition, if there is not any mark throughout the web script, it is regarded
that all the
codes in the web script are reinforced by default. It is understandable that
the marking
made in the terms of code comments or code character instructions is only
presented as a
kind of marks.
[0049] Specifically, in the embodiment described above, web script codes are
analyzed by a code
analysis tool, so as to obtain a tree code structure composed of plural nodes.
Therein, the
tree code structure includes node information corresponding to each node. The
node
information includes the type, the name, the contents and the location of the
node.
[0050] The tree code structure is generated by describing the contents of code
nodes in a one-to-
one manner, and this allows convenient collection and analysis of complete
information
of the codes. The tree code structure includes comprehensive information of
the source
codes, such as node types, names, contents, and locations, and is
interchangeable with the
source codes in a one-to-one manner. With the tree code structure, circular
traversal of
the codes can be achieved easily. It is to be noted that the code analysis
tool may be any
of various options, such as the commonly used Babel.js, and the present
embodiment sets
no limitations thereto.
8
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
[0051] In order to further ensure security of ciphertext, in the embodiment
described above, the
shift parameters are randomly and dynamic generated based on the current web
script
codes. The shift parameters may include core parameter values of the
ciphertext, such as
character strings, variable names, node lengths, and so on. The shift
parameters are used
for configuration of the virtual machine interpreters and generation of the
encrypted
bytecodes, and ensure one-to-one match between the virtual machine
interpreters and the
encrypted bytecodes in use. Additionally, since the shift parameters are
randomly and
dynamic generated based on the current web script codes, even for the same web
script
source codes, different shift parameters will lead to generation of different
encrypted
bytecodes, which eventually makes the virtual machine codes are different.
Therefore, it
is impossible to write a reverse program applicable to all virtual machine
interpreters,
thereby making code breading even more time-consuming and difficult.
[0052] In the embodiment described above, the step of traversing the nodes to
be encrypted in
the tree code structure, and encrypting and converting each of the nodes to be
encrypted
based on offset parameters successively in a bottom-to-top order, until the
nodes to be
encrypted at a top layer have all been converted, so as to generate encrypted
bytecodes
comprises:
[0053] with reference to the marked protection code blocks, screening out the
nodes to be
encrypted from the tree code structure; identifying the nodes to be encrypted
corresponding to the entry marks so as to perform deep traversal downward, and
recording information about the type, length, and contents of each said node
to be
encrypted; and based on the offset parameters, encrypting and converting the
nodes to be
encrypted in every layer from bottom to top by means of a recursion method,
until the
nodes to be encrypted at the top layer have been converted, so as to generate
the encrypted
bytecodes.
[0054] In particular implementations, the entry mark is found from the tree
code structure, and
deep traversal is performed downward from the corresponding node to be
encrypted.
During the traversal, information about the types, lengths, and contents of
every node to
be encrypted is recorded. Upon completion of the traversal, the nodes to be
encrypted are
9
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
encrypted and converted layer by layer from top to bottom using the shift
parameters
according to the recursion method, until the nodes to be encrypted at the top
layer have
been converted, the conversion of the entire tree code structure is completed.
Exemplarily,
the encrypted node is of the structure of an encrypted character string
composed of type
(1 character) + length (2 characters) + contents (n characters). Therein, the
type (1
character) and the length (2 characters) are the current shift parameters.
When interpreting
the encrypted nodes, the virtual machine interpreters restore the contents n
character by
deducting the type 1 character and the length 2 characters from the actually
expressed
value of the length successively.
[0055] An example is now described for easy understanding. An assignment
expression of a=1
will generate: type (the assignment expression) + length (x characters) +
contents (the left
node + the right node), wherein the left node is also of the structure: type
(variable name
node) + length (1) + contents (variable name a); the right node is of the
similar structure:
type (the constant) + length (1) + contents (the number 1).
[0056] In the embodiment described above, the step of configuring virtual
machine interpreters
used to construct and execute the encrypted bytecodes according to the offset
parameters
comprises:
[0057] according to the type of every node in the tree code structure, pre-
generating the virtual
machine code corresponding to each said node to be encrypted, respectively;
and using
the shift parameters to configure and generate a said unique virtual machine
interpreter
capable of interpreting and executing the said virtual machine code. It is
understandable
that the virtual machine interpreters are actually simulating the operation of
every kind
of encrypted nodes. The interpreter code describes the behavior of the
encrypted node,
and the execution code of a non-encrypted node will not be added into the
virtual machine
interpreter. However, its node instructions or custom virtual instructions can
be copied to
increase the complexity of the virtual machine. The interpreting process
conducted by the
virtual machine interpreter is also the reverse process of encrypting the
encrypted
bytecode. For example, an encrypted node of the type of character strings will
be
extracted by the virtual machine interpreter and returned as contents of
character strings;
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
an encrypted node of the type of variables will be defined in the
corresponding virtual
memory space inside the virtual machine interpreter or acquired as
corresponding
variables; and an encrypted node of the type of function expressions will be
defined as a
new function in a virtual machine environment.
[0058] In the embodiment described above, the step of packaging and storing
the virtual machine
interpreters and the encrypted bytecodes in web script code documents
comprises:
[0059] when the virtual machine interpreters and the encrypted bytecodes
correspond to each
other in a one-to-one manner, packing mutually corresponding said virtual
machine
interpreter and said encrypted bytecodes together, respectively, and storing
the packages
in independent web script code documents, respectively; and when one said
virtual
machine interpreter corresponds to plural said encrypted bytecodes, packaging
the virtual
machine interpreter and the plural said encrypted bytecodes separately, and
storing the
virtual machine interpreter and the encrypted bytecodes in separate said web
script code
documents at the same time, respectively.
[0060] It is to be emphasized that, when separate packaging is used, plural
encrypted bytecodes
may share the same virtual machine interpreter. In this case, the point is to
ensure
consistency of the shift parameters. Preferably, the eventually generated web
script code
document may be further subjected to obfuscation and encryption through some
open-
source tools, such as a javascript-obfuscator, thereby further enhancing
difficulty in
reverse resolution.
Embodiment 2
[0061] The present embodiment provides an apparatus for protecting web script
codes, which
comprises:
[0062] an initially-obfuscating unit, for performing initial obfuscation on
source codes of a web
script by means of a code obfuscation tool, so as to obtain the web script
codes;
[0063] a code-block-marking unit, for based on the web script codes, selecting
and marking a
part or all of the script codes with protection code blocks, in which the
protection code
blocks comprise entry mark information;
11
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
[0064] a code-analyzing unit, for analyzing the web script codes by means of a
code analysis
tool, so as to obtain a tree code structure composed of plural nodes;
[0065] an encrypting-and-converting unit, for traversing the nodes to be
encrypted in the tree
code structure, and encrypting and converting each of the nodes to be
encrypted based on
offset parameters successively in a bottom-to-top order, until the nodes to be
encrypted
at a top layer have all been converted, so as to generate encrypted bytecodes;
[0066] a virtual-machine-generating unit, for according to the shift
parameters, configuring a
virtual machine interpreter that is used to generate, interpret and execute
the encrypted
bytecodes; and
[0067] a packaging unit, for packaging and storing the virtual machine
interpreters and the
encrypted bytecodes in web script code documents for executing a calling.
[0068] As compared to the prior art, the apparatus for protecting web script
codes of the present
embodiment provides beneficial effects that are similar to those provided by
the method
for protecting web script codes as enumerated in the previous embodiment, and
thus no
repetitions are made herein.
Embodiment 3
[0069] The present embodiment provides a computer-readable storage medium,
storing thereon
a computer program. When the computer program is executed by a processor, it
implements the steps of the method for protecting web script codes as
described
previously.
[0070] As compared to the prior art, the computer-readable storage medium of
the present
embodiment provides beneficial effects that are similar to those provided by
the disclosed
method for protecting web script codes as enumerated in the previous
embodiment, and
thus no repetitions are made herein.
[0071] As will be appreciated by people of ordinary skill in the art,
implementation of all or a
part of the steps of the method of the present invention as described
previously may be
realized by having a program instruct related hardware components. The program
may
be stored in a computer-readable storage medium, and the program is about
performing
12
Date Recue/Date Received 2022-02-07
CA 03150187 2022-02-07
the individual steps of the methods described in the foregoing embodiments.
The storage
medium may be a ROM/RAM, a hard drive, an optical disk, a memory card or the
like.
[0072] The present invention has been described with reference to the
preferred embodiments
and it is understood that the embodiments are not intended to limit the scope
of the present
invention. Moreover, as the contents disclosed herein should be readily
understood and
can be implemented by a person skilled in the art, all equivalent changes or
modifications
which do not depart from the concept of the present invention should be
encompassed by
the appended claims. Hence, the scope of the present invention shall only be
defined by
the appended claims.
13
Date Recue/Date Received 2022-02-07