Language selection

Search

Patent 2868948 Summary

Third-party information liability

Some of the information on this Web page has been provided by external sources. The Government of Canada is not responsible for the accuracy, reliability or currency of the information supplied by external sources. Users wishing to rely upon this information should consult directly with the source of the information. Content provided by external sources is not subject to official languages, privacy and accessibility requirements.

Claims and Abstract availability

Any discrepancies in the text and image of the Claims and Abstract are due to differing posting times. Text of the Claims and Abstract are posted:

  • At the time the application is open to public inspection;
  • At the time of issue of the patent (grant).
(12) Patent Application: (11) CA 2868948
(54) English Title: SYSTEM AND METHOD FOR IDENTIFYING EXPERTS ON SOCIAL MEDIA
(54) French Title: SYSTEME ET METHODE D'IDENTIFICATION D'EXPERTS SUR LES MEDIAS SOCIAUX
Status: Dead
Bibliographic Data
(51) International Patent Classification (IPC):
  • G06F 16/9038 (2019.01)
  • G06F 16/9536 (2019.01)
  • G06F 17/00 (2019.01)
  • H04L 12/16 (2006.01)
(72) Inventors :
  • BANSAL, NILESH (Canada)
  • KOUDAS, NICK (Canada)
(73) Owners :
  • THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO (Canada)
(71) Applicants :
  • THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO (Canada)
(74) Agent: BHOLE, ANIL
(74) Associate agent:
(45) Issued:
(22) Filed Date: 2014-10-24
(41) Open to Public Inspection: 2016-04-24
Availability of licence: N/A
(25) Language of filing: English

Patent Cooperation Treaty (PCT): No

(30) Application Priority Data: None

Abstracts

English Abstract


A system and method for identifying experts on social media and more
specifically to systems
and methods for identifying experts, topics and followers in social media
networks that may be
used to engage or track a wide and relevant audience for message targeting.


Claims

Note: Claims are shown in the official language in which they were submitted.


CLAIMS
1. A system for identifying one or more experts of a topic on a social
network, the system
comprising a server in communication over a network with a social network, the
server
comprising:
(a) a user interface unit configured to obtain a topical query representing
the topic;
(b) an obtaining unit configured to obtain social network data from the social

network, the social network data comprising one or more topical lists and a
social
graph representing user relationships in the social network, each topical list

identifying one or more users;
(c) a tokenizing unit configured to:
(i) tokenize titles of the topical lists and lexically group the tokens into
token
groupings; and
(ii) tokenize the topical query to determine at least one token grouping to
which the topical query corresponds; and
(d) a processing unit configured to:
(i) generate, for each user, a topic signature vector comprising topic
signature vector elements corresponding to the token groupings for which
the user is identified in the corresponding topical lists;
(ii) generate for each topic signature vector element an occurrence count
representing the number of times each of the token groupings is identified
for the user;
(iii) rank the users by their occurrence counts for the at least one token
groupings corresponding to the topical query; and
(iv) return a selected set of the ranked users as experts in the topic.
2. The system of claim 1, wherein the system is further configured to identify
one or more
related topics of interest to users interested in the topical query, wherein
the processing
unit is further configured to: determine, for each expert of the topical
query, other topics
for which the expert is identified; and generate a ranked list of the
determined other
topics using a scoring function.

3. The system of claim 1, wherein the system is further configured to identify
one or more
related topics of interest to users interested in the topical query, wherein:
(a) the obtaining unit is further configured to obtain social network messages
in the
social network data;
(b) the tokenizing unit is configured to:
(i) tokenize the social network messages and lexically group the tokens into
the token groupings; and
(c) the processing unit is configured to:
(i) determine a subset of the social network messages containing the topical
query;
(ii) generate an aggregate signature comprising summing the topic signature
vectors of the experts identified for the topical query;
(iii) determine other topics having a high occurrence count in the aggregate
signature; and
(iv) return a selected set of the other topics as secondary topics.
4. The system of claim 3, wherein generating the aggregate signature further
comprises
determining the number of followers of the experts.
5. The system of claim 3, wherein generating the aggregate signature further
comprises
analyzing a social graph to determine reach.
6. The system of claim 2, wherein the processing unit is further configured to
determine
conversations the identified experts participate in and share.
7. The system of claim 1, wherein the processing unit is further configured to
determine, for
a given user, other users having similar interests.
21

8. The system of claim 3, wherein the processing unit is further configured to
determine
changes in conversations around events by determining changes in aggregate
signatures over time.
9. The system of claim 8, wherein the changes in conversations enables an
identification of
users relevant to the conversations and insights into how conversations evolve
over
time.
10. The system of claim 8, wherein the processing unit is further configured
to identify times
at which the conversations change significantly.
11. A computer network implemented method for identifying one or more experts
of a topic
on a social network, the method comprising:
(a) obtaining a topical query representing the topic;
(b) obtaining social network data from the social network, the social network
data
comprising one or more topical lists and a social graph representing user
relationships in the social network, each topical list identifying one or more
users;
(c) tokenizing titles of the topical lists and lexically grouping the tokens
into token
groupings;
(d) tokenizing the topical query to determine at least one token grouping to
which the
topical query corresponds;
(e) generating, by a processing unit comprising one or more processors, for
each
user, a topic signature vector comprising topic signature vector elements
corresponding to the token groupings for which the user is identified in the
corresponding topical lists;
(f) generating, by the processing unit, for each topic signature vector
element an
occurrence count representing the number of times each of the token groupings
is identified for the user;
22

(g) ranking the users by their occurrence counts for the at least one token
groupings
corresponding to the topical query; and
(h) returning a selected set of the ranked users as experts in the topic.
12. The method of claim 11, further comprising identifying one or more related
topics of
interest to users interested in the topical query, by determining, for each
expert of the
topical query, other topics for which the expert is identified; and generate a
ranked list of
the determined other topics using a scoring function.
13. The method of claim 11, further comprising identifying one or more related
topics of
interest to users interested in the topical query, by:
(a) obtaining social network messages in the social network data;
(b) tokenizing the social network messages and lexically group the tokens into
the
token groupings;
(c) determining a subset of the social network messages containing the topical

query;
(d) generating an aggregate signature comprising summing the topic signature
vectors of the experts identified for the topical query;
(e) determining other topics having a high occurrence count in the aggregate
signature; and
(f) returning a selected set of the other topics as secondary topics.
14. The method of claim 13, wherein generating the aggregate signature further
comprises
determining the number of followers of the experts.
15. The method of claim 13, wherein generating the aggregate signature further
comprises
analyzing a social graph to determine reach.
23

16. The method of claim 12, further comprising determining conversations the
identified
experts participate in and share.
17. The method of claim 11, further comprising determining, for a given user,
other users
having similar interests.
18. The method of claim 13, further comprising determining changes in
conversations
around events by determining changes in aggregate signatures over time.
19. The method of claim 18, wherein the changes in conversations enables an
identification
of users relevant to the conversations and insights into how conversations
evolve over
time.
20. The method of claim 18, further comprising identifying times at which the
conversations
change significantly.
24

Description

Note: Descriptions are shown in the official language in which they were submitted.


CA 02868948 2014-10-24
1 SYSTEM AND METHOD FOR IDENTIFYING EXPERTS ON SOCIAL MEDIA
2 TECHNICAL FIELD
3 [0001] The following relates generally to a system and method for
identifying experts on
4 social media and more specifically to systems and methods for identifying
experts, topics and
followers in social media networks that may be used to engage or track a wide
and relevant
6 audience for message targeting.
7 BACKGROUND
8 [0002] Social media has transformed the way we interact online as
individuals and
9 consumers. At the same time, it is transforming the way businesses aim to
interact with their
customers and fans online. Before social media became mainstream, online
marketers and
11 advertisers resorted to the collection of behavioral online information
regarding individuals to
12 target their messages. Individuals were primarily targeted based on the
topical focus of the sites
13 they visited. For example, sports news sites might display advertising
related to the perceived
14 interests of sports fans. The general interests of sports fans would be
derived based on third
party market research (e.g., males aged 25-35 with interest in sports are also
interested in
16 certain types of movies or specific male grooming products).
17 [0003] In the early stages of the social web, bloggers on particular
topics with wide
18 followings were identified to endorse or sponsor specific products. At
the same time, bloggers
19 started serving advertisements on their blog real estate.
[0004] Social media is transforming the way marketers and advertisers spend
their budgets.
21 Novel ways to market online are gaining traction both from an academic
as well as a practical
22 point of view. In particular, influencer-based targeting in social media
has emerged as a very
23 popular way to market on social platforms (such as Twitter and
Facebook). Individuals are
24 identified as online experts in particular topics; they are either
incentivized to participate in
sponsored advertising by spreading the messages to their followers or the
platforms
26 automatically insert sponsored messages in their activity streams (as in
the case of
27 Twitter/Facebook advertising). Further, they may be targeted with
relevant content such that
28 they organically share it with their followers. The goal is to increase
brand awareness, by
29 increasing the number of impressions (e.g., how many followers see a
particular message) and
click-throughs to particular campaign (how many click on the link embedded in
the message)
31 with the ultimate goal to track conversions (how many end up purchasing
a product).
1

CA 02868948 2014-10-24
1 [0005] As an example, with more than 250 million users, Twitter
has emerged as a
2 prominent marketing and advertising vehicle in addition to being a
prominent social
3 communications platform.
4 SUMMARY
[0006] In one aspect a system for identifying one or more experts of a
topic on a social
6 network is provided, the system comprising a server in communication over
a network with a
7 social network, the server comprising: (a) a user interface unit
configured to obtain a topical
8 query representing the topic; (b) an obtaining unit configured to obtain
social network data from
9 the social network, the social network data comprising one or more
topical lists and a social
graph representing user relationships in the social network, each topical list
identifying one or
11 more users; (c) a tokenizing unit configured to: (i) tokenize titles of
the topical lists and lexically
12 group the tokens into token groupings; and (ii) tokenize the topical
query to determine at least
13 one token grouping to which the topical query corresponds; and (d) a
processing unit configured
14 to: (i) generate, for each user, a topic signature vector comprising
topic signature vector
elements corresponding to the token groupings for which the user is identified
in the
16 corresponding topical lists; (ii) generate for each topic signature
vector element an occurrence
17 count representing the number of times each of the token groupings is
identified for the user; (iii)
18 rank the users by their occurrence counts for the at least one token
groupings corresponding to
19 the topical query; and (iv) return a selected set of the ranked users as
experts in the topic.
[0007] In another aspect, a computer network implemented method for
identifying one or
21 more experts of a topic on a social network is provided, the method
comprising: (a) obtaining a
22 topical query representing the topic; (b) obtaining social network data
from the social network,
23 the social network data comprising one or more topical lists and a
social graph representing
24 user relationships in the social network, each topical list identifying
one or more users; (c)
tokenizing titles of the topical lists and lexically grouping the tokens into
token groupings; (d)
26 tokenizing the topical query to determine at least one token grouping to
which the topical query
27 corresponds; (e) generating, by a processing unit comprising one or more
processors, for each
28 user, a topic signature vector comprising topic signature vector
elements corresponding to the
29 token groupings for which the user is identified in the corresponding
topical lists; (f) generating,
by the processing unit, for each topic signature vector element an occurrence
count
31 representing the number of times each of the token groupings is
identified for the user; (g)
32 ranking the users by their occurrence counts for the at least one token
groupings corresponding
33 to the topical query; and (h) returning a selected set of the ranked
users as experts in the topic.
2

CA 02868948 2014-10-24
1 BRIEF DESCRIPTION OF THE DRAWINGS
2 [0008] The features of the invention will become more apparent in
the following detailed
3 description in which reference is made to the appended drawings wherein:
4 [0009] Fig. 1 is a block diagram of a system for identifying
experts on social media;
[0010] Fig. 2 is a flowchart illustrating the process of creating a topic
signature;
6 [0011] Fig. 3 illustrates a user interface for accessing the
system;
7 [0012] Fig. 4 illustrates a user interface for accessing the
system;
8 [0013] Fig. 5 illustrates a user interface for accessing the
system;
9 [0014] Fig. 6 is a sample Twitter user and topic graph;
[0015] Fig. 7 is a graph illustrating the potential effect of random
sampling on aspects of the
11 system;
12 [0016] Fig. 8 is a graph illustrating the potential effect of
random sampling on aspects of the
13 system;
14 [0017] Fig. 9 is a graph illustrating the potential effect of
random sampling on aspects of the
system;
16 [0018] Fig. 10 is a graph illustrating the potential effect of
random sampling on aspect of the
17 system; and
18 [0019] Fig. 11 is a graph illustrating the potential effect of
random sampling on the system.
19 DETAILED DESCRIPTION
[0020] It will be appreciated that for simplicity and clarity of
illustration, where considered
21 appropriate, reference numerals may be repeated among the figures to
indicate corresponding
22 or analogous elements. In addition, numerous specific details are set
forth in order to provide a
23 thorough understanding of the embodiments described herein. However, it
will be understood by
24 those of ordinary skill in the art that the embodiments described herein
may be practiced without
these specific details. In other instances, well-known methods, procedures and
components
26 have not been described in detail so as not to obscure the embodiments
described herein. Also,
27 the description is not to be considered as limiting the scope of the
embodiments described
28 herein.
3

CA 02868948 2014-10-24
1 [0021] It will also be appreciated that any module, unit,
component, server, computer,
2 terminal or device exemplified herein that executes instructions may
include or otherwise have
3 access to computer readable media such as storage media, computer storage
media, or data
4 storage devices (removable and/or non-removable) such as, for example,
magnetic disks,
optical disks, or tape. Computer storage media may include volatile and non-
volatile, removable
6 and non-removable media implemented in any method or technology for
storage of information,
7 such as computer readable instructions, data structures, program modules,
or other data.
8 Examples of computer storage media include RAM, ROM, EEPROM, flash memory
or other
9 memory technology, CD-ROM, digital versatile disks (DVD) or other optical
storage, magnetic
cassettes, magnetic tape, magnetic disk storage mother magnetic storage
devices, or any
11 other medium which can be used to store the desired information and
which can be accessed
12 by an application, module, or both. Any such computer storage media may
be part of the device
13 or accessible or connectable thereto. Any application or module herein
described may be
14 implemented using computer readable/executable instructions that may be
stored or otherwise
held by such computer readable media and executed by the one or more
processors.
16 [0022] Advertising and marketing on Twitter involves two crucial
steps. First, being able to
17 identify who are the "experts" on any topic on the platform and second,
being able to identify
18 sets of users with active "interest" on a particular topic. In the
context of Twitter, an expert in a
19 particular topic is represented as an account (user) that primarily
produces and shares content
related to that topic and has a wide following that actively engages with the
produced content
21 (sharing, re-tweeting, etc.). A user may demonstrate interest in a
particular topic if, for example
22 the user follows a number of experts in the topic and engages with the
content they produce.
23 [0023] A need exists to be able to identify experts on any given
topic and analytical
24 functions on the set of experts' accounts on a specific topic, such as
what other topics they are
experts in, what conversations they participate in, and what types of content
they share online.
26 A further need exists to identify other users (e.g., followers) that are
likely to be interested in a
27 given topic.
28 [0024] The following relates generally to systems and methods for
identifying experts on
29 social media. The system is configured to collect data on user
interaction, communication and
profile information to identify experts, topics and followers in social
networks that may be used
31 to engage or track a wide and relevant audience for marketing purposes.
In another aspect,
32 such information may be provided via a user interface to enable message
targeting decisions.
4

CA 02868948 2014-10-24
1 [0025] Social networks like Google+, Facebook, Twitter, and
Pinterest, have emerged as
2 vehicles for marketing and branding. Marketers seeking to engage with,
and advertise to,
3 consumers may wish to identify a network of experts in and followers of
given topics to whom to
4 market specific content, as interests in a topic may correlate to sales
of a given product or
service. Without a loss of generality, Twitter will be used herein as an
example of a social
6 platform from which content may be collected to provide data regarding
experts, topics and
7 followers. The techniques described may be applied equally well to any
other similar social
8 platform. The terms "follower", "marketer" and "social networks" are used
herein illustratively
9 and in a non-limiting manner. These terms could be substituted for
appropriate parties as
applicable to alternative implementations.
11 [0026] In another aspect, a system and method is provided for
characterizing the expertise
12 of particular social network users among a set of topics, including the
generation of a topic
13 signature for each user of a social network. A topic signature comprises
a list of all topics of
14 expertise of the user. Additionally, a system and method is provided to
produce an aggregate
signature. An aggregate signature comprises a list of topics in which a set of
users has
16 expertise. Both topic and aggregate signatures can be interpreted as a
ranking of most relevant,
17 for the purposes of reaching the largest audience, to least relevant
topics.
18 [0027] Many social networks provide advertising platforms with
tools for marketers. For
19 example, in use, a marketer would utilize the Twitter advertising
platform in one of the following
three ways: Firstly, the advertiser provides a set of Twitter user handles,
and Twitter targets
21 advertisements to the followers of these accounts. Being able to
identify sets of experts in any
22 topic readily aids advertisers to identify the most relevant accounts to
provide advertisements to
23 while instigating a Twitter advertising campaign.
24 [0028] Secondly, the advertiser bids on a list of topics on
Twitter. Twitter, using their own
proprietary algorithms, identifies which users are interested in the topic and
subsequently
26 targets those users with messages "promoted" by the advertisers
(inserting them in their tweet
27 stream). By analyzing related topics for a topic of interest,
advertisers can identify possibly
28 cheaper topics to bid on. For example, if the price for 'social
marketing' is too high, 'seo as a
29 related topic, which may have a relatively lower bid price may be used
instead. The
effectiveness of the campaign may be the same, due to the substantial overlap
between the
31 two.
32 [0029] Thirdly, the advertisers bid on search keywords (to target
searches input to the
33 Twitter search feature). Information on Twitter is temporal by nature
and events evolve with
5

CA 02868948 2014-10-24
1 time, thus the keywords used in searches evolve over time. When a keyword
is used during a
2 search query on Twitter for which an advertisement exists, the platform
will display promoted
3 messages (as advertising) along with the search results. Advertisers may
wish to identify
4 keywords related to a queried keyword at a given time.
[0030] However, the applicants have now determined that advertisers may
also be
6 interested in specific users that would be highly relevant as followers
of a given user. These
7 new followers should be highly interested in the topics for which the
given user has expertise,
8 since the followers desire to follow the given user's messages.
Furthermore, the applicants have
9 determined that advertisers may be well served by not just understanding
who the experts are in
relation to a given topic, but what other topics those users are interested
in; for example, by
11 identifying all experts in 'cloud computing' with interests in
'photography' or experts in 'food and
12 dining' with interest in 'movies'. Such sets of experts can be targets
of novel engagement
13 campaigns that attract attention by combining their area of expertise
and their interests.
14 [0031] Various interactions between users and social networks such
as Twitter result in data
generation. Many social media networks record their interactions with their
users. The system is
16 configured to obtain interaction data via various sources and/or
connectors. The social networks
17 collect and record such data in logs stored on social network nodes or a
network-accessable
18 server. A social network may provide access to data collected on user
interactions. For
19 example, Twitter provides a Gardenhose streaming API which may be used
to access
messages and user profile information. Thus, Twitter activity may be stored in
files that may be
21 automatically created and maintained by a given server or set of
servers. In another aspect,
22 Twitter may be accessed directly via network connection, such as the
internet, and data may be
23 crawled, scraped and indexed. Crawling and scraping may be performed
using various
24 techniques by employing varying levels of automation. The obtained data
may be stored in a
database for ready use by the system.
26 [0032] Referring now to Fig. 1, an exemplary system for identifying
experts in a social
27 network is shown. The system comprises network 106 connected elements
including a server
28 100 linked to a database 101, and a social network 105. The network is
in most
29 implementations the internet. The server comprises an obtaining unit 109
for obtaining social
network data, a tokenizing unit 108 for tokenizing social media messages, a
processing unit 102
31 for processing the social media data to generate each user's topic
signature and an aggregate
32 signature for each topics, respectively, and an indexing unit 107 for
interaction with the
33 database to locally store the social network data. In further
embodiments, the system may
6

CA 02868948 2014-10-24
1 comprise a user interface unit 103. In yet an additional embodiment, a
graphing unit 104 may be
2 provided for generating or obtaining a social graph enabling a query for
users that a given user
3 is following. A representative embodiment contemplates the use of a
processor which, it will be
4 understood, could be implemented by a plurality of processors which could
be distributed.
[0033] Referring now to Fig. 3, an exemplary user interface is shown. The
user interface
6 enables search for various social network experts. As previously
described with reference to
7 Fig. 1, the user interface enables interaction by a marketer (or similar
person, organization or
8 other end-user) with the system. A marketer may select from a plurality
of commands in order to
9 view social network expert data.
[0034] As shown in Fig. 3, the system may present a user with a plurality
of options for
11 identifying expert accounts. A search box 301 accepts topic queries with
full boolean syntax.
12 Alternatively, several advanced queries may be performed. The marketer
may input a query to
13 identify a user that is an expert in one topic 302 and displays an
interest in another topic 303.
14 The marketer may also find users interested in a given topic 304, and
configure the search by
selecting a checkbox to identify users that woud provide the widest reach 305.
The option "max
16 reach" 305, if selected, instructs the system to identify experts of
topic A with interest in topic B,
17 to collectively maximize the follower reach. This effectively means that
the set of accounts
18 reported of cardinality K collectively reach the maximum number of
unique accounts (thus
19 maximizing user impressions) among all possible subsets of expert
accounts of size K, as will
be discussed in more detail below. The user interface is further operable to
accept the input of a
21 particular user account 306 and a topic 307 to find followers of that
user that have interest in a
22 given topic. Any of the foregoing searches could also be limited to
particular time intervals to
23 pinpoint experts at relevant times.
24 [0035] To accomplish the foregoing, as mentioned above, the server
comprises an
obtaining unit 109 for obtaining social network data, a tokenizing unit 108
for tokenizing social
26 media messages, a processing unit 102 for processing the social media
data to generate each
27 user's topic signature and an aggregate signature for each topics,
respectively, and an indexing
28 unit 107 for interaction with the database to locally store the social
network data. In the case of
29 the exemplary system that utilizes public Twitter data, obtaining unit
109 fetches 3 pieces of
information from public Twitter data feed: the actual textual tweet contents
from the public
31 Gardenhose streaming API, the Twitter follower graph (who follows who),
and the Twitter lists,
32 as more fully set forth below.
7

CA 02868948 2014-10-24
1 [0036] The tokenizing unit 108 tokenizes the Twitter lists to produce
topics and associates
2 the topics to the users of the lists. The processing unit 102 uses the
output association (of user
3 to topics) from tokenizing unit 108 to instruct indexing unit 107 to
stores it as a fast accessible
4 index "IXD" in the database. The user interface unit 103 can use index
"IXD" to process the
topic query "q" to return the list of experts "E" as associated to the topics
by tokenizing unit 108.
6 [0037] "IXD" supports this by storing (1) the inverted index from
topics to a collection of
7 users who belong to a Twitter list associated with the topic, and (2)
total number of topic lists a
8 given user belongs to. Using "IXD", one can compute, given a specific
topic and a specified
9 user, the number of times the user is listed in a Twitter list associated
with the specified topic
(represented as "frequency count" or "occurrence count"). The inverted index
is used to
11 compute the set of all experts "E" associated with "q" by finding set
intersection of index
12 associated with each topic "t" present in the query "q". The experts in
"E" can be ranked for
13 display using user interface unit 103 by using the frequency count as
described above.
14 [0038] To return a list of related topics to "q", user interface unit
103 will consult "IXD" to
lookup all other topics for all users from "E", call this set of all topics
"A". From "A", a list of top
16 ranked topics is presented to the user using some scoring function
(frequency count, or tf.idf)
17 [0039] Referring now to Fig. 4, upon issuing a topic query, a
plurality of expert accounts are
18 retrieved and displayed 401 along with corresponding profile
information. This can be
19 accomplished by use of topic signature generation, which is described
more fully below.
Analytics on these accounts are also provided, which as shown include topics
associated with
21 the query topic in the topic signatures of these users 402, keywords
used frequently in the
22 recent messages of these users 403, keyword pairs resulting from a
frequency analysis of the
23 keyword associations between keywords in messages 404, as well as
hashtags 405 which
24 represent frequent hashtags in the recent messages of the accounts
identified . For each such
analysis function a visual word cloud 406 may be provided to study the words
identified and
26 their associated frequency via suitable font sizing. Fig. 4 illustrates
how one can infer what
27 topics they are also experts in (402), what type of conversations they
participate in and share
28 online (403, 404 and 405).
29 [0040] Referring now to Fig. 5, all domains from which content is
frequently shared (in the
form of URL links) via the messages of these users are shown 501.
Additionally, a user may
31 select the option to 'Analyze the interests of their followers' 502 and
the system may
32 automatically conduct the same analysis but this time taking into
account collective followers of
33 these accounts. This provides a mechanism for identifying the interests
and analyzing the
8

CA 02868948 2014-10-24
1 'audience' of any expert group on Twitter. Preferably in this case the
same type of output are
2 returned to keep the same user experience. The means to provide the
response to each of the
3 foregoing types of queries will now be described.
4 [0041] Referring again to Fig. 1, the server 100 is in real time
communication with the social
network 105. The obtaining unit 109 obtains messages and corresponding
metadata from the
6 social network. Generally, message data and metadata can be obtained from
APIs provided by
7 social network, such as public APIs from Twitter, for example. For
example, the obtaining unit
8 109 may utilize a streaming API (not shown) provided by the social
network 105 to receive
9 messages and associated metadata. The metadata may include the following:
author name,
author userid, set of followers and friends of the author, and lists the
author has created. The
11 database 101 may be integrated with the server, located in proximity of
the server, or remotely
12 from the server and accessible by network connection. Upon obtaining the
message data and
13 metadata, the obtaining unit 109 stores the data and metadata on the
database. A suitable
14 storage approach utilizes a compressed row format with each message
being assigned a
message identifier. As the data is stored, the indexing unit 107 is configured
to create a table for
16 each day for storage in the database. Each row in the table is a unique
account identifier and a
17 list of all message identifiers the account produced that day.
18 [0042] Relaxed transactional semantics may be run to increase
throughput across multiple
19 threads reading and writing the table. The tables for a selected time
period may be stored on
solid state drives (SSDs) for increased performance. The collection of tables
keeping the
21 association between account identifier and message identifiers may be
stored in the database.
22 The indexing unit may retrieve for any day, the identifiers of all
messages produced that day for
23 any set of accounts. The indexing unit may then provide the collection
of all message identifiers
24 to the database to retrieve the actual messages.
[0043] The obtaining unit is configured to collect account relationships,
such as which users
26 follow others directly. Certain social networks are configured to permit
users to create lists
27 containing a descriptive name (supplied by the creator) and a set of
accounts associated to the
28 list (supplied by the creator). For example, a list on "machine
learning" may contain all accounts
29 that are experts or very related to the topic of machine learning. The
obtaining unit is configured
to store the above mentioned data in the database 101.
31 [0044] The obtaining unit is further configured to receive
information about which accounts
32 follow others along with a set of metadata appended by the social
network, associated with the
33 accounts, for storage in the database. In an embodiment, this data may
be represented by a
9

CA 02868948 2014-10-24
1 graph that may be stored in a MYSQL instance. It will be appreciated that
another relational
2 database may be used as an alternative to MYSQL. The indexing unit is
further configured to
3 index this data. In embodiments, an Apache Lucene index may be used. It
will be appreciated
4 that another text search engine library may be used as an alternative to
Apache Lucene. This
data provides an expertise vector, or a set of all lists a given account is
associated with. That
6 information is then directed to Lucene to populate the index of topics as
associated with the
7 account. The index supports full Lucene query syntax including phrase
queries and Boolean
8 logic. At the same time, the social graph provides related information
about user interests. For
9 example, if a user follows someone with expertise in cooking, one may
infer that the user has
interest in cooking. Given all accounts followed by a given account, the union
of both expertise
11 vectors may produce an interest vector.
12 [0045] The server is configured to un-shorten multiple URLs. Since
typically URLs in
13 messages are shortened (using popular URL shortening sites like bit.ly
or t.co) conducting
14 analysis on the shared domains to provide insight into the source of the
content is challenging
as each URL has to be un-shortened (possibly multiple times). Thus the server
efficiently un-
16 shortens multiple URLs. Utilizing asynchronous 10 this process may be
conducted for tens of
17 thousands of URLs in parallel on a single thread, typically in a short
time frame.
18 [0046] Given the receipt of some or a combination of the data
mentioned above, the
19 processing unit is able to generate useful information and analysis such
as a topic signature for
a given user, an aggregate signature for a group of users and techniques to
automatically
21 identify changes in the aggregate signatures over time for a given
query.
22 [0047] Referring now to Fig. 2, a flowchart showing the process
for generating a topic
23 signature is shown. At block 200, the obtaining unit obtains all or a
subset of all lists on the
24 social network. As lists are obtained, at block 202 the tokenizing unit
is configured to tokenize
the titles of the lists and, at block 204, remove stop words and other
frequently appearing
26 information as well as idioms that carry no information (e.g., if,
friends, etc.) from the list names.
27 At block 206, a dictionary of similar words is then used to group
together tokens that have the
28 same meaning. A suitable technique utilizes WordNet, a large lexical
database of English.
29 Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive
synonyms (synsets),
each expressing a distinct concept. It will be appreciated that alternative
lexical databases may
31 be used by the system for a similar purpose. At block 208, for each
account, a vector of all
32 tokens in the titles of all the lists of which the user is a member is
assembled. Each token is
33 accompanied by a number that expresses the occurrence count, namely, the
processing unit

CA 02868948 2014-10-24
1 calculates the number of times each token (grouping) was identified in a
list title that of which
2 the account is a member.
3 [0048] The vector may be referred to as the topic signature,
assuming a total ordering on all
4 topics (tokens) and assigning a value of zero to the occurrence count for
a topic if the account is
not associated with that topic at all. Assuming each token to be a unique
dimension in a multi-
6 dimensional space, the occurrence counts are normalized to produce the
unit topic signature
7 vector in Li space. Thus, this vector represents the weight of the
account being associated with
8 a topic. The Li space is used and hence, the length of the expertise
vector is normalized to 1
9 using the manhattan norm. The union of all these vectors will result in a
multi-dimensional space
with each unique token corresponding to a dimension.
11 [0049] In an exemplary scenario, consider a user @john that is
member of three Lists
12 {toronto-dentist, dentists, music-toronto}. The set of tokens with
occurrence counts for this user
13 is {dentist(2), toronto(2), music(1) After normalization, the unit topic
signature vector becomes
14 topics(john) = aentist +151nusic + ."toronto. The vector above is of
unit length in Li space, with
non-zero values across three dimensions and zero across all others.
16 [0050] Considering two more users in the same scenario: @henry
belonging to lists
17 {dentists, squash-london, music}, and @susan who is a member of lists
{squash, music-london,
18 squash-london}. After considering all the 3 users, a 5-dimensional topic
space is produced:
7 dentist music london squash toronto\
2 1 2
John ¨5 0 0 ¨5
5
1 1 1 1
henry ¨4 ¨4 ¨4 ¨4 0
1 2 2
Vusan 0 0
5 5 5
19 The above matrix is a compact form notation of individually writing the
vectors as
topics(john) = dentist +15ln'usic + Olondon + OS'quash +iforonto, and soon.
21 [0051] The process of computing topic signatures for each user is
linear in the number of
22 users and the length of their topic signatures. In this example, the
technique of extracting topic
23 signatures is applied on messages from 240 million users and 15 million
lists. Apache Lucene
24 may be used for implementing the tokenizing unit and indexing unit to
tokenize and index the
lists.
11

CA 02868948 2014-10-24
1 [0052] The generated topic signature provide for fast response to a
request for an expert list
2 in response to a topic query. The index allows query with full boolean
syntax, and is used to
3 quickly return all users having certain topic associations. For example,
in response to a query,
4 the query is tokenized and processed lexically in similar manner to lists
are processed as
described in reference to Fig. 2. Users may be ranked by the occurrence count
in their
6 respective pre-normalized topic signature for the token grouping(s)
corresponding to the querry,
7 and a particular number of the highly ranked users may be returned as
experts. It will be
8 appreciated that obtaining from the database a list of the users related
to the query can be
9 made relatively quickly due to the intelligent approach to indexing
taking at the time of storage.
[0053] To describe how the processing unit generates the aggregate
signature, the
11 following setting and notation will be used: Let the set of all users be
denoted by Um, which has
12 cardinality M. Let u c Um denote a unique user, with unit normalized
topic signature vector
13 topics (u). The vectors are derived from a multi-dimensional space S'
with N dimensions. The
14 matrix representing signature vectors for all users will then have M x N
entries. We denote this
matrix as MsE.
16 [0054] If si is a specific dimension in SN , then the signature
vector may be represented as
17
follows where wi represents length of the vector across dimension Si, topics =
w1.1 2' + w 2 +
18 === + WN.N. The graphing unit generates a social graph, where the social
graph spanning all
19 users denoting follower relationships is represented by (Um ,E), where
Um is the set of nodes
and E is the set of edges. Each user represents a node. Edges are follower
relationships, i.e., if
21 a user u follows another user v, then the directed edge from u to v will
be part of the set of
22 edges E. Formally, u, v E Um , follows(u,v) = true (u, v) c E.Let q
represent a keyword query
23 such as, "hurricane sandy" or "pepsi"; the query may permit more complex
queries that include
24 boolean operators such as, "elections AND ("barack obama" OR "mitt
romney")" as well. Let R
represent the set of message results after evaluating the query against the
content of all raw
26 messages obtained by the obtaining unit from the social network and
stored in the database. If
27 the search has a time restriction t denoting that only results within
the time interval t are of
28 interest, the set of results is R. Each entry r E Rt is a message such
that matches(q,r) = true
29 meaning that the query evaluates to true on message r and post
-time (r) E t, namely that the
message was posted within the time interval of interest t. Let At denote the
set of all unique
31
authors (users) in Rt, i.e., set of authors of all messages r e R. u E At
3r E Rt:author(r) =
32 u.
12

CA 02868948 2014-10-24
1 [0055] As MsE can potentially consists of a large number of
entries, it is desirable to produce
2 a concise summary of MsE as aggregate signature. This can be done by
first computing the
3 relevant rank of each user in the set Um using the social graph and
network ranking algorithms
4 such as PageRank. Then, conditional probability can be utilized to
aggregate the topics. This
process is mathematically the following : Pr(s) = Pr(s I uj )* Pr(u1) for
expertise s, and user u,
6 where Pr(s) denotes the probability of s, over the appropriate sample
space.
7 [0056] The aggregate signature may be used for two main purposes,
namely, a) to obtain a
8 concise view of the topics associated with the messages of interest
(denoted by the query q
9 above). This is done by aggregating the expertise vector of all authors
who have authored one
of the messages of interest, and b) to rank those topics based on their
potential for
11 dissemination within the social network. This is done by summing over
the users and their
12 topics/expertise using conditional probability. The utility of such an
aggregate signature may be
13 to gain insight on what topics a marketer may associate with q in order
to increase the reach
14 and dissemination of messages related to q in the Twitter network, via
sharing through re-
posting messages from those accounts sending messages about q.
16 [0057] The processing unit may alternatively generate the aggregate
signature of all
17 followers of a particular account (instead of author group for a query).
For example, a marketer
18 may be interested in understanding who is talking about the brand Pepsi
on Twitter and the
19 aggregate of topics associated with that group. This information may be
used to create better
advertising content for this group, e.g., if many users are associated with
travel, then a good
21 strategy could be to create marketing messages incorporating travel as a
theme. That way one
22 aims to capture the attention of the group in multiple ways and
increases engagement and
23 content sharing.
24 [0058] Thus, in order to maximize the spread of marketing content
to a relevant group it is
desirable to direct resources toward users who can spread the content (e.g.,
by re-tweeting or
26 resharing the message). Hence, when constructing the aggregate signature
not every member
27 of At is considered with an equal weight. The aggregate signature
aggrsig(q,t) (or
28 correspondingly aggrsig(At)) is computed by taking in account the
ability of a user to spread a
29 message.
[0059] Given a query q, time interval t, and its associated set of authors
At from result set Rt
31 the processing unit generates the aggregate signature aggrsig(q,t)
taking into account the
32 potential reach of each author in At. Having retrieved all messages Rt
with respect to the query
33 q of time interval t, the processing unit scans through all items in Rt
to resolve the set of unique
13

CA 02868948 2014-10-24
1 users from Rt as A. The processing unit is operable to generate the topic
signatures for each
2 user in At.
3 [0060] One strategy to produce aggregate signatures is to sum up
the topic signatures
4 retrieved and normalize them to unit length. However, this method fails
to capture the relative
importance of each user in disseminating a message to their followers with
respect to the query
6 q. Under this scheme all users are assumed equally important as far as
the dissemination
7 potential is concerned, which may not actually reflect reality. For
example, the set At may
8 contain several users with association in the topic of music but each
with very few followers, and
9 few users with association in the topic of travel but each with many
followers.
[0061] Thus, the processing unit generates an aggregate score which may be
referred to as
11 "AGGR" herein. AGGR represents the relative ranking of u E A. By looking
at the subgraph
12 induced by At on the original follower graph (Um ,E) only, a user u1 may
have a substantial
13 number of followers in (Um ,E) but have very few followers who also
belong to A. The number
14 of followers in At may be more important than the total number of
followers across the entire
social graph, as the aim is to find users who can disseminate the message to
potentially largest
16 group of relevant users.
17 [0062] To capture these intuitions the processing unit models this
scenario as a Hidden
18 Markov Model (HMM), with each user in 11 E At represented as a node in
the hidden layer, and
19 each topic in their aggregate signatures' represented as a node in the
output layer. For users
u, V E At, if user u follows V, a directed edge is added in the Markov chain
from u to v.
21 Transition from one node to another takes place with equal probability.
That is, if there are eu
22 edges out
of node u, one of the edges is selected for transition with equal probability
Since
23 the Markov chain may have disconnected components, with a small pre-
specified probability a a
24 random jump takes place, and with probability 1 ¨ a one of the outgoing
edges is selected.
[0063] Traversing the Markov chain, while at node it, having eu outgoing
edges, the
26 probability of transition is computed as follows. If eu is zero, the
next node after transition is
27 randomly picked from set A. Let 'At' be the cardinality of the set A. If
eu is non zero, then the
28 next node will be:
a
pickrandomv E At withprob
lAtl-en
29 next (u) = I 1¨a
pickrandomvlfollows(u, v) withprob ¨
eu
14

CA 02868948 2014-10-24
1 This completes the construction of the Markov chain and emission
probability for the topics is
2 assigned at each node. The symbols being emitted from the HMM are
dimensions of the topic
3 signature. For
example, if the topic signature of a user u is topic(u)= w + w As=
1. -1 - 2.-2 =
4 -1 friusic + -1 quash, then one of music or squash is emitted with equal
probability when at the
2 2
node in the HMM associated with this particular user u. Since the topic
signatures are of unit
6 length in LI. space, further normalizations are not needed to compute
symbol emission
7 probabilities. For a topic signature, topic(u) = w1.1 + w2. 2 + = = = +
wN. N. The symbol si will be
8 emitted with probability wi. Since, Wo 14/1 === WN = 1, the sum of all
probabilites will be 1.
9 [0064] Continuing from the example used for the creation of the
topic signature above and
assuming each of the three users follow each other, the HMM as displayed in
Fig. 6 is
11 constructed. The hidden layer is constructed with three nodes,
representing the three users.
12 Transition edges are added, each with probability 2-2, such that
Pri!=i(uilui) = 0.5. As a result,
13 the steady state distribution is observed to be Pr(john) = Pr(henry) =
Pr(susan) = 1/3 for the
14 hidden layer. Marginalizing out user probability from
Pr(topicsignature,user), the processing
unit can compute the aggregate signature for the entire graph. For example,
Pr(dentist) =
16
Pr(dentistljohn)Pr(john) + Pr(dentistlhenry)Pr(henry) = X X = -12-60, and,
17 Pr(toronto) = Pr(torontoljohn)Pr(john) = x = . Similarly, Pr(music) =
Pr(london) =
18 Pr(squash) = E60. As a final check, performed by the processing
unit,Pr(dentist) + Pr(music) +
19 Pr(london) + Pr(squash) + Pr(toronto) = 1. The resulting aggregate
signature, therefore
13 -= 13 ^ 2
isTO dentist + ¨13Ihusic + ¨ london +13 .C=-quash + ¨15toronto.
60 60 60
21 [0065] The HMM has now been defined by the processing unit with a
set of nodes,
22 transition probabilities, and emission probabilities for symbols. The
steady state probabilities for
23 this HMM will allow the processing unit to compute the aggregate
signature across the set of all
24 users A. At steady state, assuming that the probability that a symbol si
is seen is proh(si), then
the aggregate signature will be, aggrsig(At) = prob(s1) 1+ prob(s2) 2 + = = =
+ prob(sN). N.
26 which is of unit length in L.
27 [0066] At steady state, assuming the probability that a symbol si is
seen is prob(si), then
28 the aggregate signature will be, aggrsig(At) = prob(si)S'i + prob(s2) 2
+ = == + prob(sN). N.
29 which is of unit length in L. To compute the aggregate signature given
the steady state
distribution from the Twitter follower graph, the processing unit uses the
definition of conditional
31 probability. Observe that for topic s:

CA 02868948 2014-10-24
1 Pr(s) = Et, Pr(s,u) = Pr(slu)Pr(u) (1)
2 and since Pr(slu) (the topic probability of a user u) and Pr(u) (the
steady state probability of
3 user it) are independent and known from the Markov chain and
preprocessing, the processing
4 unit proceeds to solve the HMM first by the hidden user layers, then the
emission (topic) layer.
[0067] Marketers invest significant effort to change brand perception and
association. A
6 change in the audience of a brand could be organic over time, or it may
be influenced by an
7 event. For example, numerous marketing efforts attempt to reinvent or
reposition brands in new
8 target segments and change the way brands are perceived online or
offline. An effort to make a
9 brand more fashionable or trendy may be successful if the people taking
about the brand online
associate themselves with fashion and/or fashion trends. Thus, such changes,
if one is able to
11 identify them, may point to the success or failure of marketing efforts
online. Identifying such
12 changes in the conversation around events may further identify parties
relevant to a political or
13 academic subject and how insights evolve over time. In an exemplary
scenario, the query
14 "hurricane sandy" is considered. The processing unit conducts the search
for one day time
intervals for a 92 day long period from 1 Oct 2012 to 31 Dec 2012.
16 [0068] The processing unit may proceed to generate results as to
how aggrsig(q,
17 evolves over time for a long time range T consisting of D smaller time
intervals, T =
18 ftl,t2,...tD}. The processing unit generates the aggregate signature for
a given query q for
19 each of the time intervals as: ASM(q,T) =
taggrsig(q,t1),aggrsig(q,t2),...aggrsig(q,tD)}. The resulting matrix ASM(q,T)
has N rows
21 and D columns. The rows will each correspond to a topic dimension from
SN, and columns will
22 each correspond to a time interval from T. This matrix is referred to as
the aggregate signature
23 matrix (ASM(q,T)) over time T for the query q.
24 [0069] Given an aggregate signature matrix ASM(q,T) = faggrsig(q,t1),
...aggrsig(q,tD)},
a pre specified k < D, and a function score that measures similarity of
aggregate signatures,
26 namely, score(aggrsig(q,ti) aggrsig(q, ti)) E IV define a disjoint,
continuous k partitioning
27 of [1,2, ...,D] as Pk: = ([61,e1],[b2,e2],...,[bk,ekll with b1 -= 1, ek
= D, e1 E Z, ei bi and
28 vi <j, e1 = b1 ¨ 1 by solving for argminpk Ei
score(aggrsig(q,tbi),...,aggrsig(q,te,)).
29 [0070] In embodiments, the processing unit may iterate over a few
values of k and trace the
value of the overall function score for each value of k. Points at which large
discontinuities arise
31 are typically good candidates for k.
16

CA 02868948 2014-10-24
1 [0071] The processing unit selects k groups of continuous days
across the 92 day period for
2 a pre-specified k. Each of these k date ranges will represent a distinct
aspect of the event. For
3 example, if k was 3, resulting date ranges could be expected to represent
the pre-hurricane
4 period, the period during the hurricane specifically as it passed over
New York city, and the
period post-hurricane.
6 [0072] Using the notations defined above, given the aggregate
signature matrix ASM(q,T)
7 and specified k < D, the processing unit partitions T into k continuous
and disjoint intervals. The
8 aim is to group similar time periods together, and this is formalized by
defining a scoring
9 function capturing similarity that is minimized. Once the scoring
function has been chosen, the
problem reduces to that of identifying the optimal partitioning.
11 [0073] The processing unit generates two scoring functions. The
first function minimizes the
12 total error represented as the sum of root mean square distance between
the average
13 aggregate signature of a collection of signatures and the aggregate
signatures in the collection.
14 The first measure, given a collection of aggregate signatures
ASM = taggrsig(q,t1),aggrsig(q,t2),...aggrsig(q,tD)J, access the distance
using the root
16 mean square error : score = EiII aggrsig(q, ti) (E, ag grsig (al
112. The RMSE score
17 increases as the distance between aggregate signatures increases, i.e.,
when the topics across
18 [N., t2, , tD) are different, and decreases when the topics are the
same. Therefore, with this
19 score function intervals of time are singled out where the aggregate
signatures are very similar
to each other.
21 [0074] The second discretizes ASM(q,T) into an indicator matrix of
0 and 1, and measures
22 similarity as the hamming distance across neighbouring aggregate
signatures. A second error
23 measure is proposed that involves the discretization of aggregate
signatures. The value in each
24 dimension of aggrsig(q,t,), is between 0 and 1. The aggregate signature
can be discretized by
assigning each dimension the value of 0 or 1. There are many ways to
discretize the signature;
26 a statistically sound way is to assess the mean of all the values and
assign a value as 1 if it is
27 above some standard deviation of the mean and zero otherwise. Denote the
discretized
28 aggrsig(q,ti) as aggrs7g(q,t1) and similarly, the discretized ASM(q,T)
matrix as ASM(q,T).
29 With t1, t2, ...,tD ¨ 1 = T1 and t2, ...,tD = T2, rewritten in the
compact form: score =11
ASM(q,T2) ¨ ASM(q,Ti) 'IF using the Frobenius norm. II A 11,= Ei lAiiI2
where A is a
31 matrix.
17

CA 02868948 2014-10-24
1 [0075] In further embodiments, given a function score that computes
a distance between
2 aggregate signatures of ASM(q,T), the following recurrence may be
generated by the
3 processing unit to measure the similarity of aggregate signatures. Bi,k
is defined to be the best k
4 partition score of the first] columns of ASM(q,T) using the given score
function:
Bj,k = score(exp(q,ti)=== exp(q,tj)). (2)
6 [0076] The best K partition of ASM(q,T) for all K < D may be
computed using Equation 2.
7 Notice that it would take 0 (DD)score steps to solve the best K partition
of ASM(q,T) for all K in
8 a brute force way. Since there are D ¨ 1K 1 ways to produce k disjoint
continuous intervals for
9 [1,2, D] , the processing unit generates
0(Di) = 0(DD).
[0077] Looking at the recurrence Equation 2 the processing unit may pre
compute
11 score(ASM(q,Ti)=== ASM(q,Tj)) as it is independent from the recurrence.
12 argLI1ci Bi-1,k I SCOi will take 0(D) steps. When solving for the best
K partition
13 of ASM(q,T) for all K < D using dynamic programming, the runtime is
dramatically reduced to
14 0(D3). The space requirement can also be optimized by noting that in
Equation 2, B j,k depends
only on the values from the previous iteration. Therefore, after an iteration
is complete, the
16 processing unit may discard the optimal interval partitioning and the
optimal scoring from the
17 last iteration, bringing the space requirement excluding the precomputed
scores, down to 0(D).
18 [0078] Continuing with the example above, for each day, the
aggregate signature vector is
19 computed based on everyone who is talking about the hurricane on that
day i.e. who have
posted a message containing the words hurricane and sandy). As time
progresses, a natural
21 evolution in the matrix ASM(q,T) for this query is expected. Hurricane
Sandy first affected
22 Caribbean and Bermuda on Oct 22nd, and Twitter users actively
participating in the discussion
23 topically associated with these regions. As days progressed, more
American and subsequently
24 global audience started discussing the hurricane. As the hurricane
traveled from the southeast
of the US (Florida, Virginia, Carolinas), to the mid-Atlantic region
(Washington DC, Maryland,
26 New Jersey), and finally reached New York City, the group of users
talking about the hurricane
27 changed. In November, post-hurricane, the discussion shifted further to
rebuilding efforts and
28 those discussing were associated with politics. Intuitively it is
evident that this 92 day time
29 period can be partitioned into a discrete time periods which capture the
evolution of this story
namely tracing the geographical path of the storm (by observing the topics
associated with
31 those taking about it) and then capturing the political discussion
centered in re-building efforts.
18

CA 02868948 2014-10-24
1 [0079] In embodiments, the processing unit may be configured to
perform random sampling
2 of data to speed up this computation without sacrificing quality. This
effectively offers a good
3 tradeoff between accuracy and speed.
4 [0080] The system may be configured to process a subset of search
results to reduce
processing time for a query. Run time may be improved by using random sampling
on the set
6 A. Instead of constructing the HMM with all users in At, only a fraction,
such as, for
7 example,f 1.0 may be randomly selected. Referring now to Fig. 11,
processing time is shown
8 as the fraction f varies.
9 [0081] Referring now to Fig. 7, particular exemplary queries
indicate that the use of 30% or
more of the search results provides better than 90% (using cosine similarity)
accuracy as
11 compared to all results.
12 [0082] As the fraction f is reduced, the number of topics with non-
zero weights may also
13 decrease, as depicted in Fig. 8. Particular exemplary queries indicate
that the use of 30% of the
14 search results provide that the resulting AS has only half the number of
topics with non-zero
weights compared to AS', however the topics not present are only those with
small weight in
16 AS'. Thus random sampling may speed up the AGGR processing considerably
while producing
17 similar results to AS'. Reduction in the number of topics with non-zero
weights in AS further
18 helps in reducing the run time and memory usage. While the specific
results may not be
19 representative of all queries, they indicate that the use of subsets of
results can be made
without substantial loss of accuracy, in cases.
21 [0083] Referring now to Fig. 9, the running time of k as a function
of the number of days of
22 messages is considered. Particular exemplary queries indicate that given
messages for one
23 year, the run time, may be less than an hour. Further exemplary queries
show that overall
24 memory consumption scales linearly with the amount of data, as can be
seen in Fig. 10.
[0084] Other applications may become apparent.
26 [0085] Although the invention has been described with reference to
certain specific
27 embodiments, various modifications thereof will be apparent to those
skilled in the art without
28 departing from the spirit and scope of the invention as outlined in the
claims appended
29 hereto.The entire disclosures of all references recited above are
incorporated herein by
reference.
19

Representative Drawing
A single figure which represents the drawing illustrating the invention.
Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee  and Payment History  should be consulted.

Administrative Status

Title Date
Forecasted Issue Date Unavailable
(22) Filed 2014-10-24
(41) Open to Public Inspection 2016-04-24
Dead Application 2018-10-24

Abandonment History

Abandonment Date Reason Reinstatement Date
2017-10-24 FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type Anniversary Year Due Date Amount Paid Paid Date
Application Fee $400.00 2014-10-24
Maintenance Fee - Application - New Act 2 2016-10-24 $100.00 2016-09-27
Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO
Past Owners on Record
None
Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.
Documents

To view selected files, please enter reCAPTCHA code :



To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.


Document
Description 
Date
(yyyy-mm-dd) 
Number of pages   Size of Image (KB) 
Cover Page 2016-04-25 1 54
Representative Drawing 2016-03-31 1 26
Abstract 2014-10-24 1 7
Description 2014-10-24 19 1,078
Claims 2014-10-24 5 145
Drawings 2014-10-24 11 195
Assignment 2014-10-24 6 103
Maintenance Fee Payment 2016-09-27 2 61