Authors: Ning Xia, Han Hee Song, Yong Liao, Marios Iliofotou, Antonio
Nucci, Zhi-Li Zhang and Aleksandar Kuzmanovic.
For a growing number of users, online social networking (OSN) sites such as
Facebook and Twitter have become an integral part of their online activities. This
paper calls attention to the privacy leakage in mobile network data. This paper
also calls attention to an important aspect of the privacy leakage problem:
namely, the potential danger to user privacy posed by a third party, not simply
by crawling data directly from OSN sites, but by gathering digital footprints left
by users in cyberspace. GPS and other location information in mobile cellular
data make it possible to tie users’ cyber activities to their presence in the
physical world. The confluence of smart phones and OSNs renders the ability to
glean personal information from mobile data a far more potent threat to user
privacy than attacks on each individual service. These pose a serious threat to
user privacy. This happens because of some shortcomings of certain OSN design, as
well as by the fundamental limitations of the current Web and Internet from a
user privacy perspective, such as cookie mechanism used by the stateless HTTP
protocol.
They refer to this problem as constructing a MOSAIC of a user from their online digital footprints,
and correspondingly refer to the gathered footprint pieces as TESSERAE.
As a solution they have develop the Tessellation methodology. Through
Tessellation, they show how user identity information such as OSN IDs and
device tracking cookies can be extracted from the traffic. Furthermore, they
describe how the remaining pieces of traffic with no identity leakages can be
attributed to the known user identities.
They claimed that Tessellation can
attribute 50% of traffic to the owners with only 5% error. Optionally, the
coverage can be increased to 80%, with just a 2% increase in the error rate. Using
this methodology, they were able to create mosaics for more than 16,000 users
and classify their personal information into 59 categories including user
demographics, locations, affiliations, social activities, interests, etc. And
as a solution they suggest possible countermeasures to safeguard against the
alarming leakage of private information.
====================== Q/A====================
Q. From where do they obtain OSN User Identifiers and Information?
A: Many OSN sites due to their weak designing “leak” their user identifiers
allows Tessellation to attribute traffic to real users. HTTP headers are used
to obtain URL, Cookies and payload information to get user login and session
key information.
Q. How to get the value of coverage? What are the types of coverage?
A: There are two types of Coverage: a) Session Level Coverage and b) User
Level Coverage. Session-level coverage is the number of sessions that are given
a prediction (i.e., sum of sessions in all Ts), divided by the total number of
sessions. User-level coverage is the number of ground truth users for whom
Tessellation identified all or a subset of their sessions divided by the total
number of ground truth users.
No comments:
Post a Comment