1. News

How the NSA filters an overload of data

The National Security Agency's Special Source Operations branch manages "partnerships" in which U.S. and foreign telecommunications companies allow the NSA to use their facilities to intercept phone calls, emails and other data. This briefing describes problems with overcollection of data from e-mail address books and buddy lists, as well as NSA efforts to filter out what it does not need.

- - - - -

What is a "session"?

A session is another term for a data interchange between two computers, such as when you log into a service or mail is transfered. Each of these "sessions" crosses the NSA's collection points, filling storage repositories with redundant data.

"Selectors detasked"

Selectors are the NSA's term for what it is searching for - such as an email address or phone number. Detasking means the agency stops collection. One slide laments that the Yahoo Messenger problem forced it to stop collecting important information about Greece and Libya.

How many address books are collected?

This slide sets out the number of contact lists collected on a single day, Jan. 10, 2012, from the six top overseas access points, which are designated by alphanumeric codes. The "US" prefix denotes an NSA access point and "DS" refers to the NSA's Australian counterpart.


MARINA is an NSA database and analysis tool for internet metadata. MAINWAY is primarily for telephone metadata for contact chaining, and PINWALE for written content.


Address books make up an unexpectedly large share of information pulled in by the NSA. Many of them are less useful to the NSA because they are "unattributed," with the owners unknown.

Why collect "buddy lists"?

Buddy lists sometimes include the text of messages waiting to be delivered, which count as content. Webmail inboxes, which list new messages, often include a line or two of the text.

"500,000 buddy lists and inboxes collected on a representative day"

When the NSA searches for a specific target, such as an email address used by a terrorist, it usually finds only a listing in someone else's address book. More valuable finds - the target's own address book, a person communicating with the target or a message that mentions the target - are rarer.

A targeted account gets hacked

Four slides tell the story of a Yahoo email account, under NSA surveillance, that was hacked and subsquentaly used by spammers to send bulk mail. S2E is the Middle East and North Africa office of the NSA's Analysis and Production subdirectorate. The user of this email account had a number of Yahoo groups in his or her address book, some of them with thousands of members. Spammers used the account to send emails to all of them. The spam created so many false connections that the Yahoo account had to be "emergency detasked" to prevent the collection system from overflowing.

- - - - -

This is a glance at problems with the National Security Agency's overcollection of address books and buddy lists and its efforts to weed out useless content.


SCISSORS is an NSA system that helps parse electronic communications. There are five kinds of data, details unknown, that are collected at the four named access points.

Ownerless address books blocked by SCISSORS

For "ownerless," address books, which the NSA cannot attribute to a specific account holder, SCISSORS tries to block collection of content. (Graphic includes chart that shows how often that happened in mid-summer 2012.)

Ownerless address books blocked, by points of access

This chart displays the same data as the previous slide, separated by "signals intelligence address," or point of access.

Emergency detasks

Improved filtering between late 2011 and mid-2012 allowed the NSA to reduce the number of accounts for which it had to stop collection urgently.


SIGDEV is signals intelligence development, or analysis of data flows to discover new forms of useful information.

"Shifting collection philosophy at NSA"

Accustomed to siphoning in as much electronic data as possible, in case it proves useful later, the NSA (according to the authors of this presentation) needs to become more selective. One slide's bullet point: The "shifting collection philosophy at NSA is "Memorialize what you need" versus "Order one of everything off the menu and eat what you want."

- - - - -

This excerpt is from an article in the NSA's "Intellipedia," a classified system built with the same open-source software used by Wikipedia. Like the other documents, it describes the problem of high-volume, low-value data collection - and the NSA's response - with a focus on Internet contact lists.

How SCISSORS blocks collection

Most address books are targeted simply because they are address books, without specifying a foreign target. Using the SCISSORS tool, NSA will try to prevent useless content from being sent to the PINWALE repository. Even for "ownerless" address books, however, the NSA keeps the metadata that links each of the contacts.


The NSA is trying to filter out unwanted data at the point of collection, using a selection tool called XKEYSCORE, rather than send everything to central repositories for processing by SCISSORS.

"Little or no useful FI information"

SCISSORS was necessary because "unattributed address books" account for collection of large amout of data with "little or no 'foreign intelligence' information."