Doing The Math

Legal issues aside, is having the NSA do some kind of wide-net surveillance for the purposes of counterterrorist data mining a good idea? Arguably, yes, but it's very hard to know without knowing more about the program. But here's a thought. Suppose I have an algorithm that's supposed to listen to a certain amount of recorded conversation and evaluate whether or not the speaker is a terrorist. It's a pretty damn good program, but thanks to the inherent difficulty of the task, the challenges of voice-recognition and dealing with foreign languages, etc., it has a ten percent error rate. That's to say -- if the program says you're a terrorist, nine times out of ten you are, in fact, a terrorist. That's a pretty decent program. So what happens if we put it into use on a widespread basis? Well, it sort of depends:

Suppose we had a group of 1,000 people we were interested in monitoring and 900 of them are terrorists. The program will correctly itentify 810 terrorists as terrorists. 10 terrorists will evade its clutches. Out of the 100 non-terrorists, 90 will be correctly identified as innocent, and 10 will be wrongly labeled as terrorists. That seems pretty useful.

But say we have a group of 1,000 suspects and only 100 of them are terrorists. A ten percent shot that a given person is a terrorists doesn't reach the "probable cause" standard, but seeing as how thousands of lives could easily be on the line, maybe we want to relax the burden of proof and run the 1,000 through the program. Well, we'll catch 90 terrorists out of the 100, which is good. But out of the 900 non-terrorists, 90 innocent people are going to get labeled terrorists. In other words, out of the 180 people the program will say are terrorists, we can expect half to actually be innocent. Thus, even though the algorithm only has a very small 10 percent error rate, the overall surveillance program makes a lot of mistakes.

If we expand the program things get worse. Say we want to monitor a group of 10,000 people that includes 200 terrorists. We're going to catch 180 actual terrorists plus a whopping 980 innocent people. Thus, out of our total pool of 1,160 "terrorists" only 15.5 percent will genuinely be terrorists.

Depending on what we do with the output of the program, this could be very problematic. If, for example, the CIA picks up all the purported terrorists and subjects them to "coercive interrogations" we're going to be torturing a bunch of innocent people. Worse, coercive techniques are going to lead to a lot of innocent people "confessing" and probably "ratting out" various other innocent people. This is bad on its own terms, but it's also going to further pollute our basic data pool with all kinds of wrong information. Lather, rinse, repeat and all of a sudden you're looking at a witch hunt rather than a serious counterterrorism program.

And that's all assuming the program comes in with a very low 10 percent error rate. Even something as bad as a 25 percent error rate might look like an appealing tool. And it could be, in some ways, but the number of innocent people swept up by an algorithm like that could be absolutely enormous, especially if the program is being deployed in a very wide-net manner as sort of seems to be the case.

Of course, if you do something less drastic than torturing the people labeled by the program, this might not be so bad. Or it might be pretty bad after all. It sort of winds up depending on what, exactly, you do with the information. It also depends, as the math here shows, on how many terrorists there actually are and how wide a net the program casts. We don't know any of that, which makes it impossible to say what we're talking about. But this sort of thing is the reason any big surveillance program needs fairly robust oversight. Not only do members of congress need to be able to monitor it, but they're going to need to be able to consult relevant experts -- computer people, counterterrorism people, etc. -- who could say something meaningful about how well it works. And they're going to need to know what the government is doing with the data it mines, since even a very reliable data mining program can produce a lot of errors depending on how big the needle-haystack ratio is. My sense with regard to terrorism is that we're looking for a quite small number of needles in a pretty large haystack of "Muslim people with anti-American views" or whatever, which gives me a lot of concern.


Comments (57)

avatar

Good post. What are your thoughts about factoring in the human analysis of this data? When all this semi-raw data comes down to the hands of an individual there's a whole element of how his/her perspectives.  The Brits have introduced an interesting wrinkle by monioring all cars which says to visitors & residents alike, "Live here and you will be monitored, we know where you live ..." It's right there in your face. Lastly, the use of "wiretapping" seems Caponesque to me and I'm wondering when Bush will hide behind what he sees as a limited definition of what he's doing. He isn't, in my book, doing warrantless wiretaps. I want my Republican friends to understand the scope of this undertaking. Bush isn't just doing this based on his narrow definition, no he's blanketed us all, no one is immune. One piece, one element of the Bill of Rights at a time until our rights are scraps in a scrap yard.

avatar

Too much Christmas Turkey?  All you have done is shown that you can design a pretty crappy program that wouldn't be worth implementing.

While the issues of false positives and false negatives is a real one that the NSA or someone needs to take up, you really don't know what the actual error rates are of the actual NSA technology involved.  

You also appear to assume that if f() is great enough, their next step is torture.  Perhaps their next step is fbi().

Even your tree killing posts are better than this one.

Yo Saturnalia and hag sameach.

Look, Jerry, obviously you need to know the error rate and exactly what's done with the data in order to evaluate it. Indeed, if you read my post I think you'll see that I already say as much. That's my point. You need to know more.


But my other point is simply that even an error rate that looks low at first glance (10 percent, say) can actually turn out to be much more problematic than it initially seems depending on various other factors. Thus, a lot of the merits of the program winds up turning on the issue of roughly how many terrorists are "out there" overall. If terrorists are relatively common, then data mining is a promising way of trying to find them. But if terrorists are very rare, then for a data mining program to catch more terrorists than false positives, it's error rate is going to need to be implausibly low. My sense is that we're in the second situation -- 9/11 was perpetrated by just a few dozen people.


If that's the case, then probabalistic seives aren't a good way to catch terrorists, and it seems that that's what the NSA is doing. Maybe it isn't -- secret programs are hard to evaluate. And maybe I'm wrong and there are tons of terrorists out there but I think that's wrong.

avatar

The post is good but it assumes what needs to be demonstrated: that the Bush Administration is using this only to identify terrorists and not to monitor political opponents and protest groups.


The historic reputation of the NSA is not good on this front, and both Bush and Cheney have used some pretty extreme "you are with us or you are against us language" in regards to domestic opponents.


One reason for NOT going to the courts is that one or more judges might wonder what the odds are that a terrroist is named "Matt Ygliesias" "Josh Marshal" or "Markos Moulitis".


Keep in mind that the Bush Administration successfully kept this story out of the papers for a year, and as someone noted on another blog, may have successfully suppressedsome others. Because how would you know?


I didn't have blind faith in George Bush on September 12th, I suspected he would extract every bit of political gain and consolidation of power he could. And I had no reason to believe then, nor any reason to believe now that fundamentally he defines "freedom" in any other way than "freedom to project American power anywhere and anyhow I want, here in the USA or anywhere else, because I am the Commander in Chief".


In my mind no other logical explanation for ducking the Court and its 90 days of retroactivity is that some of the names would not have passed the smell test.


We had a White House Enemies List in this country in my lifetime. And there were full blown plans to use all the agencies of government: CIA, FBI and IRS to implement it after the 1972 election. A "third-rate burglary" got in the way. But unless you can show me some evidence that at heart Bush is more pro-civil liberties than Nixon, I am going to suspect the worst. Because what we already know is pretty shocking.

avatar

I just posted an historical take on this problem at eurotrib.

Surveillance vs Civil Liberties 

The main point is that there are fundamental information gathering problems with surveillance when the target is an unknown actor.

However, the same techniques work well when aimed at domestic activists who are addressing social problems. So historically once a secret police is set up it tends to go after a country's own citizens.

I argue that this may be part of the motivation from the start. 

 

avatar

...we're going to be torturing a bunch of innocent people...

 

No one is innocent. Let alone tortured, as soon as you're arrested you're already guilty as hell. Go read Darkness At Noon or something.

avatar Good summary.  And it obviously goes farther - a 99% solution sounds great until you do that same math for finding 50 people out of a million.

One detail - generally you don't have a single error rate - you have a false positive rate and a false negative rate.  Not only are these not generally identical, they're often inversely related.  Lowering your terroristness threshold will reduce your false negative rate (you'll catch more bad guys), but will also increase your false positive rate (you'll catch more guys who just happened to email out "Afghan music is DA bomb!").
avatar Excellent post.

One thing I have not seen discussed yet in the blogosphere was the Bush administration's initiatives aimed at terrorist funding. A couple of years ago there were arrests involving various Hamas support groups in America, and I wouldn't be surprised if this data-mining was not only aimed directly at "terrorists" but also at the organizations Bush had declared terrorist sponsors and enablers.

I forget what the money-transfer system used in much of the third world is called, but it involves simple phone calls between trustworthy agents instead of bank transfers. IOW, we don't know yet what all they were looking for in the data-mining and surveilance.
avatar Any program of this type has an "adjustable" alert level.  The adjustment sets the balance between correct detection and false alarm rates. 

Conceivably, the threshold could be set for extremely low false alarm rates.  This would inevitably lower the correct detection rate also.

A more interesting point:  it seems to me that it is quite likely that they don't know the false alarm rates.  Every alarm is flagged for followup, and the alarm threshold would be set to match the followup capacity.

It seems to me that the likelihood is that the vast majority of alarms would be false alarms.  We have the suggestion that there have been a few high profile successes, and that this justifies the program.
avatar

Whether surveillence works or not isn't the question. The question is whether we want the government watching everything we do so we can't cause any "trouble"? These kind of surveillence powers, once accepted, tend only to expand. Powers used to find so-called terrorists will be used to find other people doing things the government doesn't approve above and that (in some government officials' eyes) might be dangerous. We know the Pentagon has already expanded the definition of terrorist to include PETA, Quaker peace activists, Catholic Workers, etc. And that's only after a very short time doing this kind of spying. Give it a few decades and we'll be living in a police state, with big brother watching our every move.


Nip this in the bud now. It ain't America.


 

avatar

Matt--


Great to bring this up.  It's actually a classic problem in public health--specificity vs. sensitivity.  


And your conclusion is exactly right--testing (or, in counter-terrorism, spying) has a cost.  


That's why in public health (where making the right decision has far greater life-or-death consequences than counter-terrorism), few screening tests are used on the entire population.  Even a "low" rate of false positives can mean that the test causes greater harm than not having the test.


Mammograms, for example, are only performed on high-risk populations.  Because, the fact of the matter is, most women don't have breast cancer.  But if you gave the test to everyone and then followed-up aggressively based on the restuls, you'd have a lot more healthy 25-year old women walking around with one breast lopped off than you'd have 25-year old women with cancer who were saved by the test.


This isn't to excuse racial profiling as a means of "targeting."  It's to point out that there is a cost to testing--to me, the cost of pissing off patriotic muslims who we will need as translators and operatives in order to win the war on terror is far higher than the reward of catching a mentally-ill Muslim who wants to take down the Brooklyn Bridge with a blowtorch.  


What it says to me is that we have to be very honest with ourselves about what each type of targeting yields, and how it should be applied.  We should know sensitivity tests from specificity tests, and their power to label someone as a "bad guy" should be limited the less sensitive they are.  


And it makes civil rights MUCH more important, because that's the untimate specificity test.  If your case can survive a writ of habeas corpus suit, that's a really specific test, and we're far more confident that we're not persecuting the innocent.  


I mean, if we're willing to NOT test people when life and death is actually on the line, why aren't we willing to NOT imprison people when life and death isn't on the line?

avatar

I am no math wizard, but it seems to me that the wider the net that is cast, the greater the number of false positives and false negatives that you will net, because there is no perfect program.  Won't you end up with a lot of extraneous info that could distract you from catching the real terrorists?

avatar

Interesting post. What you've touched on is the curious fact that lawyers don't quantify important legal concepts.

What is an acceptable error rate for "reasonable suspicion", "probable cause" or "beyond a reasonable doubt?

Reasonable Suspicion is the threshold the cop must pass before he can stop you on the street to ask for your ID, or pull you over to check your blood alcohol level. I'd say this requires about a 25% certainty.

Probable Cause is the threshold for a search warrant, arrest warrant or indictment. This would mean at least 51% certainty

Beyond a reasonable doubt, of course is what a criminal must find you guilty beyond. I'd guess 95%
certainty (but I've never seen this percentage spelled out in a case before).

I think if the government can show that data mining is only minimally intrusive (the equivalent of a cop asking for your ID on a city street), then maybe they could get away with it on merely "reasonable suspicion". But the courts will look to see how how often someone who's searched turns out to be a "bad guy", the Feds better be able to show at least a 25% accuracy rate. If its lower than that, or the Feds just start data mining everyone, that can't possibly be constitutional.

avatar

I'm afraid it's even worse than you realize. Data mining is used to uncover patterns in large amounts of data.  It works well for things like determining that people who buy cookies also tend to buy milk, because there are lots of examples to find, especially if you look over the course of months and dozens of markets.

But data mining cannot work where there are very few positive examples of the pattern you're looking for.  I think we can all agree that actual terrorists make up a tiny fraction of the population.  So there might be, at most, a few dozen positive examples to find out of millions of data points (like, say emails). There is simply not enough of a pattern to find. The only way such a program could really work is if terrorists really communicate in a unique way that stands out.  But of course, genuine terrorists would try to make their communications seem innocuous.  And normal people often use language that is more threatening and dangerous literally than they actually mean. So, it seems nearly certain that terrorists are not so easy to pick out, racial profiling notwithstanding.

This means that a 10% error rate is a pipe dream.  The error rate is likely to be closer to 100% than it is to 10%.  Nearly every person identified as a potential terrorist will be innocent. While it may sound good in theory to spot terrorists by large scale monitoring of communications or other behavior, it only works in the movies and on TV.  Computationally, it's basically impossible.

Combine this data mining scheme with the Bush Administration's willingness to detain suspected terrorists without charge, without trial, without access to counsel, and without even notifying families (not to mention its willingness to use torture) and you have the very real stuff of Orwell and Kafka.

Matt 

avatar

This is an Administration about which the most charitable possible description, giving full benefit of every doubt and then some, is that their intelligence was inaccurate.


We're being asked to trust a President who's gives no hint of trustworthiness.


Some people might think they have nothing to hide, so what's the big deal. To them, I say it's only possible to feel that way because we're American.


Our sense of security comes from the very rights that George Bush took away. If this is allowed to stand, our kids and grandkids won't be nearly so carefree about "what they have to hide."

avatar

This analysis seems to be confusing two issues, though the general point you're raising is on target.

In all such algorithms, there is a distinction made between the rate of false positives and the rate of false negatives.

In general, the rates of false negatives and false positives can be vastly different, and usually adjustable with tradeoffs -- you can diminish the number of false positives by increasing the number of false negatives. That is, in the case in point, if the number of people falsely suspected of terrorism is too high (too many false positives), then you can "crank up" the criteria for being declared a terrorism suspect, but with the tradeoff that a larger number of genuine terrorists will not be flagged (increasing false negatives).

But the most basic point you're making, namely that any such program is almost certain to have an ENORMOUS number of false positives is absolutely true. Of course, the expectation here is that there would in the end be human intervention to determine if the suspected terrorists really look to be likely terrorists.

Yet even there, I'd be astonished if the human decisions did not include a vast number of non-terrorists, simply swamping the number of real terrorists. In the end, it's the numbers that dictate this is how it will turn out. There are hundreds of millions of people in the US alone, there are, I'd guess, probably well less than 100 potential genuine terrorists in our midst. Any algorithm or decision procedute that must winnow down by so many orders of magnitude must be extraordinarily precise and reliable to generate the smaller set. And, in such vague areas, algorithms and decision procedures NEVER are precise or reliable.

This is indeed the single most troubling aspect of this program -- that there is NO WAY to get at the real terrorists WITHOUT implicating a huge number of innocent people. That, again, is what the numbers themselves dictate. 

avatar

for the scenario where they're spying on all phone calls, not just overseas calls.

Let's say there are 250 million americans who make phone calls on a given day, and make an average of 4 calls per day. That's a billion calls.

Now let's say the algorithm has a .01% false positive rate. That's 25,000 calls falsely flagged as terrorist.  Even if it's .001% falls positive -- 99.999% accurate -- that's still 2500 innocent communications being reviewed without a warrant every day.

That's the problem with vast fishing expeditions. If you are looking for a small enough needle in a large enough haystack, the noise overwhelms the signal.

I'm pretty skeptical that this sort of fishing expedition is worthwhile, even if a law could be crafted that could satisfy civil liberties concerned.  Traditional law enforcement, where you start with real leads, has a much higher probability of success at catching the bad guys.


 


avatar

Just to follow up a bit on this post.

What I suspect is true is this program, however enhanced by human decision makers, is very likely to be actually counterproductive, even if one has no legal or ethical problem with it's falsely identifying vast numbers of innocent people as potential terrorists.

The deep operational problem is that those enormous numbers of innocent people take a huge toll on the limited resources of law enforcement simply to keep track of in any effective manner. Those resources might be used in other, more direct, ways to identify terrorists. My expectation is that the more direct approaches would be far more effective.

If it were taken seriously, the diversion of resources argument in and of itself should likely be sufficient to shoot down the program.

avatar

The same concern applies with the use of biometrics databases for law enforcement.

A digital photo works well to confirm that you are who you say you are. But searching a digital photo database works poorly to pick a suspected criminal out a football crowd, for the same reason.

Local governments aren't always smart buyers of technology (also see voting machines). They buy based on glowing promises from vendors. When there is secrecy in the procurement process, they are even less likely to get good analytical advice.

I'm more familiar with local/state issues than federal issues, but it is not hard to imagine that a lot of homeland security technology is oversold at the federal level. 

 


When the NSA story first broke, NBC's Nightly News illustrated the program with an example of something that actually happened. 
Apparently NSA picked up a cell phone call made from a car.  They had nothing on the caller but supposedly recognized the voice in the background as a high-value target.
A Predator drone was launched that tracked down the car and blew it up with a missle.  The story was done with animated simulation.  In admiration of the technology, my first reaction was cool -- which I think is what was intended by the piece and is probably most viewers final reaction.  Then I thought about it some more. 
Presenting the story in animated form gave it an unreal video-game quality especially since the occupants of the car except for the identity of the target weren't discussed at all.  The fact that there were human beings in the exploding car at all wasn't apparent.
But aside from the moral dimensions of the story, I wondered what degree of certainty the voice-recognition software has.  I wondered because to acquantainces my sister and I sound alike on the phone.  Now I don't expect my sister to be engaging in any nefarious schemes that might result in a missle attack, but suppose she did.  Could I be mistaken for her in a phone intercept and blown up by mistake? 
Isn't bin Laden just one of fifty something children?  How many brothers does he have?  Do any of them sound like him?  If so, they should be plenty careful about making cell phone calls.

---Emma
avatar

The example of using facial recognition for identifying terrorists in a crowd is very apt here.

As best as I can make out, virtually no one uses this technology anymore, not because it necessarily raises any civil liberties issues, but rather because it was operationally a total bust, producing no positive results, while being a huge sink for law enforcement resources, which were obliged to follow up all the false leads.

I can't imagine any reason in the world to think that monitoring hundreds of millions (billions?) of phone calls a day -- as opposed to only tens of thousands of people in a stadium -- could possibly prove any better. Speech recognition generally sucks, to begin with, at least as bad as facial recognition. Just this one really weak link in the chain would probably do the whole program in by itself, not to mention all the stupidities introduced down the line by any natural language processing.

avatar

Let's say there are 250 million americans who make phone calls on a given day, and make an average of 4 calls per day. That's a billion calls.

But I think that the idea that this would involve the monitoring of every phone call in the U.S. is pretty dubious.  Such a program would, as Matt suggests, generate so many false positives as to be useless.  I would expect, rather, that the monitoring would be restricted to those connected, in some way, to known terrorists or terrorist associates -- say, anybody who routinely talks to a person who routinely talks to somebody else know to be connected to Al Queda.  And I'd expect marriage, family, commercial, and background relationships to be used to restrict the data mining operations (e.g. we know of several Al Queda recruits who have come from a particular Saudi villiage, so those in the U.S. with connections to this place might be scrutininzed).  One could imagine a set of circumstantial 'risk factors' (place of birth, relatives, memebership in a Wahabbi-financed mosque in the U.S.) that would not justify a warrant but that would reduce the false positive rate substantially. 

So such a program might actually be effective--but that does not mean it should be allowed (that is a different question). 

avatar

I had originally thought too that the program was somehow restricted in its application.

Apparently, though, that isn't true -- it's much wider than that, including, as best I can make out, EVERY conversation, potentially.

Times link:

http://www.nytimes.com/2005/12/24/politics/24spy.html

 

avatar

The Texas legislature authorized such a system in 2005, after turning it down in 2003.  These are hard to fight -- the vendors have connections and can take the legislators out for dinner.

The one bright spot is that an amendment was added to the bill requiring a report to follow up on the results of the program. So the expected false positive rate will be studied and published.

avatar I'll bet if we put a surveillance camera on every block we'd catch more criminals, and perhaps a few terrorists too. And I'll bet if we assigned a government monitor to every citizen and resident alien in America we'd have less crime, and terrorism probably too. I'll bet the suburbs would even accept the former at least, just as their ancestors tried to stay out of that silly revolution their neighbors we're holding.

I want to launch into some kind of diatribe about America's newfound imperial tendencies, and how there will continue to be barbarians at the gates until this country rediscovers its republican roots, but really these tendencies are as old as the republic. Eliot and Pound dated the death of the republic to the 1820s or 1830s, Henry Adams to the 1860s or 1870s, others to the 1890s and 1900s, and still others to the 1940s, but the revolution never really resolved the conflict between Federalism and Antifederalism, and even as the Antifederalists manage to prevent imperial drift for a few decades eventually the Federalists (sometimes called Democrats, and sometimes including lately called Republicans) get back in power, and it becomes awful hard to fully undo what they have done (the bad or good parts). No one should forget that the legacy of the Civil War is not just an end to slavery but the beginning of the national police state. No one should forget that the legacy of the New Deal and second world war is not just social security and a victory over fascism but a lasting expansion of those police state powers. You bribe and cajole the people with some greater good, and take away their freedoms, and their privacy, in the process.

I don't claim to be a legal or constitutional scholar. I have little idea whether this president will be so much as scolded by Congress, let alone impeached, for his actions, or granted broader statutory powers to listen to our phone sex. But the precedents strike me as grim. Eventually the Republicans or the Democrats who replace them will get around to bribing us with health care or free handjobs in the restroom of your local megachurch (just a thought; they appear to be running out of things to bribe us with) so they can have their continued war on terror and Very Important Place in History, taking away a little more of our freedom in the process. In the midst of imperial drift, the police state replaces culture as the primary arbiter of good behavior.

Perhaps eventually there will secession movements over these things. As a matter of fact there already are. But someone should tell the Second Vermont republicans that should they utlimately attain their goal of independence the NSA will almost certainly be watching them still.
avatar

Frankly0, in your two posts, you make the point that I was trying to make in my simple and short comment above, but you treat the subject much more effectively and at greater length than I did.  To me, just simple common sense says that using resources in this manner will be counterproductive, if your goal is simply to stop terrorists.  Of course, the Bush administration's goals may not be quite this simple.

avatar A smarter way to use this kind of program is to recognize the uncertainty in the measurement and develop a score.  Say it is the probability that this person is worth questioning.  People trying to stop terrorist attacks have a finite amount of resources.  You want to use data mining to focus the resources most efficiently.   

  Some of the recent leaks have already talked about, how the "call" patterns are the most valuable data.  Many messages are encrypted and it takes more effort to eavesdrop.  So a first pass analysis might take a list of email addresses, phone numbers, IP addresses and web sites that are known to be Al-Quaeda.  Call them level 0.   Who do they contact and who contacts them?  Call them level 1.  Then take those addresses and see who they contact.  As the network is constructed and correlated with data from investigators, they are able to see which branches from level 0 are the live ones.  Mark the bad leads as no good and create the scoring system.

  If you think what has happened over the last few years, there is a lot of data to work with.  Computers seized when the training camps were overrun in Afghanistan.  Zarqawi’s laptop  seized in Iraq.  Lots of interrogations all over the world, like the cell in Lackawana, New York, to Gitmo, etc.
 
Do the Dems want to run against the government using this kind of information efficiently to stop attacks in the US?  It does not seem like a strategy that will take the back the House of Reps, Senate or White House

 Bob,
  The money changing system is called "hawalas". It is covered in Section 405 of proposed new Patriot act from the conference committee.

avatar

<i>While the issues of false positives and false negatives is a real one that the NSA or someone needs to take up, you really don't know what the actual error rates are of the actual NSA technology involved.  </i>

Really, this just misses the point.

The point is that ANY feasible technology is bound to produce "crappy" results, simply because of the numbers involved.

The error rates could not POSSIBLY be so low as not to create the sorts of problems people have been talking about here. Anyone with a small acquaintance with the kinds of technology involved, such as speech recognition, or natural language processing, or "pattern matching" understands that there's no way they can be made reliable on the scale involved without producing overwhelming numbers of false positives. You need an algorithm that might produce, say, a false positive error rate of less than 0.01%. I have never heard of such an algorithm in either of those fields, or anything remotely approaching such an algorithm.

My guess is that the place where this will break down most dramatically is in the natural language processing and/or pattern matching area. (As I think about it, unreliability in the speech processing part might only, say, double or triple the number of false positives. That's where the real work of winnowing by many orders of magnitude would have to take place, yet where real unreliability in prediction would certainly still exist.)

Excellent Piece, Mathew, on the on the overall design problems with this type of tool in the GWOT. 
 Nelson, raises some good points about ways data can be weighed. One element that needs to be looked at for improved effectiveness of identification of targets, would be the information from captured computers, and the historical Outer-net records the make up the suspects global Outer-net profile.    One problem is defining what priority a target is given. How much resourses should be assigned to a paticular flagged target.  Once a suuspect is output from the dataming of the data in motion, then the data at rest needs to examined.
  When examining the data at rest, the feilds of the records examined grows exponentually. Is this person a private, sargent of officer in the army of terrorist's? Is he a recruiter? Is he undercover working for some ally in the GWOT. Is this person a potentual target for undercover work. 
  Computers can assign scoers for this also.
  How much information do we have on the target? Do we have class list from college courses? Do we have the list of students in those classes? Do we know the tradecraft of the subject? Do we have their grocery card buying patterns? Is there a change to a lifstyle, diet?  Scores can be assigned for this.
  My point here, is that, the data at rest in Outer-net fields and records can be exploited to a near nano level. The problem is when these tools are used outside the GWOT.  Who is to monitor whom has access to these in motion and at rest Outer-net data? 
  How invasive should these fields and records be? How quickly do they need to be assembled? How many humans are actually getting this output? Do we need more resourses in this area, after appropiate guidlines, and oversite are in place? Do we need to increase penalties for abuse of Privacy act violations, once a 21st century Privacy act is passed? 
Last, do we want these tools to be used under contrat by corporations? Or do we want this done by Government employees, who have passed security back ground checks? Due to the value of the data, I think the answear is obvious. It is no different than the issue of the security sceeners at he airport being federal emplyees, or low bid contractors.
 While the GWOT may have required some emergancy corporate access to this data in motion and at rest, it is time to build a method to compartimentilize this process inside Government as much as possible, to prevent  abuse, that could lead to Hedge fund manipulation, just to name one possible disaster that could happen.

avatar

Powers used to find so-called terrorists will be used to find other people doing things the government doesn't approve above and that (in some government officials' eyes) might be dangerous.

It is easy to imagine one of those phone taps finding someone who might have been discussing a terrorist act, but actually discussing a small plot of marijuana. So, does anyone think that person would not be prosecuted as a result of that discovery? Of course further investigating would be done and the phone tap would never see the light of day.

Or, perhaps the discussion included some joking about hiding under the table income from the IRS. Or, about "borrowing" some materials from one's job. In all of those cases the Constitution would be grossly violated, but those people would be prosecuted.

We must use a "zero defects" policy when it comes to violations of the Bill of Rights.

avatar

There are hundreds of millions of people in the US alone, there are, I'd guess, probably well less than 100 potential genuine terrorists in our midst.

I don't know if elephants are truly afraid of mice, but that is the analogy that comes to mind. Isn't it totally insane to go along this path for such a pitifully small number of possible terrorist threats? Then to ice the cake, the next terrorist attack here is far more likely to come from a nutcase not even associated with Islam - an abortion protestor for example, or a Jim Jones type "religious" person.

We need to wake up and recognize that there is no war on terror. There is no state of war. There is no significant threat to our country from foreign terrorists. What there is, instead, is a very serious threat to our constitutional rights from our elected president.

avatar

Do the Dems want to run against the government using this kind of information efficiently to stop attacks in the US?

Winning elections is important, but not the most important thing in the world. If it were, there are plenty of illegal means available to increase the odds on winning.

What we are discussing here is not a political problem. It is a problem of a rogue president, using illegal means not just to get elected, but to dominate the citizens of the country. The smart and, more importantly, the legal way to use this system is to junk it totally. It is illegal for the government to spy on American citizens. I can't see why that is so hard to understand. If we were at war it might be interesting to discuss this again, but the answer would still be that it is illegal. And, no we are not at war - if Bush says we are, of course we aren't.

avatar

And maybe I'm wrong and there are tons of terrorists out there but I think that's wrong.

Some of us are old enough to remember very well when we had tons of communists out there, even in the federal government. We almost lost our freedoms because of that - not because of what the tons of communists did, but because of what we did thinking we were fighting them. The situation today is much worse in that the actual number of people here who have the mental and moral condition that would allow them to commit a terrorist act is vanishingly small compared to the actual number of communists we used to have and probably still do.

avatar

Re: Quaker Peace Activists?------------
How in the heck can this community pose a threat? That's just ridiculous.


 

avatar

Reading the NY Times article, it talks about 'vast' amounts of data -- but if it suggested monitoring *every* call, I missed it.

Since all mechanical test systems, whether medical, police, judicial, or intel, generate error, follow-up tests are applied.

Medically, a positive leads to other tests. Police evidence is tested in court. Judicial tests (trials) are tested on appeal. The best argument against the death penalty is the combination of high error rate and no way back after execution.

If it's deep-black intel, you're screwed. Without a FISA warrant there are no grounds and no procedures to catch error. With a FISA warrant there would be some hope.

Apropos that, I find this item very intriguing. The AG office is asking the FISA court in March 2002 if the Act can now "be used primarily for a law enforcement purpose, so long as a significant foreign intelligence purpose remains."  Odd, considering that law enforcement would be difficult without a FISA warrant (as has been the reported practice since then, for some cases.) Also, there is little evidence of administration interest in prosecution and no successful prosecutions to date. The court rejected the request, on statutory grounds.

avatar

How in the heck can this community pose a threat? That's just ridiculous.

If they are not reined in they pose a significant threat to Bush's peace of mind - no...that's not quite right. They pose a significant threat to the GOP dreams of eternal power.

avatar

If that's the case, then probabilistic sieves aren't a good way to catch terrorists, and it seems that that's what the NSA is doing. Maybe it isn't -- secret programs are hard to evaluate. And maybe I'm wrong and there are tons of terrorists out there but I think that's wrong
Tons is really not a good metric for quantifying terrorist activity, since we know that 20 men with some support network can kill thousands in a matter of a few hours.  Or maybe it is, 20 men would be about 2 tons of terrorists, I guess, if they each weighed 200 lbs. heh.
Anyway, it is interesting to note two things:
1.       Since 9-11 there have been mass casualty attackes in Bali, Madrid, London, Russia and all over the middle east but none in the US. This despite tons of threat by Bin Laden, Zawahiri and others.

  1. Most the civil rights and rights of privacy issues we are debating are rights in most of Europe.  Generally, we are much more civil libertarian than Europe.
Probabilistic sieves seems like a very rational way to conduct counterterrorism operations, so I support them.  The fact that there is a risk they will be abused is not, in and of itself, a reason to abolish them, anymore than the fact that there are welfare cheats is a reason to end welfare.  My biggest problem with the Bush Admin on the prevention side has been the wasted money trying to keep tweezers, screwdrivers etc. off the airplanes.
I share the concern about growth of government power and prosecutorial abuse, and want to see it limited.  I think ending the war on drugs is a good place to start.  But I don’t see anyone in either party pushing for that, because a big majority of the American people don’t agree with me.
But I believe the threat of jihadi attack here is very real.  If one happens I expect most of the posters on this board will blame the Bush Admin and that’s OK with me,

avatar JaneBoatlerNo, the false positives and false negatives tend to go in opposite directions. Bust everybody and you'll have no false negatives -- you'll get all the terrorists! Bust nobody and you'll have no false positives -- no innocents arrested!
avatar

No, the false positives and false negatives tend to go in opposite directions.

Not quite. That is only true for a fixed population of people being snooped on. If you cast a really big net, vs. a targeted net, you get both more false positives and false negatives.

Aren't we forgetting that the problem behind 9/11 wasn't the lack of data, but the lack of an ability to analyze and coordinate with that data? There were lots of data points that could have been put together to predict the 9/11 attacks, but no mechanism for doing that. With the NSA generating orders of magnitude more data, the problem gets worse, not better. But, the ability to target pesky political opposition does get better. That kinda makes one wonder, doesn't it?

avatar

I think that this is a camel's-nose-in-the tent situation. On surveillance, torture, denial of habeus corpus, and other such issues, the Bush-Yoo administration simply wants all possible powers. The pretexts and arguments they give in public argument are irrelevant, and only ways of getting the powers through Congress. The particular cases these powers are applied to at first are pretty irrelevant too. Once they have the powers they'll figure out how to use them.


The camel's-nose argument is a kind of slippery-slope argument, but it does describe the particular kind of case in which the slippery-slope argument is most valid -- when you don't trust the actor on the other side. Bush easily meets that criterion for me, and even for some moderates and conservatives by now.


Principled opposition would use a generalized slippery-slope argument: "Even if we trust Bush, some other President we don't trust might come along later". But we don't need that here; the hypothetical worst case is actual.

avatar Jesus H Christ Matt...what a logician's nightmare

A fuggit about it..I ain't botherin to try to pick out the pearls out of that garbage..

When you put the law aside and went into "arguably" mode, I couldn't make it past the second paragraph for all the side splitting laughter

So let's return to some clear-head thinking shall we...

Putting the law first and last...

Not Authorized By Law: Domestic Spying and Congressional Consent


UofPitt Law School's JURIST Guest Columnist Jordan Paust of the University of Houston Law Center says that contrary to assertions by President Bush and the US Department of Justice, post-9/11 Congressional legislation on the use of military force against terrorists does not authorize domestic spying...





George W. Bush and US Attorney General Alberto Gonzales claim that domestic spying in manifest violation of the Foreign Intelligence Surveillance Act (FISA) was authorized by Congress in broad language in the 2001 Authorization for Use of Military Force (AUMF) regarding persons responsible for the 9/11 attacks. Similar claims have been made in a December 22 letter from Assistant Attorney General William Moschella to the leaders of the House and Senate Intelligence Committees. The claims are patently false.

avatar There is nothing wrong with eavesdropping if there is a court warrant which is easy to do.

Eavesdropping without a court warrant is illiegal and criminal.
avatar MW's comments above point the way to lowering the false alarms, but I'd assert that this comes at the cost of lowering the correct detection rate.

Once the pool is restricted in various (obvious) ways, the terrorists will take elementary precautions to avoid being swept up.  Some of the restrictions may be clear violation of civil rights - "profiling" based on religious affiliation, etc.

I'd say that in the end the program has to be expansive to catch terrorists that are trying to be covert.  Then we're back to relying on a wide net, which will generate comparatively more false alarms.
avatar

 === But my other point is simply that even an error rate that looks low at first glance (10 percent, say) ===

Take it from someone who works in the field:  if you had a data mining algorithm with a 10% error rate, you would be sitting in your Silicon Valley mansion counting your billions right now, and deciding whether to buy the NYT or WaPo rather than scratching out a living as a freelance writer.  50% error rate is probably closer to the industry average, although I have seen many such systems with >50% errors (in other words, the client would be better off not using the mined data at all).

sPh 

avatar

Groups may be monitored because some of their members may have used verbal and physical abuse. This is an even likelier consequence in cliques because some of its members can become compulsive or obsessional about other people. As I said this situation is more a function of culture than agency. Perhaps a crisis like 9-11 will bring these behaviours out starkly. For some reason, they get away with them.

Given that we know the FBI and the military don't have enough Arabic language experts to translate the material found in close proximity to real terrorists (or in the case of Iraq, real insurgents) why does anyone think that information gleaned from a sieve applied to all the telecommunications in the U.S. is going to be looked at?  I think this is a classic case of the drunk looking for his lost keys under the streetlamp, because that's where the light is.  The NSA knows how to apply a computerized filter to the backbone switches in the telecom infrastructure, and the Bush apparatchiks love a good technology play and think they understand it.  So let's look there, rather than do the hard work to deploy more translators and human intelligence.

avatar

Reasonable Suspicion is the threshold the cop must pass before he can stop you on the street to ask for your ID, or pull you over to check your blood alcohol level. I'd say this requires about a 25% certainty. Probable Cause is the threshold for a search warrant, arrest warrant or indictment. This would mean at least 51% certainty Beyond a reasonable doubt, of course is what a criminal must find you guilty beyond. I'd guess 95% certainty (but I've never seen this percentage spelled out in a case before).

 Alas, surveys have consistently shown that most people are both very bad at estimating and applying probabilities, and are not very generous with respect to these legal concepts.  Most people ascribe to "Reasonable Suspicion" anything that is possible and plausible, or in other words, anything greater than 0% chance.  Sometimes tempered by a requirement of first-hand knowledge.  "Probable cause" is equated in most peoples mind with dismally low chances, generally on the order of 25%.  Meanwhile, the real shocker is that  people are all over the map as to what "Beyond a reasonable doubt" means.  Some peg it at 95%, fewer still at 99%.  Some at 50% or even below.  The consensus is in the 75% ballpark.

Given this discussion about error rates for tests, consider the number of criminal trials resulting in convictions at error rates of 25% false positives*.

 *  hopefully other mechanisms conbine to ensure lower error rates.

avatar One person wrote: Eavesdropping without a court warrant is illiegal and criminal.

This is true only for domestic surveillance.


The POTUS is a head of the executive branch and a co-equal part of our government on equal terms with the Congress, the legislative branch and the courts the judicial branch. He does not serve at the pleasure of Congress or the Courts -- he serves at the pleasure of the People.  The POTUS derives his power as Commander-in-Chief from Article 2 of the Constitution and no act of Congress or the Courts can diminish that power  -- only "WE THE PEOPLE", via a Congressional Amendment, can do that.  We have NOT chosen to do so.  As Commander-in-Chief the POTUS stands unchallenged under the Constitution, in his power to defend the nation from foreign threat,The Fourth Amendment to the Constitution (and from that FISA court) applies only to domestic surveillance -- i.e., where in both parties being monitored located are inside the USA.  The SCOTUS acknowledged in the Keith ruling, 1972, that while the POTUS does have limits on his authority to conduct warrent-less doemsetic surveillance it expressly noted that it did not question the POTUS's authority to conduct foreign surveillance without warrent.Clinton and Carter claimed the same privilege. Thus the NSA program is legal and necessary.  Bush will win this argument both legally and politically.
avatar This post of Matthew's makes no sense at all.  I mean, yeah, false positives and false negatives.  Right.  So how does this distinguish the NSA program from every single other method of detection ever created in the history of mankind?

I mean, let's just do away with every bit of electronics ever created, and we can detect terrorists using only human beings and their eyes, ears, and brains.  OK?  Now, every single point Matthew made in his entire post is exactly the same!  We still get false positives.  We still get false negatives.  It still matters what we do with our deteminations (whether we pick up people and torture them right away, e.g.).

So, what the hell is Matthew's point???
avatar

As Fate would have it, I've a relative involved in CIA-financed (sort of) voice recognition/data mining software.

In a world where people are trying to make their problems understood with all of our odd languages, dialects, accents, idioms, ignorance, unintentional errors, stammers, hems-and-haws, etc., 10% failure would be dream world accuracy. In a world where people suspect that they're being listened to, evasion would be simplicity itself.

I suspect that a 10% success rate would be wildly optimistic.

avatar

He does not serve at the pleasure of Congress or the Courts -- he serves at the pleasure of the People.

 

 

Not so. A President also serves, theoretically, at the pleasure of the group with the power of removal: the Congress. A Congress, protective and jealous of its power, would have already removed Bush. 

That the Republican Congress hasn't already removed Bush is an indication that an American Caesar is not unthinkable. 

avatar


You did the math on the wrong algorithm.  As I understand it, the algorithm at the NSA seeks to make conclusions about events not individuals.  The statistical methods are more akin to those employed for example in modern hard drives and other data storage systems.  The reliability of a single bit of data is highly suspect, but by layering statistical rules that combine information from neighboring (orthogonal) bits, one can arrive at more reliable conclusions.

If the goal is to determine a target, say the Brooklym Bridge, the scheme seems much more plausible.  

avatar

When evaluating the effectiveness of any screening program, you also need to consider one other factor that I did not see mentioned: the extent of the consequences when a false positive or false negative is the result.
In testing for breast cancer, for example, getting a false negative (failing to detect someone with the disease) is serious, but it only effects that one individual. In that case, you probably do not want to have a large number of false negatives just to catch that one missed positive. You will discard the test.
In terrorism, however, the consequences are a little different. To use your example of screening 10,000 people, of whom 200 are terrorists and the accruacy of the test is 90%, you correctly identify 180 of the terrorists, while 20 go undetected. This result is weighed against 980 innocent people being falsely labelled as suspects.
Is that acceptable? If you are screening for a disease, or DUI, probably not. If you are trying to identify someone who is planning and capable of killing perhaps 5,000 people or more, you might evaluate the test differently. You must weigh the number of people ‘saved’ by catching the 180 terrorists, which might be as many as 100,000,  against the unfortunate result of 980 innocent people being falsely accused. Should you disregard catching those 180 because of the 980?
I cannot answer that question for everyone, but at some point the number of potential lives saved outweighs the ‘rights’ of the falsely accused, and you then proceed with the screening. It is a variation on the current torture debate, where the question becomes how big a potential catastrophe must be to justify torturing a suspect in order to prevent it? Under no circumstances?

Individual rights are critical in our democracy, but I consider the right to live as the most important. Depending on the extent of the threat to each, compromising my right to live by upholding my right to privacy is a bad trade-off.

avatar

手机铃声 铃声下载 免费铃声 免费铃声下载 免费手机铃声下载 和弦铃声 三星铃声 三星手机铃声下载 MP3铃声 手机铃声下载 手机自编铃声 MP3手机铃声 诺基亚铃声下载 NOKIA铃声下载 小灵通铃声下载 真人铃声 MP3铃声下载 自编铃声 联通铃声下载 移动手机铃声下载 联通手机铃声免费下载 TCL铃声 飞利浦铃声下载 特效铃声 搞笑铃声 MIDI铃声 铃声图片 MMF铃声下载 免费手机图片下载 免费手机点歌 手机短信 手机彩信 手机彩铃 康佳手机铃声下载 TCL手机铃声下载 迪比特手机铃声下载 手机和旋铃声 三星手机铃声 三星手机和弦铃声下载 波导手机铃声下载 熊猫手机铃声下载 免费手机铃声 科健手机铃声下载 海尔手机铃声下载 诺基亚手机铃声下载 手机和弦铃声 手机铃声图片下载 飞利浦手机铃声下载 手机自编铃声曲谱 小灵通手机铃声下载 手机铃声编辑 CDMA手机铃声下载 摩托罗拉手机铃声下载 联通CDMA手机铃声下载 松下手机铃声下载 东信手机铃声下载 联想手机铃声下载 中兴手机铃声下载 大显手机铃声下载 首信手机铃声下载 三星手机自编铃声 三星CDMA手机铃声 康佳手机和弦铃声 MP3手机铃声下载 索尼爱立信手机铃声 手机铃声大全 三星手机铃声图片下载 手机特效铃声 手机铃声制作 三星手机铃声免费下载 TCL手机自编铃声 松下手机自编铃声 飞利浦手机自编铃声 诺基亚手机自编铃声 摩托罗拉自编铃声 三星手机MP3铃声 手机MP3铃声制作软件 免费MP3铃声下载 摩托罗拉MP3铃声 三星MP3铃声下载 联通MP3铃声下载 中国移动铃声下载 中国联通手机铃声下载 免费联通手机铃声 联通铃声 联通用户手机铃声下载 联通手机和弦铃声下载 联通手机铃声图片下载 小灵通铃声免费下载 和弦铃声免费下载
免费下载三星铃声 诺基亚免费铃声下载 联通免费铃声下载 免费铃声图片下载 MMF铃声免费下载 TCL免费铃声下载 免费下载铃声 手机铃声免费下载 松下免费铃声下载 NOKIA免费铃声下载 MIDI铃声免费下载 和弦铃声下载 TCL免费手机铃声下载 免费手机铃声图片下载 免费手机铃声下载网站 小灵通手机铃声免费下载 诺基亚手机铃声免费下载 摩托罗拉手机铃声免费下载 三星和弦铃声 CECT和弦铃声下载 三星T108和弦铃声 NOKIA和弦铃声下载 康佳和弦铃声下载 迪比特和弦铃声下载 阿尔卡特和弦铃声 CDMA和弦铃声下载 夏新和弦铃声下载 西门子和弦铃声 诺基亚和弦铃声 联通和弦铃声 三星铃声下载 三星和旋铃声 三星T108铃声下载 三星手机铃声乐园 三星CDMA铃声下载 三星免费铃声 三星真人铃声 诺基亚3100铃声下载 NOKIA手机铃声下载 怎样下载小灵通铃声 真人铃声下载 真人真唱手机铃声下载 联通用户铃声下载 联通CDMA铃声下载 TCL手机铃声图片下载 TCL手机和弦铃声下载 飞利浦630铃声下载 三星特效铃声 手机特效铃声下载 搞笑短信 MMF手机铃声 MMF格式铃声 免费短信 短信笑话 幽默短信 经典短信 谜语短信 短信祝福 爆笑短信 生日短信 爱情短信 精彩短信 情人节短信 短信传情 节日短信 彩信图片 彩信动画 彩信相册 免费彩信下载 三星彩信 联通彩信 移动彩信 彩信铃声 免费彩铃下载 移动彩铃 联通彩铃 12530彩铃 小灵通彩铃 免费三星手机铃声 免费和弦铃声 手机图铃下载 免费图铃下载 待机彩图 三星手机待机彩图 丰胸铃声
网络游戏 免费游戏下载 小游戏 在线游戏 游戏外挂 游戏论坛 游戏点卡 联众游戏 泡泡堂游戏 游戏攻略 FLASH游戏 单机游戏下载 美女 美女图片 美女写真 美女论坛 性感美女 美女走光 街头走光 走光照片 免费电影下载 免费在线ஸ