Hello, content filters
It's starting to look a lot like China:
DRM on music may be dying, but network filtering of copyrighted material is alive and well. In fact, over the next few months, two different filtering initiatives from Big Content could both come to fruition, bringing the magic of Big Brother to colleges and ISPs near you. It's still a contested issue, but the situation has developed to the point where it is at least plausible to imagine ubiquitous network filtering in the US.
Once the telcos (and Universities!) start monitoring and filtering traffic, where does it stop? Should we also filter for "hate speech" and un-American activities? Disturbing stuff...
Update: It turns out that the original article is somewhat alarmist. As an update states:
"Neither of these two provisions is tied to a school's participation in the federal student aid programs in any way," we're told. "In other words, no school and no student will ever lose funding if a school doesn't make plans to address IP theft, or purchase any programs to do it. And no school will ever lose any federal aid because its students engage in illegal downloading or file-sharing, period."
So the RIAA can sue schools, but not jeopardize their federal funds.
I also looked at the RIAA's "university toolkit" monitoring application, and all it does is naive traffic monitoring, not content filtering. The documentation includes this laughable sentence: "The program cannot distinguish between legal and illegal activity and does not identify the titles of the files being passed across the network." So while they may be more aggressive in the future, they really have no idea what they're doing now.





I work a good deal in the Internet operational engineering area, and can speak to the experience of a number of universities. They have been filtering, often on transfer type (e.g., P2P) rather than source.
They have done this because they have run out of bandwidth for their primary teaching and research activities. Analysis of the traffic on their Internet links shows a huge amount of music downloads, which seems to grow whenever they increase the Internet access bandwidth -- and its cost to the institution.
Many have tried to implement reasonable alternatives. For example, P2P among dormitory networks is not restricted. The restrictions often are relaxed at night and on weekends.
Some measure traffic type by individual user, and, if a bandwidth quota is exceeded, they will reduce the bandwidth permitted to that particular user. That comes from as classic a statement of the Tragedy of the Commons as does any overgrazing by sheep.
I emphasize that most of the filtering of which I am aware is only loosely on content as expressed by communications protocol, and not at all on digital rights. A number of universities regard controlling bandwidth use as a matter of their networks' survival.
--
Howard
*equal opportunity offense to both extremes*
"Those who cannot remember the past are condemned to repeat it" [George Santayana]
January 15, 2008 2:47 PM | Reply | Permalink
It's important not to conflate IP and engineering issues.
I have no problem with content- and source-neutral, protocol-based traffic prioritization. Heck, if they want to, Universities are welcome to completely block Napster and Limewire ports. Bandwidth is expensive, and the university network is nominally an "educational resource."
Nor do I have a problem with individual quotas. Sanctioning individual students (or even ISP users) for using too much of the communal bandwidth seems entirely reasonable. I would guess that bandwidth use is exponentially distributed among students, so lenient individual bandwidth quotas would save a lot of total bandwidth while only affecting a few people.
I get worried when the university or ISP is forced to examine content rather than treating it as a stream of bits. The problem is that once you've got content filtering in place, it's only a matter of time before someone decides to use it for something else. All the hard work's been done, after all -- the content filters are already there.
It's like what happened with Google keeping search logs: sure, they could just use them to improve their interface and feed their resident PhDs, but then the FBI sees the data just sitting there, and comes by with a National Security Letter. And remember that the RIAA are the people who think it's illegal for you to rip your own CDs and copy them to your computer. They "probably won't sue you," but to be legal, you should buy both versions.
Of course, this will probably just lead to an arms race: encrypting protocols, banning encryption, steganography, etc. Eventually we're worse off than before: the same copyrighted content is being exchanged, but less efficiently, further taxing the network.
January 15, 2008 3:28 PM | Reply | Permalink
I can only say that I don't know anyone in serious Internet engineering that believes that content filtering is going to be a major issue. There are practical limits to how much deep packet inspection you can do at 10 or 40 Gigabits. I have had to slow down very big routers, and double the hardware involved, to protect against the reality of botnet distributed denial of service. Digital rights filtering is lost in the noise.
Individual bandwidth quotas seem to be working rather well, without content filtering. As you point out, the demand is exponential, although experience shows the curve on the flat side of exponential.
I really don't understand what you are proposing, then, if bandwidth filtering is inadequate. Encrypting P2P really accomplishes very little, because I don't need to know the packets to know the statistical characteristics of P2P streams and block them.
--
Howard
*equal opportunity offense to both extremes*
"Those who cannot remember the past are condemned to repeat it" [George Santayana]
January 15, 2008 3:52 PM | Reply | Permalink
I'm not sure what you mean by "not a major issue." Do you mean that it just won't happen, or that it's trivially easy compared to defending against DoS attacks and other threats? If the former, how does China manage to filter all their traffic? From what little I've read, it sounds content-based.
I didn't mean to propose anything -- I think the current port- and quota-based controls are just fine, and we should leave it at that. But the COAA bill says universities should develop plans for "technology-based deterrents to prevent such illegal activity," and I doubt quotas would count as "deterrents of illegal activity" in the RIAA's eyes. Given that P2P is also used to distribute legitimate data like Free Software releases and demo videos, the only way to monitor the legality of transferred data is to unpack it and peek at it.
Finally, could you point me to any papers on traffic-statistic-based filtering? I hadn't heard of this before, and it sounds like an interesting machine learning problem.
Anyways, it's good we still have people like you at TPMC, who will get into interesting discussions that have nothing to do with polls or race-baiting.
January 15, 2008 4:58 PM | Reply | Permalink
There were several papers, from academic institutions, about specific bandwidth quotas and P2P recognition, at NANOG. I'll have to dig through the archives, although I'm pretty sure there was one at St. Louis, since that's the last one I attended in person.
Perhaps a little OT, but I have some back-burner research of my own, based on medical epidemiology, where plotting incidence against time shows some characteristic patterns of a single stable source of infection, a single moving source (Typhoid Mary, or the SARS "superinfectors") or where there is secondary contagion. There seem to be some quite comparable patterns characteristics of botnets, worms, and viruses -- the latter has to be correlated with working hours, since there usually is a manual action that spreads it.
Depending on the network pathogen, it may be also interesting to plot incidence against CIDR block, or to note that it's a very flat distribution against IP address -- which tends to mean it's something like the address randomizer in the original Slammer.
AFAIK, the bulk of Chinese filtering is content-based, with the traffic coming principally into humongous traffic inspecting firewalls in Beijing and Shanghai, which are mostly targeting URLs. Various universities, with direct connections outside China, aren't as heavily filtered, and apparently they just leave Hong Kong alone.
*sigh* I thought I left polling behind with SDLC, but I'm now running into it in new marine radio protocols. Can't speak to the race-baiting, but race conditions are Bad Things to have in operating systems. Modern operating systems, though, tend to be agnostic about race, minimally because they run on highly integrated circuits.
--
Howard
*equal opportunity offense to both extremes*
"Those who cannot remember the past are condemned to repeat it" [George Santayana]
January 15, 2008 10:30 PM | Reply | Permalink
Conjecturing on the basis of server logs I have access to, as well as a bit of limited discussion; there seems to be a great deal of website content scraping being done from mainland Chinese IPs.
Although, I do not know if it is controlled from within China, there is some sort of bot-net, using hacked boxes, currently engaging in a great deal of content acquisition through website downloading. It uses the forged user-agent string:
In case you are not aware of it, Majestic 12 a public Search Engine which uses a Distributed Workload collaboration model for its spidering. Their official spider is currently v1.2.1, which makes it easy to regex log files when looking for instances of the forged user-agent. Majectic 12 has more info on this subject, if you're curious, or have a use for it. It includes a few methods for blocking and/or offsite links to directions on how to block it. These work arounds are new within the last two weeks, which was the last time I posted the forger bot entries from the log files of servers I oversee. I am going to try one out after finishing this message.
It uses Apache module: mod_setenvif to set an environment variable for a specific user-agent string, using a regular expression. Then it is possible to deny all server access for that environment variable using Apache module: mod_access. It seems fairly straight forward. I soon shall see...
It's actually not a major concern presently on any of these servers, because they all are operating with a healthy amount of leased bandwidth slack, but the bot is a nasty server hammer, that attempts to suck data from a whole file directory at a time, as fast as the zombie client can. My latest hit by this bot was yesterday, and in a 20:54 time interval, it made 442 file requests, downloading a bit over 22.5mb of data in that time. I've seen it suck data at a much higher rate of speed before.
and Howard, check your TPMCafe Private messages, ok?
January 16, 2008 12:55 PM | Reply | Permalink
I found this article on botnet detection at the St. Louis NANOG, but that's about it.
Your disease work sounds very interesting. I briefly looked at the spread of HIV from a genetic perspective; the epidemiology seems to have been fairly well-studied, and there's plenty of public data, so it might be a good test case for your methods if you haven't already used it.
PS -- I thought of following up on polling and races (maybe mention frontside/backside bus segregation?), but... I just can't compete.
January 16, 2008 1:55 PM | Reply | Permalink
Ars technica is engaging in a bit of alarmism over this, it seems. The relevant citation for the proposed legislation:
H.R.4137: To amend and extend the Higher Education Act of 1965, and for other purposes. (The link is to GPO's 2.9mb PDF file. Thomas' XML mark-up of this bill is presently screwed-up, and Thomas' XML served Congressional legislation transcripts are godawful enough when they are properly coded. The DTD and XSL stylesheet are nightmares which look as if they haven't been updated since the turn of the millennium)
Now to the specific relevant text in the proposed legislation:
It's not as draconian as Ars technica is making it out to be in their post regarding this. It isn't an absolute mandate, nor is it even a requirement. It is more along the line of a recommendation, that need be implemented "to the extent practicable" It is a low-priority.
The Bill as been introduced in the house, made it out of all overseeing House Committees, and now been placed on the House Calendar for consideration by the whole body. It's still a long long way from passage in both Congressional Legislative bodies, and will probably have many changes made to it before that occurs. Even if passed unamended, it doesn't kick in until FY2009, earliest, and that date could end up not being concrete.
There is not one word regarding packet-sniffing or privacy invasion within this legislation. It is a general statement of purpose, intended to emphasise that the theft of copyrighted material is indeed theft, and provides a level of funding for initiatives which seek to curtail this theft.
The ars Technica article mentions a MPAA 'toolkit" which they are providing free to interested institutions. Given the recent litigiousness of the MPAA toward many Universities' enrolled students, a significant portion of which was frivolous nuisances, predicated on hair up their ass conjecture, not fact, I find it rather unlikely that Univ. Sys-Admins are going to accept this software with open-arms, and warmly embrace its implementation.
That would be akin to sleeping with the enemy, and the potential security risks would be immense. I would be adamantly opposed to proprietary software which was produced by a known past bad net actor being installed within my network's security perimeter, if I was tasked with keeping it secured.
January 16, 2008 5:46 AM | Reply | Permalink
There are those of us who consider updated XML, DTD, and XSL to be nightmares. If God had intended us to use these things, He wouldn't have invented troff, nroff, and, with good and evil, HTML.
Howard,
Who is very cranky struggling with Nvu to produce a webpage that has any artistic merit whatsoever -- much less useful effects.
Seriously, your point about sysadmins is well taken.
January 16, 2008 12:02 PM | Reply | Permalink
NVU still struggles from a lack of active developers to regularly update its codebase last time I played around with it. I am partial to chami dot com's, HTML-Kit for my general web coding needs. It has a very good freeware version, which doesn't come with a steep learning curve to implement, if you already possess a good basic knowledge about the web coding language you intend to use.
It does require a bit of time setting up your configuration preferences, because a majority of its most useful functions are enabled through add-on plug-ins, and there are over 400 of those to browse through. If you decide to give it a try, let me know, along with what sort of functionality you desire from a web editor software, and I'll tell you what my personal preferences are for plug-ins..
Down-sides: it doesn't have a decent WYSWYG functionality, and there is no real embedded CSS editor. The latter issue is fairly easy to work around through, because it integrates nicely with Newsgator's free liteware version of their TopStyle CSS editor. (lite version downloadlink is almost at the bottom of page)
The W3C's open source web editor Amaya is a possible choice for use too.
January 16, 2008 1:27 PM | Reply | Permalink
Yes, they do seem to be reading some alarmism between the lines there. I did a bit more digging, and have updated the post (or will do so shortly).
But it's nearly impossible for me to tell what this will really mean. On the one hand, "to the extent practicable" could mean "well, shucks, we gave it the old college try and couldn't get anywhere." On the other, "practicable" could be defined as "at least as invasive as the RIAA's toolkit," and enforced by lawsuit harassment. By making enforcement the responsibility of individual colleges, the law gives them many small, financially-strapped targets. They show no signs of letting up on their campaign to frighten students and others into settling out of court. Presumably they have the legal muscle to threaten smaller colleges into using their toolkit. Only the richest institutions can afford a fight.
I haven't had much contact with university sysadmins, but from the user agreements and news stories I've read, most colleges are very cooperative in turning over logs, etc., so they'll probably go along with whatever else.
January 16, 2008 1:59 PM | Reply | Permalink