During the Watson-related madness earlier this year I wrote a script to put all of the Jeopardy! game data from J! Archive into a SQL database (they have no API so I parsed the very-broken HTML for each game). I never ended up doing anything with the data, but there’s been renewed interest around the house in this data so I took some time to make a word cloud of the Jeopardy! categories.
The cloud is great. Stop the hype. – This is an excellent article on what cloud computing is and isn’t and when the use of the cloud is the correct technical or architectural choice. I had a long post planned on the overloaded term “cloud computing” but OmniTI covers all the important points in this article. Like any new approach to infrastructure deployment that promises quick provisioning of services, people often forget that all of that infrastructure needs to be managed. There are a lot of good tools coming out to help with that management but none make it zero cost.
Dissecting Today’s Internet Traffic Spikes - With the above article on cloud computing and this article on the sudden nature of internet traffic spikes, I’m becoming an OmniTI fanboy. Part of my job is to worry about designing and provisioning correctly for sudden changes in traffic patterns, and Theo is correct that you have to design for spikes, not react to spikes.
Kanban For Sysadmins - I’ve started doing Kanban at work for one of our Operations teams and have been really pleased with the results so far — so much so that we’re rolling it out for another team this week and hopefully the rest of the department over the next few weeks. We track our work in Request Tracker, but it is hard to know 1. what is being worked on right now, and 2. how much throughput a team has. Kanban lets us know both, and it also lets us avoid the entire topic of prioritization of future work. We only prioritize when we are ready to start doing new work. I’ll post a follow-up to this once we’re further along in our Kanban experiment.
Hello From A libc-free World! – Have you ever wondered what, exactly, your “Hello, world!” program does? Jessica at Ksplice dives into what happens when you build a super-simple C program (it’s more complicated than you think!).
Data-Intensive Text Processing With MapReduce – A freely available draft in PDF of an upcoming book on using MapReduce to process large text datasets. One of the cool things we’ve done at ITA is add tracking data to each and every request that passes throughout our reservation system, and we output this tracking information in each log entry in every component we’ve written. The structure of this tracking data is such that if you aggregate the logs from all of the components you can easily construct a graph of the request’s path through the reservation system (including the asynchronous calls). The problem now is searching all of that log data, and I’ve been curious about MapReduce as it applies to this sort of data mining.
On MicroSD Problems – The investigation of a failing batch of MicroSD cards leads to an amazing story of detective work that delves in to the world of semiconductor manufacturing, gray markets, and failure rates.
CloudClimate CDN Speed Test - A clever use of XMLHTTPRequest to time HTTP downloads of small files (64KB) to your machine from the leading CDNs and cloud providers. I’m a sucker for the pretty graphs the tool creates with the data, but beyond that I can see how this tool is useful for people evaluating CDN/cloud choices by geographic location.
Drizzle – “An Open Source Microkernel DBMS for High Performance Scale-Out Applications” are all words I know and put together in that order sound interesting. Has anyone played with this yet?
NCSA Mosaic – Now you can run Mosaic on your hexacore i7 box; the fastest AJAX is the kind that doesn’t even happen!
The Panic Status Board – I recently learned the term “information radiator” and this is a perfect example of the concept. A simple, striking visualization for what is most important to Panic for the operation of their business. It’s a network operations center for your entire business. It’s hard to see how a single board would work for a large organization, but I’d love to build one for the group I’m in at work.
DevOps, SecOps, DBAOps, NetOps – A discussion of the problem of silos inside operations organizations, and how it is important to focus on the relationships between those groups as well as relationships with people outside of Ops. As I see it, all of the *Ops initiatives are attempts to fix the brokenness in communication that traditional software shop organizational charts create; managers and up need to realize the cost in agility that comes with creating silos. On the other hand, there is a clear benefit to specialization and building service groups around specific disciplines once a company gets to a certain size. I don’t have a good solution to this problem but spend a lot of time thinking about it… however, I do know it pays to meet the people you are working with face to face, have a beer and understand what drives those groups to make the decisions they do. I sometimes wonder if doing “embedded engineering” is the right approach, with engineers from all of the silos sitting together for the duration of a cross-functional project. If anyone has any thoughts on this I’d love to hear them.
Some interesting articles & tools: