<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Simple Logic &#187; visualization</title>
	<atom:link href="http://www.thesimplelogic.com/category/visualization/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thesimplelogic.com</link>
	<description></description>
	<lastBuildDate>Tue, 20 Dec 2011 14:21:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Wandering Wikipedia: Datamining My Firefox History</title>
		<link>http://www.thesimplelogic.com/2010/04/17/wandering-wikipedia-datamining-my-firefox-history/</link>
		<comments>http://www.thesimplelogic.com/2010/04/17/wandering-wikipedia-datamining-my-firefox-history/#comments</comments>
		<pubDate>Sat, 17 Apr 2010 04:24:54 +0000</pubDate>
		<dc:creator>Adam Fletcher</dc:creator>
				<category><![CDATA[fun]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[dirtnap]]></category>
		<category><![CDATA[foxygraph]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[jesstess]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[tanning addiction]]></category>
		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.thesimplelogic.com/?p=197</guid>
		<description><![CDATA[My friends and I frequently get lost in Wikipedia. I&#8217;ll start out searching for something innocuous, like neutrino, and then suddenly I&#8217;m learning all about tanning addiction. This happens so often that my girlfriend suggested that it would be fascinating to plot the various trips through Wikipedia by datamining the Firefox history database, and since [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.thesimplelogic.com/wordpress/wp-content/uploads/2010/04/foxygraph_example1.png"><img class="aligncenter size-full wp-image-196" title="Wikipedia Hypertree" src="http://www.thesimplelogic.com/wordpress/wp-content/uploads/2010/04/foxygraph_example1.png" alt="" width="630" height="392" /></a></p>
<p style="text-align: center;">
<p style="text-align: left;">My friends and I frequently get lost in <a href="http://www.wikipedia.org">Wikipedia</a>. I&#8217;ll start out searching for something innocuous, like <a href="http://en.wikipedia.org/wiki/Neutrino">neutrino</a>, and then suddenly I&#8217;m learning all about <a href="http://en.wikipedia.org/wiki/Tanning_addiction">tanning addiction</a>. This happens so often that <a href="http://blog.ksplice.com/author/jesstess/">my girlfriend</a> suggested that it would be fascinating to plot the various trips through Wikipedia by datamining the Firefox history database, and since she is busy with her thesis I stole the idea and spent a few hours writing a Python script to visually display my Wikipedia wanderings.</p>
<p style="text-align: left;"><a href="http://www.mozilla.com/en-US/firefox/personal.html?from=getfirefox">Firefox 3</a> stores its history in a <a href="http://sqlite.org/">SQLite 3</a> database file in your profile directory; on OS X that database lives in <code>~/Library/Application Support/Firefox/Profiles/cn3x93q2.default,</code> and the database file we&#8217;re interested in is <code>places.sqlite.</code></p>
<p>The history database schema is described <a href="http://www.forensicswiki.org/wiki/Mozilla_Firefox_3_History_File_Format">here</a>, but the two tables we&#8217;re interested in are <code>moz_places</code> and <code>moz_historyvisits</code>. The first, <code>moz_places</code>, has the URL, title and other data related to the links we&#8217;ve visited. What it doesn&#8217;t have is information on the paths we have a traversed to get to the URLs in <code>moz_places</code> &#8211; that information is in <code>moz_historyvisits</code>. <code>moz_historyvisists</code> has internal references which let us find out where we&#8217;ve been (the column <code>from_visit</code>) and a reference to the <code>moz_places</code> table via the <code>place_id</code> column.</p>
<div id="attachment_200" class="wp-caption alignright" style="width: 413px"><a href="http://www.thesimplelogic.com/wordpress/wp-content/uploads/2010/04/foxygraph_exampledot.png"><img class="size-full wp-image-200 " style="border: 1px solid black; margin: 2px;" title="foxygraph_exampledot" src="http://www.thesimplelogic.com/wordpress/wp-content/uploads/2010/04/foxygraph_exampledot.png" alt="" width="403" height="596" /></a><p class="wp-caption-text">How I got from neutrino to tanning addiction. </p></div>
<p>A very talented data architect I know helped write (entirely wrote is maybe more accurate), this query:</p>
<p><code>SELECT<br />
curr.id, curr.url, curr.title,<br />
prev.id, prev.url, prev.title,<br />
1, t.visit_date<br />
FROM<br />
moz_places curr, moz_places prev,<br />
moz_historyvisits frm,<br />
moz_historyvisits t<br />
WHERE<br />
t.place_id = curr.id AND<br />
frm.place_id = prev.id AND<br />
frm.id = t.from_visit AND<br />
curr.url LIKE 'http://en.wikipedia.org/%' AND<br />
prev.url NOT LIKE 'http://en.wikipedia.org/%'<br />
</code><br />
This query returns all Wikipedia URLs that are the starting points of my journeys through Wikipedia by finding all of the Wikipedia links I&#8217;ve visited whose referrer is not Wikipedia itself. With a few changes to the last clauses we can find all the URLs whose referrers are Wikipedia links (ie, the waypoints in my travels through Wikipedia). Finally, by asking for a <code>curr.url</code> which is not part of Wikipedia but which has a <code>prev.url</code> that is Wikipedia, we know when we&#8217;ve left Wikipedia.</p>
<p>My script outputs graphs in <a href="http://www.graphviz.org/">Dot</a> format and JSON. The JSON output is in a representation that is compatible with <a href="http://thejit.org/">JIT</a>, a web 2.0 AJAXy graphing library, the output of which you can see in the title graphic of this post.</p>
<p>I&#8217;ve put the script up on <a href="https://github.com/">github</a> and called it <a href="http://github.com/adamf/FoxyGraph">FoxyGraph</a> (be kind; it was written in a few hours for a specific purpose and is probably full of bugs). I&#8217;ll be updating FoxyGraph later with more interesting visualizations of my Firefox history, but for now you can see the<a href="http://www.thesimplelogic.com/foxygraph/"> immense clickable web 2.0 hypertree of my Wikipedia wanderings.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.thesimplelogic.com/2010/04/17/wandering-wikipedia-datamining-my-firefox-history/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

