<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sean Neilan</title>
	<atom:link href="http://seanneilan.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://seanneilan.com</link>
	<description>The voices in my head tell me I&#039;m the sanest man on the moon.</description>
	<lastBuildDate>Mon, 01 Mar 2010 14:35:45 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Best MITM Thingy Ever</title>
		<link>http://seanneilan.com/2010/03/01/best-mitm-thingy-ever/</link>
		<comments>http://seanneilan.com/2010/03/01/best-mitm-thingy-ever/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 14:35:45 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=145</guid>
		<description><![CDATA[is the best man in the middle attack thingy for http requests I&#8217;ve ever seen. If you ever want to quickly reverse engineer some program that connects to the internet or a private iphone api (like mint.com&#8217;s) You can configure your os to use iptables to http packets through this program. Or, if a program [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.edge-security.com/proxystrike.php"><img class="alignnone" title="Proxystrike" src="http://farm4.static.flickr.com/3411/3251415324_242d45c681.jpg" alt="Proxystrike" width="500" height="108" /></a>is the best man in the middle attack thingy for http requests I&#8217;ve ever seen. If you ever want to quickly reverse engineer some program that connects to the internet or a private iphone api (<a href="http://www.mint.com/">like mint.com</a>&#8217;s) You can configure your os to use iptables to http packets through this program. Or, if a program can detect that your using a proxy server, you can configure squid to act as a hidden proxy but then use proxy strike as its proxy. Kind of like a proxy-proxy. Next time I bother reverse engineering something I&#8217;ll post the code to this.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2010/03/01/best-mitm-thingy-ever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My Solution to SFTP on Linux</title>
		<link>http://seanneilan.com/2010/03/01/my-solution-to-sftp-on-linux/</link>
		<comments>http://seanneilan.com/2010/03/01/my-solution-to-sftp-on-linux/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 14:22:42 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=134</guid>
		<description><![CDATA[Sftp (and ftp) clients on linux suck.
Fortunately, there&#8217;s a program called sshfs which mounts an sftp connection as a drive using fuse. To install in Ubuntu, sudo apt-get install sshfs.
Then, when you want to connect to an sftp server, add an alias to your bashrc file like this:
#mounting alias
alias seanneilan.com=&#8217;mkdir ~/ftp/seanneilan.com/; sshfs -o uid=1000 -o [...]]]></description>
			<content:encoded><![CDATA[<p>Sftp (and ftp) clients on linux suck.</p>
<p>Fortunately, there&#8217;s a program called <a href="http://fuse.sourceforge.net/sshfs.html">sshfs</a> which mounts an sftp connection as a drive using fuse. To install in Ubuntu, sudo apt-get install sshfs.</p>
<p>Then, when you want to connect to an sftp server, add an alias to your bashrc file like this:</p>
<p>#mounting alias<br />
alias seanneilan.com=&#8217;mkdir ~/ftp/seanneilan.com/; sshfs -o uid=1000 -o gid=1000 sneilan@seanneilan.com:/ ~/ftp/seanneilan.com/; cd ~/ftp/seanneilan.com/home/sneilan&#8217;</p>
<p>#unmounting alias<br />
alias useanneilan.com=&#8217;cd ~; fusermount -u ~/ftp/seanneilan.com; rmdir ~/ftp/seanneilan.com&#8217;</p>
<p>Choose a name for the connection. Mine is seanneilan.com so when I type seanneilan.com at my bash shell, it will come up. Create an ftp folder in your home directory. Replace sneilan@seanneilan.com:/ with your username, host &amp; mount path. Replace ~/ftp/seanneilan.com/ with ~/ftp/name-of-your-connection-alias/</p>
<p>For the unmounting alias, it&#8217;s the exact same thing except you prepend a u to your connection alias.</p>
<p>When you type your alias, it will prompt you for the password and automatically mount your connection in the ftp folder &amp; send you too it. You can edit files in this directory and they will be automatically uploaded.</p>
<p>If you lose your connection, you&#8217;ll have to unmount, remount and reopen all your files since the file pointers will get messed up. I was thinking about implementing something with <a href="http://www.stefan.buettcher.org/cs/fschange/index.html">fschange</a>. That way, I could make a program that makes a copy of whatever you&#8217;re about to edit and instead opens up the copy rather than the one on the sftp server. Then, when you save, fschange will notify some program to copy/paste your changes over to the sftp server. Should the sftp connection drop, the program will automatically reconnect. I might do this later.</p>
<p>It also helps if you use <a href="https://help.ubuntu.com/community/SSH/OpenSSH/Keys">ssh keys</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2010/03/01/my-solution-to-sftp-on-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Xmonad is Amazing</title>
		<link>http://seanneilan.com/2010/03/01/xmonad-is-amazing/</link>
		<comments>http://seanneilan.com/2010/03/01/xmonad-is-amazing/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 14:10:10 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=131</guid>
		<description><![CDATA[There&#8217;s this window manager called Xmonad. It&#8217;s magnificient because it lets you manage windows with a keyboard. You have 8 or so portals to different workspaces on each monitor.

You can have different programs running on different workspaces and switch them by pressing Alt-~-0-9. Bring up a new program with Alt-p-p &#38; then type the name [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s this window manager called <a href="http://xmonad.org/">Xmonad</a>. It&#8217;s magnificient because it lets you manage windows with a keyboard. You have 8 or so portals to different workspaces on each monitor.</p>
<p><img class="alignnone" title="Xmonad Workspaces" src="http://xmonad.org/images/tour/workspaces.png" alt="Xmonad Workspaces" width="588" height="171" /></p>
<p>You can have different programs running on different workspaces and switch them by pressing Alt-~-0-9. Bring up a new program with Alt-p-p &amp; then type the name of the program. Kill a program with Alt-Shift-C. Send windows to other desktops with Alt-Shift-W or E. Use Wmii bindings like <a href="http://seanneilan.com/wp-content/uploads/2010/03/xmonad.txt">mine</a>. There&#8217;s a lot more to this xmonad and you can read my config file or just go <a href="http://xmonad.org/tour.html">here</a>.</p>
<p>Xmonad is definitely an advancement in interaction with computers for those who are willing to go an extra step to learn new things.</p>
<p>I&#8217;ve been running Linux for a long time. For the most part, I&#8217;ve understood the various linux desktops as largely trying to be like windows but more powerful. (At least when you look at Gnome or KDE, the great front ends to Linux.)</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2010/03/01/xmonad-is-amazing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dominicks/Jewel Price Wars Are BS</title>
		<link>http://seanneilan.com/2010/01/22/dominicksjewel-price-wars-are-bs/</link>
		<comments>http://seanneilan.com/2010/01/22/dominicksjewel-price-wars-are-bs/#comments</comments>
		<pubDate>Sat, 23 Jan 2010 02:36:05 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[dominicks jewel aldis]]></category>

		<guid isPermaLink="false">http://seanneilan.com/2010/01/22/dominicksjewel-price-wars-are-bs/</guid>
		<description><![CDATA[I just biked to Aldis and got something like $50 worth of groceries for like $22. That was awesome. Even with their supposed price cuts, Dominicks &#038; Jewel just can&#8217;t beat Aldi&#8217;s.
]]></description>
			<content:encoded><![CDATA[<p>I just biked to Aldis and got something like $50 worth of groceries for like $22. That was awesome. Even with their supposed price cuts, Dominicks &#038; Jewel just can&#8217;t beat Aldi&#8217;s.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2010/01/22/dominicksjewel-price-wars-are-bs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>pybwsal so far</title>
		<link>http://seanneilan.com/2009/12/16/pybwsal-so-far/</link>
		<comments>http://seanneilan.com/2009/12/16/pybwsal-so-far/#comments</comments>
		<pubDate>Wed, 16 Dec 2009 21:30:05 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[pybwsal]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=117</guid>
		<description><![CDATA[Things have been a bit slow in a past couple days as my friend Derek &#38; I have been building an android application over winter break.
I have added the arbitrator, controller, buildmanager and constructionmanager classes to the project. The constructionmanager class is halfway done &#38; the other three classes haven&#8217;t been tested. I need to [...]]]></description>
			<content:encoded><![CDATA[<p>Things have been a bit slow in a past couple days as my friend Derek &amp; I have been building an android application over winter break.</p>
<p>I have added the arbitrator, controller, buildmanager and constructionmanager classes to the project. The constructionmanager class is halfway done &amp; the other three classes haven&#8217;t been tested. I need to complete the holy trinity of construction, production and morph manager to test them properly. Testing of course will probably take longer than it took to write, but, hey, by January 1st, we should have this whole pybwsal thing done.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2009/12/16/pybwsal-so-far/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>pybwsal &#8211; a python implementation of bwsal</title>
		<link>http://seanneilan.com/2009/12/12/pybwsal-a-python-implementation-of-bwsal/</link>
		<comments>http://seanneilan.com/2009/12/12/pybwsal-a-python-implementation-of-bwsal/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 18:57:05 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bwsal]]></category>
		<category><![CDATA[pybwsal]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=108</guid>
		<description><![CDATA[My friend Derek Schaefer &#38; I have started a project called pybwsal for the starcraft ai competition. Bwsal is a set of utility functions to making programming starcraft ai&#8217;s in bwapi (which is a set of utility functions in c++ for programming starcraft ai&#8217;s) easier.  
So far, the building placer has been implemented &#38; [...]]]></description>
			<content:encoded><![CDATA[<p>My friend <a href="http://derekschaefer.net/">Derek Schaefer</a> &amp; I have started a project called <a href="http://code.google.com/p/pybwsal/"><strong>pybwsal</strong></a> for the <a href="http://eis.ucsc.edu/StarCraftAICompetition">starcraft ai competition</a>. <a href="http://code.google.com/p/bwsal/">Bwsal</a> is a set of utility functions to making programming starcraft ai&#8217;s in <a href="http://code.google.com/p/bwapi/">bwapi</a> (which is a set of utility functions in c++ for programming starcraft ai&#8217;s) easier. <img src='http://seanneilan.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>So far, the building placer has been implemented &amp; I was working on implementing the building manager but unfortunately ran into issues yesterday. See <a href="http://code.google.com/p/pybw/issues/detail?id=20">http://code.google.com/p/pybw/issues/detail?id=20</a></p>
<p>In the meantime, I will work on programming the bid system as <a href="http://code.google.com/u/chad.retz/">Chad Retz</a> (and perhaps others) from the <a href="http://bwapi-jbridge.googlecode.com/">bwapi-jbridge</a> project is interested in that.</p>
<p>You can keep up to date on pybwsal by subscribing checking back to <a href="http://seanneilan.com/tag/pybwsal/">http://seanneilan.com/tag/pybwsal/</a> every once in a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2009/12/12/pybwsal-a-python-implementation-of-bwsal/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NEWSBREAK: Youtube account oldcountrytunes is suspended..</title>
		<link>http://seanneilan.com/2009/12/08/newsbreak-youtube-account-oldcountrytunes-is-suspended/</link>
		<comments>http://seanneilan.com/2009/12/08/newsbreak-youtube-account-oldcountrytunes-is-suspended/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 03:54:39 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=106</guid>
		<description><![CDATA[My goodness. What a massive loss to the community. Thousands and thousands of country music songs have been lost from youtube thanks to copyright claims. I began to like country music too and now there is simply no way to get any of it without going to the library? I mean, never in my life [...]]]></description>
			<content:encoded><![CDATA[<p>My goodness. What a massive loss to the community. Thousands and thousands of country music songs have been lost from youtube thanks to copyright claims. I began to like country music too and now there is simply no way to get any of it without going to the library? I mean, never in my life would I pay for anything in this world that might be gotten for free on the Internet. (Library costs money! Gas!)</p>
<p>And if you care, here is his old bio:</p>
<blockquote><p>Old as dirt. retired. I am a Christian but I am also a mortal man. Not perfect, born in sin so I believe I must rely on something more powerful than man to guide and help me during my bad times. I want to do good when I am able and avoid doing bad . That is my goal although it is not possible to achieve all the time.</p></blockquote>
<p>Maybe I can troll thepiratebay.org for Corrina Cordwell, Slim Whitman, Jim Reeves, Merle Haggard, Merle Travis, Ernie Freeman, Kitty Wells, Marty Robbins, Highway 101, Eddy Arnold, Billy Walker, Frank Ifield, Tom T hall, Gene Autry, Roy Rodgers, Johnny Cash and Johnny Horton.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2009/12/08/newsbreak-youtube-account-oldcountrytunes-is-suspended/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>YouTube &#8211; Gene Autry &#8211; Jingle Jangle Jingle</title>
		<link>http://seanneilan.com/2009/10/07/youtube-eddie-fisher-sings-thinking-of-you-and-im-yours/</link>
		<comments>http://seanneilan.com/2009/10/07/youtube-eddie-fisher-sings-thinking-of-you-and-im-yours/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 04:35:27 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[eddie fisher]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=89</guid>
		<description><![CDATA[
love this song.
]]></description>
			<content:encoded><![CDATA[<p><object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/6nSZGXk92So&amp;feature=related"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/6nSZGXk92So&amp;feature=related" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object></p>
<p>love this song.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2009/10/07/youtube-eddie-fisher-sings-thinking-of-you-and-im-yours/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scraping Google Search Results and Images</title>
		<link>http://seanneilan.com/2009/10/07/scraping-google-images/</link>
		<comments>http://seanneilan.com/2009/10/07/scraping-google-images/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 03:00:56 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=45</guid>
		<description><![CDATA[Scraping google images
Here is some rudimentary code for scraping google images. Later on, I&#8217;ll add features for search options &#038; perhaps limiting the number of images this thing downloads. Leave a comment below if your interested in these features or if this is useful to you.
Put this code into a file like gimages.py


import urllib
import urllib2
import [...]]]></description>
			<content:encoded><![CDATA[<h3>Scraping google images</h3>
<p>Here is some rudimentary code for scraping google images. Later on, I&#8217;ll add features for search options &#038; perhaps limiting the number of images this thing downloads. Leave a comment below if your interested in these features or if this is useful to you.</p>
<p>Put this code into a file like gimages.py
</p>
<pre class="brush: python; gutter: false; collapse: true">
import urllib
import urllib2
import os
import sys
#import pdb

import demjson # a powerful python json decoder/encoder. Necessary for decoding garbled google JSON output within a reasonable development time frame

############################
# The meat of the project
############################
def search(term):
    """returns a results object for getting images."""
    return results(term)

class results:
    """iterable list of image results.
    here are the properties of each image
    [0] # google images page for image
    [1] # unknown
    [2] # unknown
    [3] # image url
    [4] # google images thumbnail width
    [5] # google images thumbnail height
    [6] # title text describing relevance of image to query somehow. Not sure of ruleset for this
    [7] # unknown
    [8] # unknown
    [9] # dimensions &#038; size
    [10] # filetype
    [11] # original domain
    [12] # unknown
    [13] # unknown
    [14] # server that url contained in [0] resides on
    [15] # unknown
    [16] # unknown
    [17] # unknown

    still unknown as how to get the alt text stored in the image without having to visit actual page.

    results.stats_text will give you some html containing the time it took to load the page, the total images retrieved, etc.
    searching by size, type &#038; color will come later"""

    def __init__(self, term):
        self.term = urllib.quote(term) # the original search term
        self.index = 1 # which image we are returning
        self.images = [] # stash/return images (from) here
        #self.curPageObj = {} # stores json object containing list of images
        self.cur_page_num = 1 # page we are on in google images
        self.max_images = 1000 # only retrieve first 1000 images due to restrictions placed by google. Google says it can find hundreds of millions of images, but, it will only return the first 1000 results. Such a crude example of unnoticed false advertising, IMHO.
        # retrieve initial images
        url = 'http://images.google.com/images?hl=en&#038;q=%s&#038;ijn=page&#038;start=%d' % (self.term, self.cur_page_num) # Google uses an internal json api to retrieve images <img src='http://seanneilan.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Yup.
        page = get(url)
        page = page.replace('/*', '')
        page = page.replace('*/', '')
        page = demjson.decode(page)
        self.images.extend(page['images'])
        self.stats_text = page['sd']

    def __iter__(self):
        return self

    def next(self): # return next image object here or get a new page object if page_num
        if self.index == self.max_images:
            raise StopIteration

        if self.index % 18 == 0: # if we need to go to the next google images page
            # get the next page
            self.cur_page_num = self.index
            url = 'http://images.google.com/images?hl=en&#038;q=%s&#038;ijn=page&#038;start=%d' % (self.term, self.cur_page_num)
            page = get(url)
            page = page.replace('/*', '')
            page = page.replace('*/', '')
            page = demjson.decode(page)

            self.images.extend(page['images']) # add to existing list of images

            # do something here like get a new page &#038; attach more stuff to self.images
        self.index = self.index + 1
        return self.images[self.index] # return next image!

# @TODO Make gimages.get(url) keep trying if the server says it's down.
# @TODO Add support for searching by size, type &#038; color.
# @TODO In results object, extract total number of images retrieved &#038; other stats out of HTML, rather than make user do that. Same with each image: get the width, height &#038; size out of image for the user.
# @TODO Make cookie file optional incase script is being run from a read only directory.

# This is some boilerplate code for using urllib with cookies
# at the end, we get a nice get(url) function that has its own cookie file

COOKIEFILE = 'cookies.lwp'
# the path and filename to save your cookies in

cj = None
ClientCookie = None
cookielib = None

# Let's see if cookielib is available
try:
    import cookielib
except ImportError:
    # If importing cookielib fails
    # let's try ClientCookie
    try:
        import ClientCookie
    except ImportError:
        # ClientCookie isn't available either
        urlopen = urllib2.urlopen
        Request = urllib2.Request
    else:
        # imported ClientCookie
        urlopen = ClientCookie.urlopen
        Request = ClientCookie.Request
        cj = ClientCookie.LWPCookieJar()

else:
    # importing cookielib worked
    urlopen = urllib2.urlopen
    Request = urllib2.Request
    cj = cookielib.LWPCookieJar()
    # This is a subclass of FileCookieJar
    # that has useful load and save methods

if cj is not None:
# we successfully imported
# one of the two cookie handling modules

    if os.path.isfile(COOKIEFILE):
        # if we have a cookie file already saved
        # then load the cookies into the Cookie Jar
        cj.load(COOKIEFILE)

    # Now we need to get our Cookie Jar
    # installed in the opener;
    # for fetching URLs
    if cookielib is not None:
        # if we use cookielib
        # then we get the HTTPCookieProcessor
        # and install the opener in urllib2
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)

    else:
        # if we use ClientCookie
        # then we get the HTTPCookieProcessor
        # and install the opener in ClientCookie
        opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
        ClientCookie.install_opener(opener)

def get(url, txdata=None):
    try:
        # fake a user agent, some websites (like google) don't like automated exploration
        txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

        #txdata = None
        # if we were making a POST type request,
        # we could encode a dictionary of values here,
        # using urllib.urlencode(somedict)

        req = Request(url, txdata, txheaders)
        # create a request object

        handle = urlopen(req)
        # and open it to return a handle on the url

    except IOError, e:
        print 'We failed to open "%s".' % url
        if hasattr(e, 'code'):
            print 'We failed with error code - %s.' % e.code
        elif hasattr(e, 'reason'):
            print "The error object has the following 'reason' attribute :"
            print e.reason
            print "This usually means the server doesn't exist,"
            print "is down, or we don't have an internet connection."
        sys.exit()

    else:
        #print 'Here are the headers of the page :'
        #print handle.info()
        return handle.read()
        # handle.read() returns the page
        # handle.geturl() returns the true url of the page fetched
        # (in case urlopen has followed any redirects, which it sometimes does)

    #print
    if cj is None:
        print "We don't have a cookie library available - sorry."
        print "I can't show you any cookies."
    else:
        #print 'These are the cookies we have received so far :'
        for index, cookie in enumerate(cj):
            print index, '  :  ', cookie
        cj.save(COOKIEFILE)                     # save the cookies again
</pre>
<p>
Then go like this:
</p>
<pre name="code" class="brush: python; gutter: false; toolbar: false">
import gimages
results = gimages.search('asdf')
for image in results:
    print image[3]
</pre>
<p>
Each image row has these family values:</p>
<pre>
    [0] # google images page for image
    [1] # unknown
    [2] # unknown
    [3] # image url
    [4] # google images thumbnail width
    [5] # google images thumbnail height
    [6] # title text describing relevance of image to query somehow. Not sure of ruleset for this
    [7] # unknown
    [8] # unknown
    [9] # dimensions &#038; size
    [10] # filetype
    [11] # original domain
    [12] # unknown
    [13] # unknown
    [14] # server that url contained in [0] resides on
    [15] # unknown
    [16] # unknown
    [17] # unknown
</pre>
<p>
You&#8217;ll also have to download <a href="http://deron.meranda.us/python/demjson/">Demjson</a>. Demjson is a really good json library &#038; it&#8217;s necessary to parse google&#8217;s weird internal json api.
</p>
<h3>Scraping Google Search</h3>
<p>With this code:</p>
<pre class="brush: python; collapse: true; gutter: false">
import urllib
import urllib2
import os
import sys
#from xml.etree.ElementTree import ElementTree
#from xml.etree.ElementTree import XMLTreeBuilder
import lxml.html
import pdb

############################
# The meat of the project
############################
def search(term):
    """returns a results object for getting searches."""
    return results(term)

class results:
    def parse_page(self, page):
        tree = lxml.html.document_fromstring(page)
        searches = []
        for i in tree.find_class('g'):
            temp = {}
            temp['url'] = i.find_class('l')[0].get('href')
            temp['title'] = i.find_class('l')[0].text_content()
            temp['desc'] = i.find_class('s')[0].text_content()
            searches.append(temp)
        return searches

    def get_stats_text(self, page):
        tree = lxml.html.document_fromstring(page)
        stats_text = tree.get_element_by_id('ssb').text_content()
        return stats_text

    def __init__(self, term):
        self.term = urllib.quote(term) # the original search term
        self.index = 1 # which image we are returning
        self.searches = [] # stash/return images (from) here
        #self.curPageObj = {} # stores json object containing list of images
        self.cur_page_num = 1 # page we are on in google images
        self.max_searches = 1000 # only retrieve first 1000 images due to restrictions placed by google. Google says it can find hundreds of millions of images, but, it will only return the first 1000 results. Such a crude example of unnoticed false advertising, IMHO.
        # retrieve initial images
        url = 'http://www.google.com/search?q=%s&#038;start=%d' % (self.term, self.cur_page_num) # Google uses an internal json api to retrieve images <img src='http://seanneilan.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Yup.
        page = get(url)
        "/html/body/div[2]/div/p" # search info
        "/html/body/div[2]/div[3]/div/ol" # array of searches
        #tree = ElementTree()
        #root = tree.fromstring(page)
        #pdb.set_trace()
        self.searches.extend(self.parse_page(page))
        self.stats_text = self.get_stats_text(page)

        # build logic here
        #page = page.replace('/*', '')
        #page = page.replace('*/', '')
        #page = demjson.decode(page)
        #self.images.extend(page['images'])
        #self.stats_text = page['sd']

    def __iter__(self):
        return self

    def next(self): # return next image object here or get a new page object if page_num
        if self.index == self.max_searches:
            raise StopIteration

        self.index = self.index + 1
        if self.index % 10 == 0: # if we need to go to the next google images page
            # get the next page
            self.cur_page_num = self.index
            url = 'http://www.google.com/search?q=%s&#038;start=%d' % (self.term, self.cur_page_num)
            page = get(url)
            self.searches.extend(self.parse_page(page))
            #self.stats_text = self.get_stats_text(page)t(url)

            #self.images.extend(searches) # add to existing list of images

            # do something here like get a new page &#038; attach more stuff to self.images
        return self.searches[self.index] # return next image!

COOKIEFILE = 'cookies.lwp'
# the path and filename to save your cookies in

cj = None
ClientCookie = None
cookielib = None

# Let's see if cookielib is available
try:
    import cookielib
except ImportError:
    # If importing cookielib fails
    # let's try ClientCookie
    try:
        import ClientCookie
    except ImportError:
        # ClientCookie isn't available either
        urlopen = urllib2.urlopen
        Request = urllib2.Request
    else:
        # imported ClientCookie
        urlopen = ClientCookie.urlopen
        Request = ClientCookie.Request
        cj = ClientCookie.LWPCookieJar()

else:
    # importing cookielib worked
    urlopen = urllib2.urlopen
    Request = urllib2.Request
    cj = cookielib.LWPCookieJar()
    # This is a subclass of FileCookieJar
    # that has useful load and save methods

if cj is not None:
# we successfully imported
# one of the two cookie handling modules

    if os.path.isfile(COOKIEFILE):
        # if we have a cookie file already saved
        # then load the cookies into the Cookie Jar
        cj.load(COOKIEFILE)

    # Now we need to get our Cookie Jar
    # installed in the opener;
    # for fetching URLs
    if cookielib is not None:
        # if we use cookielib
        # then we get the HTTPCookieProcessor
        # and install the opener in urllib2
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)

    else:
        # if we use ClientCookie
        # then we get the HTTPCookieProcessor
        # and install the opener in ClientCookie
        opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
        ClientCookie.install_opener(opener)

def get(url, txdata=None):
    try:
        # fake a user agent, some websites (like google) don't like automated exploration
        txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

        #txdata = None
        # if we were making a POST type request,
        # we could encode a dictionary of values here,
        # using urllib.urlencode(somedict)

        req = Request(url, txdata, txheaders)
        # create a request object

        handle = urlopen(req)
        # and open it to return a handle on the url

    except IOError, e:
        print 'We failed to open "%s".' % url
        if hasattr(e, 'code'):
            print 'We failed with error code - %s.' % e.code
        elif hasattr(e, 'reason'):
            print "The error object has the following 'reason' attribute :"
            print e.reason
            print "This usually means the server doesn't exist,"
            print "is down, or we don't have an internet connection."
        sys.exit()

    else:
        #print 'Here are the headers of the page :'
        #print handle.info()
        return handle.read()
        # handle.read() returns the page
        # handle.geturl() returns the true url of the page fetched
        # (in case urlopen has followed any redirects, which it sometimes does)

    #print
    if cj is None:
        print "We don't have a cookie library available - sorry."
        print "I can't show you any cookies."
    else:
        #print 'These are the cookies we have received so far :'
        for index, cookie in enumerate(cj):
            print index, '  :  ', cookie
        cj.save(COOKIEFILE)                     # save the cookies again
</pre>
<p>You&#8217;ll be able to scrape google search results. Same procedure as above except you&#8217;ll need <a href="http://codespeak.net/lxml/">lxml</a>. You can get this by typing easy_install lxml or whatever. It&#8217;s a very good library for scraping. It allows me to develop facebook scraping code quickly and efficiently <img src='http://seanneilan.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Place the code into something like gsearches.py &#038; do the following:</p>
<pre class="brush: python; gutter: false; toolbar: false">
import gsearches
for result in gsearches.search('asdf'):
    print result
</pre>
<p>Results have the following format:<br />
{&#8216;url&#8217;: &#8216;http://scipp.ucsc.edu/groups/babar/charm2007.ppt&#8217;, &#8216;desc&#8217;: &#8216;File Format: Microsoft Powerpoint &#8211; View as HTML1. D0-D0 Mixing at BaBar. Charm 2007 August, 2007. Abe Seiden. University of California at Santa Cruz. for. The BaBar Collaboration &#8230;scipp.ucsc.edu/groups/babar/charm2007.ppt &#8211; Similar&#8217;, &#8216;title&#8217;: &#8216;aSDf&#8217;}</p>
<p>Like I said, later on I&#8217;ll add some more features like using a proxy server or limiting the number of results. Perhaps I&#8217;ll release my facebook scraping code too at some point.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2009/10/07/scraping-google-images/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Switched Vim for Intellij</title>
		<link>http://seanneilan.com/2009/10/07/switched-vim-for-intellij/</link>
		<comments>http://seanneilan.com/2009/10/07/switched-vim-for-intellij/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 02:16:14 +0000</pubDate>
		<dc:creator>sneilan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[ide]]></category>
		<category><![CDATA[intellj]]></category>
		<category><![CDATA[vim]]></category>

		<guid isPermaLink="false">http://seanneilan.com/?p=41</guid>
		<description><![CDATA[I&#8217;ve used vim for a couple years now and I like it a lot. Except, it just doesn&#8217;t have the things I need like context aware code completion, automated refactoring and integrated debugging. Don&#8217;t tell me Vim can do those things. I know it doesn&#8217;t do context aware code completion &#38; automated refactoring. Maybe it [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve used vim for a couple years now and I like it a lot. Except, it just doesn&#8217;t have the things I need like context aware code completion, automated refactoring and integrated debugging. Don&#8217;t tell me Vim can do those things. I know it doesn&#8217;t do context aware code completion &amp; automated refactoring. Maybe it does integrated debugging with gdb or the like but debugging in an ide is easier.</p>
<p>Vim has lots of great ways to move text around &amp; rebind itself until you don&#8217;t even recognize that it&#8217;s Vim. These things aren&#8217;t very important. I just want my refactoring, code completion &amp; debugging. I suppose I could build these things in Vim myself, but, it&#8217;s just not worth it.</p>
<p>IntelliJ has all that &amp; it supports vim commands with <a href="http://ideavim.sourceforge.net/">IdeaVim</a>.</p>
<p>I think that&#8217;s a winrar right there.</p>
]]></content:encoded>
			<wfw:commentRss>http://seanneilan.com/2009/10/07/switched-vim-for-intellij/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
