2010
02.01

Fuck Art

Whenever I’m afraid or anxious, I want to go back to doing what I live to do. For a long time, I thought that was art.

I’m always programming computers and thus I don’t get to paint as much as I wanted. Last night, I realized that art is not my passion. It’s computer science. Computer Science is what I love to do.

2010
01.22

I just biked to Aldis and got something like $50 worth of groceries for like $22. That was awesome. Even with their supposed price cuts, Dominicks & Jewel just can’t beat Aldi’s.

2010
01.09

It took me so long to understand how to command an organization. I noticed problems in the Computer Science Society’s organizational structure from day one. I was doing too much and the officers were doing too little. People brought this up to me and I knew. And like any good software engineer (or one who as aspirations of becoming among the best) I asked questions to see what was wrong. Normally, I can figure out how how problems are my fault and work to resolve myself and then the world around me. But, it took me a couple quarters to figure out what an event coordinator should do. She or He should reserve rooms, figure out how much things will cost and tell people what gruntwork needs to be done. A marketing coordinator should make posters, emails & help people get the word out about CSS. The vice president should be able to make decisions in my absence. The treasurer should keep track of the expenses, donations and who is owed what. The activities coordinator should send emails at people to keep ‘em in line so I don’t have to.

As an engineer managing an organization, I simply didn’t know what questions to ask and if I did, I didn’t understand how to direct people into doing what they need to do. It was overwhelming before.

There are people to do work like putting up posters, making equipment and managing projects. These are not business related positions.

It’s all so clear now. Before, I kept beating myself up about my preconcieved notions of self laziness. Anxiety and nervousness crept in and so I learned how to maintain sanity in all situations. I learned this the hard way actually impulsively taking a grad level course and then getting so nervous about it that I failed two other courses. Had I prevented myself from worrying about all those things I was keeping track of (alot of which should’ve been delegated to officers), I might not have failed those courses. I have changed though. I had a life changing experience a couple days ago (I did something fun ;) and saw how worried I was all the way from the subconscious to the conscious levels of thought. I began thinking as a rational person again so slowly that I was able to observe every single thought as it passed through my mind. I could see the worries and how they sprung up and what they told me.

I now have a calmness for everything. I know what worries me and have no reason to worry about it.

And now at last, we have restructured the organization. People who could not do their jobs for whatever reason are not in those jobs any longer. The organization is as follows:

President: Sean Neilan
Vice President: Chris Miree
Treasurer: Igor Kovalishin
Event Coordinator: Mark Lairs
Marketing Coordinator: Devon Blandin
Activities Coordinator: Corey Ladd
Project Manager: Derek Schaefer

2009
12.16

Things have been a bit slow in a past couple days as my friend Derek & I have been building an android application over winter break.

I have added the arbitrator, controller, buildmanager and constructionmanager classes to the project. The constructionmanager class is halfway done & the other three classes haven’t been tested. I need to complete the holy trinity of construction, production and morph manager to test them properly. Testing of course will probably take longer than it took to write, but, hey, by January 1st, we should have this whole pybwsal thing done.

2009
12.12

My friend Derek Schaefer & I have started a project called pybwsal for the starcraft ai competition. Bwsal is a set of utility functions to making programming starcraft ai’s in bwapi (which is a set of utility functions in c++ for programming starcraft ai’s) easier. :)

So far, the building placer has been implemented & I was working on implementing the building manager but unfortunately ran into issues yesterday. See http://code.google.com/p/pybw/issues/detail?id=20

In the meantime, I will work on programming the bid system as Chad Retz (and perhaps others) from the bwapi-jbridge project is interested in that.

You can keep up to date on pybwsal by subscribing checking back to http://seanneilan.com/tag/pybwsal/ every once in a while.

2009
12.08

My goodness. What a massive loss to the community. Thousands and thousands of country music songs have been lost from youtube thanks to copyright claims. I began to like country music too and now there is simply no way to get any of it without going to the library? I mean, never in my life would I pay for anything in this world that might be gotten for free on the Internet. (Library costs money! Gas!)

And if you care, here is his old bio:

Old as dirt. retired. I am a Christian but I am also a mortal man. Not perfect, born in sin so I believe I must rely on something more powerful than man to guide and help me during my bad times. I want to do good when I am able and avoid doing bad . That is my goal although it is not possible to achieve all the time.

Maybe I can troll thepiratebay.org for Corrina Cordwell, Slim Whitman, Jim Reeves, Merle Haggard, Merle Travis, Ernie Freeman, Kitty Wells, Marty Robbins, Highway 101, Eddy Arnold, Billy Walker, Frank Ifield, Tom T hall, Gene Autry, Roy Rodgers, Johnny Cash and Johnny Horton.

2009
11.13

My Goal

I love to draw but can’t always do it. I must not allow this part of me to fade away. One day I will make enough $$ to paint and draw until the end of my days. Then, my mind will flourish. This is my goal. I may never even be that great. That’s OK. I know what I want to go onto a sheet of paper. God wills it.

I put whatever I have right now in my dorm on the wall. It doesn’t matter whether it’s bad or good. It probably isn’t very good. I don’t care.

DSC01529 DSC01530 DSC01532 DSC01534 DSC01531 DSC01533

And from there, expand and combine computer science with art, learn how to draw nude portraits, make scenes, learn more about aesthetics, learn from artists like Adolf Wölfli, Howard Finster and Jean Michel Basquiat. I’ll keep painting and drawing until I become lost in what I’ve made and then have to find my way out again.

2009
10.07

love this song.

2009
10.07

Scraping google images

Here is some rudimentary code for scraping google images. Later on, I’ll add features for search options & perhaps limiting the number of images this thing downloads. Leave a comment below if your interested in these features or if this is useful to you.

Put this code into a file like gimages.py

import urllib
import urllib2
import os
import sys
#import pdb

import demjson # a powerful python json decoder/encoder. Necessary for decoding garbled google JSON output within a reasonable development time frame

############################
# The meat of the project
############################
def search(term):
    """returns a results object for getting images."""
    return results(term)

class results:
    """iterable list of image results.
    here are the properties of each image
    [0] # google images page for image
    [1] # unknown
    [2] # unknown
    [3] # image url
    [4] # google images thumbnail width
    [5] # google images thumbnail height
    [6] # title text describing relevance of image to query somehow. Not sure of ruleset for this
    [7] # unknown
    [8] # unknown
    [9] # dimensions & size
    [10] # filetype
    [11] # original domain
    [12] # unknown
    [13] # unknown
    [14] # server that url contained in [0] resides on
    [15] # unknown
    [16] # unknown
    [17] # unknown

    still unknown as how to get the alt text stored in the image without having to visit actual page.

    results.stats_text will give you some html containing the time it took to load the page, the total images retrieved, etc.
    searching by size, type & color will come later"""

    def __init__(self, term):
        self.term = urllib.quote(term) # the original search term
        self.index = 1 # which image we are returning
        self.images = [] # stash/return images (from) here
        #self.curPageObj = {} # stores json object containing list of images
        self.cur_page_num = 1 # page we are on in google images
        self.max_images = 1000 # only retrieve first 1000 images due to restrictions placed by google. Google says it can find hundreds of millions of images, but, it will only return the first 1000 results. Such a crude example of unnoticed false advertising, IMHO.
        # retrieve initial images
        url = 'http://images.google.com/images?hl=en&q=%s&ijn=page&start=%d' % (self.term, self.cur_page_num) # Google uses an internal json api to retrieve images :)  Yup.
        page = get(url)
        page = page.replace('/*', '')
        page = page.replace('*/', '')
        page = demjson.decode(page)
        self.images.extend(page['images'])
        self.stats_text = page['sd']

    def __iter__(self):
        return self

    def next(self): # return next image object here or get a new page object if page_num
        if self.index == self.max_images:
            raise StopIteration

        if self.index % 18 == 0: # if we need to go to the next google images page
            # get the next page
            self.cur_page_num = self.index
            url = 'http://images.google.com/images?hl=en&q=%s&ijn=page&start=%d' % (self.term, self.cur_page_num)
            page = get(url)
            page = page.replace('/*', '')
            page = page.replace('*/', '')
            page = demjson.decode(page)

            self.images.extend(page['images']) # add to existing list of images

            # do something here like get a new page & attach more stuff to self.images
        self.index = self.index + 1
        return self.images[self.index] # return next image!

# @TODO Make gimages.get(url) keep trying if the server says it's down.
# @TODO Add support for searching by size, type & color.
# @TODO In results object, extract total number of images retrieved & other stats out of HTML, rather than make user do that. Same with each image: get the width, height & size out of image for the user.
# @TODO Make cookie file optional incase script is being run from a read only directory.

# This is some boilerplate code for using urllib with cookies
# at the end, we get a nice get(url) function that has its own cookie file

COOKIEFILE = 'cookies.lwp'
# the path and filename to save your cookies in

cj = None
ClientCookie = None
cookielib = None

# Let's see if cookielib is available
try:
    import cookielib
except ImportError:
    # If importing cookielib fails
    # let's try ClientCookie
    try:
        import ClientCookie
    except ImportError:
        # ClientCookie isn't available either
        urlopen = urllib2.urlopen
        Request = urllib2.Request
    else:
        # imported ClientCookie
        urlopen = ClientCookie.urlopen
        Request = ClientCookie.Request
        cj = ClientCookie.LWPCookieJar()

else:
    # importing cookielib worked
    urlopen = urllib2.urlopen
    Request = urllib2.Request
    cj = cookielib.LWPCookieJar()
    # This is a subclass of FileCookieJar
    # that has useful load and save methods

if cj is not None:
# we successfully imported
# one of the two cookie handling modules

    if os.path.isfile(COOKIEFILE):
        # if we have a cookie file already saved
        # then load the cookies into the Cookie Jar
        cj.load(COOKIEFILE)

    # Now we need to get our Cookie Jar
    # installed in the opener;
    # for fetching URLs
    if cookielib is not None:
        # if we use cookielib
        # then we get the HTTPCookieProcessor
        # and install the opener in urllib2
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)

    else:
        # if we use ClientCookie
        # then we get the HTTPCookieProcessor
        # and install the opener in ClientCookie
        opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
        ClientCookie.install_opener(opener)

def get(url, txdata=None):
    try:
        # fake a user agent, some websites (like google) don't like automated exploration
        txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

        #txdata = None
        # if we were making a POST type request,
        # we could encode a dictionary of values here,
        # using urllib.urlencode(somedict)

        req = Request(url, txdata, txheaders)
        # create a request object

        handle = urlopen(req)
        # and open it to return a handle on the url

    except IOError, e:
        print 'We failed to open "%s".' % url
        if hasattr(e, 'code'):
            print 'We failed with error code - %s.' % e.code
        elif hasattr(e, 'reason'):
            print "The error object has the following 'reason' attribute :"
            print e.reason
            print "This usually means the server doesn't exist,"
            print "is down, or we don't have an internet connection."
        sys.exit()

    else:
        #print 'Here are the headers of the page :'
        #print handle.info()
        return handle.read()
        # handle.read() returns the page
        # handle.geturl() returns the true url of the page fetched
        # (in case urlopen has followed any redirects, which it sometimes does)

    #print
    if cj is None:
        print "We don't have a cookie library available - sorry."
        print "I can't show you any cookies."
    else:
        #print 'These are the cookies we have received so far :'
        for index, cookie in enumerate(cj):
            print index, '  :  ', cookie
        cj.save(COOKIEFILE)                     # save the cookies again

Then go like this:

import gimages
results = gimages.search('asdf')
for image in results:
    print image[3]

Each image row has these family values:

    [0] # google images page for image
    [1] # unknown
    [2] # unknown
    [3] # image url
    [4] # google images thumbnail width
    [5] # google images thumbnail height
    [6] # title text describing relevance of image to query somehow. Not sure of ruleset for this
    [7] # unknown
    [8] # unknown
    [9] # dimensions & size
    [10] # filetype
    [11] # original domain
    [12] # unknown
    [13] # unknown
    [14] # server that url contained in [0] resides on
    [15] # unknown
    [16] # unknown
    [17] # unknown

You’ll also have to download Demjson. Demjson is a really good json library & it’s necessary to parse google’s weird internal json api.

Scraping Google Search

With this code:

import urllib
import urllib2
import os
import sys
#from xml.etree.ElementTree import ElementTree
#from xml.etree.ElementTree import XMLTreeBuilder
import lxml.html
import pdb

############################
# The meat of the project
############################
def search(term):
    """returns a results object for getting searches."""
    return results(term)

class results:
    def parse_page(self, page):
        tree = lxml.html.document_fromstring(page)
        searches = []
        for i in tree.find_class('g'):
            temp = {}
            temp['url'] = i.find_class('l')[0].get('href')
            temp['title'] = i.find_class('l')[0].text_content()
            temp['desc'] = i.find_class('s')[0].text_content()
            searches.append(temp)
        return searches

    def get_stats_text(self, page):
        tree = lxml.html.document_fromstring(page)
        stats_text = tree.get_element_by_id('ssb').text_content()
        return stats_text

    def __init__(self, term):
        self.term = urllib.quote(term) # the original search term
        self.index = 1 # which image we are returning
        self.searches = [] # stash/return images (from) here
        #self.curPageObj = {} # stores json object containing list of images
        self.cur_page_num = 1 # page we are on in google images
        self.max_searches = 1000 # only retrieve first 1000 images due to restrictions placed by google. Google says it can find hundreds of millions of images, but, it will only return the first 1000 results. Such a crude example of unnoticed false advertising, IMHO.
        # retrieve initial images
        url = 'http://www.google.com/search?q=%s&start=%d' % (self.term, self.cur_page_num) # Google uses an internal json api to retrieve images :)  Yup.
        page = get(url)
        "/html/body/div[2]/div/p" # search info
        "/html/body/div[2]/div[3]/div/ol" # array of searches
        #tree = ElementTree()
        #root = tree.fromstring(page)
        #pdb.set_trace()
        self.searches.extend(self.parse_page(page))
        self.stats_text = self.get_stats_text(page)

        # build logic here
        #page = page.replace('/*', '')
        #page = page.replace('*/', '')
        #page = demjson.decode(page)
        #self.images.extend(page['images'])
        #self.stats_text = page['sd']

    def __iter__(self):
        return self

    def next(self): # return next image object here or get a new page object if page_num
        if self.index == self.max_searches:
            raise StopIteration

        self.index = self.index + 1
        if self.index % 10 == 0: # if we need to go to the next google images page
            # get the next page
            self.cur_page_num = self.index
            url = 'http://www.google.com/search?q=%s&start=%d' % (self.term, self.cur_page_num)
            page = get(url)
            self.searches.extend(self.parse_page(page))
            #self.stats_text = self.get_stats_text(page)t(url)

            #self.images.extend(searches) # add to existing list of images

            # do something here like get a new page & attach more stuff to self.images
        return self.searches[self.index] # return next image!

COOKIEFILE = 'cookies.lwp'
# the path and filename to save your cookies in

cj = None
ClientCookie = None
cookielib = None

# Let's see if cookielib is available
try:
    import cookielib
except ImportError:
    # If importing cookielib fails
    # let's try ClientCookie
    try:
        import ClientCookie
    except ImportError:
        # ClientCookie isn't available either
        urlopen = urllib2.urlopen
        Request = urllib2.Request
    else:
        # imported ClientCookie
        urlopen = ClientCookie.urlopen
        Request = ClientCookie.Request
        cj = ClientCookie.LWPCookieJar()

else:
    # importing cookielib worked
    urlopen = urllib2.urlopen
    Request = urllib2.Request
    cj = cookielib.LWPCookieJar()
    # This is a subclass of FileCookieJar
    # that has useful load and save methods

if cj is not None:
# we successfully imported
# one of the two cookie handling modules

    if os.path.isfile(COOKIEFILE):
        # if we have a cookie file already saved
        # then load the cookies into the Cookie Jar
        cj.load(COOKIEFILE)

    # Now we need to get our Cookie Jar
    # installed in the opener;
    # for fetching URLs
    if cookielib is not None:
        # if we use cookielib
        # then we get the HTTPCookieProcessor
        # and install the opener in urllib2
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)

    else:
        # if we use ClientCookie
        # then we get the HTTPCookieProcessor
        # and install the opener in ClientCookie
        opener = ClientCookie.build_opener(ClientCookie.HTTPCookieProcessor(cj))
        ClientCookie.install_opener(opener)

def get(url, txdata=None):
    try:
        # fake a user agent, some websites (like google) don't like automated exploration
        txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}

        #txdata = None
        # if we were making a POST type request,
        # we could encode a dictionary of values here,
        # using urllib.urlencode(somedict)

        req = Request(url, txdata, txheaders)
        # create a request object

        handle = urlopen(req)
        # and open it to return a handle on the url

    except IOError, e:
        print 'We failed to open "%s".' % url
        if hasattr(e, 'code'):
            print 'We failed with error code - %s.' % e.code
        elif hasattr(e, 'reason'):
            print "The error object has the following 'reason' attribute :"
            print e.reason
            print "This usually means the server doesn't exist,"
            print "is down, or we don't have an internet connection."
        sys.exit()

    else:
        #print 'Here are the headers of the page :'
        #print handle.info()
        return handle.read()
        # handle.read() returns the page
        # handle.geturl() returns the true url of the page fetched
        # (in case urlopen has followed any redirects, which it sometimes does)

    #print
    if cj is None:
        print "We don't have a cookie library available - sorry."
        print "I can't show you any cookies."
    else:
        #print 'These are the cookies we have received so far :'
        for index, cookie in enumerate(cj):
            print index, '  :  ', cookie
        cj.save(COOKIEFILE)                     # save the cookies again

You’ll be able to scrape google search results. Same procedure as above except you’ll need lxml. You can get this by typing easy_install lxml or whatever. It’s a very good library for scraping. It allows me to develop facebook scraping code quickly and efficiently ;) Place the code into something like gsearches.py & do the following:

import gsearches
for result in gsearches.search('asdf'):
    print result

Results have the following format:
{‘url’: ‘http://scipp.ucsc.edu/groups/babar/charm2007.ppt’, ‘desc’: ‘File Format: Microsoft Powerpoint – View as HTML1. D0-D0 Mixing at BaBar. Charm 2007 August, 2007. Abe Seiden. University of California at Santa Cruz. for. The BaBar Collaboration …scipp.ucsc.edu/groups/babar/charm2007.ppt – Similar’, ‘title’: ‘aSDf’}

Like I said, later on I’ll add some more features like using a proxy server or limiting the number of results. Perhaps I’ll release my facebook scraping code too at some point.

2009
10.07

I’ve used vim for a couple years now and I like it a lot. Except, it just doesn’t have the things I need like context aware code completion, automated refactoring and integrated debugging. Don’t tell me Vim can do those things. I know it doesn’t do context aware code completion & automated refactoring. Maybe it does integrated debugging with gdb or the like but debugging in an ide is easier.

Vim has lots of great ways to move text around & rebind itself until you don’t even recognize that it’s Vim. These things aren’t very important. I just want my refactoring, code completion & debugging. I suppose I could build these things in Vim myself, but, it’s just not worth it.

IntelliJ has all that & it supports vim commands with IdeaVim.

I think that’s a winrar right there.

Sean Neilan is Digg proof thanks to caching by WP Super Cache