I found a nice little time saving device when testing different numbers of nearest neighbour weights in the excellent PySAL library in python, so I thought I’d share it.
Basically I wanted to test a number of different values of k when choosing a weighting scheme for spatial smoothing using nearest neighbours, but I didn’t want to have to continually recalculate the weight’s matrix for different values of k. Here’s what I did:
- Calculate a k nearest neighbour vector for a high value of k, this should be the same size, or larger than what you anticipate as being your maximum value of k.
- Store this table in a database, the database is useful as for large values of k for a large set of data you create k x n rows. Databases are optimised to store and query this amount of data in a way that text files aren’t!
- Create the weights matrix in PySAL from first principles: grab all the data from the database and order it by the ‘from’ neighbour id, and then the weights of the ‘to’ neighbours, Create the weights matrix as specified in the PySAL docs on weighting, but only use as many observations as you want to test by slicing the input matrixes.
Here is my code showing how this works:
import _mysql #Library that connects to Mysql
from pysal import W #Weights part of pysal
# Important - Database connection!
db = _mysql.connect(host="localhost",user="root",passwd="",db="data")
# Now Create a Spatial Weights Object
# These first 4 lines pull in the weights data from my MySQL database
# and store it as a list of tuples called 'resultWeight'
getWeights = "select * from `spatialweight` order by `polyID`,`weight`"
db.query(getWeights)
r = db.store_result()
resultWeight = r.fetch_row(maxrows=0)
nList = [] # Empty list for neighbours
wList = [] # Empty list for weights
neighbours = {} # Empty dictionary to hold neighbours
weights = {} # Empty dictionary to hold weights
# Now iterate through the results to form dictionaries with lists of
# neighbours and weights for each relevant point in the dataset
recNum = 1
while recNum < (len(resultWeight)+1):
# append data from resultWeights until the limit
# for k is reached for each point
nList.append(int(resultWeight[recNum-1][1]))
wList.append(float(resultWeight[recNum-1][2]))
if recNum % 50 == 0: # 50 as I used a maximum value of k = 50
# Slice nList and wList depending on the value of k to test
neighbours[int(resultWeight[recNum-1][0])] = nList[0:20]
weights[int(resultWeight[recNum-1][0])] = wList[0:20]
# Reset nList and wList for the next point in the weights data.
nList = []
wList = []
recNum += 1
# Now simply create the weights matrix for the value of k specified.
w = W(neighbours,weights)
#the order the matrix for use later.
if not w.id_order_set:
w.id_order = range(1,n)
# Where n = number of observations in the dataset (assuming
# point IDs are sequential integers starting at 1.)
~ End Article and Begin Conversation ~
There are no comments yet...
~ Now It's Your Turn ~