Alex Beutel's Blog

June 12th, 2012 · 1 Comment

Recently my friend Josh Hammer has been sending a TON of tweets.  He is a staunch Republican and Mitt Romney supporter.  If you want to read every news story through the GOP lens, subscribe to his Twitter feed.  He is effectively a GOP Twitter bot.

In the past couple of weeks, his tweeting began to get out of control, and I wondered: at this rate, how many tweets will he send by Election Day?  Taking what I have learned from following his Twitter feed and reading conservative news, I took a small amount of data and irresponsibly extrapolated it.  Thus, Time Series Twitter was born.

Time Series Twitter is a small, simple web app – it simply loads the last 1000 tweets from a given user, graphs the time series plots for the tweets (by day), and does a linear regression on the tweets.  With the linear regression you can over extrapolate and find the expected number of tweets a given user will send some time in the future.

For Hammer, this is currently 57 Tweets per day by the Election, significantly more than other news sources: @FoxNews is expected to send 6.3 tweets/day, @ESPN is expected to send 10.9 tweets/day, and @MSNBC to send -36.2 tweets/day. (All of these measurements are at time of writing – they obviously change a lot.)

This web app is nothing novel (or accurate), but it has been fun to watch friends’ Twitter activity over time.  Give it a try.

Interactive Quadtrees and Well-Separated Pairs Decomposition

January 16th, 2012 · 4 Comments

While doing computational geometry research with Professor Pankaj K. Agarwal and Thomas Mølhave at Duke, we looked at using quadtrees and the well-separated pairs decomposition (WSPD) for one of our algorithms. As I was reading about the algorithms and geometric concepts, I decided to code them in Javascript and HTML5’s canvas to get more intuition about their behavior. I will describe the basics of the math behind the geometry in this blog post. To just play with the demo go here.

[All of what I am describing is based off of my reading of Professor Sariel Har-Peled’s book on the subject. I suggest you read that (or other online notes) if you are really interested in the mathematical properties of well-separated pairs decomposition (or a range of other geometric topics).]

Quadtrees are a fairly common geometric concept and also data structure for storing points in $$d$$-dimensional space. We assume all points are within some large cube or in 2-dimensions, as we will focus on, a square. We split the square into four equal sized smaller squares. We can do this again on each of the smaller squares and do this continuously (recursively). We consider each square to be a cell and the four smaller cells within it to be its “children.” A quadtree is a division of the square into smaller and smaller cells until every point has its own cell.

Therefore, we can search the space fairly easily and quickly. We can even plot non-spacial things in a spacial metric to make use of the data structure. For computer science-minded people, these cells are organized in a tree (why its called a quadtree), compressed for cells that don’t contain any points, and thus each leaf node contains one point. Searching the quadtree is thus fairly fast. The properties and uses of quadtrees, and its many variants, such as how accurate it is for nearest neighbor searches, are extensive and, again, can be read in Professor Har-Peled’s notes among other places.

Well-Separated Pairs Decomposition

The well-separated pairs decomposition provides a more specific method than quadtrees of characterizing the general distance between points in a large set of points without storing the distance for each of the $${n \choose 2}$$ combinations. Broadly, it does this by clustering points that are close together and only describing the distance between such clusters. The conditions for a WSPD are relatively simple, but require some mathematical notation. I will try to give a simple description below.

We say that the points are clustered into sets. The diameter of a set $$Q$$ is $$\mathrm{diam}(Q) = \max_{p,q \in Q} \| p – q \|$$ where $$\|p-q\|$$ is the distance between $$p$$ and $$q$$. In English, the diameter is the distance between the two points furthest from each other in the set. Two sets $$Q$$ and $$R$$ are $$1/\epsilon$$-separated if $$\max({\rm diam}(Q),{\rm diam}(R)) \leq \epsilon \cdot d(Q,R).$$Here $$d(Q,R) = \min_{q\in Q, s\in R} \|q-s\|,$$or $$d(Q,R)$$ is the smallest distance between any pair of points from $$Q$$ and $$R$$, intuitively giving how close the two clusters are. Therefore, if two sets are $$1/\epsilon$$-separated then all of the points in one set are roughly the same distance from all of the same points in the other set, with some error relative to $$\epsilon$$. Finally, a pair decomposition is a set of set pairs: $$\mathscr{W} = \{ \{A_1,B_1\}, \ldots \{A_s, B_s\} \}$$ such that:

1. All points in any $$A_i$$ or $$B_i$$ are part of the original set of data points, $$P$$: $$A_i,B_i \subset P$$
2. For any pair $$\{A_i,B_i\}$$ there is no overlap in points between the two sets $$A_i \cap B_i = \emptyset$$
3. For any pair of points $$p,q \in P$$ there is at least one pair $$\{A_i,B_i\}$$ such that $$p$$ is in $$A_i$$ and $$q$$ is in $$B_i$$

Therefore, if we have a well-separated pair decomposition, then we have a pair decomposition where each pair of sets $$\{A_i,B_i\}$$ are $$1/\epsilon$$-separated. The third criteria for a pair decomposition here is most interesting, simply stating that for any pair of points we can estimate their distance apart with the WSPD.

One algorithm to construct the WSPD climbs down the quadtree and uses the cells of the quadtree as possible set pairs in the WSPD. This is the algorithm I used in my implementation, as is shown in the Javascript [pseudocode] below:

?View Code JAVASCRIPT
 function ConstructWSPD(cell1, cell2) { var arr = new Array(); if(cell1.diam() < cell2.diam()) { // Swap cell1 and cell2 var temp = cell1; cell1 = cell2; cell2 = temp; } if(cell1.diam() < epsilon * d(cell1,cell2)) { //The two cell's points are 1/epsilon-separated   // This is a theoretical class SetPair which would link the two sets arr.push(new SetPair(cell1.points, cell2.points));   //My code actually uses the following statement to draw the relationship, rather than store it efficiently arr.push(new Dumbell( new PBall(cell1.points), new PBall(cell2.points) ));   return arr; } if(cell1.children != null) { for(var i = 0; i < 4; i++ ) { arr = arr.concat(ConstructWSPD(cell1.children[i], cell2)); } } return arr; } // Here QuadTreeHead is the top cell in the quadtree: var wspd = ConstructWSPD(QuadTreeHead,QuadTreeHead);

The code above recursively steps through the quadtree attempting to use different cells from the quadtree as sets for the WSPD. Check out my small web application here to get some more intuition about how WSPDs relate to quadtrees as well as the impact of $$\epsilon$$ on the WSPD. There are many interesting properties and uses of quadtrees and WSPDs that I encourage you to read about further online or in Professor Har-Peled’s book.

On a completely separate note, this blog post was my first chance to use MathJax, which is awesome.

Announcing TerraNNI

November 9th, 2011 · No Comments

I am excited to announce that TerraNNI, the project from my undergraduate senior thesis, is being open-sourced.

For those who have not heard of the project, TerraNNI is a tool for taking LIDAR data, common for geographic information systems (GIS), and creating high-resolution grid digital elevation models (DEMs).  Most GIS take DEMs as an input and can perform a wide range of tasks, such as flood mapping, line-of-sight computations, and city planning.  The amount of LIDAR data available is growing quite fast.  TerraNNI makes full use of this large quantity of data to efficiently produce large-scale, high-resolution DEMs.

In most of these cases, LIDAR data is 3D spatial data and we produced a 2D grid DEM where each grid point has some estimated elevation.  Looking forward, we are beginning to see the rise of  spatial-temporal data (4D) which offers the opportunity to perform more complex geographic studies such as change detection on terrains.  Making use of new spatial-temporal data, TerraNNI provides efficient, scalable, high-resolution volumetric grid DEM construction.  In this case, we produce a 3D grid, with axes x,y in space and t time; again each grid point has an estimated elevation.  We expect this tool to become increasingly useful as LIDAR data becomes even more prevalent.

On a technical level, TerraNNI was an interesting project, incorporating computational geometry, GPU programming, and I/O efficient computing.  The program performs an approximation of natural neighbor interpolation (NNI), which is based on the Voronoi diagram.  It makes heavy use of the graphics card for most of the computations, using both OpenGL and CUDA.  We also use TPIE for efficient disk access, since data sets can be much larger than the computer’s memory.  TerraNNI is one of the fastest programs for producing high-resolution grid DEMs and one of the first to be able to perform 3D natural neighbor interpolation.

Please download TerraNNI from Github and give it a try for yourself.  Let me know if you run into any problems.  Also feel free to look at our two papers about the project

or at my previous blog post about computing Voronoi diagrams on the GPU.

→ No CommentsTags: C++ · Duke · OpenGL

Interactive Voronoi Diagrams with WebGL

August 2nd, 2010 · 19 Comments

Within computational geometry, the Voronoi diagram is a relatively simple concept with a wide-range of applications.  Everything from physics research to the “snap-to” feature in GUI-design uses Voronoi diagrams as a simple underlying data structure to decompose space. I have been working with Voronoi diagrams for the past few months, as my research has focused on interpolation, specifically scalable natural neighbor interpolation. Because the interpolation I have been doing (and the demo I have created) focuses on Voronoi diagrams in a plane, I will only discuss the Voronoi diagrams in 2D space. However, the mathematical concepts can be extended to higher dimensions.
If you already know all about Voronoi diagrams or simply don’t care about the math and just want to play, jump to here or go straight to my Voronoi diagram generator application.

The Voronoi Diagram

I would first like to briefly describe what precisely the Voronoi diagram is, along with how nearest neighbor and natural neighbor interpolation make use of this data structure. The Voronoi diagram is the division of plane into discrete Voronoi cells. Given a set of input points S, we would like to divide the plane such that for any given position in the plane, we know the nearest input point. This is the nearest neighbor problem. A Voronoi cell Vor(p,S) represents the region of the plane for which the given point p is the closest input point from the set of input points S. Mathematically the Voronoi cell Vor(p,S) is defined as In the Voronoi diagram below, we color each Voronoi cell with a unique color. For the entire region covered by each cell, the input point within the cell is the closest point from the set of input points. You can also easily observe that the boundaries between neighboring cells is the set of points equidistant between two nearby input points. Interpolation with the Voronoi Diagram

Given this understanding of the Voronoi diagram, we can easily answer a nearest neighbor query. Given a point q at position (x,y) we need only check which Voronoi cell it is within to see which input point is closest to it. Thus, if each input point also has some height z we could roughly interpolate the height of q by taking the height of its nearest neighbor.
A more complex interpolation scheme, known as natural neighbor interpolation, is to take a weighted average of all of the nearby input points. This is done by inserting the query point into the Voronoi diagram and observing how much area the Voronoi cell of the query point steals from the neighboring Voronoi cells in the original Voronoi diagram. Specifically, the fractional weight assigned to each natural neighbor is the area stolen from that nearest neighbor’s original Voronoi cell divided by the total area of the Voronoi cell of the query point. Mathematically this interpolation can be defined as: While more complicated than nearest neighbor interpolation, natural neighbor interpolation offers a higher quality interpolation and produces smooth results.

Approximating the Voronoi Diagram in OpenGL

As explained briefly in the OpenGL Red Book, we can compute the Voronoi diagram easily in OpenGL by rendering cones at each input point and using the depth buffer to figure out the boundaries between Voronoi cells. Specifically, if we place a right angle cone with the apex at a point, the height of the cone at a given position is equal to the distance form the point in the x,y plane. As such, if we place all the cones in the plane and look at them from below, we will only see the lowest cone at any given point and thus only the cone for which the point it represents is closest. This is known as taking the lower envelope. (For more information, you can read Hoff et. al.’s paper on how this works.)
Therefore, by merely drawing the cones and placing the viewpoint appropriately within OpenGL, the framebuffer will store the Voronoi diagram. Naturally, when doing computations with OpenGL, the diagram will be discretized into pixels. Additionally, OpenGL can not draw cones, only triangles. Therefore, we approximate a cone with a f-sided pyramid. With a reasonably large value for f, we will not notice the effect of the discretization, but making f a small value like 4-10 can be interesting. Both of these discretizations of course create some deviation from the previously defined Voronoi diagram.
I have used this method, implemented in WebGL, the Javascript API to OpenGL, to create the following interactive Voronoi diagram generator.

Enough Math

While math is necessary to precisely describe these concepts, it is not always the best way to convey some of the nuances of the geometric behavior. As a result, one night, both to play with WebGL and to better understand Voronoi diagrams, I implemented some of these concepts in the browser. Below is a short screencast of me playing with the tool, which may be useful if either you don’t have a WebGL-enabled browser or you want to learn some of the features of tool; or just play with it for yourself.

On Code

Not much to say here. To make this small application, I used two canvases overlaid, one with WebGL and one with just the 2D-context. This allowed me to draw over the Voronoi diagram (such as the points). Following the learningwebgl.com tutorials, I used the Sylvester library for Matrix math as well as the glUtils library from the tutorials.
WebGL is still a little buggy and memory management can be an issue, among other things. But overall, it samples a good portion of OpenGL. Unfortunately, getting started can take a little effort. For example, unlike OpenGL in C++, you must write your own shader programs; I suggest using samples from learningwebgl.com at least to get started. However, once you have a functioning WebGL canvas, it is easy to pick up speed. I also went slightly overboard on the movement functions, but it was a good chance to play with Javascript closures.

Some cool Voronoi diagrams

And finally, here are some Voronoi diagrams I have generated that I find interesting, cool, or merely pretty.
Similar kinetic Voronoi diagram but with only 6-sided pyramids used.
Below, the same set of input points are plotted with two different types of cones. The first uses a 50-sided pyramid. The second only uses 6-sided pyramids, creating the jagged effect between neigboring Voronoi cells.  The following two Voronoi diagrams show the Voronoi cells and area stolen for a natural neighbor query.  CMake, OpenGL, and GLX on OS X

July 28th, 2010 · 4 Comments

As you may know, I am staying at Duke this summer for research.  My research, which I started last semester and am going to be doing for a while, focuses on scalable algorithms.  While that is pretty broad, I am currently working on using the graphics card to speed up processing. Again, here are some technical notes on code.

In doing my work I have come to have to write and run C++ OpenGL code on both OS X and Ubuntu.  To make this easier, CMake has been great for both  having readable make files and for being able to keep the files consistent across both platforms.  Additionally, while GLUT is great for cross-platform OpenGL coding, I needed more control over the environment. Being able to run X11 on OS X has allowed me to test GLX natively without always testing on Linux.  However, to get this set up on OS X without XCode can be a little tricky.  So, two snippets of CMake files for using OpenGL on OS X and Linux.

If you want to run OpenGL with the standard OS X libraries, CMake generally finds the correct frameworks, but it can’t hurt to specify them more precisely.  A basic example of the CMakeLists is shown below:

IF(APPLE)
INCLUDE_DIRECTORIES ( /System/Library/Frameworks )
FIND_LIBRARY(COCOA_LIBRARY Cocoa)
FIND_LIBRARY(GLUT_LIBRARY GLUT )
FIND_LIBRARY(OpenGL_LIBRARY OpenGL )
GLUT_LIBRARY
OpenGL_LIBRARY)
SET(EXTRA_LIBS ${COCOA_LIBRARY}${GLUT_LIBRARY} ${OpenGL_LIBRARY}) ENDIF (APPLE) target_link_libraries(main${EXTRA_LIBS})


With this, your C++ should have the following includes:

 #include #include #include #include 

If, however, you are looking to run your code also on a Linux machine or simply want to use GLX rather than CGL, you will need to have X11 installed (which your OS X installation disk contains) and specifically point CMake at the X11 libraries.  The CMake code for this is shown below:

include_directories(/usr/X11R6/include/)

 #include #include #include #include #include #include