This is not a how-to article. This is more the story of how I found out why making even a simple map is so darn complex, and why ultimately I plan to rebuild the map I made. Going from an idea to an end product is no simple task. In this case, my idea concerns the distribution of WPR pledges throughout the state of Wisconsin by zip code. I am curious about which areas of the state (and country) produce the most revenue. I'm not exactly sure what I will do with this information, although I have a few ideas.
After some failed attempts to create a map, I called my brother, a graduate student in GIS, and he pointed me to a tool called gvSIG. While I try not to make a habit of complaining about open source software, I feel I must mention that this particular distribution of gvSIG had a lot of bugs.
To help create a map of the state of Wisconsin, my brother pointed me to the census site where one can download shapefiles. This was a great discovery for me, as these files enabled me to represent the shapes of each zip code.
As for the data, I had no difficulty pulling pledge and zip code information from our membership database with SQL. Connecting my ZIP code data with the state ZIP code data (shapfiles) I downloaded and imported in gvSIG, on the other hand, felt like banging my head against the wall. I became reacquainted with DBF files, one of the two accepted file formats in gvSIG, as far as I could tell. I tried to find a way to convert a text file into a DBF file using a variety of different text editors, Microsoft Excel (which no longer supports DBF, apparently), Microsoft Access, SQL Server Management Studio, OpenOffice, and other programs I downloaded from the web. After dozens of failed attempts, my brother figured it out for me on his Windows machine. And I hope to avoid DBF files in future projects.
DBF files seemed to be a better solution than importing the data in the other accepted format in gvSIG: CSV. My problems with this format was not just figuring out the separator (gvSIG uses a semicolon as a separator instead of a comma, probably becuase the comma is used as the decimal separator in Spanish). The problem for me was getting gvSIG to recognize each zip code's donation total as a number rather than a string. I wanted to display the donation totals for each zip code in color-coded categories and the string “$2531” would not fit into a numerical category of $2000-$4000. I hate to sound like I am trashing on gvSIG, because I know in this case the user (me), not the software, could very well be to blame. The bottom line: I found data importing and manipulation to be very tricky (or at least unfamiliar) in gvSIG. I have scrambled the data, but the image above is essentially my result.
This process also taught me about some of the complexities that come with choosing colors. I thought I could have a different color for each of our dozens of giving levels. My brother informed me that good maps should not have more than four or five different colors -- in part because it is difficult for the eye to easily distinguish between more than five colors -- and he pointed me to a website to help choose colors: https://colorbrewer2.org.
Then I realized that if I’m only dealing with four or five different colors, I can only break down the donation totals into four or five categories. I had plans to use categories like $1000-$2000 and $2001-$3000. However, it turns out there are numerous methods to break down data, and my brother said, “you should do this right or it will be like you are lying.” We chose natural intervals over equal or quantile intervals, although I'm still figuring out what exactly that means.
I have basically given up on gvSIG and I have plans to re-create my map using Processing. I want more control and more interactivity. I want to make it so users can hover over pieces of data and see the totals. I want to be able to create a dynamic website. I am sure gvSIG could provide solutions for other use cases. For me, I'm ready to try this using a different tool. So stay tuned.
Comments