Install CouchDB (GeoCouch) on CentOS 5

Use these steps to install CouchDB (or more specifically, GeoCouch) on a CentOS 5 instance. Note, some code reflects i386 and should be changed to x86_64 as appropriate. These steps were run on a lean server install that is also configured to use the RPMforge repository.

Install OS dependencies

This process builds some required tools from source. The following operating system components support compilation and libraries needed by the various tools.

$ sudo yum install gcc glibc-devel make ncurses-devel openssl-devel yum install libicu-devel curl-devel git nmap

Install updated curl

At this writing, the curl yum package on CentOS is too far out of date, requires curl >= 7.18.0. Follow steps below to download, build and install curl from source.


$ wget http://curl.haxx.se/download/curl-7.20.1.tar.gz
$ tar -xvzf curl-7.20.1.tar.gz
$ cd curl-7.20.1
$ ./configure --prefix=/usr/local
$ make
$ make test
$ make install

Install SpiderMonkey

Mozilla Javascript library Spidermonkey is required by CouchDB


$ wget http://ftp.mozilla.org/pub/mozilla.org/js/js-1.8.0-rc1.tar.gz
$ tar xvzf js-1.8.0-rc1.tar.gz
$ cd js/src
$ make BUILD_OPT=1 -f Makefile.ref
$ make BUILD_OPT=1 JS_DIST=/usr/local -f Makefile.ref export

Install Erlang

Download and install Erlang (ignore configure warnings about Java compiler, wxWidgets and fop).


$ wget http://erlang.org/download/otp_src_R13B04.tar.gz
$ tar xvzf otp_src_R13B04.tar.gz
$ cd otp_src_R13B04
$ ./configure && make && sudo make install

Install GeoCouch

GeoCouch is a version of CouchDB that supports geometry data types. GeoCouch is maintained on github.


$ git clone http://github.com/vmx/couchdb.git geocouch
$ cd geocouch
$ ./bootstrap
$ ./configure
$ make && sudo make install

Configure GeoCouch

Edit the CouchDB configuration file

# vi /usr/local/etc/couchdb/local.ini

Create couchdb user and change file ownership

$ sudo adduser -r --home /usr/local/var/lib/couchdb -M --shell /bin/bash --comment "CouchDB Administrator" couchdb
$ sudo chown -R couchdb: /usr/local/var/lib/couchdb /usr/local/var/log/couchdb

Set CouchDB service to run on startup:

# ln -s /usr/local/etc/rc.d/couchdb /etc/init.d/couchdb
# chkconfig --add couchdb
# chkconfig --level 345 couchdb on

Start CouchDB and verify it running:

# service couchdb start
# service couchdb status
# curl http://127.0.0.1:5984/
# netstat -nap

See Also

  • http://wiki.apache.org/couchdb/Installing_on_RHEL5
  • http://library.linode.com/databases/couchdb/centos-5

DCStat-An Overview

Originally posted August 2006, this article is republished on iPublic.org

Two short years ago (2004) I was asked to develop a new program for Robert Bobb, City Administrator for the District of Columbia government. Christened DCStat, the system is both technology and business process that provides key, timely information that enables executive managers to be more data-driven, customer-focused and accountable.

How DCStat works:

  • Collects transactional data from agency operations on a real-time basis
  • Organizes and combines this information to produce reports, charts & maps
  • Provides policy-makers, managers, line staff and residents access to the same information through applications and data feeds
  • Supports regular executive-level sessions to review agency performance and hold managers accountable

more »

Real, Quantifiable Benefits from Open Government Data

In April I was invited to speak at a meeting of the Federal Data.gov Agency POC Working Group.  This group is on the front line of President Barack Obama’s Open Government Directive, driving federal agencies toward greater transparency, participation, and collaboration by publishing government data sets on the Internet.

I presented ‘Perspectives on Government Open Data‘, sharing with federal leaders benefits experienced at the District of Columbia through its open government data program.  The talk covered drivers that led to DC’s program creation, its mission, findings and outcomes expressed in quantifiable improvements for citizens.

As federal agencies move to publish content, I challenged the group to make this information more useful:

  • Add value by incorporating into each data set common ‘handles’ such as location, time, organizations, etc. that enable content correlation and mashups
  • Coordinate a curation process that organizes source content into a government-wide, shared ontology framework and use controlled vocabularies
  • Publish content using RESTful web services with basic filtering capabilities (such as date selection) rather then simple file downloads
  • Eat your own dog food: using the data internally helps you produce more useful data

These recommendations stem from experience in DC and are design tenets of the Civic OpenMedia publishing system for government data.

Talking Open Government Data with Federal DAS

This month I presented “Early Demonstrations of Federal, State & Local Government for Open Government Data” to the Federal Data Architecture Subcommittee (DAS).  The topic covered a cross section of government projects that I’ve supported.  The invitation arose from Brand K Niemann’s interest in one of these: the collaborative, semantic web-based FEMA Region III Levee website.  Brand and his father, Brand Sr., who work at DOJ and EPA respectively, are developing wikis and open linked data in the Federal Government.

I spoke about:

  • Anatomy of wiki-based, semantic information solutions
  • District of Columbia’s open data program, and
  • Insights on events, organizations and activities in the municipal open data community.

Along with FEMA’s site, I explained the purpose and design supporting Mediawiki/Semantic Mediawiki projects that similarly combine structured and unstructured content: modeling natural hazard event impacts on Critical Infrastructure/Key Resources (CI/KR), and managing public transit facilities using Google Transit Feed Specification (GTFS) vocabulary.

I reviewed the mission, technology and data content that produced Washington DC’s Citywide Data Warehouse program.

The federal folks were interested to learn about the open data movement at local/municipal level, including organizations, events and projects such as: Code for America, CityCamp, Transparency Camp, E-Democracy.org and others.

Many thanks to Brand, Brand Sr, Federal DAS Chair: Dr. Suzanne Acars and committee members.  A copy of the presentation is available here.

Online Resources for Open Source Municipal Data System Community

Since the “Call to Action” post on this topic, a number of enthusiastic folks have contacted me to participate in the Open Source Municipal Data System project.  Your willingness to step up and contribute demonstrates that this project’s time has come.  We now have assembled an energized, talented group with a common vision.  So, here we go!

To prepare, over the past couple weeks I’ve: a) done a deep dive evaluating new (to me) key technology, and b) stood up a development and collaboration framework to support the community’s activities.

Thoughts and findings from the technology research activity belong to a separate post.  This article is about tools and services now available for the community’s use.

Collectively, these tools will help us organize, communicate, coordinate activities, capture knowledge and protect our contributions.  While wiki pages reference these resources and the tools themselves cross-link between one another, for convenience I’ve assembled this list in one spot:

  • iPublic Wiki – design, development and user documentation and information.  See Civic OpenMedia project home page
  • OpenMuni wiki – a fellow CityCamp effort: policies, processes and how to’s related to open data on the local level
  • GitHub - ipublic/sivic-openmedia shared repository for source code
  • Issue tracker – report, assign and close software issues and feature requests (linked to GitHub repo)
  • Developer email list server – all who are interested in development process should subscribe here
  • User list email list server – general discussion and support list: subscribe here
  • Website – it’s stood up but empty right now.  Ultimately, the plan is to make this primary node for organizing information, demonstrating and downloading the system

Please investigate and register to participate on these sites. I expect conversations will start quickly.  Let me know any issues you encounter.

I have taken the step to give the system a candidate name: Civic OpenMedia.  “Civic” of course refers to municipal matters. “Open” means… well, you know what open means.  Dictionary.com defines “Media” as: “the means of communication, as radio and television, newspapers, and magazines, that reach or influence people widely”.

Not very Web 2.0-ish, but it is meaningful.  I realize others may have different ideas to contribute.  If so, I ask that we conduct that conversation in the email list threads as an early topic.

Install Git on Mac OS X and Initialize a Remote Repo on GitHub

GitHub provides  hosting services for the Git distributed version control system at no charge for open source projects.  Following are steps to install and configure git on a Mac and to connect a local development project to a shared GitHub repository.

Install Git

With MacPorts installed, open a Terminal window and use the following commands to first install and then verify git and git-svn.  I’m running Snow Leopard– for my configuration this required extended time to download and build a number of dependencies.

sudo port install git-core +svn +doc
git --version
which git

With git loaded, its useful to run a few configuration commands. The first two provide global settings for name and email.  The next configuration command ensures consistent line endings are posted to repo.  The last command sets Textmate as default git editor.

git config --global user.name "my_name"
git config --global user.email "my_email_address"
git config --global core.autocrlf input
git config --global core.editor "mate -w"

Create a Repository

These  steps assume you have: 1) registered an account at GitHub, 2) uploaded the requisite ssh public key necessary to commit changes, and 3) a repository exists on GitHub named <my_project>.

Create a project directory and initialize a local repository.  Then add a file README.md  (GitHub recognizes and interprets .md extension content as wiki Textile markup), add it to version control  and commit the change.

mkdir <my_project>
cd <my_project>
git init
touch README.md
git add README.md
git commit -m 'first commit'

All done, right?  Not yet.  For those new to git, notice above that I referred to git as a distributed version control system. The commit command checked in changes to your local git repository.  Unlike Subversion, CVS and the like, a second step is necessary to post to a shared remote repository.

If this is original post to an empty remote repository, tell git where it lives with the following command:

git remote add origin git@github.com:ipublic/<my_project>.git

Then push change:

git push origin master

The file is now added to the shared remote repository on GitHub.

Open Source Muncipal Data System: A Call to Action

January 23-24 I spent the weekend engaging with a group of like-minded folks who share an interest in extending Gov 2.0 capabilities and benefits to municipal governments.  The event was CityCamp unconference in Chicago, expertly co-organized by Jennifer Pahlka and Kevin Curry and graciously hosted by the Rockefeller Foundation.

My motivation for attending took root thorugh conversations with Andrew Greenhill, Chief of Staff at the City of Tucson, last September at the Gov 2.0 Summit.  Andrew was there divining the technologies and processes necessary to open government data and more effectively engage with Tucson residents. Listening to Andrew, it was clear that many of Tucson’s challenges and unknowns were the same as those my team faced building the District of Columbia’s DCStat and Internet data feed systems.

A vision to share lessons learned, help avoid pitfalls and transfer knowledge between 50 states, 83,767 local governments and 300 million people prompted setup of the iPublic.org site in the first place!  Obviously, there’s more to do.

Since September, I’ve stepped up writing and posting material on iPublic.  I’ve also reviewed my team’s circa 2004/2005 system design. Technology gains since then means that publishing civic data to the Internet need not be lengthy nor cost-prohibitive.  In fact, I’m convinced that an improved system can now be made widely available to local governments under an open source licensing model.

CityCamp presented a timely opportunity to energize a community around this cause.  With that in mind, I moderated a discussion topic on Open Source Municipal Data Feeds. We had great turnout and insightful discussion. We reached consensus around the notion that the basis for a successful open government data program rests not only with technology, but also with coordinated policy and process.  At the CityCamp session I took the priority action to organize and mobilize a group to build such an open source system.

This post a call to action to business analysts, library scientists, technology developers and others who wish to contribute to realizing this vision.  It’s also a call for support and participation from municipal leaders and foundation representatives.

I’m in the process of setting up collaboration tools for this community.  You can communicate your interest in the meantime by sending me an email at: dan -dot- thomas -at- ipublic -dot- org.

Add an existing project to Subversion control

Use the following steps to add an existing directory tree under version control:

svn mkdir http://example.com/svnroot/myproject/trunk
cd myproject
svn co http://example.com/svnroot/myproject/trunk .
svn add *
svn ci

Automate MySQL full database backup

Use these steps to create a backup of all MySQL databases on a daily basis with the following behaviors:

  • cron runs files in alpha order in the directory – prefixing a 0 makes it a high priority service.
  • File name appends day-of-week, so files will be overwritten weekly.
  • mysqldump command output is XML text, so the command line uses gzip to compress content.

This approach required root access. If the output directory doesn’t exist, create it.

su -
mkdir /var/dump

Edit the new file 0mysql.cron and add the following lines:

vi /etc/cron.daily/0mysql.cron

/bin/nice -n 19 /usr/bin/mysqldump -u root -p<password>  --all-databases -c | \
/bin/nice -n 19 /bin/gzip -9 > /var/dump/mysql-all-$(date '+%a').sql.gz

Set the execute bit on the file.

chmod +x /etc/cron.daily/0mysql.cron

Finally, test that all is working as expected by running the file:

/etc/cron.daily/0mysql.cron

Install Mediawiki on an Amazon EC2 instance

Attach an EBS Volume to Instance

Create an Elastic Block Storage (EBS) volume where the wiki and mySql data will live (and survive loss of the instance) and attach to the EC2 instance. In this example, the volume is attached using device /dev/sdh with mount point: /ebs.  You will need to substitute user to root.

su -
cat /proc/partitions
mkdir /ebs
mke2fs -F -j /dev/sdh
mount /dev/sdh /ebs

Install Mediawiki Dependencies

In addition to the CentOS base install, the AMI will need a few additional items to support Mediawiki requirements. Use yum to download the required packages. Set Apache and MySQL to start automatically on reboot.

yum install httpd mysql-server mysql php-mysql php php-mbstring ImageMagick
chkconfig httpd on
chkconfig mysqld on

Relocate MySQL and Web Server Files to EBS Volume

Set up new home on S3 volume for mysql data. Create a directory, replace the standard MySQL install location with a symbolic link and change new directory ownership to MySQL account

mkdir -p /ebs/var/lib/mysql
mv /var/lib/mysql /var/lib/mysql-orig
ln -s /ebs/var/lib/mysql /var/lib/mysql
chown -R mysql.mysql /var/lib/mysql

Now, perform similar operation to replace the default Apache DirectoryRoot location with a symbolic link to the S3 volume

mkdir -p /ebs/var/www/html
mv /var/www /var/www-orig
ln -s /ebs/var/www /var/www

Download Mediawiki

Download and uncompress the latest version from the Mediawiki website

cd /var/www/html
wget http://download.wikimedia.org/mediawiki/1.15/mediawiki-1.15.1.tar.gz
tar xvzf mediawiki-1.15.1.tar.gz

As a clean default skin for Mediawiki, I prefer the daddio skin, which is available here.  Note that a couple code changes (documented later in this post) are necessary to get daddio to behave properly.

cd /var/www/html/mediawiki-1.15.1/skins
wget http://sustainablex.com/daddio.zip
unzip daddio.zip

Initialize Apache and MySQL

Change web root directory ownership to enable apache access, start up Apache and MySQL services, set MySQL root password for first time.

chown -R apache.apache /var/www/html
service httpd start
service mysqld start
mysqladmin -u root password NEWPASSWORD

You should now be able to connect to the server and proceed with Mediawiki configuration: http://myserver.com/mediawiki-1.15.1

Daddio Skin Adjustments

The mediawiki daddio skin is clean and attractive. However, a couple modifications are necessary to make it play nice. First, the css forces wiki article titles to lower case. Since titles are case-sensitive, this is problematic. This is corrected by editing the main.css file and commenting out one line.

vi /var/www/html/mediawiki-1.15.1/skins/daddio/main.css
on or about line 253, comment out text-transform for #p-cactions li
/*        text-transform: lowercase;*/

The second adjustment directs Daddio skin to reference the common.css stylesheet. This includes styles that you may add directly into wiki through the MediaWiki:Common.css page
vi /var/www/html/mediawiki-1.15.1/skins/Daddio.php
On or about line 58, add the following:
<!-- Added to include mediawiki:common.css settings -->
<link rel="stylesheet" type="text/css" href="/w/index.php?title=MediaWiki:Common.css&usemsgcache=yes&ctype=text%2Fcss&smaxage=18000&action=raw&maxage=18000" type="text/css" />