Mobile Data Collection with Preloaded data


Recently I was asked to design a mobile based data collection and management system to collect data of sugarcane growers in Pakistan. The objective was to develop a data collection and management system which can be used to collect data from sugar cane growers using android based devices. The collected data required to contain text, numeric and pictures along with geographic shapes (polygons) of the sugarcane fields.

The field enumerators were expected to have low education level so it was desired to make the mobile data collection as intuitive and user friendly as possible. The weather and field conditions were tough requiring considerations.

The collected data was to be audited by supervisors through a web interface where they wanted to view and verify the collected data as well as the location, shape and size of sugarcane field. They wanted to view the polygon of sugarcane filed overlaid on satellite image with the ability to modify coordinates of polygon shape.

Additionally, it was required to have some mechanism in which existing data of sugarcane growers is made available on mobile data collection devices so that enumerator does not need to fill in all data fields. Instead they can simply verify if the existing data was correct.

Form Design with Preloaded data:

We decided to base our work on Open Data Kit with customized data collection forms and reporting server. ODK Collect 1.4.3 allows the data preloading in new round of survey. We took advantage of that and created a survey form with associated database of existing grower information. Some key technical aspects in designing such forms include the following:

  1. Create a .csv file containing the data you want to use as pre-loaded in your questions. For example our csv name is SCGDV1.csv
  2. The .csv file must contain a column with name ending with “_key”. This column will be used for lookup. For example in our case we used “grower_id_key”
  3. The column names for other columns should also be short and unique.
  4. Create a simple form using ODK build or any other xml form builder of your choice
  5. Open the xml for in note pad or any other xml editor for advance changes
  6. Search for  “<bind nodeset” and you will reach in the part of form containing data nodes
  7. Initially they will look like:
    <bind nodeset="/data/grower_id" type="int"/>

    <bind nodeset="/data/name" type="string" required="true()"/>

    <bind nodeset="/data/father_name" type="string"/>

    <bind nodeset="/data/nic" type="int" required="true()"/>

    <bind nodeset="/data/land" type="int"/>

    <bind nodeset="/data/location" type="geopoint"/>

  8. Add pulldata() function to desired nodesets where you want to have preloaded data.
  9. The syntax will be like
    calculate="pulldata('SCGDV1', 'name', 'grower_id_key',  /data/grower_id)"

    calculate=”pulldata(‘SCGDV1’, ‘name’, ‘grower_id_key’,  /data/grower_id)”

  10. Where SCGDV1 is name of csv file, “name” is the column heading whose value you want to pull against grower_id given in /data/grower_id while “grower_id_key” will be used for searching that name.
  11. Suppose you entered 15 as a grower_id in a question and you use pulldata() function to fetch name of the grower having id 15 from the csv file. So it will search for 15 in “grower_id_key” column and will find the corresponding name for that record and fill the Name question with what it found.
  12. pulldata() function is used with a calculate command with each nodeset and resultantly code looks like this
    <bind nodeset="/data/grower_id" type="int"/>

    <bind calculate="pulldata('SCGDV1', 'name', 'grower_id_key',  /data/grower_id)" nodeset="/data/name" type="string" required="true()"/>

    <bind calculate="pulldata('SCGDV1', 'father_name', 'grower_id_key',  /data/grower_id)" nodeset="/data/father_name" type="string"/>

    <bind calculate="number(pulldata('SCGDV1', 'nic', 'grower_id_key',  /data/grower_id))" nodeset="/data/nic" type="int" required="true()"/>

    <bind calculate="number(pulldata('SCGDV1', 'land', 'grower_id_key',  /data/grower_id))" nodeset="/data/land" type="int"/>

    <bind nodeset="/data/location" type="geopoint"/>

  13. Even when they are numbers, data fields pulled from a .csv file are considered to be text strings. Thus, you may sometimes need to use the int() or number() functions to convert a pre-loaded field into numeric form. In my case int() did not worked but number() works fine as it can be seen above. I had to use this function for each and every nodeset where data type was integer. Otherwise it gives error.
  14. Once form is complete, test is using ODK Validate and upload in your aggregate along with csv file. Deploy on your mobile and it works perfect.

Mobile Data Collection in Pakistan

Almost every project in rural development, disaster management and community awareness calls for field surveys for the collection of primary data. In low income country like Pakistan where capacity and administrative problems with the collection of data are common, surveys are often the only way to collect reliable data. Paper based data collection has been the standard method for decades but errors are frequent, storage costs are prohibitive, and the costs of double data entry are high. Recent advancement in communication technology has introduced the electronic methods of data collection in order to merge the process of data collection and data entry. Handheld devices such as personal digital assistants and smart phones are increasingly being used instead of paper and pencil methods of data collection.

In 2008 Pakistan was the world’s third fastest growing telecommunications market. Pakistan’s telecom infrastructure is improving dramatically with foreign and domestic investments into fixed-line and mobile networks; fiber systems are being constructed throughout the country to aid in network growth. Approximately 90 percent of Pakistanis live within areas that have cell phone coverage and more than half of all Pakistanis have access to a cell phone. With 118 million mobile subscribers in March 2012, Pakistan has the highest mobile penetration rate in the South Asian region (Wikipedia 2012). This gives us a very positive opportunity to use mobile based data collection mechanisms in our regular data collection and research activities to reduce our cost and improve accuracy and efficiency.

clip_image002The concept of electronic data collection has been applied successfully in many developing countries (see Map) in the field of health, agriculture, socio-economic studies, livelihoods & economic development, microfinance, market analysis and customer satisfaction studies. Recently this data collection mechanism has been adopted in Pakistan by some national and international organizations to collect data from remote areas at a reasonably large scale.

Open Data Kit (ODK) is a suite of tools that allows data collection using Android mobile devices and data submission to an online server, even without an Internet connection or mobile carrier service at the time of data collection. One may streamline the data collection process with ODK Collect by replacing traditional paper forms with electronic forms that allow text, numeric data, GPS, photo, video, barcodes, and audio uploads to an online server. You can host your data online using Google’s powerful hosting platform, AppEngine, manage your data using ODK Aggregate and visualize your data as a map using Google Fusion Tables and Google Earth.

Created by developers at the University of Washington’s Computer Science and Engineering department and members of Change, Open Data Kit is an open-source project available to all. It consists of three main components Build, Collect and aggregate as shown below:


As per my knowledge, in Pakistan, Mobile data collection using Andriod based smart phones has been used partially in the following projects (as of May 2014):

  1. Multi-sector Initial Rapid Assessment for Pakistan (MIRA) implemented by OCHA and NDMA
  2. Collection of primary data about ‘elements at risk’ in flood plain areas of Indus River implemented by City Pulse (Pvt.) Ltd. (Mar 2012)
  3. Real time data analysis of Participants’ Feedback in training sessions (Jan 2014)
  4. Labour Force Survey in Gilgit Baltistan implemented by AKFP and AKRSP
  5. A pilot project on monitoring of health facilities using smart phones implemented by LUMS
  6. PakistanGIS team has been capacitating a few groups of university researchers in Mobile data collection systems and Smart phone based primary data collection for improving efficiency and accuracy in data collection for their research. (Aug 2012)
  7. IRG has used Mobile data collection for Electricity Consumers’ Census in KPK for PESCO. Mobile data collection and Management solution has been provided by City Pulse (Pvt.) Ltd. (April 2014)

Special Thanks to Mr. Qadeer for write up