March 13th and 17th 2025
The International Food Policy Research Institute, in collaboration with The Australian National University (ANU) and University of Papua New Guinea (UPNG), recently conducted a 1.5-day training on data analysis and management using Stata, a widely used statistical software. The training was held at the UPNG in Port Moresby on March 13th and 17th 2025, and was attended by 35 participants, including 12 faculty and staff members and 23 graduate students from the Master’s in Economics and Public Policy (MEPP) program. The session was led by Rishabh Mukerjee, Harry Gimiseve and Emily Schmidt from IFPRI.
Objective and Overview
The training aimed at providing participants with thorough, hands-on introduction to Stata, a powerful statistical software for analyzing household datasets and many other types of data as well as familiarizing them with statistical techniques to analyze and interpret data. The key objectives included:
- Equip participants with essential skills in data management, interpretation and analysis of the 2023 rural household survey data
- Enhance their capacity to understand the nature of data to apply appropriate analytical tool
- Teach statistical methods and techniques that can be applied in coursework or related research projects
The workshop was structured around three topics i) exploring datasets and describing data, ii) transforming data to achieve analysis goals and iii) analyzing data for policy analysis
To facilitate the training, IFPRI provided Stata licenses and cleaned data that were pre-uploaded on the university’s computer systems along with training manual and related excerpts from the 2023 rural household questionnaire for their reference.
Introduction to Stata and background of 2023 rural household survey data
The training began with an overview of the 2023 Rural Household Survey and the subsequent report published in 2024. Participants were introduced to the survey objectives, questionnaire components, sample selection, and methodology to familiarize them with the dataset used in the session.
Before starting the coding lessons, selected key findings from the 2023 Rural Household Survey Report were shared to demonstrate how data can provide key insights into agriculture, employment, consumption, and WASH indicators in rural communities across PNG. Participants quickly recognized the potential of household data in informing discussions on food systems, household resilience, and welfare. Additionally, this exercise allowed them to see directly the types of outputs Stata can generate when working with real-world data.
Given that most participants had no prior experience in using Stata, the training was started by explaining Stata’s interface which provided them with an overview of each component along with its use.
Describing and Analyzing Data
The first lesson focused on describing, viewing, and understanding the nature of a dataset. It included a detailed session on how to browse data and distinguish between string, categorical, and numerical variables in Stata. The training then introduced key descriptive commands commonly used in data analysis to compute frequency distributions, averages, medians, etc., through tables and histograms. These lessons were applied to the housing quality data collected during the 2023 PNG Rural Household Survey, where participants analyzed information on water sources, roof materials, and floor materials. The session concluded with a discussion on conditional commands, which allowed users to filter outputs based on specific criteria.
Transforming Data
The second half of the first day was dedicated to learning data transforming techniques in Stata. Participants were trained in creating, renaming, labeling, replacing and recoding variables. The session also covered how to create and interpret dummy variables, collapse and merge data, remove outliers and handle missing variables.
During the session, Emily Schmidt shared insights from a recent analysis on the prevalence and correlates of child growth outcomes in rural PNG. This presentation provided students with a foundation for understanding regression outputs and their significance.
Statistical Test and Regression Analysis
On the final day, participants were introduced to hypothesis testing and provided lessons on correlation analysis, T-tests, and regression analysis. The session began with an explanation of the concepts behind hypothesis testing, including null and alternative hypotheses, significance levels, and p-values. Participants then practiced computing correlation coefficients along with creating scatter plots.
Building on this foundation, participants learned about T-tests, where they compared mean differences in child growth outcomes between households with and without protein consumption.
For the final exercise, participants applied their learning by replicating the regression analysis presented by Emily Schmidt the previous day. Using the rural household survey dataset, they conducted multiple regression analyses to identify factors influencing children’s height-for-age Z-scores.
The training concluded with a certificate distribution ceremony attended by Dr. Lawrence Sause, Acting Executive Dean, School of Business and Public Policy, UPNG and Mr. Nic Jonsson, Counsellor-Economic, Australia High Commission. They congratulated the participants and highlighted the ANU-UPNG partnership’s role in strengthening research capacity within PNG.
The participants are now equipped with basic data analysis techniques that they can apply to their research projects and related initiatives. This training event marked a significant step forward in improving the capacity of faculty and students in UPNG. The training was well received, with participants expressing interest in further training on advanced data analysis and tools.