Nurse_Salaries_Project

Nurse Salaries Project

This research project aims to analyze the salaries of nurses in different countries before and after the pandemic. It aims to create an analytics tool to processes various datasets, generates reports, and visualizes the data trends to help HR teams make data-driven decisions about healthcare employment.

This project is important for making it visible the disparities in the compensation for healthcare workers, even trhough a worldwide crisis as the pandemic. It helps to visualize the trends in order to support the discussion about rising pay.

A key feature of the project is the iterative process in four phases.

The public repository of this project is available Here

Table of Contents

Introduction

This research examines nurses’ salaries in Ireland compared to those in various countries, including Australia, Canada, the USA, and several European nations. The study factors in GDP, cost of living, exchange rates, and other variables to understand salary differences. In 2022, Ireland’s average nurse salary was $64,000, ranking third among the countries studied. The paper also explores the evolving role of nurses in healthcare and highlights disparities in compensation despite their crucial role in patient care. The aim is to provide insights for nurses to make informed career decisions. The project follows the Waterfall model for structured project management and uses the Knowledge Discovery in Databases (KDD) framework for data analysis, with Jupyter Local chosen as the development environment.

Here is a Datafolio (Poster) report of the research process and takeaways:

image.png

Structure

/Nurse-Salaries-project
├── /data
│ ├── HEALTH_REAC_.csv              # Raw OECD dataset
│ ├── HEALTH_REAC_Interpolated.csv  # Cleaned dataset
│ ├── HEALTH_REAC_Augmented.csv     # Enriched dataset
│ ├── HEALTH_REAC_Sorted.csv        # Same dataset, ordered by country GDP
│ ├── List of countries by GDP.csv  # ordered list of countries, to sort data.
├── /images
│ ├── Boxplots and Facetgrids plots
├── /notebooks                      # JUPYTER Notebooks
│ ├── 1. Data Preparation - Nurse Salary Project.ipynb
│ ├── 2. Exploratory Data Analysis - Nurse Salary Project.ipynb
│ ├── 3. Statistical Analysis - Nurse Salary Project.ipynb
│ ├── 4. Machine Learning - Nurse Salary Project.ipynb
├── /src
│ ├── preprocess.py                 # Data preprocessing scripts
├── FINAL REPORT - Nurse Salaries Project.pdf
├── Annex 1. Data Science project management methodologies.pdf
├── Annex 2 List of countries by GDp (nominal) - Wikipedia.pdf
├── Annex 3  List_of_countries_by_GDP_(nominal)_per_capita.pdf
├── Annex 4  List_of_countries_by_GDP_(PPP)_per_capita.pdf
├── Annex 5_ Nurse Salaries - Project Plan.docx
├── Annex 6_ Nurse salaries - Project Timeline.xlsx
├── README.md                       # Project documentation
├── requirements.txt                # Python dependencies
├── LICENSE.md                      # Project licencing

Goals

The aim of this project is to establish a model of a complete software development cycle applied to a simple research project. This is the Capstone Project for the Master in Data Analytics of the University of Dublin.

The structure of the project is consisting in 4 main phases, each one deliverying specific types of insights:

  1. Data Preparation
  2. Exploratory Data Analysis
  3. Statistical Analysis
  4. Machine learning to answer research questions.

The idea is that the insights gathered in each phase inform the next, and eventually, the process turn iterative, since subsequent analysis and advancing with the research points the team to start working from scratch, or with the same supplies, but in a different direction.

This Git version controlled repository includes the notebooks that describe the four main phases of the project, as well as the documents that help shape the knowledge aquired about the topic of choice.

The complete datasets used were available publicly and without any license, from the OCDE website.

Problem

The particular problem of choice is: “What are the characteristics of the salaries of nurses in the countries od the European Union, in particular in Ireland, from the data publicly available?”. This project started during 2023 running experiments to gather data, transform it and test if it was feasible to answer business questions with it.

The initial idea was to build a data analysis tool that calculates and visualizes healthcare salary trends by processing datasets from different developed countries, generating reports, and applying statistics techniques and Purchasing Power Parity and cost of living adjustments to correct salary values. By accounting for economic factors, it provides a more accurate, less biased picture of salary changes over time and across a wide geography.

Then, after a lenghty analysis of the datasets, there were defined new descriptive variables in a Feature Engineering process that involved domain knowledge from the nursing field to generate new descriptors and classifiers. Using that information, a Clustering Model for countries were developed. And finally, many different Machine Learning algorithms were implmented using those new features in order to try to answer more difficult questions.

Analysis

This report explores nurse salaries across 39 countries, using data from the OECD. The dataset, presented in XML format, lacked a dictionary and required research to interpret various variables and ranges. Assumptions were made regarding the dataset’s source (OECD), consistency, and data privacy. Key Data Findings and Dataset Issues:

Exploratory Data Analysis (EDA):

Plots 1 to 5

How does different measurement methodologies, GDP ranking affect salaries of different professions. Salaries are shown in these units:

image.png

image.png

image.png

image.png

image.png

Correlation and Statistical Analysis:

Machine Learning Algorithms:

Several machine learning models were suggested for addressing analytical challenges:

Plots 6

How have adjusted and standardized salaries varied over time?

image.png

Key Outcomes:

Plots 7

How are adjusted for PPP salaries compared among professions?

image.png

Plots 8

How are adjusted salaries compared to average salary of the country among professions?

image.png

Discussion:

The report identifies several factors influencing nurse salaries, such as specialization, demand, experience, and location. Specialist nurses with advanced training and more responsibilities earn higher wages. Countries facing nursing shortages or with strong unions tend to offer better compensation. Addressing salary disparities is vital to improving recruitment, retention, and job satisfaction in the nursing profession, ensuring a robust healthcare workforce globally.

License

All rights reserved

No Use Without Permission

No Commercial Use

Modification Prohibited

Attribution must be made

See the LICENSE file for details.