Skip to content

Investigating Average Age of Deceased Males and Females in the Titanic Tragedy via MapReduce Program

Comprehensive Learning Hub: This educational platform encompasses various subjects, such as computer science, programming, school education, upskilling, commerce, software tools, competitive exams, and others, aiming to empower learners in diverse domains.

Analysis of Titanic Tragedy Data using MapReduce: Determining the Averages of Male and Female Age...
Analysis of Titanic Tragedy Data using MapReduce: Determining the Averages of Male and Female Age at Death

Investigating Average Age of Deceased Males and Females in the Titanic Tragedy via MapReduce Program

In an intriguing data processing project, we aim to calculate the average age of male and female passengers who perished in the infamous Titanic disaster using Hadoop MapReduce.

Preparing the Environment

  1. First, download the Titanic dataset from its specific link.
  2. Create a new Eclipse project named "Titanic_Data_Analysis".
  3. Add External Hadoop Common and Hadoop MapReduce Core JARs to the project.
  4. Create a new class named "Average_age" within the project.

Preparing the Data

  1. Write the MapReduce program's code into the "Average_age.java" file.
  2. The Titanic dataset consists of 12 columns, each describing the information of a particular person.
  3. For this project, we will process the data to extract gender and age for only those who didn't survive.

Setting Up Hadoop

  1. Start Hadoop Daemons using the commands and .
  2. Upload the Titanic dataset to Hadoop's HDFS using the command .
  3. Check if the dataset has been uploaded with the command .

Running the MapReduce Program

  1. Export the project as a JAR file.
  2. Run the exported .jar file on Hadoop using the command .
  3. View the output file in the Hadoop web interface at . Navigation path: .

Processing the Data

In a Hadoop MapReduce Java program, the process to find the average age of male and female passengers who died involves:

  1. Mapper Phase:
  2. Parse each record.
  3. Filter records with survival status indicating deceased.
  4. Emit key-value pairs with the gender as key ("male" or "female") and age as value.
  5. Reducer Phase:
  6. Receive all ages by gender.
  7. Sum ages and count the number of records per gender.
  8. Calculate average age = total age sum / count.

Understanding the Results

From general knowledge about the Titanic dataset, young adults (18-30) had better survival rates, implying a relatively higher average age among those who died. However, exact averages need to be computed programmatically.

Without running the MapReduce job on the actual dataset, we can only provide example numbers:

  • The average age of males who died tends to be higher than females who died because more females and children survived due to lifeboat priority.
  • The female passengers who died were generally older or younger children, skewing average age compared to male deaths dominated by adults.

In conclusion, the average age by gender of Titanic passengers who died is not directly stated in the search results or common references; it must be calculated by processing the Titanic dataset with a filtering and averaging MapReduce program in Java. The dataset and the process are well known, but specific numeric answers require program execution on the data.

In the Hadoop MapReduce Java program for the "Titanic_Data_Analysis" project, we will use technology like data-and-cloud-computing to implement a trie. The trie will help us efficiently store, search, and retrieve the data related to gender and age of the passengers who perished in the Titanic disaster, enabling accurate calculation of average ages.

Once the MapReduce program is successfully executed, we can find the average age of both male and female passengers who died on the Titanic by analyzing the information processed and stored in the trie data structure within the Hadoop ecosystem.

Read also:

    Latest