Web Scraping ICC Rankings Using Python: A Step-by-Step Guide

Author: Maroof Tahir and Faisal Rafiq

Overview

The project involves four main phases: data collection, data cleaning, data storage and visualization of that data (stats). In the data collection phase, data is scraped from ESPNcricinfo.com using APIs to gather the latest statistics and rankings. The data cleaning phase involves processing the scraped data to ensure it is in a usable format, addressing issues such as missing values, inconsistencies, and errors. Finally, in the data storage phase, the cleaned data is written into CSV files for analysis and reporting. This project involves extracting player and team data from the ESPN Cricinfo website. The goal is to scrape the data, convert it into a CSV format, and visualize various metrics such as the number of players, top players, and most winnings.

Technologies Used

  • Python
  • Matplotlib
  • Pandas
  • Pyplot
  • BeautifulSoup4
  • Requests

Project Structure

The project directory contains the following files and directories:

  • 10 Players Bio/: A folder containing biographies of 10 selected players.
  • csv: CSV file containing ICC ODI rankings data.
  • py: Python script for extracting and processing ICC rankings data.
  • csv: CSV file containing ICC T20I rankings data.
  • csv: CSV file containing ICC Test rankings data.
  • csv: CSV file containing ICC Women’s ODI rankings data.
  • csv: CSV file containing ICC Women’s T20I rankings data.
  • txt: Text file with indexing information.
  • Individual Players/: A folder containing detailed information about individual players.
  • Individual.py: Used to extracting all player data in a team.
  • Players Ranking/: A folder containing various player ranking files.
  • Players.py: Used to extract top 10 players from every team.
  • py: Python script for extracting and processing team data.

Steps to Extract and Process Data

  1. Extract Data from Website: The data is extracted from the ESPN Cricinfo website using web scraping techniques.
  2. Scrape Data: The ‘ICC_Project.py’and ‘team.py’ scripts are used to scrape player and team data, respectively.
  3. Convert to CSV: The scraped data is processed and converted into CSV files for easier analysis and visualization.
  4. Visualizations: We have visualized the data of team ranking across the season, individual player stats and top players of each team.

Visualized Data

The data is visualized to show various metrics, including:

  • Number of players
  • Top players
  • Most winnings by teams in season

File Details

  1. Ten Players Bio/

This directory contains detailed biographies of 10 selected players. Each file in this directory provides in-depth information about a player, including their career statistics and achievements.

  1. ICC_ODI_Rankings.csv

This CSV file contains the ICC ODI rankings data. The columns include player names, team names, and their respective rankings.

  1. ICC_Project.py

This Python script is responsible for extracting and processing ICC rankings data. It includes functions to scrape data from the website and convert it into a structured format.

  1. ICC_T20I_Rankings.csv

This CSV file contains the ICC T20I rankings data. Similar to the ODI rankings file, it includes player names, team names, and their respective rankings.

  1. ICC_Test_Rankings.csv

This CSV file contains the ICC Test rankings data. It includes player names, team names, and their respective rankings.

  1. ICC_WODI_Rankings.csv

This CSV file contains the ICC Women’s ODI rankings data. It includes player names, team names, and their respective rankings.

  1. ICC_WT20I_Rankings.csv

This CSV file contains the ICC Women’s T20I rankings data. It includes player names, team names, and their respective rankings.

  1. index.txt

This text file contains indexing information that helps in organizing and accessing the data files.

  1. Individual Players/

This directory contains detailed information about individual players. Each file provides comprehensive data on a specific player, including their statistics and career highlights.

  1. Individual.py

This Python script is responsible for extracting and processing individual player data of each team. It includes functions to scrape data from the website and convert it into a structured format.

  1. Players Ranking/

This directory contains various player ranking files. Each file provides rankings for different categories and formats.

  1. players.py

This Python script is responsible for extracting and processing Top 10 players from every team. It includes functions to scrape data from the website and convert it into a structured format.

  1. team.py

This Python script is responsible for extracting and processing team data. It includes functions to scrape data from the website and convert it into a structured format.

Usage

  1. Run the ‘ICC_Project.py’script to scrape and process the ICC rankings data.
  2. Run the ‘py’script to scrape and process the team data.
  3. Use the generated CSV files to visualize the data and analyze various metrics.

Visualization Examples

  • Number of players: A bar chart showing the number of players from each team.
  • Top players: A table listing the top players based on their rankings and performance.
  • Most winnings: A graph showing the teams with the most winnings in different formats.

This visual representation allows us to quickly identify which teams have the highest or lowest ranks in different formats. For example, India ranks highest in ODIs, while AUS Women holds the top spot in T20I. The data can be updated to reflect the latest rankings from the scraped datasets. You can replace it with real-time data from the scraped datasets for more accuracy and relevancy as i did.

Top 5 ODI All-rounders, Batters and Bowlers

All-Rounders:

In ODI cricket, the top 5 all-rounders include Mohammad Nabi, known for his consistent all-around performances who excels with both bat and ball. Shakib al Hasan, Sikandar Raza, Rashid Khan, and Assad vala also feature prominently for their match-winning abilities.

Batters:

Among the top batters, Babar Azam continues to reign supreme, followed by Rohit Sharma, Shubman Gill, Virat Kholi, Harry Tector. These players have been crucial in accumulating runs for their teams.

Bowlers:

On the bowling side, Keshav Maharaj, Josh Hazlewood leads the charge, with Adam Zampa, Kuldeep Yadav and  Bernard Scheltz dominating the bowling charts.

Similar Visualizations for T20, Test, and Women’s Cricket

Just like we’ve visualized the top rankings for ODIs, the same process will be applied to T20, Test, and Women’s Cricket. Data for these formats will be scraped, cleaned, and visualized to show top players, rankings, and team performances, giving us deeper insights into each format of the game. Stay tuned for more detailed charts and rankings for all formats! 

case studies

See More Case Studies

Contact us

Partner with Us for Complete IT Solutions

We’re happy to answer any questions you may have and help you determine which of our services best fit your needs.

Your benefits:
What happens next?
1

We Schedule a call at your convenience 

2

We do Consulting Meeting

3

We prepare a proposal 

Schedule a Free Consultation