Site icon FSIBLOG

HiAnimez.to Scraper: An Anime Data Extractor API in Python

hianimez

hianimez

If you are an anime enthusiast or a developer working on anime-related projects, you know how important it is to gather accurate and up-to-date anime data. Whether you’re building a recommendation system, a review platform, or an anime tracker, getting anime data like titles, ratings, release years, and episodes is crucial for the smooth functioning of your application. However, what if the website you’re interested in like HiAnimez.to doesn’t provide a public API? In this case, scraping the data directly from the website can be your best solution.

What is HiAnimez.to?

HiAnimez.to is an anime streaming website where users can watch anime online for free. In addition to offering anime streaming, it provides a wealth of information about the anime, including:

While HiAnimez.to offers a great collection of anime content, it does not have an official public API. Without an API, developers often turn to web scraping to extract the information they need. Scraping allows you to pull the data from the site’s HTML pages and use it in your projects.

Why Should You Scrape Data from HiAnimez.to?

There are several reasons why scraping HiAnimez.to might be a good idea for your project:

  1. No Public API: HiAnimez.to doesn’t provide an official API, making scraping the only practical option for programmatically accessing data.
  2. Custom Data Extraction: Scraping allows you to tailor the data you extract to suit your specific needs, whether that’s just titles and ratings or detailed episode breakdowns.
  3. Up-to-date Data: By scraping the site regularly, you can get the latest anime updates, ensuring your application always has fresh content.
  4. Automation: Once you’ve set up your scraper, you can automate the process of retrieving new anime data, saving you time.

Setting Up the Environment

Before we start coding, let’s set up our Python environment and install the libraries we’ll use for scraping.

Required Libraries:

We’ll need the following Python libraries:

Installing the Libraries:

Open your terminal or command prompt and run the following commands to install the necessary libraries:

pip install requests beautifulsoup4 pandas

Once the libraries are installed, we’re ready to start coding the scraper.

Building the Scraper:

Now, let’s dive into the code. We will create a scraper that:

  1. Sends HTTP requests to HiAnimez.to.
  2. Parses the HTML response to extract anime data like titles, ratings, release years, and more.
  3. Handles multiple pages (pagination) if there are many anime listings.
  4. Saves the extracted data into a CSV file for later use.

Step-by-Step Code Explanation:

Here’s the full Python code for the HiAnimez.to Scraper:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

# Function to send HTTP request with proper headers (including User-Agent)
def get_page_content(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    
    try:
        # Send GET request
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise an exception for HTTP errors
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the page: {e}")
        return None

# Function to parse the anime data from a page
def parse_anime_data(page_content):
    soup = BeautifulSoup(page_content, 'html.parser')
    
    # Find all anime entries
    anime_list = soup.find_all('div', class_='anime-card')  # Adjust based on actual structure of the page

    # Ensure data is found
    if not anime_list:
        print("No anime found on this page")
        return None

    anime_data = []
    
    for anime in anime_list:
        try:
            title = anime.find('h3').get_text(strip=True)  # Anime title
            link = anime.find('a')['href']  # Link to the anime's page
            rating = anime.find('span', class_='rating')
            rating = rating.get_text(strip=True) if rating else 'N/A'  # Rating (default to N/A if not found)
            release_year = anime.find('span', class_='release-year')
            release_year = release_year.get_text(strip=True) if release_year else 'N/A'  # Release year (default to N/A)

            # Add the data to the list
            anime_data.append({
                'Title': title,
                'Link': link,
                'Rating': rating,
                'Release Year': release_year
            })
        except AttributeError as e:
            print(f"Error extracting data from an anime entry: {e}")

    return anime_data

# Function to scrape anime data from HiAnimez.to
def fetch_anime_data(base_url, total_pages=1):
    all_anime_data = []
    
    for page_number in range(1, total_pages + 1):
        url = f"{base_url}?page={page_number}"  # Adjust pagination based on the website structure
        print(f"Scraping page {page_number}...")
        
        # Get page content
        page_content = get_page_content(url)
        if page_content is None:
            print(f"Skipping page {page_number} due to error.")
            continue
        
        # Parse anime data from the page content
        anime_data = parse_anime_data(page_content)
        if anime_data:
            all_anime_data.extend(anime_data)

        # Optional: delay to avoid overwhelming the server
        time.sleep(2)  # Sleep for 2 seconds between requests

    return all_anime_data

# Function to save data to a CSV file
def save_to_csv(data, filename='anime_data.csv'):
    df = pd.DataFrame(data)
    df.to_csv(filename, index=False)
    print(f"Saved data to {filename}")

# Main function to execute the scraping process
def main():
    base_url = 'https://hianimez.to/anime-list/'  # Replace with actual URL if different
    total_pages = 5  # Set the number of pages you want to scrape

    anime_data = fetch_anime_data(base_url, total_pages)
    if anime_data:
        # Save the collected data to a CSV file
        save_to_csv(anime_data)
        print(f"Successfully scraped {len(anime_data)} anime entries.")
    else:
        print("No anime data was scraped.")

if __name__ == "__main__":
    main()

Explanation of Each Section

Sending HTTP Requests:

We use the requests library to send HTTP requests to the website. The User-Agent header is included to simulate a real browser, which can prevent the site from blocking our scraper.

response = requests.get(url, headers=headers)

If the request fails (e.g., due to network issues), we handle the error using a try-except block and return None.

Parsing the HTML Content:

Once we fetch the page, we use BeautifulSoup to parse the HTML content. We then search for specific tags and classes to extract the anime data.

anime_list = soup.find_all('div', class_='anime-card')

In this case, we’re looking for div elements with the class anime-card, which contains each anime’s title, link, rating, and release year.

Handling Missing Data:

We check if the anime data exists (e.g., title, rating, release year) before trying to extract it. If it’s missing, we return a default value like 'N/A' to prevent errors.

rating = rating.get_text(strip=True) if rating else 'N/A'
release_year = release_year.get_text(strip=True) if release_year else 'N/A'

Pagination Handling:

We support pagination by iterating through multiple pages of anime listings. For each page, we build the URL with the page number and scrape the data. The time.sleep(2) delay ensures we don’t overwhelm the server.

url = f"{base_url}?page={page_number}"

Saving Data:

Finally, we store the scraped data in a pandas DataFrame and save it as a CSV file.

df.to_csv(filename, index=False)

This makes it easy to analyze the data later or integrate it into other projects.

Running the Scraper

To run the script, simply call the main() function. It will scrape anime data from the specified number of pages and save it to anime_data.csv.

python hianimez_scraper.py

Conclusion

In this tutorial, we walked through creating a HiAnimez.to Scraper in Python to extract valuable anime data from the website. This scraper can be adapted for different sites and different data types, making it a powerful tool for any anime-related project.

Exit mobile version