Scraping Site Data with Jsoup (in Kotlin)

31.10.2023 — Scraping, Jsoup, Web Scraping, Kotlin — 1 min read

Scraping data from websites can be a powerful technique to extract information for various purposes. In this article, we will explore how to utilize the Jsoup library in Kotlin to scrape site data effectively. Whether you are an Android developer or just curious about web scraping with Kotlin, this guide will provide you with valuable insights and practical examples.

Setting up Jsoup in Android Project

Before we dive into the code, let's set up Jsoup in our Android project:

Open your project in Android Studio.
In your app-level build.gradle file, add the following dependency within the dependencies block:

1implementation 'org.jsoup:jsoup:x.x.x'

Sync your Gradle files to fetch the Jsoup library.

With Jsoup successfully integrated into your project, we can now proceed to perform web scraping operations.

Example: Scraping HTML Content

Let's start by scraping the HTML content of a web page. Suppose we want to extract the titles of articles from a news website. Here's how you can achieve it using Jsoup in Kotlin:

1import org.jsoup.Jsoup
2import org.jsoup.nodes.Document
3
4fun main() {
5    val url = "https://example.com/news"
6
7    try {
8        val document: Document = Jsoup.connect(url).get()
9        val articleTitles = document.select("h2.article-title")
10
11        for (title in articleTitles) {
12            println(title.text())
13        }
14    } catch (e: Exception) {
15        e.printStackTrace()
16    }
17}

In the above code snippet, we use Jsoup.connect(url).get() to retrieve the HTML content of the specified URL. We then use document.select("h2.article-title") to select all <h2> elements with the class "article-title," representing the titles of the articles. Finally, we loop through the selected titles and print them.

Remember to replace the url variable with the desired website URL you wish to scrape.

Example: Extracting Data from Tables

Web scraping often involves extracting data from structured elements like tables. Let's consider an example where we want to scrape a table containing stock prices from a financial website. Here's how you can achieve it using Jsoup in Kotlin:

1import org.jsoup.Jsoup
2import org.jsoup.nodes.Document
3
4fun main() {
5    val url = "https://example.com/stocks"
6
7    try {
8        val document: Document = Jsoup.connect(url).get()
9        val table = document.select("table.stock-table")
10        
11        val rows = table.select("tr")
12        for (row in rows) {
13            val columns = row.select("td")
14            for (column in columns) {
15                println(column.text())
16            }
17        }
18    } catch (e: Exception) {
19        e.printStackTrace()
20    }
21}

In this example, we scrape a table with the class "stock-table" and iterate over its rows and columns. By using row.select("td"), we select all <td> elements within each row and extract their textual content using column.text().

Feel free to modify the code according to your specific requirements and the structure of the table you want to scrape.

In Conclusion

Scraping site data with Jsoup in Kotlin opens up a world of possibilities for extracting information from websites. Whether you need to gather data for research, build a content aggregator, or automate processes, Jsoup provides a robust set of tools to make web scraping tasks easier!