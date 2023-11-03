Knowing where to look for data – and how to access it – should be a priority for investigative journalists. Effective use of data can not only improve the overall quality of an investigation but also increase its public service value.

Data scraping allows journalists to gather information quickly and efficiently.

Over the past 20 years, the amount of data available has grown at an unprecedented rate. According to the International Data Corporation (IDC), by 2025 the collective sum of the world’s data will reach 175 zettabytes (a zettabyte is one trillion gigabytes; as IDC says, if one could store the 2025 datasphere on a DVD, the resulting Line DVD will circle the Earth 222 times).

Some estimates claim that Google, Facebook, Microsoft, and Amazon alone store at least 1,200 petabytes of data between them (one petabyte = one million gigabytes). Investigative and data journalists are using more quantitative, qualitative and categorical data than ever before – but getting Good Data is still a challenge.

Accessing, or finding, structured data – defined as data in a clearly defined, standardized format ready for analysis – including bad or incomplete data (false data, dirty, faulty, or “rogue” data, duplicate data, scattered ) data from the ocean, and ambiguous data ) is still difficult, no matter the region. Part of the solution to this problem is to increase data literacy: we need to understand how data is collected, cleaned, verified, analyzed and visualized, because it is a It is an interrelated process. For journalists, data literacy is important.

In data journalism, like any type of journalistic practice, we look for ways to access all types of data, such as from leaks, from thousands of PDF files, or from indexes recorded on websites – organized or not. Some of these are easy to access, while others require technology to access, which takes time.

However, there are tools and methods that make it both enjoyable and simple – like scraping data from websites. Scraping in this manner means using computer programs or software to extract or copy specific data from websites. This process can be used to collect or analyze data, and it is faster and more efficient than obtaining data manually.

The benefits of data scraping for journalists include:

Speed ​​and Range: Data scraping allows journalists to gather information quickly and efficiently. Pulling data from a variety of sources on the Internet gives you a broader perspective, and helps you base your stories on a more solid foundation. Verification: Data scraping can help journalists in the verification process. You can check information on the web and compare data to find contradictions, which helps verify information and increase its credibility. Highlighting trends: Data scraping can be used to uncover patterns related to a particular topic or phenomenon. For example, by analyzing large datasets, you can understand trends in social media or public opinion and integrate this information into your news. data visualization: Visualizing data collected by data scraping helps journalists present their stories more effectively. By using graphs, charts, and interactive visuals, you can make data more understandable and give readers a better understanding of the topic. Enabling deeper checks: Data scraping allows journalists to conduct more in-depth research. By analyzing large datasets, for example, financial data, you can gain a deeper understanding of company operations or government policies. Increasing value of news: Data scraping can produce newsworthy stories. Statistics, trends, demographics or other data can make your stories more engaging and compelling.

Data Miner is a free data extraction tool and browser extension that enables users to scrape web pages and quickly collect secure data. It automatically collects data from web pages and saves it in Excel, CSV, or JSON formats.

However, keep in mind that collecting large amounts of data from websites may be a violation of their terms of use or the law. It is important to read the website’s terms of use carefully and act in accordance with all legal rules and regulations before using a browser extension or plugin. You should also review the terms of service of the extension you’re using.

How Journalists Can Use Data Miners

Here are the steps to scrape a website with a data miner browser extension.

1. establish data miner Add-ons to your browser. Add-ons are generally available for browsers like Chrome or Firefox. Find and install the Data Miner add-on from your browser’s add-on store.

Open the target website. Open the website from which you want to scrape data in your browser, and launch its extension – or in other words, find Data Miner in the Extensions/Plugins menu in your browser and open it. The extension is usually located in the upper right corner of your browser.

3. Create a new task/recipe for web scraping. The Data Miner extension has a “My Recipes” option. Click this option to create a new web scraping job. You will be presented with a command screen to continue the mining process.

4. Set options for scraping a website: Data Miner has various options and settings for scraping a website. For example, you can specify what data you want to scrape, and you can set up automated actions, such as page navigation or form filling.

Start scraping the website. Once you finalize the settings, you can start data scraping by clicking the “Scrape” button in the Data Miner extension dashboard. The extension will crawl the website and collect the data you specify. (You can also watch the process in this short video.) Save or export data. You can usually save your scraped data as a CSV file or Excel spreadsheet. You can also copy the screen scraping using the Clipboard feature – a convenient and time-saving feature. If your scraped data exceeds 10,000 rows, it will be downloaded as two separate files.

By following these steps, you can scrape one or multiple websites with Data Miner, and you can run any of over 60,000 data scraping rules, or create your own to extract only the necessary data from a web page. You can create a customized data scraping method as it is possible to create single page or multi page automated scraping.

You can automate scraping and run batches of scraping jobs based on a list of website URLs. Plus, you can use 50,000 free, pre-built queries for over 15,000 popular websites. You can also crawl URLs, paginate them, and scrape a page from a single location – no coding required.

Using extensions also has the following advantages.

