The Parseur extension to the Google Chrome Browser allows you to extract data and scrape web pages. Using this useful Chrome extension to automatically send webpage to Parseur. Parseur will then be able to parse the content of the page and extract data you need. What’s more, with a little trick you can even go one step further and automate crawling of and scarping of webpages.
First you need to install the Google Chrome Web Browser. If you haven’t already, here is the Chrome installation page.
Step 1: Install the extension
Go to the Chrome Web Store and search for “Parseur”, or click here to directly access the Parseur extension page.
Then click “Add to Chrome“:
You should then see the extension in the top right corner of your browser:
Step 2: Use the extension to send web pages to your Parseur mailbox
First, create your Parseur account (if you don’t have one already): Enter your email, password and name here.
Then, follow the instructions to create your first mailbox.
Once created, copy the mailbox address by clicking the “copy to clipboard” button, as shown below:
Then, go to a webpage that you’d like to parse and click the Parseur extension icon:
Then paste (or type) your mailbox address in the mailbox field:
You may have to remove the @in.parseur.com part from your mailbox name. Don’t worry, you’ll only have to type this in once.
Then click send!
Your web page will show up as a new document in your mailbox!
Note: unlike emails, we can’t always display the web pages exactly as they show up in your browser. The page may look bare but all the data from the original web page should be here, though.
Step 3: scrape web pages and extract data
You can now create a template for this web page and extract interesting data from it, including tables! Parseur will guide you through the creation of your first template. If you required more information on how to create your template, follow our Getting Started Guide.
Step 4 (optional): Automate web crawling and scrape more pages
Sometimes, the data you need is spanned across several pages. That can be the case, for instance, if you have a large paginated table. In that case you can automatically and programmatically move from one page to the other using “Linked Document” fields.
A “Linked Document” field is a field that captures a URL and will then automatically download that URL as a new document in Parseur. Follow this link to learn how to setup a Linked Document field.
If the downloaded document is in the same format as the original one, data will be automatically extracted using the template you created at step 3. If the document is in a different format, Parseur will ask you to create a new template.