Web Scraper Firefox Extension
April 17, 2019
Since Firefox team has implemented the same browser extension API that Chrome has, we decided that we could also publish Web Scraper on Firefox. Without much effort we got Web Scraper running. You can install the Firefox addon from here - https://addons.mozilla.org/en-US/firefox/addon/web-scraper/. Ever since Firefox got its huge “Quantum” update back in 2018, Mozilla’s popular browser has been on a resurgence. It’s faster than ever, and the back-end overhaul meant that extension developers had to redesign their apps to work with Firefox Quantum. The Web Extensions API is compatible with other browser and removes the overhead of developing the same solution for different platforms. You can download Firefox version of Web Scraper here. If the Firefox version isn't behaving as expected please let us know by posting a bug report in Web Scraper Forum. For FireFox you can use User Agent Switcher extension. For Chrome there is currently no extension, but you can set the User Agent from the command line at startup: chromium-browser –user-agent=”my custom user agent” For Internet Explorer you can use the UAPick extension. And for Python scripts you can set the proxy header with. Web Scraper Web Scraper is a chrome browser extension and a library built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data.
Sitemap.xml, Release
We are happy to announce that Web Scraper 0.4.0 has been released. This release contains a new selector, updates to other selectors and improved CSS selector generator. Starting from version 0.4.0 Web Scraper is also available in Firefox.
Sitemap.xml link selector
Many websites want to be crawled by scrapers. For example, news outlets want their articles to appear in search engine results. In order for this to happen, a search engine has to crawl the entire site. The site can make this work more efficient by listing all of the relevant URLs in a sitemap.xml file. This makes the job for a crawler more efficient and also ensures that everything within the site is being indexed.
With Sitemap.xml Link selector you can leverage this feature to access all of the relevant URLs in a site without having to build a path through the site using the Link selectors for navigation and pagination. With a single selector you can access every product page in an e-commerce site. It is always worth checking out whether the site has sitemap.xml
files before creating other selectors, as using this method can speed up the scraper configuration significantly.
When using the Sitemap.xml Link selector use the Add from robots.txt
button to automatically discover sitemap.xml
links. If no links are discovered you can conduct a manual check whether a example.com/sitemap.xml
page exists. Add child selectors under the Sitemap.xml Link selector that extract data from URLs that the sitemap.xml
file leads to.
Element click selector
With this release it is now possible to add an Element Click Selector under another Element Click Selector. With this feature you can go through multiple product color/size variations within a single product page to get the SKU and the price for every variation.
You can also now use element click selector to click through options within a <select>
element.
Element scroll down selector
Element scroll down selector now scrolls down with a smooth animation. It will additionally try a few tricks to trigger the data load event within the website. Generally the Element scroll down selector isn't as reliable as Link selectors but with this update it should also work in some additional edge cases.
Firefox
I'll start by saying big thanks to Firefox team. They have done a lot work in order to bring the Web Extensions API into their browser. The most painful part of this probably was that they had to remove their previous add-on API with all of the add-ons that developers had been building for years. Despite this, this was a good choice that they made. The Web Extensions API is compatible with other browser and removes the overhead of developing the same solution for different platforms.
You can download Firefox version of Web Scraper here. If the Firefox version isn't behaving as expected please let us know by posting a bug report in Web Scraper Forum.
CSS Selector generator
Bugmenot Firefox Extension
When you are selecting an element within a page, Web Scraper generates a CSS selector. In this release we made some improvements to the CSS Selector generator. When generating a CSS Selector the generator will additionally try to use element attributes and their values. Additionally it will generate better CSS selectors for description lists using the :contains()
selector. We made some additional tweaks to reduce the use of order based selector :nth-of-type()
which frequently doesn't work well across multiple pages.