FAW BOT is a crawler that allows you to search for all the URLs of the pages linked to the home page from which the search is started. The most interesting feature of this tool is the ability to search for pages and information in websites protected by credentials (example: Facebook, Linkedin, etc …).

The FAW BOT toolbar is visible below (Fig. 1)


Fig. 1

The following buttons are present (Fig. 2):


Fig. 2

(1) Navigation
Allows you to browse like a normal browser

(2) Search
The URL search begins

(3) Save
Save the list of URLs found in an xml file that can be opened with FAW MULTI

(4) Go to FAW
Export the list of URLs found directly in the FAW MULTI tool for automatic acquisition

(5) Get Height
Try to get the total height of the web page displayed. The value obtained is shown in the textbox next to the button.

 

In addition to these five buttons, the toolbar also contains the following options:

Sub Domains
Choose “yes” if you want the search to also include sub domains, otherwise leave “no”.

Max Level
Allows you to choose the depth (levels) in which the search is to be performed.

Acquisition page height
If “auto” is selected, the tool tries to find the total height of the page displayed (be careful, using this feature could slow down the search a lot).

Filter
It is a text box that can be used to filter results using regular expressions. For example, if you want only the pages that contain the word “business” just write in the Filter: business box; if you want only the pages that contain any e-mail address just write in the box Filter: ^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

 

The correct steps, therefore, to use the search with the FAW BOT tool are the following:

  1. In Navigation mode, go to the page from which the search is to start.
  2. Set your search options: sub domains, levels, page height and filter.
  3. Press the [Search] button
  4. At the end of the search click on the [Save] button, the search results will be saved in the Case folder with the name ResultsBOT001.xml, in case of multiple searches the files will be called ResultsBOT002.xml, ResultsBOT003.xml… etc.
  5. At this point, if you want to start acquiring websites, you must click the [Go to FAW] button, in this way FAW BOT closes and the FAW MULTI tool opens for the automatic acquisition of the pages found.

 

Structure of the ResultsBOT001.xml file

An example of the content of the xml file with the search results is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<CrawlerResults>
<CrawlerResult>
<Height>0</Height>
<Url>https://www.testurl.com/</Url>
<TimeSecondsFrom>0</TimeSecondsFrom>
<TimeSecondsTo>0</TimeSecondsTo>
</CrawlerResult>
<CrawlerResult>
<Height>1550</Height>
<Url>https://www.testurl.com/contacts/</Url>
<TimeSecondsFrom>10</TimeSecondsFrom>
<TimeSeconsTo>10</TimeSecondsTo>
</CrawlerResult>
<CrawlerResult>
<Height>9000</Height>
<Url>https://www.testurl.com/feed/</Url>
<TimeSecondsFrom>30</TimeSecondsFrom>
<TimeSecondsTo>30</TimeSecondsTo>
</CrawlerResult>
<CrawlerResult>
<Height>748</Height>
<Url>https://www.testurl.com/comments/video/a1s4frk/</Url>
<TimeSecondsFrom>15</TimeSecondsFrom>
<TimeSecondsTo>220</TimeSecondsTo>
</CrawlerResult>
</CrawlerResults>