Connecting to Web Data in Microsoft Excel

Connecting to Web Data in Microsoft Excel

SaveToDB Add-in for Microsoft Excel allows connecting to web data sources.

The following web data sources are supported:

  • Web services including OData services
  • Web pages

SaveToDB supports the following web data formats: HTML, XML, JSON, and CSV.

SaveToDB supports the following authorization methods: Windows, Basic, Forms, OAuth 1.0, and OAuth 2.0.

SaveToDB supports the following authorization providers: Google, Facebook, LinkedIn, Twitter, Microsoft Azure Marketplace, Windows Live, Yahoo, and Yahoo API Key.

SaveToDB Standard and Enterprise allows saving data changes to OData web services.

Selecting Provider

Use Gartle Web Data Provider to connect web data.

Connecting Excel to Web - Selecting provider

Connecting to Web Data Source

This step differs from the same step for database connections.

Connecting Excel to Web - Specifying URL

You have to specify the following data:

  • URL
  • Service URL
  • OAuth Provider
  • Scope
  • Accept

When you change the URL, the add-in tries to discover the required parameters in the background.
If the parameters cannot be discovered by the add-in, or other values should be used, you can specify them manually.

Any time you can test the connection with the current connection data.

The Next button is available for successfully connected OData web services as a web service object is selected in the next step.
For other web data sources, the Finish button is used as the specified URL data is inserted in Microsoft Excel.

URL

In general, you can insert the URL from a browser.

SaveToDB also allows redefining HTTP query parameters. See below.

You can also add the pages and rootPath parameters to the end of the URL. See below.

Service URL

The service URLs are defined unambiguously for OData web services only. The service URL is the service root URL that returns the service document.

The service URL for other web services and websites is a URL of any protected area.

A website can contain multiple protected areas. The root URLs of these areas can be used as service URLs.

Such areas cannot be found in all cases. So, sometimes you can need to specify the service URL manually.

Don't worry if the service URL is not defined correctly. In this case, you can be asked for authorization for multiple URLs from the website.

OAuth Provider

SaveToDB supports the following authorization methods:

  1. Windows
  2. Basic
  3. Forms
  4. OAuth 1.0
  5. OAuth 2.0

The first three methods are discovered automatically. The OAuth providers are discovered by website URLs.

You can manually set or clear the OAuth provider.

Scope

The scope is used for OAuth 2.0 providers only. SaveToDB discovers the known scopes for known OAuth providers.
However, the scope requirements are very different for various providers and websites. So, you can specify the required scope manually.

Accept

You can the Accept header by default or tune it if an error is occurred.

Redefining HTTP Query Parameters

The SaveToDB add-in detects parameters of HTTP queries automatically.

The parameters can be redefined in the form {<Parameter name>=<Default value>}.

For example, the URL https://finance.yahoo.com/q/hp?s=GOOG+Historical+Prices

has the "s" parameter with the "+Historical+Prices" suffix to the stock symbol.

You can configure the URL like https://finance.yahoo.com/q/hp?s={Symbol=GOOG}+Historical+Prices

In this case, the Symbol parameter is used to specify the stock symbol only.

The parameters are placed on the ribbon and allow changing the query further.

Pages and RootPath Parameters

;Pages=<number of pages>

The option specifies the number of loaded pages for the specified URL.

SaveToDB tries to find the next page URL and to load the next pages.

For example, you can load option data for multiple expiration dates from Yahoo! Finance:

https://finance.yahoo.com/q/op?s={Symbol=AAPL};pages=20

;RootPath=<root path>

The option specifies the root element of columns to output.

For XML and JSON, SaveToDB parsers look for the first element with the full column name ended with the specified value.

The full column name consists of all column names in the path separated by dots.

For JSON documents, you can specify multiple roots separated by comma or semicolon.

For HTML, you can specify a table number to output, not column name.
To find the right value, just try different values: ;rootPath=1, ;rootPath=2, and so on.

For example, you can load option data for calls and puts from Google Finance:

http://www.google.com/finance/option_chain?q={Symbol=AAPL}&authuser=0&output=json;RootPath=calls,puts;pages=2

Also, you can load Income Statements, Balances, CashFlows from Google Finance using the following URLs:

https://www.google.com/finance?q={Symbol=AAPL}&fstype=ii;RootPath=2
https://www.google.com/finance?q={Symbol=AAPL}&fstype=ii;RootPath=4
https://www.google.com/finance?q={Symbol=AAPL}&fstype=ii;RootPath=6
https://www.google.com/finance?q={Symbol=AAPL}&fstype=ii;RootPath=3
https://www.google.com/finance?q={Symbol=AAPL}&fstype=ii;RootPath=5
https://www.google.com/finance?q={Symbol=AAPL}&fstype=ii;RootPath=7

;CollapsedNodes=<node>[,...]

The option defines the XML nodes that include values of children nodes.

Example:

;CollapsedNodes=passages

;SkippedNodes=<node>[,...]

The option defines suffixes of XML nodes to exclude from the output.

To exclude specific columns, use complete paths with column names separated by dots like parent1.parent2.column.

Example:

;SkippedNodes=.type

;IgnoredTags=<tag>[,...]

The option defines tags to skip as separate columns. The values are included into parent columns.

The typical scenario is ignoring text highlight tags. The option is applicable only with XML.

Example:

;IgnoredTags=hlword

;NoSourceHeaders=True

The option suppresses using HTML table headers as column names.

Use this option to import financial statements from the web with static column names.

;AsIs=True

The option suppresses any special processing of the output data.

For example, the add-in converts Yahoo timestamps to datetime values. You can disable this using the AsIs option.

;RowValues=True

This option applies XML or JSON parsers that print all values in rows.

This option is useful for learning document structures.

Selecting OData Object

This step allows selecting an OData web service object to connect.

OData containers are similar to database query list views.

Using the query list, you can select a container and then an object of the container.

Connecting Excel to OData - Selecting EntitySet or FunctionImport

Specifying Parameters

This step is used to specify parameters of OData FunctionImport objects and HTTP queries.

Connecting Excel to OData - Specifying OData query parameters

Further, you can work with web data like with other database data.