A post on the Tableau Linked In forum got me thinking about how to explain about Tableau Server (version 6-8) Data Sources. A Data Source published to Tableau Server is a convenient method to share the data connection instructions (meta-data) including dimensions, measures, sets and any calculated fields. Tableau covers some of the issues related to Data Sources in its knowledge base article found here.
But, the differences are quite perplexing and subtle, and which combination of tactics is used depends on the speed of the actual data repository connection and the reason for using Tableau server access rights to expose a data source rather than a file based solution (save a .TDS file to a network drive). Below is an explanation of the terms as well as insights into considerations of the various elements.
Extracts - A copy of the data is placed in a proprietary database engine. Use extracts when the direct connect to the data is cumbersomely slow, the data is a snapshot, or the features of the underlying data source lack functionality (like the MS Jet engine)
Data Connection -- When authoring a workbook, data connections are created to obtain data from a specific data repository. Repetitive data connections can be made to the same data repository (a global filter applies to a specific data connection. A use case for repetitive connections is when there are some worksheets created that use a global filter and others in the same workbook that shouldn't have the global filter applied). A data connection is a SQL statement that is comprises of either a single table, multiple tables with join conditions or custom SQL.
Data Source -- This is a Tableau term for the xml instructions that can be saved either to a file or to the server. The actual data is not contained in these files or loaded to the server. Data Sources can include instructions about how and where an extract can be found. A data source only includes the metadata that tells Tableau how to create the data connection and what formulas and native data elements exist in that data connection. It does not include formatting.
You can make it seem that the data is on the tableau server. When an extract is saved to a network location (use the UNC filename, not mapped drives) that the server and user can also access, then when a user selects 'Import Data Source' from the server, the data, dimensions, measures and sets appear in the workbook because the user has access to the data extract that is defined in the data source, not because the data resides on Tableau Server.
Version 6 permits saving extract files to a network drive, while prior versions required extracts to reside on a local drive. Extracts on a network drive can impact performance.
Publishing Data Sources to a server is a method to share a common set of formulas, etc. with other server users (the publisher chooses who to give permissions). If the connection requires database permissions, then those credentials will have to be entered by the user to access the actual data. However, come caveats:
1. Read only vs. update permissions to the extract located on a network drive have to be managed by network drive access permissions.
2. If the network extract file is open by any process, then the extract cannot be updated. A network drive extract file can have several workbooks read the file at the same time.
3. A workbook with update file rights can add dimensions, measures, sets to the data source and re-generate the extract. The new dimensions aren't shared, however, unless the data source is re-published to the server. When other workbooks open, they see a message "A newer version of the published data source is available. Would you like to use the latest version?" Answering no means you get to the data source as before, but use the old XML definition that was in the workbook. Some elements could be broken in your workbook due to renamed or removed items from the extract file. Answering yes brings in the new elements.
If the data connection was saved with instructions to use it as an extract, then a popup appears to either:
1. locate the the extract (which could be on a network drive)
2. remove the extract (it will then require appropriate permission to use the data connection)
3. Deactivate the extract (keeps the extract info in the xml file, but doesn't use that information -- instead requires appropriate permission to use the data connection) If the extract file didn't exist before, an error will occur when attempting to regenerate the extract.
4. Regenerate the extract (careful! this regenerates the extract in the same location where it was originally generated - so to do this you will need permission to use the data connection and permission to save the extract wherever it was - it shows you a shortened version of the save location, but you don't have a choice to change this location)
So, lets complicate it some more:
When you publish a workbook to the server and it uses an extract, the extract is pulled from the network/local drive and placed on the server as part of a packaged workbook. This is a copy of the extract and not the local/network located data source. So, if the loc data source is modified and saved to the network, the Tableau server version of the workbook doesn't pull from that location, instead it pulls from the extract copy. The meta-data about the data source is frozen as of the time that the server version of the workbook was published. To get the server version to recognize changes, the workbook needs to be re-published with the updated data source information to be embedded into a packaged workbook.
I recently had an issue where I needed to have both a live connection to an Excel workbook and a MS SQL Server connection as a data extract in the same workbook. By saving the workbook as a twb file, the live connection to the local file remained active. If I saved the workbook as a twbx, then a copy of the Excel file was embedded in the workbook and the live connection was lost. So, here comes the tricky part. I wanted to publish this workbook to Tableau Server with a live connection to the Excel file. The problem happens when publishing, Tableau automatically 'packages' the workbook for consumption to the server when there is an extract present. Packaging is another term for what happens when a twbx is created. So, when I published the workbook to the server, the Excel file connection no longer pointed to the local resource (I used UNC naming convention to point it to a network share that the Tableau server could access). To overcome this issue, I had to publish the Excel workbook as a datasource to the Tableau Server -- but unchecking the box 'Include External Files', thereby making the datasource a live read to the Excel file. Last, I created a new data connection to the this Tableau server datasource, then replaced the original Excel Datasource with this new one. Now, when I publish the workbook to the server, the live connection doesn't break, as it is defined by the published datasource, not by the workbook. Even as a packaged workbook, the live link remains working.