Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The Need

Data in Synapse is normally accessed via the Synapse client but Synapse allows access to select data directly in Amazon S3. The original, motivating use case is to support access to data uploaded by the Bridge Exporter. The details are given here:

...

Here is the documentation for the use of the AmazonS3 connector. I believe the code is here. To use Spark with Synapse (without the STS token feature) would require a Hadoop Synapse connector. Note that Spark uses data partitioning, nominally based on file path hierarchy, so that may have to be reflected in the connector.

Potential Solutions

Separate S3 Access from Synapse

In this approach, Synapse would merely track user permissions, but have no representation of the object being accessed. The user info returned by Synapse could include governance related information as well as permissions. A separate application (acting as an OIDC client) would authenticate a user through Synapse, retrieve user permissions, and contain the logic that creates STS tokens for approved users. Note that such an application could gate access to a variety of AWS resources (EC2 instances, clusters, batch compute services), providing the appropriate STS token for the service of interest.

Synapse S3 Bucket Entity

In this approach Synapse would have an entity representing the S3 bucket, or a folder within, but would not represent the contained objects. The entity could have an access control list and governance settings (access requirements) and Synapse would contain the logic by which STS tokens are generated.

Maintain STS Folder Feature, With Limited Flexibility

Fix All Known Edge Cases