LLM integration with the Synapse API

A part of PLFM-8502, we explored options for adding LLM integration into Synapse. Specifically, we looked at using Agents for Amazon Bedrock as possible mechanism to incorporate LLMs into Synapse. As part of that issue we created a proof of concept application that uses a bedrock agent to support chatting with an LLM. The agent was configured with a set of functions that agent could use to request more information from Synapse to support the user’s chat requests. For more information about this proof of concept application see its github project: Synapse-Agent-Lambda.

Summary of Results

When asked to perform search tasks, the agent would correctly invoke the appropriate action group function to fetch the user requested data from Synapse. While it would often take more than 10 seconds for the agent’s LLM (Claude 3 Sonnet by Anthropic in this case) to process and respond to the data, I found the resulting summaries to be useful. Once the desired data was loaded and processed by the agent, the response times improved. The agent was able to provide insightful information about the wiki pages that it digested. A user with a bit of patience would likely find such insights to be useful in understanding some of the complex projects to be found in Synapse.

Initially, I tried to use fancy “prompt engineer” to get the agent to do exactly what I wanted. For example, I tried to get the agent to always show search results as a table with specific columns. Occasionally, the agent would follow such prompting but other times it would not. In fact, each session with the agent would produce slightly different results. I soon found that I could simply tell the agent to provide the desired results during any interaction. Therefore, I decided to be less prescriptive with the configuration “prompt engineering” and trust that the user will be able to ask for what they want during their sessions.

Each new action group function that was made available to the agent improved the quality of interactions. Basically, the more data the agent has available to it the better the results.

Recommendation

Based on my experience with the agent application, I believe a bedrock agent integration with Synapse would provide value to some of our users. Ideally, the integration would include the following action group functions:

Search - The agent can execute Synapse searches on the user’s behalf
Entity metadata:
- Annotation
- View/Table schemas
- ACLs
- Access Requirements
- children of containers
Table and view query support - Train the agent to execute SQL command on the user’s behalf.
User profile information - Provide information about who created/modified Entities
Help documentation - The agent should be able to read the help docs on the user’s behalf to answer questions.
Add Files to a User’s Download List and help them understand and meet access restrictions.
Search and read project's forum posts
Anything else we think would be useful.

Proposed API

Feature Overview

Based on the proof of concept application, I believe we can add a new asynchronous worker that wold work in the same way. Basically, the worker would accept text (for now) input from the user and and process the result via invoke_agent requests. The following diagram illustrates an example call:

In this example we can see that the worker forwarded the user’s request to the bedrock agent with an invoke_agent call. However, the agent determined that it did not have enough information to process the user’s request. So, the agent responded with a “return control” result that informed the worker that the agent would like to execute a Synapse search using the term “cows”. After running the search query for the agent, the worker sends the search results back to the agent with a second invoke_agent call. Since the search results contain truncated wiki data, the agent once again decides it needs more information to respond to the user’s request. The second response from the agent is also a “return control” that requests for the full description for entity syn123. The worker fetches the requested description and then forwards it back to the agent by making a third invoke_agent request. At this point the agent has enough information to respond to the user’s original request. The worker then returns the results from the agent directly to the user.

Sessions

The bedrock agent is capable of running many concurrent conversations at the same time. Each conversation has its own unique session ID that isolates the data associated with that conversation. The agent stores the data for each sessions conversation in its “memory” (managed by AWS). Therefore, the worker must assign and track each user’s session ID for each conversation. This will allow a user to have a single, long running, conversation with an agent that spans multiple asynchronous worker calls over time. The session IDs also isolate each user’s conversations from each other.

Authorization

Each asynchronous worker call is already authenticated at the API level. This means the worker knows who the users is that made the call. During a conversation, each time the agent requests more data from the worker, the worker will only provide information that calling user is already authorized to see. Since the agent only sees what the user can see, the agent never needs to do any additional authorization filtering. Since each user’s conversational data is isolated to that user’s session, it is not possible for the agent to provide the user with data that the user is unauthorized to see.

Cost

Bedrock agents are “serverless”. This means we do not need to pay to keep any (new) server machines running. Instead, AWS bedrock charges us for call to invoke_agent based on the how many tokens are exchanged with the underlining LLM during each user’s conversation. This means we only pay for the feature if users actually use it.

It is difficult to predict exactly how much this feature would cost as number of tokens exchanged with the LLM is not directly in our control, nor is it fully transparent. However, I did look at the bedrock costs under the developer account that I have been using to test the prof of concept application. I have run dozes of experiments and our total cost for Claude 3 Sonnet ( Bedrock Edition) in that account is $0.03.

Web Services

Response	url	Request	Description	Authorization

Response	url	Request	Description	Authorization
AgentSession	POST /agent/session	CreateAgentSessionRequest	Used to start a new session with the Synapse agent.	A user must be authenticated to make this call.
ListAgentSessionsResponse	POST /agent/sessions/list	ListAgentSessionsRequest	List all of the user’s sessions starting with the latest.	Will only list sessions that belong to the caller.
SessionHistoryResponse	POST /agent/session/history/{sessionId}	SessionHistoryRequest	Get a single page of a session’s conversation history.	Only the owner of the session may make this call.
JobState	POST /agent/chat/async/start	AgentChatRequest Start	Start an asynchronous job to send a chat message to the Synapse agent.	A user must be authenticated to make this call. The provided sessionId must belong to the caller. The agent’s access level will be determined by the level selected by the users for the session.
AgentChatResponse	GET /agent/chat/async/get/{job_id}		Get the results of an asynchronous job that contains the Synapse agent’s response to a user’s request.	Authentication required. Only the user that started the job can get its results.
TraceEventsResponse	POST /agent/chat/trace/{job_id}	TraceEventsRequest	Get a single page of trace events associated with an AgentChatRequest job. Note: The AgentChatRequest.enableTrace must be set to “true” to enable tracing for a job.	Authentication required. Only the user that started the job can get its results.
AgentRegistration	PUT /agent/registration	RegisterAgentRequest	Idempotent registration of a custom agent.	Only internal users (Sage employees) will be authorized to register agents.
AgentRegistration	GET /agent/registration/{agentRegistrationId}		Get the details of registered custom agent using its ID.	A user must be authenticated to make this call.

Object Models

CreateAgentSessionRequest:

{
	"description": "Information about a specific session (conversation) with an agent.  Only the acess level can be changed on an existing session.  You will need to start a new session if you wish to use a different agentId.",
	"properties": {
		"agentAccessLevel": {
			"description": "Required. Specifies the access level that the agent will have during this session only.",
			"$ref": "org.sagebionetworks.repo.model.agent.AgentAccessLevel"
		},
		"agentRegistrationId": {
			"type": "string",
			"description": "Optional. When provided, the registered agent will be used for this session.  When excluded the default 'baseline' agent will be used."
		}
	}
}

AgentAccessLevel:

{
	"description": "Defines the level of data access that the agent will be given during a session.",
	"name": "AgentAccessLevel",
	"type": "string",
	"enum": [
		{
			"name": "PUBLICLY_ACCESSIBLE",
			"description": "The agent will only have access to ALL data that is already publicly available to anyone."
		},
		{
			"name": "READ_YOUR_PRIVATE_DATA",
			"description": "The agent will have access to ALL data in Synapse that you have 'read' access to.  The agent will also have access to ALL data that is publicly available."
		},
		{
			"name": "WRITE_YOUR_PRIVATE_DATA",
			"description": "Grant the agent permission to make data changes within Synapse on your behalf. Any change the agent makes will be attributed to you.  Under this level the Agent will also have access to ALL data in Synapse that you have 'read' access to.  The agent will also have access to ALL data that is publicly available."
		}
	]
}

AgentSession:

{
	"description": "Information about a specific session (conversation) with an agent.  Only the acess level can be changed on an existing session.  You will need to start a new session if you wish to use a different agentId.",
	"properties": {
		"sessionId": {
			"type": "string",
			"description": "The unique identifier for a conversation with an agent.  The sessionId issued by Synapse when the session is started.  The caller must provided this sessionId with each chat request to identify a specific conversation with an agent.  A sessionId can only be used by the user that created it."
		},
		"agentAccessLevel": {
			"description": "Specifies the access level that the agent will have during this session only.",
			"$ref": "org.sagebionetworks.repo.model.agent.AgentAccessLevel"
		},
		"startedOn": {
			"type": "string",
			"format": "date-time",
			"description": "The date this session was started."
		},
		"startedBy": {
			"type": "integer",
			"description": "The id of the user that started this session"
		},
		"modifiedOn": {
			"type": "string",
			"format": "date-time",
			"description": "The date this session was last modified."
		},
		"agentRegistrationId": {
			"type": "string",
			"description": "Identifies that agent that will be used for this session.  The default value is null, which indicates that the default agent will be used."
		},
		"etag": {
			"type": "string",
			"description": "Will change whenever the session changes."
		}
	}
}

ListAgentSessionsRequest:

{
	"description": "Request a single page agent sessions for the current user.  The session are ordered by 'startedOn' descending.",
	"properties": {
		"nextPageToken": {
			"type": "string",
			"description": "Forward the returned 'nextPageToken' to get the next page of results."
		}
	}
}

ListAgentSessionsResponse:

{
	"description": "A single page of agent sessions",
	"properties": {
		"page": {
			"type": "array",
			"items": {
				"$ref": "org.sagebionetworks.repo.model.agent.AgentSession"
			}
		},
		"nextPageToken": {
			"type": "string",
			"description": "Forward this token to get the next page of results."
		}
	}
}

AgentChatRequest:

{
	"description": "Send a chat message to the Synapse chat agent",
	"implements": [
		{
			"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousRequestBody"
		}
	],
	"properties": {
		"sessionId": {
			"description": "The sessionId that identifies the conversation with the agent.",
			"type": "string"
		},
		"chatText": {
			"description": "The user's text message to send to the agent.",
			"type": "string"
		},
		"enableTrace": {
			"description": "Optional. When trace is enabled, the agent will include information about its decision process and the functions/tools it will use to process this request. Default value is false.",
			"type": "boolean"
		}
	}
}

AgentChatResponse:

{
	"description": "The response to an agent chat request.",
	"implements": [
		{
			"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousResponseBody"
		}
	],
	"properties": {
		"sessionId": {
			"description": "The sessionId that identifies the conversation with the agent.",
			"type": "string"
		},
		"responseText": {
			"description": "The agent's text response to the user's request",
			"type": "string"
		}
	}
}

SessionHistoryRequest

 {
	"description": "Request a single page of a session's history.  The history is ordered by the interaction time stamp descending.",
	"properties": {
		"nextPageToken": {
			"type": "string",
			"description": "Forward the returned 'nextPageToken' to get the next page of results."
		}
	}
}

SessionHistoryResponse:

{
	"description": "A single page of an agent session history",
	"properties": {
		"sessionId": {
			"description": "The session ID of this conversation's history",
			"type": "string"
		},
		"page": {
			"description": "A single page of a session's history.  The history is ordered by the interaction time stamp descending.",
			"type": "array",
			"items": {
				"$ref": "org.sagebionetworks.repo.model.agent.Interaction"
			}
		},
		"nextPageToken": {
			"type": "string",
			"description": "Forward this token to get the next page of results."
		}
	}
}

Interaction:

{
	"description": "Represents a single interaction between the user and an agent.",
	"properties": {
		"usersRequestText": {
			"type": "string",
			"description": "The text of the user's request"
		},
		"usersRequestTimestamp": {
			"type": "string",
			"format": "date-time",
			"description": "The time stamp when the user made the request"
		},
		"agentResponseText": {
			"type": "string",
			"description": "The text of the agent's response"
		},
		"agentResponseTimestamp": {
			"type": "string",
			"format": "date-time",
			"description": "The time stamp when the agent produced the response."
		}
	}
}

TraceEventsRequest:

{
	"description": "A request to get a single page of trace events for a specified asynchronous job.",
	"properties": {
		"jobId": {
			"type": "string",
			"description": "The job ID issued when the agent chat request job was started."
		},
		"newerThanTimestamp": {
			"type": "integer",
			"description": "When a timestamp value is provided, only trace events that occurred after the provided timestamp will be included in the results."
		}
	}
}

TraceEventsResponse

{
	"description": "A single page of an agent trace events for an asynchronous agent chat job. The events are sorted by timestamp ascending.",
	"properties": {
		"jobId": {
			"type": "string",
			"description": "The job ID issued when the agent chat request job was started."
		},
		"page": {
			"description": "A single page of trace events.",
			"type": "array",
			"items": {
				"$ref": "org.sagebionetworks.repo.model.agent.TraceEvent"
			}
		}
	}
}

TraceEvent:

{
	"description": "Represents a single trace event generated during an agent chat asynchronous job request.",
	"properties": {
		"timestamp": {
			"type": "integer",
			"description": "The time stamp identifies when the agent generated this trace event.  It is also used to uniquely identify this event within the context of this asynchronous job."
		},
		"message": {
			"type": "string",
			"description": "The trace text message generated by the agent while processing a chat request."
		}
	}
}

AgentRegistrationRequest.json

{
	"description": "Request to register a custom AWS agent with Synapse.  Currently, only internal users are authorized to register custom agents.",
	"properties": {
		"awsAgentId": {
			"type": "string",
			"description": "The AWS issued agent ID of the agent to be registered."
		},
		"awsAliasId": {
			"type": "string",
			"description": "The AWS issued agent alias ID of the agent alias to be used. Optional. If an alias is not provided then 'TSTALIASID' will be used."
		}
	}
}

AgentRegistration.json

{
	"description": "The registration of a custom AWS agent.",
	"properties": {
		"agentRegistrationId": {
			"type": "string",
			"description": "The unique ID issued by Synapse when this agent was registered. Provide this ID when starting a session to use the registered agent for a session."
		},
		"awsAgentId": {
			"type": "string",
			"description": "The AWS issued agent ID of the agent."
		},
		"awsAliasId": {
			"type": "string",
			"description": "The AWS issued agent alias ID. If an alias ID was not provided, a default value of 'TSTALIASID' will be used."
		},
		"registeredOn": {
			"type": "string",
			"format": "date-time",
			"description": "The date this agent was registered."
		}
	}
}