Universal Social Data Structure

There are many excellent social applications on the Internet confined to boutique audiences due to the audience critical mass bar being set very high. When competing with the likes of Google+, Twitter and facebook, with audiences measured in the hundreds of millions, other applications are considered niche aplications of silo'ed applicability.

The Universal Social Data Structure (USDS) of the Universal Social Data Protocol (USDP) is intended to level the playing field through the interoperability of a common protocol across all participating social platforms. Rather than creating many small audiences of participants weary of the invasive practices of the major social networks, the specialized private network platforms can now all support one large audience of motivated participants.

Overview

The USDS is broken into the four primary categories of "Descriptor Data," "Sender Information," "Recipient Information" and "Cargo," with the "Cargo" being the data passed and it's associated descriptors. In order to simplify the data structure all data is not neatly tucked into data structure subcategories. The sender data is collected within the "Source" substructure, and the recipient data is collected in the "Dest" substructure, because of the duplication in subelement names. The cargo and descriptor elements of the structure are first level elements for simplicity and easy access. Some cargo do however have substructures to handle disctinct and multiple data elements within them.

Special Note

Keep in mind while examining this description of the USDS that this structure, and the protocol it supports, are not intended to eliminate the third party from the conversation, only to localize it. This is not a peer-to-peer protocol, but intended to be brokered through a personal social engine that is the primary recipient and distributor of this data per predefined rules. This social engine is also known as a "message broker" in some technical circles. We think of it as each person, family or business's personal online social advocate. Hence the name of "Fautore," french for "Advocate," that we have given our prototype OSA social engine.

This advocate model allows for lightweight communication between well known clients and servers and other known social advocate engines.

OSA JSON

JSON is the medium chosen to provide structure when passing data to, from and within the OSA POD. JSON (Java Script Object Notation) is a light weight transport mechanism for communications with OSA embedded functions/signals. This mechanism and details herein describe communication transport via HTTP over TCP/IP.

References

www.json.org
www.json.org/JSONRequest.html
www.ietf.org/rfc/rfc4627.txt

Current Limitations

Only JSON strings and numbers are allowed.
The data contained within any value (name/value pairs) must be escaped according to JSON specifications in RFC4627, Section 2.5 "Strings".
BINARY values fall outside of the JSON specifications. We are looking at alternatives (see later discussion on Binary JSON).

Planned Enhancements

Process binary in JSON to pass images, etc.. directly
Implementation of streaming to split broadcast point to point video where the OCE simply provides authentication and serves as a switch.

USDS JSON Structure

The OSA USDS relies on one JSON data structure designed to handle social data universally. With the goal in mind of providing a universal communication medium the protocol is being called the Universal Social Data Structure, or USDS. OSA uses the primary data structure to pass all message data. Any data or API calls are passed as JSON data within the USDS as Adjunct data in the "Adjunct/Data" field.

The third, "Admin" USDS message type is not yet implemented within OSA.

Descriptor

The descriptor information is a set of two fields at the top level of the data structure used to declare the privacy level of the message and identify the data being passed as one of 3 types

msgType: Required string type holding one of three values defining the type of message being passed.

Value	Description
qMsg	This message type indicates that the data being passed is intended to be processed for redistribution. Only security based processing occurs before passing to the OSA. The bulk of all messages being passed will be of qMsg type.
oceOp	The osaOp message is passed when needing to transact directly with the OCE for such activities as registering an application as an authorized client. These messages are reparsed to extract the OCE instruction oriented JSON for the necessary function call and derive the necessary function call.
oceAdm	For use with specialized clients for changing operational settings of the OCE. These messages are processed similarly to the osaOP type with the applicable JSON being store in the "Adjunct/Data" segment of the main message structure.

msgKey: A unique key assigned to a qMsg type intended to prevent perpetual message send looping. If null upon receipt (or entirely undefined) the OCE must create a msgKey of low probability for duplication. If the msgKey exists it cannot be changed. This assures that an originating OCE sets a message key and every OCE in the routing path has opportunity to identify messages it has already received. The idea here is not to stamp every messageever created with an individual unique key, but to identify all messages sent within a reasonable period of time to prevent broadcast storms made exponentially bad by looping on the same message. An OCE may time out a msgKey for it's own filtering to prevent the creation of an infiinitely deep history pool, but msgKey's should not be reused within close proximity or they stand a good chance of being falsely discarded by other OCE's. Intentional duplication of an existing key is in violation of the OSA standard.

Visibility: This highly encouraged value is a flag indicating the minimum visibility rating that a receiving entity must carry. Private messages are not to be forwarded outside the receiving POD. The default of this flag must be zero (0) at the OSA message level. The OCE may however default the value it places in this element to any OSA defined value. The idea behind this is that the OSA is a substrate layer beyond the control of the Internet participant and needs to default toward a common social acceptability level of trust. OCE selection is an individual decision of the Internet participant regarding how they would like the defaults of the communication tool they use to exist. In short, inbound without a value must always be treated as reasonably private. Outbound message "Privacy" field defaults can be OCE specific.

Note: These are early value definitions and subject to change as real world prototype use necessitates refinement.

Value Name Characteristics Description

3 Internalized Data remains in application or OCE and available only to OCE Internally managed data, OCE extension i.e. performance enhancing cache application

2 Secure Data Hosted within POD and encrypted with limited access Locally stored, encrypted, limited access i.e. address book

1 Safe Data hosted within POD, visible within POD only Locally stored i.e. Application to store files locally

0 Normal Data hosted in POD, visible to limited audience Subject to normal trust relationships, could be screen-scraped. i.e. password protected gallery hosted within POD.

-1 Shared Data transferred outside POD to well known recipients Transfer of data ownership outside POD to trusted entities. SMTP server where all OCE initiated data transfers can only be relayed to recipients registered to the OCE. All OCE-to-OCE communications are of rating negative 1.

-2 Free Publically available data hosted within POD An unprotect gallery hosted within the POD available for anonymous viewing.

-3 Media Transfer of data to untrusted entities Transfer of data to commercial entity or service. i.e unrestrained SMTP server capable of send OCE initiated data to any email address.

Value	Name	Characteristics	Description
3	Internalized	Data remains in application or OCE and available only to OCE	Internally managed data, OCE extension i.e. performance enhancing cache application
2	Secure	Data Hosted within POD and encrypted with limited access	Locally stored, encrypted, limited access i.e. address book
1	Safe	Data hosted within POD, visible within POD only	Locally stored i.e. Application to store files locally
0	Normal	Data hosted in POD, visible to limited audience	Subject to normal trust relationships, could be screen-scraped. i.e. password protected gallery hosted within POD.
-1	Shared	Data transferred outside POD to well known recipients	Transfer of data ownership outside POD to trusted entities. SMTP server where all OCE initiated data transfers can only be relayed to recipients registered to the OCE. All OCE-to-OCE communications are of rating negative 1.
-2	Free	Publically available data hosted within POD	An unprotect gallery hosted within the POD available for anonymous viewing.
-3	Media	Transfer of data to untrusted entities	Transfer of data to commercial entity or service. i.e unrestrained SMTP server capable of send OCE initiated data to any email address.

Sender Information (Source)

Sender information is stored in the "Source" segment of the data structure. The Source segment has four data elements to identify the sender.

SubField	Type	Disp	Description
OCE	string	opt	The unique key for the OCE that is sending to a remote OCE. This field is listed as optional only because it should not be used when the source is an app. This is a required field if the source is an OCE sending to an app or another OCE.
AppKey	string	opt	The application key given to this software client when registering with the OSA. This field should only be blank in cases a message is OCE generated such as an OCE status alert.
AppId	string	opt	Application developer supplied key for use in supporting like-application default routing. The first value in this field acts as a categorical designator so that the qMsg will default delivery to any application of the same category. A second, colon separated value specifies the application to which the data should go. The second value can be thought of as the "preferred" application to receive the qMesg data. No field however; can override OCE member instructions for delivery.
Member	string	req	The member identifier assigned to the user of the client application that is generating the data structure. This key would be provided by the OCE at registration time.

Recipient Information (Dest)

Recipient information is stored in the "Dest" segment of the data structure. Note that the destination application cannot be specified. It is either derived from source application or specified in the OCE instruction definitions.

SubField	Type	Disp	Description
OCE	multi	opt	Unique key of destination OCE. This value is optional in that the local OCE can derive the destination OCE from other provided data. A remote OCE destination should be able to be derived from a remote member specification by querying data to derive what OCE they belong to. Default, if no OCE can be derived, is to be a local OCE destination. This value can be a single OCE key, a comma separated list of OCE keys, or a hash of hashes where the OCE ref is the key and the subordinate values to each key represent the other typical values found in a standard single value "Dest->OCE" reference.
Coterie	multi	opt	The inclusion of a Coterie id recognized by the OSA will have the data passed to all members of the Coterie as defined by the OCE instructions.This value can be a single Coterie key, a comma separated list of Coterie keys, or a hash of arrays where the OCE ref is the key and the array is a list of destination OCE members intended to receive the message.
Group	string	opt	Name of a group of Coterie members. This value is always a single string, but can be a comma separated list. This is only processed for messages coming to the OCE from applications. The Group data element is not available for messages being sent between OCE's.
Commons	string	opt	The id of the Commons intended to receive the message data. This value is always a single string, but can be a comma separated list.
Member	string	opt	The member identifier assigned to the intended recipient(s) being targeted to receive the data. This value is always a single string, but can be a comma separated list. An absence of member id will leave the message to be distributed according to the broadcast rules defined at the OCE for the sending application.

Cargo

The message cargo is made up of several fields designed to deliver different types of data. With the exception of the "Object" cargo container the definitions for the data types belonging in the containers are more guidelines than anything. The intent is to make management and delivery of the data as easy and flexible as possible.

Summary: Optional, 164 characters - A field of short data for use in minimum content situations like picture captions, snippet communications

Detail: Optional, virtually unlimited - Larger text field to carry articles, correspondence, weBlogs, long photo descriptions.

Object: Optional, binary transport, This Object section of the USDS is a JSON array of objects that originated as binary data. Binary data is encoded at the client and decoded at the recipient. The OSA does not natively alter the Object cargo in transport.

SubField	Type	Disp	Description
Type	string	opt	Mime type of the associated object in the list of cargo objects.
Encoding	string	opt	Type of binary encoding. Currently only base64 is supported. Additional encodings and a OSA specific format are in the works.
Data	string	opt	Actual encoded data being passed. The USDS does not currently handle straight binary data due to limitations of the underlying JSON protocol.
Title	string	opt	Summary description of the provided object. Master record "Summary" field will be used, when provided, if this field is empty.
Detail	string	opt	full descriptive text for object.Master record "Detail" field will be used, when provided, if this field is empty.

Adjunct: Optional, text, The adjunct data container is intended for passing application specific data. It is used to pass JSON formatted instructions to the OSA social engine. It can also be used to pass handling instructions to any other destination application conforming to OSA standards. The "Adjunct" section could become an array to handle the passing of multiple instruction sets to a variety of applications that might handle the same data. We're keeping it simple at the moment though.

SubField	Type	Disp	Description
Desc	string	opt	A freeform field to compliment the data field. It's used is defined by the destination application for which the content is intended. This field could be as inoccuous as a human readable description for what is being passed, or it could hold critical defining information such as the name of a function intending to be called at the remote location.
Encoding	string	opt	Type of binary encoding. Currently only base64 is supported. Additional encodings and a OSA specific format are in the works.
Data	string	opt	Application specific content being passed. When passing instructions to the OSA this is JSON content that is reparsed before being acted upon.First draft.

Adjunct->Keys: Optional, hash, The "Keys" value of the "Adjunct OSA:USDS" structure section holds defined key value pairs for use by the destination application. While the "Data" section of the Adjunct space exists to transport application specific data, The "Keys" section provides a location of reliable structure to transport application specific data that may be of use to other applications like those in the same type group. This information notably can be made available to the OCE itself for instruction criteria evaluation. See the "Instruction Criteria Definition" section of the "RegApps" function for details.

Each value directly beneath the "Keys" reserved word represents the technical name of the data set. This can be thought of as a left operand or the "variable name." This value can be any string without whitespace. It is recommended to avoid special characters to prevent problems in different OCEs with varying tolerance of special characters.
The technical name reference is actually a hash of values the below values are the currently supported literals:

DisplayName - The text that should be used to describe the value on any screens developed to show the value to the OCE participants.
Value - The right operand, or "variable value" to be used in any instruction criteria evaluations.

See the "Full USDS" example above for a JSON example of the how these values would look. Below is a Perl example of code to aid in understanding the construct for how instruction keys for a Chat program might look.

my %Keys = ()
$Keys{'Length'} = {'DisplayName' => 'Line Length', 'Value' => length($msg->{'Summary'})};
$Keys{'CodeVer'} = {'DisplayName' => 'App Version', 'Value' => "3.7"};

OSA JSON HTTP Transport Layer

Definition of JSON over HTTP use in OSA

Request Methods

There are two possible request methods sent to the OSA JSON server:

'POST' which calls an object request passing JSON name/value pairs.
'GET' which simply calls a requested object (no name/value pairs are passed)

Header Terminators

Header termination definition.

All header lines are terminated with CR + LF
The header itself has a further CR + LF pair to denote the end of the header.

Note: CR + LF termination of header lines is defined RFC1945 & RFC2616 how ever many HTTP clients and servers have gone outside the specifications and utilise just an LF. OSA itself has been designed to detect and handle either CR + LF or LF terminated lines. (either will work).

Example: POST Method HEADER

POST /request HTTP/1.1(cr+lf)
Accept: application/jsonrequest(cr+lf)
Content-Encoding: identity(cr+lf)
Content-Length: xxxx(cr+lf)
Content-Type: application/jsonrequest(cr+lf)
Host: OSA(cr+lf)
(cr+lf)
{ JSON name/value pairs }

Post Header Notes:

/request shall define the function used to action the JSON object data (eg. /getUser)
Accept: must always be 'application/jsonrequest'
Content-Encoding must always be 'identity'
Content-Type: must always be 'application/jsonrequest'
Host: must always be 'OSA'
Content-Length: will be equal to the length of '{ JSON name/value pairs }'

Example: GET Method HEADER

GET /request HTTP/1.1(cr+lf)
Accept: application/jsonrequest(cr+lf)
Host: OSA(cr+lf)
(cr+lf)

Get Header Notes:

/request shall define the function used to action the JSON object data (eg. /getHomeTribe)
Accept: must always be 'application/jsonrequest'
Host: must always be 'OSA'

Responses

The server reply to both 'POST' and 'GET' methods.

Success

HTTP/1.1 200 OK(cr+lf)
Content-Type: application/jsonrequest(cr+lf)
Content-Length: xxxx(cr+lf)
(cr+lf)
{ JSON name/value pairs }

Success Response Notes:

Content-Type: will always be 'application/jsonrequest'
Content-Length: will be equal to the length of '{ JSON name/value pairs }'

Host/Syntax Failure

HTTP/1.1 400 Descriptive Message(cr+lf)
(cr+lf)

Failure Response Notes:

The descriptive message provides deeper insight as to the failure.

Binary JSON

Whilst JSON is very efficient for the transport of object data such as strings and numbers, it has been limited in the area of binary data. For example say an image file.
Traditionally the work around has been to encode such data using the likes of.

BASE64 encoding
MIME
Simple %xx escaping.

The above methods however require both more processing (both ends) and increase the length of the data significantly. They DO however maintain one of the JSON principles that data values are compatible with EVALuation functions as found in most modern programming languages.

UBJSON

A new player has entered the scene of late being 'Universal Binary JSON' or UBJSON (see ubjson.org).
The specification described certainly does handle binary data however reading through the specification one soon realises that such a data format as suggested is no longer compatible with the original JSON data format which was based upon simplicity.
Processing to format data to the UBJSON format introduces detection/application of various numeric types, string types and binary types.
Further it introduces more processing to convert for example simple ascii based numbers into packed byte representation (big endian order). For example 1 in ascii which uses 2 bytes/characters in hex of 31 would be represented in UBJSON as the 1 byte hex character of 01
It also introduces specifying lengths once again in packed binary of 1-4 byte integers.
This packing does provide size reduction and for small strings and numbers represents a good percentage.
However the introduced processing over head and complexity seems a draw back.
The size reduction for data as a percentage becomes very minimal when processing, for example, larger strings 100 chars or more and at a point will actually become larger than just standard JSON.
The one interesting point I note is the introduction of specifying a length value for a binary data.
If we know the length, we can easily extract that binary data and avoid that potentially it may have contained characters which would have broken our JSON.

Note: Objects utilising UBJSON as such will FAIL programming EVALuation unless preprocessed further to make the object syntactically correct for the programming language being used and thus evaluation possible.

An Alternative Approach

JSON is simple

“somename”:”its a string”     For strings the data value is enclosed within quotes.
“anothername”:123456          Numbers have no quotes

Proposal for introducing a BINARY type similar to the above regime.

“somebinary”:~10,binarydata~

The ~ (tilda) character simply flags that binary data follows.
The next digits specify in ascii numbers the length of the data.
The following , (comma) denotes the end of the length characters and that the binary data follows.
From the above example the binary data will be of 10 bytes in length and can be extracted very easily.
The trailing ~ (tilda) character which should be checked for confirms the end of this name/value pair.
(if not found it means the length was incorrect – a good test)

This appears simplicity itself. Existing JSON systems need only incorporate this new type as represented by its being enclosed within ~ (tilda) characters.
From OSAs point of view this is a piece of cake.

Note: Objects utilising this method as such will FAIL programming EVALuation unless preprocessed further to make the object syntactically correct for the programming language being used and thus evaluation possible.

Universal Social Data Structure

Table of contents

Overview

OSA JSON

References

Current Limitations

Planned Enhancements

USDS JSON Structure

Full Data Structure

API Call Example 1

API Call Example 2

Descriptor

Sender Information (Source)

Recipient Information (Dest)

Cargo

OSA JSON HTTP Transport Layer

Request Methods

Header Terminators

Responses

Binary JSON

UBJSON

An Alternative Approach

Shoutbox

OSA

Page actions

Universal Social Data Structure

Table of contents

Overview

OSA JSON

References

Current Limitations

Planned Enhancements

USDS JSON Structure

Full Data Structure

API Call Example 1

API Call Example 2

Descriptor

Sender Information (Source)

Recipient Information (Dest)

Cargo

OSA JSON HTTP Transport Layer

Request Methods

Header Terminators

Responses

Binary JSON

UBJSON

An Alternative Approach

Shoutbox

OSA