0

Joining excels to tables in published schemas

Hello.. In my use case, business users want to join their local data (excel, csv etc) to  published schemas on enterprise data (bookings, billings etc), for adhoc analysis. The tables in the published schema have some row-level security filter logic in the SQL based on SSO. I would like IT to retain control of published schemas, but allow business power users to upload their excel and join to tables in already published schemas. 

  1. Is this possible without creating an additional schema containing the enterprise table and the user uploaded excel file? The power users will not understand the SQL behind the schema table, and I would like to avoid them creating a new schema with their excel each time they want to join a new excel to existing enterprise data.
  2. If creating a new schema each time is the only way, then it will be a security hole if users can apply their own data security filters. Is there a better place to put the row-level data security SQL filter in that case?
8replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • Hi Sanjay Akhare -- one approach a number of our customers take in situations such as this is to:

    a) create a new schema owned just by that user with just the tables defined by the excel files.  From this new user-owned schema, joins can be made to the published schema tables.

    In some cases, however, this presents some security challenges.  Often companies don't want to share the published schemas at the physical level with all users.  In cases such as this, we recommend

    b) creating a new schema that is identical to or a subset of the published schema using nothing but alias tables.  Alias tables allow the same tables to be loaded into Incorta once but to be leveraged in different schemas in different ways (aka. different joins).  Note that the record level security can't be redefined at the alias table level, it will be enforced at the base table and brought into the alias table.  

    In short, these new schemas defined solely by alias tables unlock a degree of freedom and ad hoc analysis that doesn't require loading the same data from source multiple times.

    Let us know if you think it will meet your need.  If it doesn't, let us know here and we will brainstorm with you further.

    Reply Like
    • Dan Brock Looks like option b will work in concept, I can try it out. Will this approach duplicate the data in the new schema containing the alias table, or will it resuse the already loaded data from the original schema? The enterprise tables are several TB each and we wont want each user to replicate the dataset in their own alias table.

      Reply Like
    • Sanjay Akhare The beauty of the alias table is that the data is only loaded once (in the physical published schema) so there is no data duplication or duplicate loading.  Sorry, I should've spell that out more explicitly above.   Let us know how it goes.

      Reply Like 1
  • Sanjay,  just a pointer...  Anytime when you allow a user to upload data file and allow them to load data into schema tables,  you need to provide Schema Manager role for the user.   But this role will also enable the user to create new schema and tables.   Also if you don't monitor the volume of data uploads, it will impact the available memory for other schema data when you have memory limitation.

    Senthil

    Reply Like
    • SenthilKumar Sivakumar Is there a way to cap memory use per schema or per user?

      Reply Like
    • Sanjay Akhare Sort of.  You can provision multiple loader services (on different servers or on the same server) and control their memory usage/max from the CMC.  From there, you can choose to optionally "pin" specific schemas (or tenants) to specific loader services.

      Curious, why do you ask?  Are you worried about users loading too much data?  You may want to consider spinning up a separate "sandbox" Incorta environment that has data replicated from production nightly that end users can use as their "playground".  

      Reply Like
    • Dan Brock Yes that is our intent, to have power users in Sandbox. I was replying to Senthil comment that people will be able to load unlimited data once they have the required privileges and that uploads will have to be monitored to see how much memory is being used. Personally I am a fan of caps as opposed to monitoring. I will let our IT figure that one out.

      Reply Like
    • Sanjay Akhare As Dan suggested, there are few workarounds  to accomplish your end goal... but you also need to consider the additional cost  & effort involved from IT to manage these additional workarounds.   

      You can restrict the user to  upload data file of certain size by setting the maxHttpHeaderSize="<required value in bytes>" maxPostSize="<required value in bytes>"   in your analytic service config file,  but  this limitation can be overridden  if your power user figures out by keep adding data chunks and perform incremental load.  

      Reply Like
Like Follow
  • Status Answered
  • 3 mths agoLast active
  • 8Replies
  • 47Views
  • 3 Following