Fast Inserts in TACTIC

Published by tactic on

There are times when you need to add a lot of data into TACTIC through some custom code.  This could be an ingest from the start of a project or some large delivery from an external source.  Although you could  go directly to the database and execute straight SQL commands to the database, this leaves a lot to be desired.  You completely by pass the security mechanisms built into TACTIC.

Let’s say we want to insert 10,000 items into the TACTIC database.  Doing this individually:

from import SearchType
for index in range(1,10000):
    sobject.set_value("code", "SHOT%0.5d" % index)

This is decidedly slow.  This will make 10,000 individual INSERT statements to the database.  There is a way in TACTIC that allows you to use TACTIC code in the same way as all other inserts, however, you can chunk of the inserts so that it is much much faster.

from import SearchType
search_type = "vfx/shot"

# get the sql object to sent direct SQL statements
sql = SearchType.get_sql_by_search_type(search_type)

# set a chunk size for number of inserts grouped together
chunk = 100
statements = []

# create a 10000 shots
num_shots = 10000
for index in range(1,num_shots):
    sobject.set_value("code", "SHOT%0.5d" % index)
    statement = sobject.get_statement()
    if index % chunk == 0 or index == num_shots-1:
        statements = []
    index += 1

This code will chunk up inserts into sets of 50.  The value of this chunk size is pretty arbitrary since most databases can handle quite large queries, but using a chunk size ensures that there is at least some limit to the size of the sql statement sent to the database.

The primary advantage of using this method is execution speed.  Rough tests has found that this method is over 10 times the speed of using TACTIC’s individual commits.  This is the method that is used in TACTIC’s CSV import when trigger mode is set to “No Triggers”. With simply a laptop running TACTIC, 10,000 shots were imported in about 30 seconds.

There are some disadvantages and limitations:

  1. Since large number of inserts are being sent in a single call, there is no opportunity to query data between each insert.
  2. Triggers are not executed

With this limitations, this operation is best reserved to large ingests.  Despite this, ingests are common enough that it would warrant using this method.

Categories: Articles