The program begins with the get_files()
function, checking to see whether
the program is a module being called from another program or from the
command line. If it was called from the command line the system arguments are
checked and input and output files are stored. The process_file()
function is
called, which stores the output of get_known_targets() into the
variable known_targets
. get_known_targets()
works by connecting
to the database, storing the query (querying for class, type, variant, and
specific from the table target), and returning the output from the query. Once
this list has been returned ``gt2import.py'' steps through the current input file
line by line (thereby allowing files of extreme length to be processed).
Upon encountering the first line the bytes per pixel value is found by
calling get_bytes_pixel()
, which opens the file again and searches for
the first reference, then parsing it and extracting its bytes per pixel
value. Then the sequence location is stored and
load_sequence_id(bytes_pixel)
is called, which connects to the
database, checks to make sure this sequence has not already been entered into
the database, and inserts the needed information into the
sequence_location
table (Appendix C.2) if this is a
new sequence location. Control is handed back to process_file()
and
the time at which this sequence was recorded is stored.
The second line of the gt2 format contains the number of targets in the file,
so this value is stored as number_targets
for later use. Starting on
line three the number of references for each target is stored, so
number_targets
lines are read and parsed for the number of references
for each target. These values are then stored in a dictionary for use in
parsing the main sections of this file.
In lines four through four plus number_targets
target specific
information is stored.
``gt2import.py'' iterates through number_targets
, reading each line and
storing the values contained on that line in a list which is then transformed
into a dictionary. A consistency check is also done (through
check_consistency()
) to help ensure the file format is valid.
In the gt2 format, priority levels are stored as words (``PRIMARY,''
``SECONDARY,'' ...) but the designed database uses numerals to sort
priority, so this value is converted. The class, type, variant, and specific
values are then stored as a string into the list target_ids
, to enable
checking. The current target code is checked against the list to which was
just added and if there is no match ``gt2import.py'' proceeds and queries the
database to see if this target is entered or not. If not, all needed target
information is stored into the database through an INSERT
query, if the
target is already present, this process is skipped.
We have now entered the ``main'' section of the gt2 file, where reference
data is stored. ``gt2import.py'' iterates through list_targets
,
checking to see whether it should append or overwrite each time. The temporary
output file is opened, however many lines were indicated earlier are
read, checked for consistency, and stored in a dictionary. The output
is then formatted as a tab delimited file and written to the output file. After
this if EOF is encountered ``gt2import.py'' breaks, else it continues to the next
line. Once the current target's references have been read and written, the
output file is closed, current directory is stored, and the
load_frame_rows()
call is made, loading the output data into the
database (the LOAD
command is significantly faster than inserting rows
individually). Once the data is loaded the temporary output file is erased
and the next target is processed.