*
  • To provide functions for reading in XML into a data structure
  • *
  • To provide functions for writing that data structure into an SQL database.
  • *
  • To provide functions for retrieving the identical data associated with a given XML object.
  • * * Because of its hierarchical nature, any data structure representing XML must necessarily * contain a lot of recursion. We will refer to any tag containing other tags as groups and and tags containinng * data as fields. A group can contain both groups and fields.
    *
    * In this implementation. Each group contains arrays containing the types of fields and groups it may contain, and * arrays containing any instances of these fields or groups. A group may contain multiple versions of the same field as * well as multiple groups of the same type. Functions are provided for adding new groups, new fields as well as * instantiations of both.
    *
    * An id system is used to maintain the relationships between elements within an XML object. * Every group, with the exception of the root, has a parent group.
    * Each group has an ID, a parent ID, and a main ID. The ids do not exist as part of the XML being representing but are * meta fields used by this implementation to keep track of the relationships between groups. * * For the discussion below, the following XML string will be used as the model:
    *
    * value
    *
    * value
    *

    *

    *
    * Each group will be converted into an SQL table. The table will contain a column for each field within that group. * A dilemma arises because a group may contain more than one instantiation of a particular field. There are two solutions * to this problem: * * * @package JLex */ class group { /**#@+ * @access public */ /** * The name of this group element. This is taken from the tag itself. * * In the example above, the first group will have the value "HEADGROUP". However, typically, I convert * all tags to lower case prior using them. In hierarchy_discoverer, I will first convert the tags to lowercase * and then create a new group or field based on the lowercase name. * @var string */ public $name; /** * An array containing the field-value pairs contained by this group. * * This is an associative array indexed by field name. The value contained is the number of times the group * with the most instantiations of that field has that field. This information is used when creating an * SQL table representing the current group. A new column will be created in the table representing the current * group for each field type existing. For a more complete discussion of this see above. * @var array */ public $fields; /** * The array containing instantiations of a field. * * This is an associative array indexed by field name and containing arrays of the values for the given field name. * @var array */ public $field_vals; /** * An array containing references to the groups that this group may contain as subgroups. * * The objects within this array below to the group class. The primary reason I created this implementation * was to allow for easy reloading of an XML structure from a flat structure. By obeying the requirments of * of this and the $fields array, it is umambiguous to determine whether a current field=value pair * belongs to the current group, it's parent or its child.
    * To create a new group value, one would clone the relevant object from this array, and place the copy * in the $group_vals array. * * @var array */ public $groups; /** * The array containing the instantitations of (sub)groups for this group. * * Note that the groups within this array should have values in either/both of $field_vals and $group_vals. In other * words, these instantiations should not simple be the models should have data as well. * @var array */ public $group_vals; /** * This is the primary wrapper tag enclosing each entry to be added to the database. It is used to identify * exactly which parts of the XML document are to be added as rows to the database. * @var string */ public $head_tag; /** * The id of this group. This should only have a value in the instantiated version of a group. * * An id is only assigned to an instance of a group. I use instance not to refer to the instantiation of a * group object but to when a group object contains data. * @var integer */ public $id; /** * The id of the "main group" which this group belongs to. * * The main group is the upper most group in an xml structure, i.e. the group that is not part of another group. * I use this id to facilitate searches. Every table created for subgroups (not the main group) in the mysql * database will contain a "main_id" column. When searching subgroup tables, I return the main_id. * I then use the set of main_ids to produce a complete result set. It is not always the case that a * complete result set is desired. For example, if a query returns hundreds of hits, you may not want to * return the full xml structure for each. The main_id provides an easy way for then going back and querying * the database for those rows that matched the original query. * @var integer */ public $main_id; /** * The id of the parent to which this group belongs. * * The parent id is always a groupname_id. To if a group belongs to the "customer" group with $id=4. * The parent_id for this group will be "customer_4". The parent_id is used for maintaining the relationships * between groups and their children groups. * * @var integer */ public $parent_id; /** * This array contains the next available id for each table that belongs to this project. * * This is an associative array indexed by group name containing integers. * @var array */ public static $ids; /** * This variable points to the parent group of this group. If this is top level group, $parent is set to false. * * @var group */ public $parent; /** * The template tracks the fields and subgroups for a particular group. * * I created this in order to track the exact order of an xml structure. It is then used to * put together the XML structure in the exact order it was originally inputted into the database. * Therefore (though not recommended), if you want to rely on the order of an xml structure, * you are assured that the order will be the same.
    * Note that fields and the start tag opening a group are entered by name. The end tag of a group will have a slash * in it. Also, the main group will not be included in the template, but is assumed. So for the xml structure above * the template will be:
    * field1 subgroup1 field2 /subgroup2
    * Note that this field is not important if the exact xml structure does not need to be replaced. It is used * by the function to_xml_3 which uses the template to build the template. If to_xml used to reproduce the xml * the output will be in order of how fields are ordered in the $field_vals and $group_vals arrays. * * @var string */ public $template = ""; /**#@-*/ function __autoload($class_name) { require_once $class_name.".php5"; } function __construct($name, $parent) { $this->name = $name; $this->parent = $parent; $this->fields = array(); $this->field_vals = array(); $this->groups = array(); $this->group_vals = array(); } function __destruct() { } /** * This is a recursive function which initializes the ids array. * * The function first sets the index for this group to 0 in the ids array. It then calls the function * for each of its subgroups. */ function init_ids() { self::$ids[$this->name] = 1; foreach($this->groups as $group) { if(!array_key_exists($group->name,self::$ids)) { $group->init_ids(); } } } /** * This function adds a field to the $fields array. It is used for building the template structure of an XML * structure. * * To add a field, the name of the field and the maximum number of times it can occur in any particular entry * is provided. * * @param string $field The name of the field to be added. * @param integer $count The maximum number of times this field can occur in a particular entry. */ function add_field($field,$count) { $this->fields[$field] = $count; } /** * This function adds a group to the $groups array. Is is used for building the template structur of an XML structure. * * Note that groups are added by reference. * @param group $group The group to be added to $groups. */ function add_group($group) { $this->groups[$group->name] = $group; } /** * This function returns a clone of the group specified by $group_name. If no group exists with that name, false * is returned. * * @param string $group_name The name of the group being searched for. * @return group A clone of the group object with the matching name. */ function get_group($group_name) { $result = false; if($this->name == $group_name) { $result = clone $this; } else if(array_key_exists($group_name,$this->groups)) { $result = clone $this->groups[$group_name]; } else { foreach($this->groups as $group) { $result = $group->get_group($group_name); if($result) { break; } } } return $result; } /** * has_value returns true if this group contains a field in the $field_vals array, i.e. the group has a field * with a non-empty string. * * @param string $field The field being searched for. */ function has_value($field) { return (array_key_exists($field,$this->field_vals)); } /** * assign_ids assigns the relevant ids for this group object. * * This fucntion should be used to assign ids to a group rather than directly accessing the variables themselves. * @param integer $main_id The main_id belonging to the group containing this group. * @param integer $parent_id The id of the parent group to which this group belongs. * @param string $parent_name The name of the parent group to which this group belongs. */ function assign_ids($main_id, $parent_id, $parent_name) { $this->main_id = $main_id; $this->parent_id = $parent_id."_".$parent_name; $this->id = self::$ids[$this->name]; self::$ids[$this->name]++; } /** * add_value adds a field=value pair to the $field_vals array. * * @param string $field The field name being added. * @param string $value The value for the specified field. * * @return boolean False is returned if the field does not exist in the $fields array. */ function add_value($field,$value) { if(array_key_exists($field,$this->fields)) { $this->field_vals[$field][] = $value; return true; } else { return false; } } /** * Adds a group to $group_vals and sets the parent of the child group to this. * * @param group $group The group to be added. */ function add_group_value($group) { $this->group_vals[$group->name][] = $group; $group->parent = $this; } /** * Returns an array of all the names of the groups contained within this group object. * * @return array An array containing the names of all the groups contained in this object. */ function get_group_names() { $names = array($this->name); foreach($this->groups as $group) { $subgroup_names = $group->get_group_names(); foreach($subgroup_names as $name) { if(!in_array($name,$names)) { $names[] = $name; } } } return $names; } /** * A useful debugging function which prints out template information for this group. * * This function is useful for showing the template of a group. It prints to general out. */ function print_group_info() { echo "Group: $this->name main id: $this->main_id; parent_id: $this->parent_id;"; echo " id: this->id
    "; foreach($this->fields as $field=>$count) { echo "field: $field count: $count
    "; } foreach($this->groups as $group) { echo "group: $group->name : XX fields : ".count($group->groups)." groups
    "; } foreach($this->groups as $group) { $group->print_group_info(); } } /** * This functions prints out the values of given group (not the template). * * This is useful for seeing what currently exists in a given group. */ function print_group_values() { foreach($this->fields as $field=>$count) { if(array_key_exists($field,$this->field_vals)) { foreach($this->field_vals[$field] as $val) { echo "\\$field $val
    "; } } } foreach($this->groups as $group) { if(array_key_exists($group->name, $this->group_vals)) { foreach($this->group_vals[$group->name] as $g) { $g->print_group_values(); } } } } /** * load_query_result loads all data for a given xml structure from the set of rows corresponding to different * subgroups within the MySQL database. * * Any row within a given table contains only the values of fields for a particular group. In other words, * it does not contain any data for subgroups of the current group. This function first loads all field data. * It then determines the subgroups in the following way: * * The caveat with this implementation is that the XML structure must be known before a group can be loaded. A schema * must therefore exist describing the allowable XML structures. However, because this is data that is being loaded * from the database, it was (necessarily) first inputted into the database. Therefore, it is always the case * that we know schema at this point in the processing. * * @param array $row The fields, indexed by column name, of data for a given group. Note that all columns in all * MySQL tables are indexed numerically. So "field" become "field_0", "field_1", "field_n". This * last "_n" is removed before calling the add_field() function. * @param array $data This array is indexed by "[parent_id]_[child-group-name]" and contains arrays corresponding to * rows in an MySQL table. * @return void This group object will be filled with data following a call to this function. */ function load_query_result($row,$data) { foreach($row as $field=>$val) { if($field == "template") { $group_names = $this->get_group_names(); $cols = explode(" ",$val); foreach($cols as $col) { $c = str_replace("/","",$col); if($this->contains_field($col) || in_array($c,$group_names)) { $template .= $col." "; } } $this->template = $template; } else if((trim($val) != "") && !ereg("id",$field)) { $val = ereg_replace("&","&",$val); $index = strrpos($field,"_"); if($index) { $field = substr($field,0,$index); } if($this->add_value($field,$val)) { //echo "parent: ".$this->parent->name." : $this->name : $field=$val
    "; } } } $id = $row[$this->name."_id"]."_".$this->name; foreach($this->groups as $group) { if(array_key_exists($id."_".$group->name, $data)) { foreach($data[$id."_".$group->name] as $row) { $new_group = clone $group; $new_group->id = $row[$group->name."_id"]; $new_group->load_query_result($row,$data); $this->group_vals[$new_group->name][] = $new_group; unset($new_group); } } } } /** * destroy_subgroups derefences all variables in this group and this group's subgroups. * * This function is required in PHP4. Because of the way PHP4 handles references, letting its trash collecting * system failed to eliminate many of the objects that got created. The consequence was that loading a large * XML file caused a memory shortage and the program to eventually crash (depending on the memory allocation * amount set in the PHP.ini script.
    * Specifically, even if an object was unset, if that object maintained a reference to another object via the $parent * variable, the parent object continued to exist, even if the parent was explicitly destroyed. For example, * A has a pointer to B. If A is unset and B is unset, B will continue to exist in memory unless the pointer in A is * explicitly unset. I do not know why this occurs or why the PHP guys chose this implementation. But, it is no longer * an issue in PHP5. That is, if A and B are both unset, one does not need to explicilty unset any pointers in A to B * in order for B to be removed from memory. */ function destroy_subgroups() { unset($this->parent); $this->parent = false; unset($this->field_vals); $this->field_vals = array(); foreach($this->group_vals as $groups) { foreach($groups as $group) { $group->destroy_subgroups(); } } unset($this->group_vals); $this->group_vals = array(); $this->template = ""; } /** * This function destroys all values in the $field_vals and $group_vals array. * * Originally, I believed this function was sufficient for removing an object from memory. However as discussed in * the destroy_subgroups documentatation, this is not the case. This function still is useful for simply * removing the values of an object. */ function reset_values() { unset($this->field_vals); $this->field_vals = array(); unset($this->group_vals); $this->group_vals = array(); $this->template = ""; } /** * A function to remove html encodings of special characters. */ function unhtmlentities($string) { $trans_tbl = get_html_translation_table(HTML_ENTITIES); $trans_tbl = array_flip($trans_tbl); return strtr($string, $trans_tbl); } /** * to_ssv puts all the data for a given group into a two dimensional array indexed by group name containing * the rows corresponding to each group for an XML structure. * * The $head_tag is required as an argument because the head group (that indicated by the head tag) is slightly * different than subgroups. Namely, the head group only has a single id, whereas subgroups have 3 ids. The main * group also has a template (see discussion of the template variable).
    * Note that this data will eventually be put into file to be uploaded using the MySQL "LOAD" command. The load * command requires that every single column of a row be represented in the file containing the data to be uploaded. * Therefore, columnns which are empty for a particular row are still included but are empty strings. Therefore, * the array being built for a particular row will contain a value for every column (and will be the empty string * when there is no value). Here is how this is handled: * * Each row will then be imploded into a single string using the '^' character to enclose data and each piece of * data seperated by a space. This string will then be entered into the $result array, which is indexed by * groupname and contains an array of strings.
    * Next, we must cycle through the subgroups of this group. This function is called rescursively for each * instance of a subgroup. to_ssv returns an array of arrays indexed by group name containing arrays of rows. * The result of the recursively called function is then added to the result of the object making the * recursive call. The finall result will be an array indexed by group name containing arrays of rows. Note * that the nested structure of subgroups is flattened here. It does not matter for the purposes of * inputting this data into the SQL database the relationship between one row and another. This informaiton * is maintained via the parent_id. Evententually, there will be a seperate file created for each unique table * of a project containing properly formatted (enclosed by the ^ symbol and space seperated) lines corresponding * to future rows of the table. * * @param string $head_tag The outermost tag of the XML structure. * @return array An array of arrays indexed by group name containing an array of rows. */ function to_ssv($head_tag) { $row = array(); $row[$head_tag][0] = $this->main_id; if($this->name != $head_tag) { $row["parent_id"][0] = $this->parent_id; $row["id"][0] = $this->id; } else { $row["template"][] = $this->template; } foreach($this->fields as $field=>$count) { if(array_key_exists($field,$this->field_vals)) { foreach($this->field_vals[$field] as $val) { $val = $this->unhtmlentities($val); $row[$field][] = $val; } $num_fields = count($this->field_vals[$field]); for($i=$num_fields;$i<$count;$i++) { $row[$field][$i] = ""; } } else { for($i=0; $i<$count;$i++) { $row[$field][$i] = ""; } } } $s = ""; foreach($row as $field=>$vals) { foreach($vals as $val) { $s .= "^$val^ "; } } $result[$this->name][] = $s; foreach($this->groups as $group) { if(array_key_exists($group->name, $this->group_vals)) { foreach($this->group_vals[$group->name] as $g) { //echo "adding ".$g->name."
    "; $subgroup_tabledata = $g->to_ssv($head_tag); foreach($subgroup_tabledata as $table=>$rows) { foreach($rows as $row) { $result[$table][] = $row; } } } } } return $result; } /** * This returns an array of all groups containing $field. * * The array is ordered in key-value pairs of group_name=count where count is * the amount of fields which may exist in that group. For example, * if a group can have up to 3 lxa fields, the count is 3. * @param $field The field being looked up. * @return array The array is indexed by group name and contains the maximum number of fields allowed for that group. */ function find_group($field) { $group_names = array(); if(array_key_exists($field,$this->fields) ) { $group_names[$this->name] = $this->fields[$field]; } foreach($this->groups as $group) { $subgroup_names = $group->find_group($field); foreach($subgroup_names as $name=>$count) { if(!array_key_exists($name,$group_names)) { $group_names[$name] = $count; } } } return $group_names; } /** * I think this function was intended to produce dynnamically generated code for making an multi-dimensional array * equivalent in size to the group structure. I don't believe it's been tested. */ function make_tables_array($tables) { $s = "\"$this->name\" => array("; foreach($this->fields as $field=>$count) { $s .= "\"$field\"=>$count,"; } $s = substr($s,0,-1).")"; $tables[$this->name] = $s; foreach($this->groups as $group) { if(!array_key_exists($group->name, $tables)) { $group->make_tables_array($tables); } } } /** * to_xml produces an xml string containing all the information (not ids) contained in this group structure. * * This function though does not maintain the original ordering of the XML structure when it was first loaded * into the database. Rather, first the fields are printed out in order that they were inputted into the $field_vals * array. Second, the groups are printed out in order that they exist in the $groups (not $group_vals) array. */ function to_xml() { $xml = "<$this->name>\n"; foreach($this->field_vals as $field=>$vals) { foreach($vals as $val) { if(trim($val) != "") { $xml .= "<$field>$val\n"; } } } foreach($this->groups as $group) { if(array_key_exists($group->name, $this->group_vals)) { foreach($this->group_vals[$group->name] as $g) { $xml .= $g->to_xml(); } } } $xml .= "name>\n"; return $xml; } /** * Prints out the XML structure of this group in the exact order it was originally inputted into MySQL. * In addition, the option to print with ids may be provided. * * This function requires a template to dictate the ordering of the XML. The template must obey the following rules: * * If an error is encountered with the template, i.e. a token is encountered which is invalid, the function will quit. * Errors include fields which are not part of the current subgroup, the start of subgroups which do not exist in the * current group, the end of a subgroup which is not the current group.
    * If $with_ids parameter is set to true, the XML structure will include the id of the current group though * not the parent or main id. * * @param string $template A space seperated string of the fields and subgroups describing the ordering of the * current group. * @param boolean $with_ids When set to true, the group id will be included in the XML structure being output. * @return string The XML structure of this group. */ function to_xml_by_template($template, $with_ids) { $xml = "<$this->name>\n"; if($with_ids) { $xml .= "<".$this->name."_id>$this->idname."_id>\n"; } $indices = array(); while(count($template) != 0) { $cur_tag = array_shift($template); if(!array_key_exists($cur_tag,$indices)) { $indices[$cur_tag] = 0; } $index = $indices[$cur_tag]; $indices[$cur_tag]++; if(array_key_exists($cur_tag,$this->fields)) { if(array_key_exists($cur_tag,$this->field_vals)) { $val = $this->field_vals[$cur_tag][$index]; } else { $val = ""; } $xml .= "<$cur_tag>$val\n"; } else if(array_key_exists($cur_tag,$this->group_vals)) { $group = $this->group_vals[$cur_tag][$index]; $subgroup_template = array(); while(($cur_tag = array_shift($template)) != "/".$group->name) { $subgroup_template[] = $cur_tag; } $xml .= $group->to_xml_by_template($subgroup_template,$with_ids); } else { print_r($this->field_vals); echo "Error: ".$this->field_vals["ref"][0]." : No $cur_tag group within this group object \n"; die("Error!"); } } $xml .= "name>\n"; return $xml; } /** * contains_field returns true if this group or any of its subgroups contains the field. * * Note that this does not search for whether a subgroup contains a value for the given field. Only * whether the group or a subgroup contains the field in the $fields array. has_value should * be used to determine if the current group contains a value for a given field. However, that function * only searches the current group but not any subgroups. Furthermore, the function * only searches down the tree, but not up it. * * @param string $field The field being search for. * @return boolean True if this group or any subgroup contains the field. */ function contains_field($field) { if(array_key_exists($field,$this->fields)) { return true; } else { foreach($this->groups as $group) { $result = $group->contains_field($field); if($result) { return true; } } } return false; } /** * This function produces the schema for the group. * * A schema has the following format:
    *
    * field-name
    *
    * field-name
    *

    *

    * The schema is used to create the group structure representing allowable XML structures. Please see the * documentation for the hierarchy_loader class to understand how this is accomplished. * @return string A string in XML of the schema representing this group. */ function structure_to_xml() { $xml = "name\">\n"; ksort($this->fields); foreach($this->fields as $field=>$count) { $xml .= "$field\n"; } foreach($this->groups as $group) { $xml .= $group->structure_to_xml(); } $xml .= "\n"; return $xml; } /** * write_schema writes to disk, under the provided file name, the schema for this group. * * @param string $file_name The name of the file written to disk. */ function write_schema($file_name) { $out = fopen($file_name,"w"); fwrite($out,$this->structure_to_xml()); fclose($out); } /** * This function sets the $parent variable in each of the groups in the $groups array to this group. * * I don't know if this function is ever used or why exactly I wrote it in the first place. */ function set_parent() { $names = array_keys($this->groups); foreach($names as $name) { $g = $this->groups[$name]; $g->parent = $this; $g->set_parent(); } } /** * get_field_names returns an array sorted alphabetically of all the fields in this group and all of its * subgroups. * * @return array An alphabetically sorted array of all the fields in this group and all of its subgroups. */ function get_field_names() { $fields = array(); foreach($this->fields as $field=>$count) { $fields[] = $field; } foreach($this->groups as $group) { $fields = array_merge($fields,$group->get_field_names()); } $fields = array_unique($fields); sort($fields); return $fields; } /** * Produces a shoebox formatted string of the values of the fields within this group. * * @param array $group_field_orders An associative array indexed by group name containing arrays of the * fields for the group in the order they are to be reordered. */ function produce_reordered_shoebox_entry($group_field_orders) { $result = ""; $field_order = $group_field_orders[$this->name]; foreach($field_order as $field=>$index) { if(array_key_exists($field,$this->field_vals)) { $values = $this->field_vals[$field]; foreach($values as $value) { $result .= "\\".$field." ".$value."\n"; } } else if(array_key_exists($field,$this->group_vals)) { $groups = $this->group_vals[$field]; foreach($groups as $g) { $result .= $g->produce_reordered_shoebox_entry($group_field_orders); } } } return $result; } /****************************** assign_fields ********************************/ function assign_fields($existing_groups) { $this->fields = $existing_groups[$this->name]; foreach($this->groups as $group) { $group->assign_fields($existing_groups); } } } ?>