Wednesday, 19 March 2008

Chapter 4: Developing SSIS Packages.

Create a Package -

Use BIDs -
  • Needed because 'more complicated' packages cannot be created thro I/E Wizard.
  • Need to create a set of co-ordinated pckgs.

Use SQL's Import/Export Wizard -
I know this already!!

Create data flow by using the data flow designer -

Configure sources & destinations -
  • Navigate to data flow tab.
  • 3 types of obj in toolbox:
    1. Data flow source adapters - along with destinations, they use package connections. Only raw file adapter doesn't use PC.
  • 2. Data flow transforms - a transform performs an op. on one row at a time or several rows at a time. Types of tranforms - a) Logical row-lvl transform - performs op on rows without needing other rows. b) Multi-input or multi-output transforms - either can work with > 1 input or generate >1 output. Used to combine or branch data. Merge & Merge Join require sorted inputs. These 2 transforms will wait for rows from either input based on the defined sort order, to preserve the sorted output. Thus rows may not immediately be sent out on the output path. c) multi-row transforms - perform work based on criteria from multiple input rows, or generate multiple output rows from a single row. Such transforms can be more intensive in operation and memory overhead. Sort, Aggregate - these are blocking transdforms - all output rows are blocked, allowing the tranform to perform the work before sending rows downstream. d) Advanced data-prep transforms - e.g. script component, fuzzy lookup.
  • 3. Data flow destination adapters - Set ValidateExternalMetadata property -> false if e.g. a destination tbl is not available at design time, as it is created during execution. OLE DB Destination Adapter - can be configured for bulk insert, otherwise data is inserted 1 row at a time.
  • connect srcs, tranforms & destinations by connecting data paths (green or red).
Configure transformations -

Create control flow by using the control flow designer -
  • The control flow is the workflow engine & contains control flow tasks, containers & precedence constraints. In BIDs we define control flow objs with the control flow designer (1st tab).
  • (Dataflow is directly related to processing & transforming data from sources -> destinations).
  • 3 primary types of control flow objs:
  • 1. Control flow tasks - workflow objs that perform operation-lvl jobs.
  • 2. Containers - provide a grouping mechanism for tasks.
  • 3. Constraints - Allow one to connect tasks & containers and thus define exec order + precedence.

Control Flow Tasks -

  • Each CFT has an editor to configure it.
  • Pckg must contain at least 1 task.

Control Flow Containers -

  • A container allows one to grp tasks to control parellelization, ordering, logging & txns.
  • Containers also provide the ability to execute the tasks within them several times based on iterative requirements.
  • Sequence container - Lets you organise subordinate tasks by grping them together, & then apply txn or logging to the cont.
  • For Loop container - As Seq. Cnt. but also allow for running of tasks therein to be run multiple times, based on an eval. condition e.g. looping from 1 to 10.
  • ForEach Loop Cont. - instead of providing a condition expr. for looping, you loop over a set of objs such as files in a folder. Can iterate over diff. types of objects.
Sequence tasks by using precedence constraints -

Organize tasks by using containers -

Manage SSIS package configurations -

Connect to different data source types by adding connection managers -
  • Data Sources - An SSIS Project object. live at project lvl and are found in Solution Explorer under logical folder 'Data Sources'. Once an existing data src is changed you need to open each package to synch the conn string inside each pckg, with the data src.
  • Connections - scoped @ pckg lvl. Connections can be based on project data sources (New Connection from Data Source...) or can stand alone within pckgs.

No comments: