1. What is Informatica exactly? This is a popular ETL (extraction, transformation, load) tool used for data mining. It has a simple visual interface and is very easy to use, and can be automated to do the data mining at a later time.
2. How does ETL work in Informatica? Data from a variety of sources – mainframes, RDBMSs, flat files, XML, VSM, SAP, and more) – is extracted. This information is converted to a common format and performs any queries required. It then loads the data into a useable database. This series of steps is called a mapping.
3. What are the developer tools within Informatica? These would be the PowerCenter Client tools, which enable the developer to design the transformation process (Designer tool), define the run-time processes for mapping (Workflow Manager), monitor session execution (Workflow Manager), manage the repository (Repository Manager) and report meta data (Medadata Reporter.)
4. What is a transformation? This is a repository object which manipulates the data which is extracted. Transformations can be Active (changing the number of rows going through transformation, changing transaction boundaries, changing row types) or Passive, which does not do the Active activities. A connected transformation connects with other transformations or with a target table; and UnConnected transformation does not. These are the types of transformations; there are many transormations available under these types.
5. What is the Integration Service? This is a program within Informatica that manages your orders for scheduling workflows. Each data mining project is a workflow Workflows can be run on demand, continuously, repeated at a given time or set at intervals.
6. What are reusable schedules? In the Workflow Manager, you can name a particular schedule so that you can use it again. You can assign several workflows to the same schedule. Just be careful if you delete a reusable schedule – all the workflows assigned to it become invalid so you need to reassign a schedule to them
7. What is the difference between abort and stop? A stop order will not terminate the workflow until all running objects are completed. Abort stops the running object immediately.
8. What is the Look up function? When data comes in, it is compared to a lookup table. When found, the related data is returned. Functions just like a VLookup table in Excel.
9. What are the index and data caches? The index cache stores only the data indices. The data cache stores the actual data values.
10. What are batches and sessions? A batch is a collection of sessions to be run either sequentially or all at the same time. Each session is a set of instruction telling how and when to ove data from a source to a target.