Thursday, July 31, 2008

LO-COCKPIT extraction


The first and the most important extraction method. LO-COCKPIT

Positioning of LO-COCKPIT:



What is Logistics Cockpit?
We can say that it’s a new technique to extract logistics information and consists of a series of a standard extract structures (that is, from a more BW perspective, standard datasources), delivered in the business content.

Mind you that, the navigation part will no be discussed here but only the theory part which is required for understanding LO extraction process. The navigation steps will be mentioned in brief.

Here we start...........

Data Extraction is the process of loading data from OLTP to OLAP (BW/BI). Here is an illustration...

I have a company, in which daily thousands of transactions happen all over the world. So to analyze my business on yearly or monthly basis, i am moving to SAP BW/BI, so that i can generate reports and take business decisions.

Tomorrow i am going to load all the data which was captured till yesterday, from SAP R/3 to BW/BI. I do a full load for this. After completing this task, i need to load the transactions that will happen from tomorrow to BW. This can be done either daily or weekly or monthy based on the volume of transactions.
If there are 1000s of transactions per day, i can use daily load, 10000s - weekly, if in lakhs- monthly.

So, in precise, data has to be extracted in two modes:

1. Full load - Entire data which is available at source is loaded to BW/BI
2. Delta load - Only the new/changed/deleted data is loaded.


Full Load Data FLow:
Let us see, how the data is loaded to BW in Full mode.

























Here we need to understand few basic things which happen on R/3 side.

Document Posting means creating a transaction, writing into the application/transaction tables.
So whenever sales order is created ( document posted), it transaction is written into the database tables/application tables/transaction tables (Ex. EKPO, EKKO, VBAK, VBAP)

Whenever you are doing a full load, setup tables are used.

setup tables:

Access to application tables are not permitted, hence setup tables are there to collect the required data from the application tables.


When a load fails, you can re-run the load to pull the data from setup tables. Data will be there in setup tables. Setup tables are used to Initialize delta loads and for full load. Its part of LO Extraction scenario.

With this option, you avoid pulling from R/3 directly as we need to bring field values from multiple tables. You can see the data in the setup tables.Setup table table name wiil be extract structure name followed by SETUP. Set up table names starts with 'MC' followed by application component '01'/'02' etc and then last digits of the datasource name and then followed by SETUP
Also we can say the communication structure (R/3 side,you can check it in LBWE also) name followed by 'setup'

example: MC13VD0HDRSETUP
  • If you want to check data in set up tables you better look at the transaction NPRT here you can see the table name from which data is picking.
  • Setup tables are cluster tables and are used to extract the data from R/3 Tables.(LO Extractors)
  • Basically, for entire application like SD-Billing we have got it's own setup Tables...so while filling the set-up tables, we usually fill for the entire application.
Ex: OLI7BW is for filling setup Tables for SD application.
OLI9BW T-code is for Billing Application,
  • When u fill the setup Tables, the data from different tables..VBAK, VBAP, VBRK, VBRP...etc will come through communication Structures and saved in SetupTables...
  • The main advantage of having setup Tables is, we can read the data in different levels..Header level as well as Item level.
  • when we run init load or Full load in BW, the data will be read from Setup Tables for the first time( Entire data will be read).... and the delta records will be updated to Delta Queue once the v3 job runs... and we can extract the delta records from Delta Queue.
  • once we succefully run the init, we can delete setup Tables.
  • Filling up the set up tables depends on the datasource.
There are different T-codes for the respective extract structures

OLIIBW transaction PM data
OLI1BW INVCO Stat. Setup: Material Movemts
OLI2BW INVCO Stat. Setup: Stor. Loc. Stocks
OLI3BW Reorg.PURCHIS BW Extract Structures
OLI4BW Reorg. PPIS Extract Structures
OLI7BW Reorg. of VIS Extr. Struct.: Order
OLI8BW Reorg. VIS Extr. Str.: Delivery
OLI9BW Reorg. VIS Extr. Str.: Invoices
OLIABW Setup: BW agency business
OLIFBW Reorg. Rep. Manuf. Extr. Structs
OLIIBW Reorg. of PM Info System for BW
OLIQBW QM Infosystem Reorganization for BW
OLISBW Reorg. of CS Info System for BW
OLIZBW INVCO Setup: Invoice Verification
OLI7BW is the tcode for Sales Order.

Steps for full load:

*************** This will be updated later - the navigation part*********************


Delta Load:

As a prerequisite we need to discuss various update methods for delta load.

  1. Serialized V3
  2. Queued Delta
  3. Direct Delta
  4. Unserialized V3
Before that we need to understand V1, V2, V3 updates. These are different work processes on the application server that makes the update LUW from the running program and execute it. These is separated to optimize the transaction processing capabilities.

These are different work processes on the application server that makes the update LUW from the running program and execute it. These is separated to optimize the transaction processing capabilities.

For Example :
If you create/change a purchase order (me21n/me22n), when you press 'SAVE' and see a success message (PO.... changed..), the update to underlying tables EKKO/EKPO has happened (before you saw the message). This update was executed in the V1 work process.

There are some statistics collecting tables in the system which can capture data for reporting. For example, LIS table S012 stores purchasing data (it is the same data as EKKO/EKPO stored redundantly, but in a different structure to optimize reporting). Now, these tables are updated with the transaction you just posted, in a V2 process. Depending on system load, this may happen a few seconds later (after you saw the success message). You can see V1/V2/V3 queues in SM12 or SM13.

Statistical tables are for reporting on R/3 while update tables are for BW extraction. And is data stored redundantly in these two (three if you include application tables) sets of table.
Difference is the fact that update tables are temporary, V3 jobs continually refreshes these tables (as I understand). This is different from statistics tables which continue to add all the data. Update tables can be thought of as a staging place on R/3 from where data is consolidated into packages and sent to the delta queue (by the V3 job).

Update tables can be bypassed (if you use 'direct' or 'queued' delta instead of V3) to send the updates (data) directly to the BW queue (delta queue). V3 is however better for performance and so it is an option along with others and it uses update tables.

Statistical table existed since pre BW era (for analytical reporting) and have continued and are in use when customers want their reporting on R/3.

The structure of statistical table might be different from the update table/BW queue, so, even though it is based on same data, these might be different subsets of the same superset.
V3 collective update means that the updates are going to be processed only when the V3 job has run.

At the time of oltp transaction, the update entry is made to the update table. Once you have posted the txn, it is available in the update table and is waiting for the V3 job to run. When V3 job runs, it picks up these entries from update table and pushes into delta queue from where BW extraction job extracts it.

  • Synchronous update (V1 update): Statistics update is carried out at the same time (synchronous) as the document update (in the application tables).
  • Asynchronous update (V2 update): Document update and the statistics update take place in different tasks.
So, V1 and V2 updates don’t require any scheduling activity.
  • Collective update (V3 update):As for the previous point (V2), document update is managed in a separate moment from the statistics update one, but, unlike the V2 update, the V3 collective update must be scheduled as a job (via LBWE).
Remember that the V3 update only processes the update data that is successfully processed with the V2 update.

-------------------------------------------------------------------------------------
Serialized V3:

Take an example of the same PO item changing many times in quick succession.
V1 (with enqueue mechanism) ensures that the OLTP tables are updated consistently. Update table gets these update records which may or may not end up in correct sequence (as there is no locking) when it reaches BW. 'Serialized V3' was to ensure this correct sequence of update records going from update tables to delta queue (and then to BW).

Since update table records have the timestamp, when the V3 job runs, it can sequence these records correctly and thus achieve 'serialization'.

The problems in Serialized V3 update are:
  • Several changes in one second: For technical reasons, collective run updates that are generated in the same second cannot be serialized. That is, the serialized V3 update can only guarantee the correct sequence of extraction data in a document if the document did not change twice in one second.
  • Different instances and times synchronization: I think it’s easy to verify how much it is probable that in a landscape in which there are several application servers for the same environment different times can be displayed.The time used for the sort order in our BW extractions is taken from the R/3 kernel which uses the operating system clock as a time stamp. But, as experience teaches, in general, the clocks on different machines differ and are not exactly synchronized.The conclusion is that the serialized V3 update can only ensure the correct sequence in the extraction of a document if the times have been synchronized exactly on all system instances, so that the time of the update record (determined from the locale time of the application server) is the same in sorting the update data.

  • The V2 update dependence: Not to be pitiless, but the serialized V3 update have also the fault of depending from the V2 processing successful conclusion.Our method can actually only ensure that the extraction data of a document is in the correct sequence (serialized) if no error occurs beforehand in the V2 update, since the V3 update only processes update data for which the V2 update is successfully processed.Independently of the serialization, it’s clear that update errors occurred in the V2 update of a transaction and which cannot be reposted, cause that the V3 updates for the transaction that are still open can never be processed.This could thus lead to serious inconsistencies in the data in the BW system.














Example:

Take a case where the first update (based on earliest timestamp) to be processed is in language EN (for same PO item). V3 job is then going to process all the update records of language EN in chronological sequence before going to next language records. If another language update (for same PO item) happened in between two EN language updates, this is going to be processed later after all EN updates are processed and thus become out of sequence.















In the above figure, all the documents in red color (EN language) will be processed first and later blue colored (IT language), which is an inconsistancy in sequence.


Direct Delta ( 2nd delta update method in our list)

With this update mode,
  • Each document posting is directly transferred into the BW delta queue
  • Each document posting with delta extraction leads to exactly one LUW in the respective BW delta queues
Just to remember that ‘LUW’ stands for Logical Unit of Work and it can be considered as an inseparable sequence of database operations that ends with a database commit (or a roll-back if an error occurs).

Benifits:
  • There’s no need to schedule a job at regular intervals (through LBWE “Job control”) in order to transfer the data to the BW delta queues; thus, additional monitoring of update data or extraction queue is not required.
  • Logically, restrictions and problems described in relation to the "Serialized V3 update" and its collective run do not apply to this method: by writing in the delta queue within the V1 update process, the serialization of documents is ensured by using the enqueue concept for applications and, above all, extraction is independent of V2 update result.
Limits:
  • The number of LUWs per datasource in the BW delta queues increases significantly because different document changes are not summarized into one LUW in the BW delta queues (as was previously for V3 update).Therefore this update method is recommended only for customers with a low occurrence of documents (a maximum of 10000 document changes - creating, changing or deleting - between two delta extractions) for the relevant application. Otherwise, a larger number of LUWs can cause dumps during extraction process.
  • No documents can be posted during delta initialization procedure from the start of the recompilation run in R/3 (setup tables filling job) until all records have been successfully updated in BW: every document posted in the meantime is irrecoverably lost.
  • V1 update would be too much heavily burdened by this process.
(Remember that stopping the posting of documents always applies to the entire client).


------------------------------------------------------------------------------------------

Queued Delta ( the third update method)

With queued delta update mode, the extraction data (for the relevant application) is written in an extraction queue (instead of in the update data as in V3) and can be transferred to the BW delta queues by an update collective run, as previously executed during the V3 update.
After activating this method, up to 10000 document delta/changes to one LUW are cumulated per datasource in the BW delta queues.

If you use this method, it will be necessary to schedule a job to regularly transfer the data to the BW delta queues

As always, the simplest way to perform scheduling is via the "Job control" function in LBWE.
SAP recommends to schedule this job hourly during normal operation after successful delta initialization, but there is no fixed rule: it depends from peculiarity of every specific situation (business volume, reporting needs and so on).

Benifits:

  • When you need to perform a delta initialization in the OLTP, thanks to the logic of this method, the document postings (relevant for the involved application) can be opened again as soon as the execution of the recompilation run (or runs, if several and running in parallel) ends, that is when setup tables are filled, and a delta init request is posted in BW, because the system is able to collect new document data during the delta init uploading too (with a deeply felt recommendation: remember to avoid update collective run before all delta init requests have been successfully updated in your BW!).
  • By writing in the extraction queue within the V1 update process (that is more burdened than by using V3), the serialization is ensured by using the enqueue concept, but collective run clearly performs better than the serialized V3 and especially slowing-down due to documents posted in multiple languages does not apply in this method.
  • On the contrary of direct delta, this process is especially recommended for customers with a high occurrence of documents (more than 10,000 document changes - creation, change or deletion - performed each day for the application in question.
  • In contrast to the V3 collective run (see OSS Note 409239 ‘Automatically trigger BW loads upon end of V3 updates’ in which this scenario is described), an event handling is possible here, because a definite end for the collective run is identifiable: in fact, when the collective run for an application ends, an event (&MCEX_nn, where nn is the number of the application) is automatically triggered and, thus, it can be used to start a subsequent job.
  • Extraction is independent of V2 update.
Limits:
  • V1 is more heavily burdened compared to V3.
  • Administrative overhead of extraction queue.
Note:
  1. if you want to take a look to the data of all extract structures queues in Logistic Cockpit, use transaction LBWQ or "Log queue overview" function in LBWE (but here you can see only the queues currently containing extraction data).
  2. In the posting-free phase before a new init run in OLTP, you should always execute (as with the old V3) the update collective run once to make sure to empty the extraction queue from any old delta records (especially if you are already using the extractor) that, otherwise, can cause serious inconsistencies in your data.
  3. Then, if you want to do some change (through LBWE or RSA6) to the extract structures of an application (for which you selected this update method), you have to be absolutely sure that no data is in the extraction queue before executing these changes in the affected systems (and especially before importing these changes in production environment !).
    To perform a check when the V3 update is already in use, you can run in the target system the RMCSBWCC check report.


--------------------------------------------------------------------------------------
Unserialized V3 : (The last one)

With this update mode, that we can consider as the serializer’s brother, the extraction data continues to be written to the update tables using a V3 update module and then is read and processed by a collective update run (through LBWE).
But, as the name of this method suggests, the V3 unserialized delta disowns the main characteristic of his brother: data is read in the update collective run without taking the sequence into account and then transferred to the BW delta queues.

Issues:

  • Only suitable for data target design for which correct sequence of changes is not important e.g. Material Movements
  • V2 update has to be successful
When this method can be used ?

Only if it’s irrelevant whether or not the extraction data is transferred to BW in exactly the same sequence (serialization) in which the data was generated in R/3 (thanks to a specific design of data targets in BW and/or because functional data flow doesn’t require a correct temporal sequence).




Here ends the update methods.

************************************************************

Some important points :

  • The setup tables are the base tables for the Datasource used for Full upload.So if you are going for only full uploadfull update is possible in LO extractors.
  • Full update is possible in LO extractors.In the full update whatever data is present in the setup tables(from the last done init) is sent to BW.
  • But setup tables do not receive the delta data from the deltas done after the init.So if ur full update should get ALL data from the source system,u will need to delete and re-fill setup tables.
**************************************************************
Some Questions:

Question:
The serialized V3 update can guarantee the correct sequence for the extraction data of a document only if there were no errors in the V2 update. This is because the V3 update only processes update data for which a V2 update has be carried out successfully.
Why is V3 dependent on V2, what is V2 and V1 update?

Answer:
V1 - Synchronous update

V2 - Asynchronous update

V3 - Batch asynchronous update

These are different work processes on the application server that takes the update LUW (which may have various DB manipulation SQLs) from the running program and execute it. These are separated to optimize transaction processing capabilities.

Taking an example -
If you create/change a purchase order (me21n/me22n), when you press 'SAVE' and see a success message (PO.... changed..), the update to underlying tables EKKO/EKPO has happened (before you saw the message). This update was executed in the V1 work process.

There are some statistics collecting tables in the system which can capture data for reporting. For example, LIS table S012 stores purchasing data (it is the same data as EKKO/EKPO stored redundantly, but in a different structure to optimize reporting). Now, these tables are updated with the txn you just posted, in a V2 process. Depending on system load, this may happen a few seconds later (after you saw the success message). You can see V1/V2/V3 queues in SM12 or SM13.

V3 is specifically for BW extraction. The update LUW for these is sent to V3 but is not executed immediately. You have to schedule a job (eg in LBWE definitions) to process these. This is again to optimize performance.

V2 and V3 are separated from V1 as these are not as realtime critical (updating statistical data). If all these updates were put together in one LUW, system performance (concurrency, locking etc) would be impacted.

Serialized V3 update is called after V2 has happened (this is how the code running these updates is written) so if you have both V2 and V3 updates from a txn, if V2 fails or is waiting, V3 will not happen yet.

BTW, 'serialized' V3 is discontinued now, in later releases of PI you will have only unserialized V3.
--------------------------------------------------------------------
Question:

There are following tables
1. Application tables (R/3 tables)
2. Statistical tables (for reporting purpose)
3. update tables
4. BW queue

For Application tables its V1 update, statistical tables its V2 update and is it that the same information is again redundantly stored in update tables?

How are statistical tables different from update tables? I mean i understood what statistical tables are, my question is "Is the same information again redundantly stored in update tables for Collective V3 update to pull the records to BW Queue".

I mean is V3 collective update same as Synchronous V3 update? How does the records get saved in update tables?

Answer:

Statistical tables are for reporting on R/3 while update tables are for BW extraction. Is data stored redundantly in these two (three if you include application tables) sets of table?, yes it is.
Difference is the fact that update tables are temporary, V3 jobs continually refresh these tables (as I understand). This is different from statistics tables which continue to add all the data. Update tables can be thought of as a staging place on R/3 from where data is consolidated into packages and sent to the delta queue (by the V3 job).

Update tables can be bypassed (if you use 'direct' or 'queued' delta instead of V3) to send the updates (data) directly to the BW queue (delta queue). V3 is however better for performance and so it is an option alongwith others and it uses update tables.

Statistical table existed since pre BW era (for analytical reporting) and have continued and are in use when customers want their reporting on R/3.

The structure of statistical table might be different from the update table/BW queue, so, even though it is based on same data, these might be different subsets of the same superset.

V3 collective update means that the updates are going to be processed only when the V3 job has run. I am not sure about 'synchronous V3'. Do you mean serialized V3?

At the time of oltp transaction, the update entry is made to the update table. Once you have posted the txn, it is available in the update table and is waiting for the V3 job to run. When V3 job runs, it picks up these entries from update table and pushes into delta queue from where BW extraction job extracts it.
-----------------------------------------------------------------------------------
Question:

what do you mean by serialization?is it the serialization beween sequence of records in update tables to the sequence in BW Queue?
and
Can you explain little more about the Collective run performance with different languages.

Answer:

The requirement in 'delta' capturing on R/3 side is to be able to capture the delta 'exactly once in order'.

Take an example of the same PO item changing many times in quick succession.

V1 (with enqueue mechanism) ensures that the OLTP tables are updated consistently. Update table gets these update records which may or may not end up in correct sequence (as there is no locking) when it reaches BW. 'Serialized V3' was to ensure this correct sequence of update records going from update tables to delta queue (and then to BW).

Since update table records have the timestamp, when the V3 job runs, it can sequence these records correctly and thus achieve 'serialization'. However, there is a technical problem with this. The timestamp recorded in update record is sent by the application server (where user executed the txn) and if there are multiple app servers there might be some inconsistency in their system time which can cause incorrect serialization.

Another problem is in the fundamental design of the V3 process. V3 Job sequences the updates on timestamp, and then processes the update records from update table (to send it to delta queue), but it does so for one language at a time (update record also has user logon language stored). Why this is done is not clear to me, but it is a basic design feature and can not be subverted.

This causes a potential issue if multiple logon languages are used by users. Serialization may not happen correctly in such a case. Take a case where the first update (based on earliest timestamp) to be processed is in language EN (for same PO item). V3 job is then going to process all the update records of language EN in chronological sequence before going to next language records. If another language update (for same PO item) happened in between two EN language updates, this is going to be processed later after all EN updates are processed and thus become out of sequence. The weblog mentions this scenario.

These two constraints remain for 'serialized V3' where 'serialization' couldn't be truly achieved. Hence newer PIs have discarded 'serialized V3' altogether and now you do not have this option (if you are using a newer PI).

If you use 'serialized V3', you have to be clear that the 'serialization' may not always work in the above two scenarios (multi language environment, and multiple app servers or updates to same records in same second(as timestamp has granularity upto second level only)).

***************************************************************************

Now we will discuss about the functions of LO-COCKPIT:
  • Maintain Extract Strucutres: Here you can add additional fields from the communication structures available to the extract structure.
  • Maintain Data Sources: In the Data source maintenance screen, you can customize the data source by using the following fields: field name, short text, selection, hide field, inversion or cancellation field or reverse posting, and field only known in customer exit.
  • Activating update: By Setting as active, data is written into extract structures both online as well as during completion of setup tables or restructure table or LO initialization tables. Depending on the update mode a job has to be scheduled with which the updated data is transferred in the background into the central delta management (Delta Queue).
  • Controlling update:This talks about the delta update mode you are using and how do you control the data load based on the volume of data. LO Cockpit supports 4 types of update modes ( delta modes, which we have already discussed):Serialized V3 update,Direct Delta,Queued Delta,Unserialized V3 update.
***************************************************************************

With this the theory what ever is required for LO-COCKPIT ends here. If you need navigation steps, please give comments.

To download Ebook version of the above content, click here.

Extraction

First of all we will go through the extraction part which is the toughest and the most important.

As a first step we will discuss about various extractor methods and at the end we will categorize these extractors and discuss why there have been categorized so.

Methods:

1.LO-COCKPIT
2.CO-PA
3.FI-SL
4. LIS
5. Generic Extractor

DATA WAREHOUSE


What is a data Warehouse?



"A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process".

subject-oriented : Data that gives information about a particular subject instead of about a company's ongoing operations.

integrated : Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

time-variant : All data in the data warehouse is identified with a particular time period.

non-volatile : Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.


What is Data WareHousing?

Data warehousing is essentially what you need to do in order to create a data warehouse, and what you do with it. It is the process of creating, populating, and then querying a data warehouse.


Data WareHousing tool

SAP-BW, SAP-BI (7.0, the latest version).

There area three data warehousing functions:

1. Modeling
2. Extraction
3. Reporting

Modeling : This talks about, the architecture and design of the data objects, how you are going to store the data and where and what type of data.

Extraction: This is about how you are loading the data from OLTP (R/3) to OLAP (SAP-BW). In this process, you will take care of filtering the data and adding up more data based business requirements. Here you have various extractors using which you can extrat data of a particular application.

Reporting: What ever is done till now on a warehouse tool, is ultimtely to generate a report. Reporting is done based on business requirement, for the higher executives to take business decisions after analysing the reports.