Showing posts with label dataflow. Show all posts
Showing posts with label dataflow. Show all posts

Thursday, March 22, 2012

DataSet table as DataFlow Destination Component

Is it possible to use a datatable as destination component in dataflow ?

Sample code Fr MS:
Create the source component.
IDTSComponentMetaData90 source = dataFlow.ComponentMetaDataCollection.New();

source.ComponentClassID = "DTSAdapter.OleDbSource";
CManagedComponentWrapper srcDesignTime = source.Instantiate();
srcDesignTime.ProvideComponentProperties();

// Create the destination component.
IDTSComponentMetaData90 destination = dataFlow.ComponentMetaDataCollection.New();

destination.ComponentClassID = WHAT ? for dataset Table ?

CManagedComponentWrapper destDesignTime = destination.Instantiate();
destDesignTime.ProvideComponentProperties();

thanks in advance
PatrickI saw that the datareader can be used as Destination.
So, my "ask for" is closing.
Thanks
Patricksql

dataset object dispose after first use

I have an Execute SQL Task that returns a dataset to variable DfltValData. A dataflow follows that with a script component that access that dataset (read only variable) (see code below) and everything is fine. Now, after that, there's another dataflow with a script component, with the same code as below, trying to access DfltValData. Here is where the problem is, the DfltValData object does not contains any row. Whats happening and how to solve this?

Thanks!

Dim olead As New Data.OleDb.OleDbDataAdapter
Dim dt As New Data.DataTable
Dim row As System.Data.DataRow
olead.Fill(dt, Me.Variables.DfltValData)
For Each row In dt.Rows
.

.

.// read value from row

.HAHAHAHA!! Not to laugh at your problem, but search a bit and you'll find a VERY recent discussion on this very topic.|||http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1406350&SiteID=1|||

Use a MULTICAST if you need to work with same rows again.

There is a code work-around (shown in the forum post Phil referenced), but it's ugly.

|||I don't understand how multicast would work if we have multiple dataflow.|||

Sorry, I didn't read the original post closely enough.

If the basic problem is that you want to reuse the data in another data flow, you could use a multicast in your first data flow, and output the data to a raw file. Then use a raw file source to pull it into your second data flow.

|||

jwelch wrote:

Sorry, I didn't read the original post closely enough.

If the basic problem is that you want to reuse the data in another data flow, you could use a multicast in your first data flow, and output the data to a raw file. Then use a raw file source to pull it into your second data flow.

Exactly. Or simply use a multicast to populate 2 variables in the first place with exactly the same data.

-Jamie

Monday, March 19, 2012

Datareader Destination as source for other datareader source ?

HI!

as far as I know from docs and forum datareader is for .NET data in memory. So if a use a complex dataflow to build up some data and want to use this in other dataflow componens - could i use data datareader source in the fist dataflow and then use a datareader souce in the second dataflow do read the inmemoty data from fist transform to do fursther cals ?

how to pass in memory data from one dataflow to the next one (i do not want to rebuild the logic in each dataflow to build up data data ?

Is there a way to do this ? and is the datareader the proper component ? (because its the one and only inmemory i guess, utherwise i need to write to temp table and read from temp table in next step) (I have only found examples fro .NET VB or C# programms to read a datareader, but how to do this in SSIS directly in the next dataflow?

THANKS, HANNES

Hannes,

You would need to build an ADO.Net provider for SSIS packages in order to do this. We have built one at Conchango but I think my boss is reluctant for us to just give it away.

-Jamie

|||

Hi Jamie!

I could understand your boss..

I would expect there is some integrated way because it is essential to get data from one dataflow to an other dataflow.

Do you know any othere method (without temp tables) to move data from one dataflow to to next?

I do not understand the difference between datareader and oledb source/dest? is there reader only for .NET Programming, oder is there a special purpose for this objects i do not realize (it seems to be the same escape for integration with other .NET data consumer which could read the Datareader...)

THANKS, HANNES

|||

hannes,

I have just realised that I completely misunderstood your first post. I apologise for that.

The answer I gave previously applies if you want to pass data from one PACKAGE to another. I realise now that what you want to do is pass data from DATA-FLOW to another. You have 2 options for doing that. Raw files or ADO recordsets.

I compare the two here: http://blogs.conchango.com/jamiethomson/archive/2006/06/28/4159.aspx and its pretty conclusive about which you should use.

Again, my apologies for misreading your initial post!

-Jamie

P.S. Hannes, are you still at MISAG?

|||

That is it, but what I cannot understand why there is no in memory way.

THANKS

PS: No! - AVENESS - see signature for further details (for the moment only in german)

|||

There is - the ADO recordset destination provides this functionality. Admittedly you have to write code to access it again in a script source component but this is because the metadata of the recordset is not (and cannot be) persisted in the Object variable at design-time.

-Jamie

DataPump performance problem

I'm using an SSIS under SQL 2005; i transfer data with a DataFlow Component:

DataSource = OLEDB Source;

Destination = OLEDB Destination

The source is a View returning about 100000 records; if i execute it in query analyser it takes few seconds. But if i use it as DataPump Source it takes HOURS to execute it...

I've noticed that If i modify the SELECT query inserting a "TOP 10000000000", it takes few seconds..

Why? Am i missing something?

Could I ask what would be wrong with just always using Select TOP 100% in your query? If adding the TOP 100% explicitly speeds things up that much, then that is what I would do.|||

Netoblivion wrote:

I'm using an SSIS under SQL 2005; i transfer data with a DataFlow Component:

DataSource = OLEDB Source;

Destination = OLEDB Destination

The source is a View returning about 100000 records; if i execute it in query analyser it takes few seconds. But if i use it as DataPump Source it takes HOURS to execute it...

I've noticed that If i modify the SELECT query inserting a "TOP 10000000000", it takes few seconds..

Why? Am i missing something?

Read this article:

SELECT *... or select from a dropdown in an OLE DB Source component?
(http://blogs.conchango.com/jamiethomson/archive/2006/02/21/2930.aspx)

as this could well be the problem.

Let us know!!

Thanks

-Jamie

|||Thanks Jamie that really helped me to solve this problem!!!

Sunday, March 11, 2012

Dataflow with Workflow

I know the idea was to seperate workflow and dataflow, but I have come across a scenario where it would be useful for a branch of a dataflow to wait until another branch has finished.

I have some transactional data which records events for the start and end of a session. I want to build a list of unique sessions with the start and end date. I currently have the list of events sorted by time, followed by a conditional split for the start and end events. I can then insert all of the start events and would like to wait until all of the starts are inserted before updating them with their relevant end times.

Is this achievable?

Does anyone else think it would be a good idea to be able to set precendence across multiple branches of a data flow?

Does anyone have a better solution?

I know this is the wrong forum, but is there a way to model this against the transactional data in SSAS, I will move this question to the SSAS forum if anyone can think this would work!

Philip Coupar

Unfortunately, setting execution precendences in between branches in the data flow is not supported today. Until we redesign the engine to support this feature, I would consider other possibilities to solve the problem, such as using a combination of task flow and data flow.

|||A simple strategy is to use raw files. Stage the updates in a raw file and use a second Data Flow Task to load them and run the updates. Raw Files are very efficient.|||

Philip Coupar wrote:

I know the idea was to seperate workflow and dataflow, but I have come across a scenario where it would be useful for a branch of a dataflow to wait until another branch has finished.

I have some transactional data which records events for the start and end of a session. I want to build a list of unique sessions with the start and end date. I currently have the list of events sorted by time, followed by a conditional split for the start and end events. I can then insert all of the start events and would like to wait until all of the starts are inserted before updating them with their relevant end times.

Is this achievable?

Does anyone else think it would be a good idea to be able to set precendence across multiple branches of a data flow?

Does anyone have a better solution?

I know this is the wrong forum, but is there a way to model this against the transactional data in SSAS, I will move this question to the SSAS forum if anyone can think this would work!

Philip Coupar

Phil,

As Darren said, raw files are the solution here.

On a similar vein though I have requested some functionality that would allow you to prioritise destinations:

Intrinsic Flow Priority
(https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=178058)

Would this help you?

-Jamie

|||

Raw files may well be very efficient, and space may be cheap but I already have a stage table with 4 million transaction rows which breaks out to 2 million start events and 2 million end events. I don't really want to read 4 million rows from a set of text files into a SQL Server table to read back out all the rows while applying some transformation logic to it, to split on one of the transformed columns to put half the data back out to a file, so that at some later point I can pick up that file again and update my 2 million start events with their end time.

That is just the incremental data, to rebuild the entire model would involve reprocessing 250 Million transaction rows, the pipeline uses buffers to manage the number of rows in memory, the raw file in this case would require 125 Million rows to be written out. I know I could batch up the rebuild into 4 Million rows sets, but I quite like the "Read Once, Transform Many, Write Once" mentality.

|||

Jamie Thomson wrote:

Phil,

As Darren said, raw files are the solution here.

On a similar vein though I have requested some functionality that would allow you to prioritise destinations:

Intrinsic Flow Priority
(https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=178058)

Would this help you?

-Jamie

That looks great, I hope anyone else following this tread will have a look and vote. My only additional request to your proposal would be to include the OLEDB Command Component as well as the destinations so that precendence could be specified over inserts and updates.

As well as the scenario above we have a number of data feeds which have data columns split across multiple source files and not all sets of file contain distinct data sets, sometime a file can contain data items that relate to data from a previous set of files. This introduces exactly the same scenario where I would like to insert all new rows and after that use other data in the data flow to update these or existing rows.

|||

Philip Coupar wrote:

Jamie Thomson wrote:

Phil,

As Darren said, raw files are the solution here.

On a similar vein though I have requested some functionality that would allow you to prioritise destinations:

Intrinsic Flow Priority
(https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=178058)

Would this help you?

-Jamie

That looks great, I hope anyone else following this tread will have a look and vote. My only additional request to your proposal would be to include the OLEDB Command Component as well as the destinations so that precendence could be specified over inserts and updates.

Agree with that totally.

Note to anyone else. Don't just vote - that is useless. Add a use case for why you want this functionality (as Phil has done).

Philip Coupar wrote:

As well as the scenario above we have a number of data feeds which have data columns split across multiple source files and not all sets of file contain distinct data sets, sometime a file can contain data items that relate to data from a previous set of files. This introduces exactly the same scenario where I would like to insert all new rows and after that use other data in the data flow to update these or existing rows.

Hmmm...the scenario I have in mind is that you either insert a row or update it. But I guess there is no reason why this "Intrinsic Flow Priority" feature couldn't be used to both insert a record and then update it. Interesting....

One more thing Phil, can you make sure you add all the details from your use case on this thread to the connect posting? They'll take more notice of it if you give them more reasons to.

Ta

Jamie

Dataflow with 2 lookup using the same Table

Hi there

I'm getting into trouble everytime I want to setup a dataflow containing 2 or more flows, where I want to make a lookup into 1 other table.

This fails, of some reason I do not know of.

ex.

Table 1 -> Lookup Currency Table -> Table2 with currency
Table 3 -> Lookup Currency Table -> Table 4 with currency

One of the lookup will fail.

How do you avoid this?

/Bjarne

You meant, you have more than 1 source component in your data flow that goes independently of each other toward different lookup transformations?

How are you setting the lookup transformations? You shoud use a query to get only the columns you need and since the lookup table is used concurrently; you may want to use nolock hint to avoid deadlocks.

Are you actually receiving any error message?

Rafael Salas

Dataflow where sourcecolumns vary ?

Is it possible to set up at dataflow with for example an excel source that points to a sheet with varying number of colums.

In the file there is some information about number of columns. But the question is here how you use that information to define the columns in the dataflow ?

The columns in the data-flow need to be known at design-time so by and large you cannot chage things at runtime.

There is a workaround though - you can change a package that you are about to execute using the execute package task. Here's how courtesy of Kirk Haselden: http://www.sqljunkies.com/WebLog/knight_reign/archive/2005/12/31/17731.aspx

-Jamie

dataflow to excel - Convert numbers stored as text to numbers Excel Cell Error

I'm trying to write data to excel from an ssis component to a excel destination.

Even thought I'm writing numerics, every cell gets this error with a green tag:

Convert numbers stored as text to numbers

Excel Cells were all pre-formated to accounting 2 decimal, and if i manually type the exact data Im sending it formats just fine.

I'm hearing this a common problem -

On another project I was able to find a workaround for the web based version of excel, by writing this to the top of the file:

<style>.text { mso-number-format:\@.; } </style>

is there anything I can pre-set in excel (cells are already formated) or write to my file so that numerics are seen as numerics and not text.

Maybe some setting in my write drivers - using sql servers excel destination.

So close.. Thanks for any help or information.

You received some sugestion on your other thread: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1392693&SiteID=1

Opening new threads for same problem makes more dificult to help you.

Dataflow task -> Error Handling

Hi,
In terms of data flow tasks, when say we load text files into databases.

Is it possible to have it in a way so that if a certain record (line in the text file) fails to load due to watever reason, it gets written to another table, but the rest of the records still get loaded?

I try to do so and end up with the whole data flow task failing and it stalls at the record that had the error and doesn't seem to continue forward.

I just used the red arrow (on failure) and put that to another SQL destination object. But yeah that didnt work.

If someone has a better way of doing so, would be awesome if you can share that.

Cheers

You have to set the 'Access mode' property of 'OleDB DEstination' OpenRowset.It should not be fastload.Then set ErrorOuput as Redirect Row. Now you can get your erroneous rows redirected.

This should do.

|||yup...i didnt know about the accessmode property. that did it. thanks
|||Dear all,

I face the same problem but I didn't find the 'Access mode' property of 'OleDB DEstination' that contain OpenRowset its only contain Tabel and View with Fastload. Where I can find that property ?
And what if the error occured in Flat File Source ? can we still using the same solution ? Thanks in advance.

Best Regards,

Hery|||Dear all,

Please help me on this.

Thanks in advance.

Dataflow task -> Error Handling

Hi,
In terms of data flow tasks, when say we load text files into databases.

Is it possible to have it in a way so that if a certain record (line in the text file) fails to load due to watever reason, it gets written to another table, but the rest of the records still get loaded?

I try to do so and end up with the whole data flow task failing and it stalls at the record that had the error and doesn't seem to continue forward.

I just used the red arrow (on failure) and put that to another SQL destination object. But yeah that didnt work.

If someone has a better way of doing so, would be awesome if you can share that.

Cheers

You have to set the 'Access mode' property of 'OleDB DEstination' OpenRowset.It should not be fastload.Then set ErrorOuput as Redirect Row. Now you can get your erroneous rows redirected.

This should do.

|||yup...i didnt know about the accessmode property. that did it. thanks
|||Dear all,

I face the same problem but I didn't find the 'Access mode' property of 'OleDB DEstination' that contain OpenRowset its only contain Tabel and View with Fastload. Where I can find that property ?
And what if the error occured in Flat File Source ? can we still using the same solution ? Thanks in advance.

Best Regards,

Hery|||Dear all,

Please help me on this.

Thanks in advance.

Thursday, March 8, 2012

Dataflow task -> Error Handling

Hi,
In terms of data flow tasks, when say we load text files into databases.

Is it possible to have it in a way so that if a certain record (line in the text file) fails to load due to watever reason, it gets written to another table, but the rest of the records still get loaded?

I try to do so and end up with the whole data flow task failing and it stalls at the record that had the error and doesn't seem to continue forward.

I just used the red arrow (on failure) and put that to another SQL destination object. But yeah that didnt work.

If someone has a better way of doing so, would be awesome if you can share that.

Cheers

You have to set the 'Access mode' property of 'OleDB DEstination' OpenRowset.It should not be fastload.Then set ErrorOuput as Redirect Row. Now you can get your erroneous rows redirected.

This should do.

|||yup...i didnt know about the accessmode property. that did it. thanks
|||Dear all,

I face the same problem but I didn't find the 'Access mode' property of 'OleDB DEstination' that contain OpenRowset its only contain Tabel and View with Fastload. Where I can find that property ?
And what if the error occured in Flat File Source ? can we still using the same solution ? Thanks in advance.

Best Regards,

Hery|||Dear all,

Please help me on this.

Thanks in advance.

DataFlow Task & Filters

Hi,

I am getting data from an external source. External data has a column called "Type". I have a variable in my package which contains the list of types as shown below:

Filtered_type_List = 2,4,8,10,11

If this variable(Filtered_type_List) is blank, then I need all the data from the external source and if it is not blank then I only need the records matching to his list. How can I implement this in DataFlow Task?

Thanks

You could do this in an expression. Something like:

"SELECT * FROM MyTable " + (LEN(MySSISVariable) != 0 ? "WHERE MyColumn IN (" + MySSISVariable + ")" : "" )

That expression will (I think) add a WHERE clause if the length of the string inside the variable (which I have called MySSISVariable) is not zero.

HTH

-Jamie

|||

Hi Jamie,

Where should I put this "Select" statement,

1. Source using SQL Command as variable using OLE DB Source or

2. Lookup transformation

Thanks

|||

OLE DB Source. Set it to 'SQL Command from variable' and paste the expression that I provided above into the variable expression. The variable will require EvaluateAsExpression=TRUE.

-Jamie

Dataflow Tab:There is no ODBC Source option in the Toolbox

I need to extract data from tables in a database that I can only access via ODBC.

I have successfully created a connection in Connection Manager (ConnectionManagerType = ODBC) for this database.

However I’m unable to add this connection as a Data Flow Source. There is no ODBC Source option in the Toolbox.

This is a major because we have been using the system dsn Microsoft Visual Foxpro Driver to access free table directory .dbf files under ODBC with DTS for years. To install a new Microsoft OLEDB driver for foxpro is out of the question on a production system as it would cost many thousands of dollars to go through our BAT testing process

How do I extract data from tables in a database via ODBC?

Thanks in advance

Dave

There is too, though it's not marked as such. Use the Data Reader Source.|||

You can use the script component as source.

|||

Thanks for that the data reader souce can actually use a ODBC source. I have it working fine.

I do think there is a peformance overhead and its slower than the ODBC connector in DTS.

|||

well that is interesting How can it connect to a foxpro file .dbf and matching .fpt file the .fpt files are used for memo text fieldsthe foxprpro ODBC driver does this for you behind the scenes

|||

Vijay Thirugnanam wrote:

You can use the script component as source.

True, though it shouldn't be faster than using the prepackaged source connectors.

DataFlow suggestion

I am transfering data from a textfile to sql server.I use a data flow task for trasfering my text files.

Here is what i do.

1.Add text file source

What i want to achieve here is if the text file countains the column name in the first row i should delete them and if it does not contain column name in the first row just transfer it.

how can this be achieved?

2.add one more column to my text file which should contain the status(insert or update).

how can this be done?

3.before transfering data ot destination i want to know if the record exists if exists i just want to update it instead of insert.and if new record i want to insert it .and the status in the above new column need to change.

please help...

Not sure without looking at your data, but look into using the conditional split transformation to perform your logic checks. Use a derived column transformation to add a column.|||

sureshv wrote:

1.Add text file source

What i want to achieve here is if the text file countains the column name in the first row i should delete them and if it does not contain column name in the first row just transfer it.

how can this be achieved?

There is a property on the flat file connection manager that you can configure for that (ColumnsNamesInFirstDatarow).

sureshv wrote:

2.add one more column to my text file which should contain the status(insert or update).

how can this be done?

Use a derived column to add extra columns to data flow pipeline

sureshv wrote:

3.before transfering data ot destination i want to know if the record exists if exists i just want to update it instead of insert.and if new record i want to insert it .and the status in the above new column need to change.

There are tons of threads discussing that issue; the most popular is to use Lookup transform against the destination table; if a match occurs means the row already exists; otherwise it does not.

|||

In the first case.

Data is just a text file with column names in the first row.

eg:

name address telno

xxx yyy zzz

xxx sdfdsf sdfsf

But i dont want the column names coming from text file. instead i want to have my own column names(which i can and know to create)

If i transter data as in the text file the column names would appear in the first row.just to get rid of it i want to remove column name from text file.

In second case:

I use derived column to add a column,and conditional split for bad data.

My point is in the new column how should i insert the status(insert) or )update.

what should i use.

|||

There is a property on the flat file connection manager that you can configure for that (ColumnsNamesInFirstDatarow).

I dont want to use this i can have and want to have my own column names.WHich can be done and i know about it.

please look at my reply earlier

|||

sureshv wrote:

My point is in the new column how should i insert the status(insert) or )update.

see if this helps:

http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx

you could add a derived column transform at after each lookup transform and add there a column with the expected value(insert/update)...

Dataflow slows dramatically down after passing 1.6 million records ?

If i have a dataflow witch moves records from one OLE DB Source to a OLE DB Destination, the step slow down when getting above passing 1.5 - 1.6 million records.The Source is a SQL 2000 and the Destination is SQL Server 2005. the Package is run from a laptop with 1.5 GB ram

What can i try to help it ?Check the perf counters to see if the package starts swapping when the slowdown occurs. If this is the case then there is not much to be done. You could try adding more memory or making the destination run faster (i.e. drop indicies, faster spindle, etc). Is the destination on the same PC as the package is running? If so then they are competing for memory so you could run the package on a different PC. Also, the next CTP has a minor change that you may find helps with this.

Thanks,|||Start by checking in Task Manager how much memory the package (i.e. dtexec.exe) uses - does it grow all the time?
And when it slows down, are you out of memory + swapping starts + CPU usage goes down?
If this is the case - do you expect memory usage to grow like this, given your data flow? Which component could be gobbling up the memory? Can you change its settings to improve the situation? Or change the data flow to not use the troublesome component?
Or is there a memory leak that creates a problem?
If the bottleneck isn't memory at all, you obviously have to look in other places...

Cheers,
KDog|||

Matt David wrote:

Check the perf counters to see if the package starts swapping when the slowdown occurs.

How do you check the perf counters?
Regards,
Henk|||Use the Windows Performance Monitor (Administrative Tools-->Performance). In there, there is an IS object containing various counters that you can use to monitor performance.

There is a help file provided that explains how to use Windows Performance Monitor.

-Jamie

Dataflow error in lookup task : "Object was open.".

I Can't reproduce the error if I run the package stand-alone.

I'm using the same lookup call (same table, etc.) in 2 packages that are running in parallel (called by a parent package).

[LKP_UnderwriterId [72283]] Error: An OLE DB error has occurred. Error code: 0x80040E05. An OLE DB record is available. Source: "Microsoft OLE DB Provider for SQL Server" Hresult: 0x80040E05 Description: "Object was open.".

Anyone seen this one?

Googling threw this up: http://www.sharepointu.com/forums/m_23343/tm.htm

Could it be that something isn't installed properly on the machine on which you are running the package stand-alone (N.B. Can you clarify what you mean by "stand-alone"?)

-Jamie

|||If the package is run serially (so no other packages are running using a lookup against the same table) or run in the debugger, then no error occurs.

dataflow error

This error seem to be very silly.did anyone come across this error.

I have been transferring data from textfile to a table using oledb destination.

The number of records in the text file are 2,091,650

Its was running just fine couple of days ago when the incoming data was little small then this...(arround 300,000).Now it seem to have a problem.

Here is the flow

1.File System task ->I copy the file to different location

2.Execute sql task->truncate tables

3.DataFlow task->I check for only the error files in this data flow.and all valid rows i transfer to a different text file.

4.Dataflow->filesource i connect to new text file created earlier.Here i convert fields to repective datatype and i insert if new or update record.

I dont know whats going on...

When i run my package it runs through the first three perfectly fine.When it comes to fourth step it sits there.....it dosent go to the tasks within this dataflow at all...and begining i have flat files source...

What could be the reason...

when i look at progress tab...i was able to look at the progress of other tasks but when it comes to this task it shows start>>>>>time and it sits there...

Are you using the same file connection manager for the "new" file created in step 3 and used in step 4? If so, try creating a second file connection manager for the file and use that connection manager in step 4.|||Do you have step three hooked up to step four via a precedence constraint?|||

I checked it out using a same file connection and different connection manager as well...but it didnt work...seems to really silly problem....

And all my flows are perfectly connected with precedence constraint as well.

Then only change which was made to this package was increase in volumn of data.

How can i fix this?Is there any other solution?

I also checked executing the last dataflow task alone as the file had data in it.

But it sits there and dosent proceed.

|||

Suresh,

could you troubleshoot this by making a simple package to only process your flat file? If it fails, try cutting the file in pieces to find out if it has some formatting errors.

HTH.

DataFlow EngineThreads Error

Hi,
I have a SSIS package which is scheduled to run via SQL Agent. I have a data flow task within the SSIS Package which has a single source adapter and single destination adapter & lookup transformations.

The # of engine threads for the task is 5 (default). The max concurrent executables on the package is 20. The packge contains a whole lot of execute sql tasks etc.

The package after few (3-4) successful runs started to give following error for the data flow:

"The Data Flow task engine failed at startup because it cannot create one or more required threads"
This could be seen from the log files.
OR
"Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state."

Even if concurrent exectuables are set to -1 it is same.

Now the Package keeps on failing. What is the cause of this problem. I have not run, but I am sure once run through BIDS it would work fine.

Any hints on the same? This has put us in a trouble spot.

Server Config:
4 - WAY, 16 GB RAM, enough disk!

Thanks,
Gaurav


Hi Gaurav,

I am also facing this same problem, after few sucessful run the package is returning above error now.

that is :

[DTS.Pipeline] Error: The Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state.

Any solution you have got? Please share it with me.

Thanks a lot.

Swarna,

|||

What service pack level are you running?

|||

Hi,

Thanks for your reply, I am running with SP2. Here the problem is i can not run the package at all, even in BIDS i am getting the above error.

Thanks.

Regards,

Swarna.

|||

Has anybody got the solution for this?

I have a package which was running perfectly and suddenly it blows up with this memory error?

Please can anybody share the solution?

Thanks,

Prash.

|||

Well My pakage is having too many lookups.

When I incresed the cache memory from 5mb(default) to 30mb it stared running again..

so I think it can be one of the solutions Smile

prash

|||

Which cache you are refering? any package setting?

Swarna.

|||I believe he is referring to the properties on the Advanced tab of the lookup - under Enable Memory Restriction.

|||yes u are rite!!|||

For my case, i am not using so many look ups, but still i am facing the same problem. Any idea?

Thanks,

Swarna.

|||in fact my package was running well when there were about million resords fetched in lookup. but as the records incresed to 2.5 millions package suddenly it stopped and i was kicked out. i saw progress of package . actually debugger was showing that all stages are fetchin number of records but still my data flow was showing yellow color with the message as package finished execution. I was really confused. but some how I incresed the cache memory of every loookup in dataflow and it stared executing normally....|||

Swarna wrote:

For my case, i am not using so many look ups, but still i am facing the same problem. Any idea?

Thanks,

Swarna.

For the lookups that you do have, is memory restriction enabled?

|||swarna can u check debugger and see wether its showing any info about records being fetched in various stages?|||

Hope I'm not too late in giving my feedback here. The issue that we have faced here is where there are not too many lookups object, but it keeps giving the same error message.

What I have tested was to optimize the usage of the Lookup object by not selecting the "Use a table or a view:" selection. Instead, I chose the "Use Results of an SQL Query:" selection. I had only 3 lookup objects, 2 of which are looking up to a table with less than 10000 records and less than 8 columns in average. But the other table that we did the lookup has about 2 million worth of records, with about 10 - 12 columns.

I did the changes for all of the lookup object and some how everything was working again. I think it would be the same issue with the memory restrictions as well.

DataFlow EngineThreads Error

Hi,
I have a SSIS package which is scheduled to run via SQL Agent. I have a data flow task within the SSIS Package which has a single source adapter and single destination adapter & lookup transformations.

The # of engine threads for the task is 5 (default). The max concurrent executables on the package is 20. The packge contains a whole lot of execute sql tasks etc.

The package after few (3-4) successful runs started to give following error for the data flow:

"The Data Flow task engine failed at startup because it cannot create one or more required threads"
This could be seen from the log files.
OR
"Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state."

Even if concurrent exectuables are set to -1 it is same.

Now the Package keeps on failing. What is the cause of this problem. I have not run, but I am sure once run through BIDS it would work fine.

Any hints on the same? This has put us in a trouble spot.

Server Config:
4 - WAY, 16 GB RAM, enough disk!

Thanks,
Gaurav


Hi Gaurav,

I am also facing this same problem, after few sucessful run the package is returning above error now.

that is :

[DTS.Pipeline] Error: The Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state.

Any solution you have got? Please share it with me.

Thanks a lot.

Swarna,

|||

What service pack level are you running?

|||

Hi,

Thanks for your reply, I am running with SP2. Here the problem is i can not run the package at all, even in BIDS i am getting the above error.

Thanks.

Regards,

Swarna.

|||

Has anybody got the solution for this?

I have a package which was running perfectly and suddenly it blows up with this memory error?

Please can anybody share the solution?

Thanks,

Prash.

|||

Well My pakage is having too many lookups.

When I incresed the cache memory from 5mb(default) to 30mb it stared running again..

so I think it can be one of the solutions Smile

prash

|||

Which cache you are refering? any package setting?

Swarna.

|||I believe he is referring to the properties on the Advanced tab of the lookup - under Enable Memory Restriction.|||yes u are rite!!|||

For my case, i am not using so many look ups, but still i am facing the same problem. Any idea?

Thanks,

Swarna.

|||in fact my package was running well when there were about million resords fetched in lookup. but as the records incresed to 2.5 millions package suddenly it stopped and i was kicked out. i saw progress of package . actually debugger was showing that all stages are fetchin number of records but still my data flow was showing yellow color with the message as package finished execution. I was really confused. but some how I incresed the cache memory of every loookup in dataflow and it stared executing normally....|||

Swarna wrote:

For my case, i am not using so many look ups, but still i am facing the same problem. Any idea?

Thanks,

Swarna.

For the lookups that you do have, is memory restriction enabled?

|||swarna can u check debugger and see wether its showing any info about records being fetched in various stages?|||

Hope I'm not too late in giving my feedback here. The issue that we have faced here is where there are not too many lookups object, but it keeps giving the same error message.

What I have tested was to optimize the usage of the Lookup object by not selecting the "Use a table or a view:" selection. Instead, I chose the "Use Results of an SQL Query:" selection. I had only 3 lookup objects, 2 of which are looking up to a table with less than 10000 records and less than 8 columns in average. But the other table that we did the lookup has about 2 million worth of records, with about 10 - 12 columns.

I did the changes for all of the lookup object and some how everything was working again. I think it would be the same issue with the memory restrictions as well.

DataFlow EngineThreads Error

Hi,
I have a SSIS package which is scheduled to run via SQL Agent. I have a data flow task within the SSIS Package which has a single source adapter and single destination adapter & lookup transformations.

The # of engine threads for the task is 5 (default). The max concurrent executables on the package is 20. The packge contains a whole lot of execute sql tasks etc.

The package after few (3-4) successful runs started to give following error for the data flow:

"The Data Flow task engine failed at startup because it cannot create one or more required threads"
This could be seen from the log files.
OR
"Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state."

Even if concurrent exectuables are set to -1 it is same.

Now the Package keeps on failing. What is the cause of this problem. I have not run, but I am sure once run through BIDS it would work fine.

Any hints on the same? This has put us in a trouble spot.

Server Config:
4 - WAY, 16 GB RAM, enough disk!

Thanks,
Gaurav


Hi Gaurav,

I am also facing this same problem, after few sucessful run the package is returning above error now.

that is :

[DTS.Pipeline] Error: The Data Flow task failed to create a required thread and cannot begin running. The usually occurs when there is an out-of-memory state.

Any solution you have got? Please share it with me.

Thanks a lot.

Swarna,

|||

What service pack level are you running?

|||

Hi,

Thanks for your reply, I am running with SP2. Here the problem is i can not run the package at all, even in BIDS i am getting the above error.

Thanks.

Regards,

Swarna.

|||

Has anybody got the solution for this?

I have a package which was running perfectly and suddenly it blows up with this memory error?

Please can anybody share the solution?

Thanks,

Prash.

|||

Well My pakage is having too many lookups.

When I incresed the cache memory from 5mb(default) to 30mb it stared running again..

so I think it can be one of the solutions Smile

prash

|||

Which cache you are refering? any package setting?

Swarna.

|||I believe he is referring to the properties on the Advanced tab of the lookup - under Enable Memory Restriction.

|||yes u are rite!!|||

For my case, i am not using so many look ups, but still i am facing the same problem. Any idea?

Thanks,

Swarna.

|||in fact my package was running well when there were about million resords fetched in lookup. but as the records incresed to 2.5 millions package suddenly it stopped and i was kicked out. i saw progress of package . actually debugger was showing that all stages are fetchin number of records but still my data flow was showing yellow color with the message as package finished execution. I was really confused. but some how I incresed the cache memory of every loookup in dataflow and it stared executing normally....|||

Swarna wrote:

For my case, i am not using so many look ups, but still i am facing the same problem. Any idea?

Thanks,

Swarna.

For the lookups that you do have, is memory restriction enabled?

|||swarna can u check debugger and see wether its showing any info about records being fetched in various stages?