'How to calculate space for number of records

I am trying to calculate space required by a dataset using below formula, but I am getting wrong somewhere when I cross check it with the existing dataset in the system. Please help me

1st Dataset: Record format . . . : VB
Record length . . . : 445
Block size . . . . : 32760 Number of records....: 51560

Using below formula to calculate
optimal block length (OBL) = 32760/record length = 32760/449 = 73
As there are two blocks on the track, hence (TOBL) = 2 * OBL = 73*2 = 146

Find number of physical records (PR) = Number of records/TOBL = 51560/146 = 354

Number of tracks = PR/2 = 354/2 = 177

But I can below in the dataset information

 Current Allocation            
  Allocated tracks  . : 100    
  Allocated extents . : 1      
                               
 Current Utilization           
  Used tracks . . . . : 100    
  Used extents  . . . : 1   

2nd Dataset : Record format . . . : VB
Record length . . . : 445
Block size . . . . : 27998
Number of Records....: 127,252

Using below formula to calculate
optimal block length (OBL) = 27998/record length = 27998/449 = 63
As there are two blocks on the track, hence (TOBL) = 2 * OBL = 63*2 = 126

Find number of physical records (PR) = Number of records/TOBL = 127252/126 = 1010

Number of tracks = PR/2 = 1010/2 = 505

Number of Cylinders = 505/15 = 34

But I can below in the dataset information

 Current Allocation         
  Allocated cylinders : 69  
  Allocated extents . : 1   
                            
 Current Utilization        
  Used cylinders  . . : 69  
  Used extents  . . . : 1   


Solution 1:[1]

A few observations on your approach.

First, since your dealing with records that are variable length it would be helpful to know the "average" record length as that would help to formulate a more accurate prediction of storage. Your approach assumes a worst case scenario of all records being at maximum which is fine for planning purposes but in reality you'll likely see the actual allocation would be lower if the average of the record lengths is lower than the maximum.

The approach you are taking is reasonable but consider that you can inform z/OS of the space requirements in blocks, records, DASD geometry or let DFSMS perform the calculation on your behalf. Refer to this article to get some additional information on options.

Back to your calculations:

You Optimum Block Length (OBL) is really a records per block (RPB) number. Block size divided maximum record length yields the number of records at full length that can be stored in the block. If your average record length is less then you can store more records per block.

The assumption of two blocks per track may be true for your situation but it depends on the actual device type that will be used for the underlying allocation. Here is a link to some of the geometries for supported DASD devices and their geometries.

enter image description here

Your assumption of two blocks per track depends on the device is not correct for 3390's as you would need 64k for two blocks on a track but as you can see the 3390's max out at 56k so you would only get one block per track on the device.

Also, it looks like you did factor in the RDW by adding 4 bytes but someone looking at the question might be confused if they are not familiar with V records on z/OS.In the case of your calculation that would be 61 records per block at 27998 (which is the "optimal block length" so two blocks can fit comfortable on a track).

I'll use the following values:

  • MaximumRecordLength = RecordLength + 4 for RDW

  • TotalRecords = Total Records at Maximum Length (worst case)

  • BlockSize = modeled blocksize

  • RecordsPerBlock = number of records that can fit in a block (worst case)

  • BlocksNeeded = number of blocks needed to contain estimated records (worst case)

  • BlocksPerTrack = from IBM device geometry information

  • TracksNeeded = TotalRecords / RecordsPerBlock / BlocksPerTrack Cylinders = Device Tracks per cylinder (15 for most devices)

Example 1:

  Total Records = 51,560
  BlockSize = 32,760
  BlocksPerTrack = 1 (from device table)
  RecordsPerBlock: 32,760 / 449 = 72.96 (72)
  Total Blocks = 51,560 / 72 = 716.11 (717)
  Total Tracks = 717 * 1 = 717
  Cylinders = 717 / 15 = 47.8 (48)

Example 2:

  Total Records = 127,252
  BlockSize = 27,998
  BlocksPerTrack = 2 (from device table)
  RecordsPerBlock: 27,998 / 449 = 62.35 (62)
  Total Blocks = 127,252 / 62 = 2052.45 (2,053)
  Total Tracks = 2,053 / 2 = 1,026.5 (1,027)
  Cylinders = 1027 / 15 = 68.5 (69)

Now, as to the actual allocation. It depends on how you allocated the space, the size of the records. Assuming it was in JCL you could use the RLSE subparameter of the SPACE= to release space when the is created and closed. This should release unused resources.

Given that the records are Variable the estimates are worst case and you would need to know more about the average record lengths to understand the actual allocation in terms of actual space used.

Final thought, all of the work you're doing can be overridden by your storage administrator through ACS routines. I believe that most people today would specify a BLKSIZE=0 and let DFSMS do all of the hard work because that component has more information about where a file will go, what the underlying devices are and the most efficient way of doing the allocation. The days of disk geometry and allocation are more of a campfire story unless your environment has not been administered to do these things for you.

Solution 2:[2]

Instead of trying to calculate tracks or cylinders, go for MBs, or KBs. z/OS (DFSMS) will calculate for you, how many tracks or cylinders are needed.

In JCL it is not straight forward but also not too complicated, once you got it.

There is a DD statement parameter called AVGREC=, which is the trigger. Let me do an example for your first case above:

//anydd   DD DISP=(NEW,CATLG),
//           DSN=your.new.data.set.name,
//           REFCM=VB,LRECL=445,
//           SPACE=(445,(51560,1000)),AVGREC=U
//*                  |     |     |           |
//*                  V     V     V           V
//*                 (1)   (2)   (3)         (4)

Parameter AVGREC=U (4) tells the system three things:

  • Firstly, the first subparameter in SPACE= (1) shall be interpreted as an average record length. (Note that this value is completely independend of the value specified in LRECL=.)
  • Secondly, it tells the system, that the second (2), and third (3) SPACE= subparameter are the number of records of average length (1) that the data set shall be able to store.
  • Thirdly, it tells the system that numbers (2), and (3) are in records (AVGREC=U). Alternatives are thousands (AVGREC=M), and millions (AVGREC=M).

So, this DD statement will allocate enough space to hold the estimated number of records. You don't have to care for track capacity, block capacity, device geometry, etc.

Given the number of records you expect and the (average) record length, you can easily calculate the number of kilobytes or megabytes you need. Unfortunately, you cannot directly specify KB, or MB in JCL, but there is a way using AVGREC= as follows.

Your first data set will get 51560 records of (maximum) length 445, i.e. 22'944'200 bytes, or ~22'945 KB, or ~23 MB. The JCL for an allocation in KB looks like this:

//anydd   DD DISP=(NEW,CATLG),
//           DSN=your.new.data.set.name,
//           REFCM=VB,LRECL=445,
//           SPACE=(1,(22945,10000)),AVGREC=K
//*                 |    |     |            |
//*                 V    V     V            V
//*                (1)  (2)   (3)          (4)

You want the system to allocate primary space for 22945 (2) thousands (4) records of length 1 byte (1), which is 22945 KB, and secondary space for 10'000 (3) thousands (4) records of length 1 byte (1), i.e. 10'000 KB.

Now the same alloation specifying MB:

//anydd   DD DISP=(NEW,CATLG),
//           DSN=your.new.data.set.name,
//           REFCM=VB,LRECL=445,
//           SPACE=(1,(23,10)),AVGREC=M
//*                 |   |  |          |
//*                 V   V  V          V
//*                (1) (2)(3)        (4)

You want the system to allocate primary space for 23 (2) millions (4) records of length 1 byte (1), which is 23 MB, and secondary space for 10 (3) millions (4) records of length 1 byte (1), i.e. 10 MB.

I rarely use anything other than the latter.

In ISPF, it is even easier: Data Set Allocation (3.2) allows KB, and MB as space units (amongst all the old ones).

Solution 3:[3]

A useful and usually simpler alternative to using SPACE and AVGREC etc is to simply use a DATACLAS for space if your site has appropriate sized ones defined. If you look at ISMF Option 4 you can list available DATACLAS's and see what space values etc they provide. You'd expect to see a number of ranges in size, and some with or without Extended Format and/or Compression. Even if a DATACLAS overallocates a bit then it is likely the overallocated space will be released by the MGMTCLAS assigned to the dataset at close or during space management. And you do have an option to code DATACLAS AND SPACE in which case any coded space (or other) value will override the DATACLAS, which helps with exceptions. It still depends how your Storage Admin's have coded the ACS routines but generally Users are allowed to specify a DATACLAS and it will be honored by the ACS routines.

For basic dataset size calculation I just use LRECL times the expected Max Record Count divided by 1000 a couple of times to get a rough MB figure. Obviously variable records/blks add 4bytes each for RDW and/or BDW but unless the number of records is massive or DASD is extremely tight for space wise it shouldn't be significant enough to matter.
e.g. =(51560*445)/1000/1000 shows as ~23MB

Also, don't expect your allocation to be exactly what you requested because the minimum allocation on Z/OS is 1 track or ~56k. The BLKSIZE also comes into effect by adding interblock gaps of ~32bytes per block. With SDB (system Determined Blocksize) invoked by omitting BLKSIZE or coding BLKSIZE=0, it will always try to provide half track blocking as close to 28k as possible so two blocks per track which is the most space efficient. That does matter, a BLKSIZE of 80bytes wastes ~80% of a track with interblock gaps. The BLKSIZE is also the unit of transfer when doing read/write to disk so generally the larger the better with some exceptions such as KSDS's being randomly access by key for example which might result in more data transfer than desired in an OLTP transaction.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 phunsoft
Solution 3