Pattern counting

Hi,

I have this data

0 1 0 12 243 0 2 112 12 0 21 1 1 23 0 11 120 0 10

I would like to count the number of groups int his row, i.e. those number of groups which are separated by zeros. In this case the groups are:

1, 12 243, 2 112 12, 21 1 1 23, 11 120, 10

so there are 6 groups, which are sizes 1, 2, 3, 4, 2, 1. The size of a group is defined by the delimited space character.
So 12 243 is a group, with size 2, because there is one space character between them.

Finally I should get the number of groups and its sizes. Nevertheless I want to get the values of these groups, i.e. which are represented by itself.

For example:

the group 12 243 has size 2, and value 255 (12+243)
the group 21 1 1 23 has size 4, and value 46 (21+1+1+23).

Could you suggest me some code or piece of code to start with?

Comments

  • We shouldn't be writing your code (again), but here's a start. It should be fairly simple to add the last missing feature you requested.
    awk '{
      n = split($0, a, /(^| )0( |$)/);
    
      for (i=1; i<=n; i++)
        printf("%d: \"%s\" (%d)\n", i, a[i], split(a[i], x, " ") );
    }'
    
  • Ok, the first part is clear. But how can I get the values itself, I mean

    1 (1) [1]
    12 243 (2) [255]
    2 112 12 (3) [126]
    21 1 1 23 (4) [46]
    11 120 (2) [131]
    10 (1) [10]

    The column in [] brackets would be this value, i.e. the summed value of a group with certain size.
  • Hi,

    i have this record

    0 0 0 0 0 177 42 8 0 0 0

    How can I get the positions of the numbers different from 0, i.e. the position of 177 , 42 and 8 in the line?

    On the positions I mean the following (the actual number of colum):

    So, in 0 0 0 0 0 177 42 8 0 0 0

    177 is in the 6th position, while 42 is in the 7th and 8 is in the 8th.

    Input: 0 0 0 0 0 177 42 8 0 0 0
    Output: 6 177 7 42 8 8 or just (6,7,8)

    Thanks in advance!
  • Instead of asking others to do your work for you, why don't you start with what you have already tried yourself? Show what you have done and other can help point out any problems or mistakes that were made.
    I personally have helped you so many times, but have never seen one line of code written by you.
    So unless you start showing some effort, I personally am done helping you.
  • Ok, I post my code I have so far:
    #!/usr/bin/awk -f                                                                                                                                          
    {
        n1 = split($0, a, /(^| )0( |$)/);
        n2 = split($0, b, " ");
    
        l++; #printf("%d",l);                                                                                                                                 
    
        act_pos = 0;
        for (i = 1; i <= n1; i++)
        {
            len = length(a[i]);
            if (len > 0)
            {
                m = split(a[i], c, " ");
                sum = 0;
                for (j=1; j <= m; j++)
                {
                    sum += c[j];
                }
                ActualPosition();
                #printf("begin: %d\tend: %d\n",act_pos,act_pos+m-1);                                                                                           
                printf("%s size=%d sum=%d cntr=%.1f ", a[i], m, sum, (2*(act_pos-1)+m-1)/2);                                                                                  
                act_pos += m;
            }
        }
        printf("\n");
    }
    
    function ActualPosition()
    {
        is_zero = 1;
    
        while (act_pos < n2 && is_zero)
        {
            if (b[act_pos+1] != "0")
            {
                is_zero = 0;
    	}
            act_pos++;
        }
        #print act_pos;                                                                                                                                        
    }
    
    

    This gives me to correct positions, BUT instead of getting the simple positions, I would like to get the "center of mass" in some sense. To be more precise,

    in this 0 0 0 0 0 177 42 8 0 0 0

    the position of the center of the group is 6.0 (if I counts from 0). The group 177 42 8 has positions 5.0, 6.0, 7.0.
    So I should take the weighted average of the values in the group. In this case I should get the right position around 5.(something), not 6.0 or greater. Is it clear?

    Could you help me in this?
Sign In or Register to comment.