[Tut] How to Remove Duplicates From a Python List of Lists? - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] How to Remove Duplicates From a Python List of Lists? (/thread-94889.html) |
[Tut] How to Remove Duplicates From a Python List of Lists? - xSicKxBot - 05-05-2020 How to Remove Duplicates From a Python List of Lists? <div><figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio"> <div class="wp-block-embed__wrapper"> <div class="ast-oembed-container"><iframe title="How to Remove Duplicates From a Python List of Lists?" width="1400" height="788" src="https://www.youtube.com/embed/jzuXwIVRT1s?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div> </div> </figure> <p><em><strong>What’s the best way to remove duplicates from a Python list of lists?</strong></em> This is a popular coding interview question at Google, Facebook, and Amazon. In this article, I’ll show you how (and why) it works—so keep reading!</p> <p> <iframe src="https://trinket.io/embed/python/4c400fed29" width="100%" height="356" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe> </p> <p><strong><em>How to remove all duplicates of a given value in the list?</em></strong></p> <h2>Method 1: Naive Method</h2> <p><strong>Algorithm</strong>: Go over each element and check whether this element already exists in the list. If so, remove it. The problem is that this method has quadratic <a rel="noreferrer noopener" href="https://en.wikipedia.org/wiki/Time_complexity" target="_blank">time complexity</a> because you need to check for each element if it exists in the list (which is <code>n * O(n)</code> for <code>n</code> elements). </p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">lst = [[1, 1], [0, 1], [0, 1], [1, 1]] dup_free = [] for x in lst: if x not in dup_free: dup_free.append(x) print(dup_free) # [[1, 1], [0, 1]]</pre> <h2>Method 2: Temporary Dictionary Conversion</h2> <p><strong>Algorithm: </strong>A more efficient way in terms of time complexity is to create a dictionary out of the elements in the list to remove all duplicates and convert the dictionary back to a list. This preserves the order of the original list elements.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">lst = [[1, 1], [0, 1], [0, 1], [1, 1]] # 1. Convert into list of tuples tpls = [tuple(x) for x in lst] # 2. Create dictionary with empty values and # 3. convert back to a list (dups removed) dct = list(dict.fromkeys(tpls)) # 4. Convert list of tuples to list of lists dup_free = [list(x) for x in lst] # Print everything print(dup_free) # [[1, 1], [0, 1], [0, 1], [1, 1]] </pre> <p>All of the following four sub methods are linear-runtime operations. Therefore, the algorithm has linear runtime complexity and is more efficient than the naive approach (method 1). </p> <ol> <li>Convert into a list of tuples using <a href="https://blog.finxter.com/list-comprehension/" target="_blank" rel="noreferrer noopener">list comprehension</a> <code>[tuple(x) for x in lst]</code>. Tuples are hashable and can be used as dictionary keys—while lists can not!</li> <li>Convert the list of tuples to a dictionary with <code>dict.fromkeys(tpls)</code> to map tuples to dummy values. Each dictionary key can exist only once so duplicates are removed at this point.</li> <li>Convert the dictionary into a list of tuples with <code>list(...)</code>.</li> <li>Convert the list of tuples into a list of lists using list comprehension <code>[list(x) for x in lst]</code>. </li> </ol> <figure class="wp-block-image size-large"><img src="https://blog.finxter.com/wp-content/uploads/2020/05/removeDupsPython_ListofLists-1024x576.jpg" alt="" class="wp-image-8222" srcset="https://blog.finxter.com/wp-content/uploads/2020/05/removeDupsPython_ListofLists-scaled.jpg 1024w, https://blog.finxter.com/wp-content/uploads/2020/05/removeDupsPython_ListofLists-300x169.jpg 300w, https://blog.finxter.com/wp-content/uploads/2020/05/removeDupsPython_ListofLists-768x432.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <p>Each list element (= a list) becomes a tuple which becomes a new key to the dictionary. For example, the list <code>[[1, 1], [0, 1], [0, 1]]</code> becomes the list <code>[(1, 1), (0, 1), (0, 1)]</code> the dictionary <code>{(1, 1):None, <code>(0, 1)</code>:None}</code>. All elements that occur multiple times will be assigned to the same key. Thus, the dictionary contains only unique keys—there cannot be multiple equal keys.</p> <p>As dictionary values, you take dummy values (per default).</p> <p>Then, you convert the dictionary back to a list of lists, throwing away the dummy values. </p> <p><strong>Related blog articles:</strong></p> <ul> <li><a rel="noreferrer noopener" href="https://blog.finxter.com/python-list-remove/" target="_blank">Python List Remove</a></li> <li><a rel="noreferrer noopener" href="https://blog.finxter.com/python-dictionary/" target="_blank">The Ultimate Guide to Python Dictionaries!</a></li> <li><a rel="noreferrer noopener" href="https://blog.finxter.com/how-to-remove-duplicates-from-a-python-list/" target="_blank">Remove Duplicates From Python Lists</a></li> <li><a href="https://blog.finxter.com/python-list-of-lists/" target="_blank" rel="noreferrer noopener">Python List of Lists</a></li> </ul> <h3>Do Python Dictionaries Preserve the Ordering of the Keys?</h3> <p><strong>Surprisingly, <em>the dictionary keys in Python preserve the order of the elements</em>. So, yes, the order of the elements is preserved. <a rel="noreferrer noopener" href="https://morepypy.blogspot.com/2015/01/faster-more-memory-efficient-and-more.html" target="_blank">(source)</a> </strong></p> <p>This is surprising to many readers because countless online resources like <a rel="noreferrer noopener" href="https://stackoverflow.com/questions/5629023/the-order-of-keys-in-dictionaries" target="_blank">this one</a> argue that the order of dictionary keys is not preserved. They assume that the underlying implementation of the dictionary key iterables uses sets—and sets are well-known to be agnostic to the ordering of elements. But this assumption is wrong. The built-in Python dictionary implementation in cPython preserves the order.</p> <p>Here’s an example, feel free to create your own examples and tests to check if the ordering is preserved.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">lst = ['Alice', 'Bob', 'Bob', 1, 1, 1, 2, 3, 3] dic = dict.fromkeys(lst) print(dic) # {'Alice': None, 'Bob': None, 1: None, 2: None, 3: None}</pre> <p>You see that the order of elements is preserved so when converting it back, the original ordering of the list elements is still preserved:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">print(list(dic)) # ['Alice', 'Bob', 1, 2, 3]</pre> <p>However, you cannot rely on it because any Python implementation could, theoretically, decide not to preserve the order (notice the “COULD” here is 100% theoretical and does not apply to the default cPython implementation). </p> <p>If you need to be certain that the order is preserved, you can use the <a rel="noreferrer noopener" href="https://docs.python.org/3/library/collections.html#collections.OrderedDict" target="_blank">ordered dictionary library</a>. In cPython, this is just a wrapper for the default dict implementation.</p> <h2>Method 3: Set Conversion</h2> <p>Given a list of lists, the goal is to remove all elements that exist more than once in the list. </p> <p><a rel="noreferrer noopener" href="https://blog.finxter.com/sets-in-python/" target="_blank">Sets in Python</a> allow only a single instance of an element. So by converting the list to a set, all duplicates are removed. In contrast to the naive approach (checking all pairs of elements if they are duplicates) that has quadratic time complexity, this method has linear runtime complexity. Why? Because the runtime complexity of creating a set is linear in the number of set elements. Now, you convert the set back to a list, and voilĂ , the duplicates are removed.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">lst = list(range(10)) + list(range(10)) lst = list(set(lst)) print(lst) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # Does this also work for tuples? Yes! lst = [(10,5), (10,5), (5,10), (3,2), (3, 4)] lst = list(set(lst)) print(lst) # [(3, 4), (10, 5), (5, 10), (3, 2)]</pre> <p>However, converting a list to a set doesn’t guarantee to preserve the order of the list elements. The set loses all ordering information. Also, you cannot create a set of lists because lists are non-hashable data types:</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">>>> set([[1,2], [1,1]]) Traceback (most recent call last): File "<pyshell#0>", line 1, in <module> set([[1,2], [1,1]]) TypeError: unhashable type: 'list'</pre> <p>But we can find a simple workaround to both problems as you’ll see in the following method.</p> <h3>Linear-Runtime Method with Set to Remove Duplicates From a List of Lists</h3> <p>This third approach uses a set to check if the element is already in the duplicate-free list. As checking membership on sets is much faster than checking membership on lists, this method has linear runtime complexity as well (membership has <a rel="noreferrer noopener" href="https://blog.finxter.com/sets-in-python/" target="_blank">constant runtime complexity</a>).</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">lst = [[1, 1], [0, 1], [0, 1], [1, 1]] dup_free = [] dup_free_set = set() for x in lst: if tuple(x) not in dup_free_set: dup_free.append(x) dup_free_set.add(tuple(x)) print(dup_free) # [[1, 1], [0, 1]]</pre> <p>This approach of removing duplicates from a list while maintaining the order of the elements has linear runtime complexity as well. And it works for all programming languages without you having to know implementation details about the dictionary in Python. But, on the other hand, it’s a bit more complicated.</p> <h2>Where to Go From Here?</h2> <p>Enough theory, let’s get some practice!</p> <p>To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?</p> <p><strong>Practice projects is how you sharpen your saw in coding!</strong></p> <p>Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?</p> <p>Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.</p> <p>Join my free webinar <a rel="noreferrer noopener" href="https://blog.finxter.com/webinar-freelancer/" target="_blank">“How to Build Your High-Income Skill Python”</a> and watch how I grew my coding business online and how you can, too—from the comfort of your own home.</p> <p><a href="https://blog.finxter.com/webinar-freelancer/" target="_blank" rel="noreferrer noopener">Join the free webinar now!</a></p> </div> https://www.sickgaming.net/blog/2020/05/05/how-to-remove-duplicates-from-a-python-list-of-lists/ |