分享

Understanding Deep Learning: DNN, RNN, LSTM, CNN and R

 我的技术大杂烩 2023-10-13 发布于广东

Deep Learning for Public Safety
深度学习促进公共安全

It’s an unavoidable truth that violent crime and murder is increasing around the world at an alarming rate, like in America murder rate is increased by 17% higher than five years ago. Among the murders that were occurred, about 73% of US murders are committed with guns, a proportion of which has increased in recent years.¹ World leaders are trying to clamp-down this certain situation with the help of their law enforcement system. Despite their efforts, sometimes things get out of control due to the lack of action in no time. But in such cases, we the tech giants can make an approach to ensure public safety using Deep Learning.
世界各地的暴力犯罪和谋杀正在以惊人的速度增加,这是一个无法回避的事实,比如美国的谋杀率比五年前增加了17%。在已发生的谋杀案中,美国约 73% 的谋杀案是用枪支实施的,近年来这一比例有所增加。 世界各国领导人正试图在执法系统的帮助下遏制这种情况。尽管他们付出了努力,但有时由于缺乏行动,事情很快就会失控。但在这种情况下,我们科技巨头可以采取一种利用深度学习来确保公共安全的方法。

This can be demonstrated through a simple model where we are going to look at an active shooter and how an object detection system is going to identify a weapon, track the criminal and deploy a depth sensing localized drone to de-escalate with a pepper spray and then escalate using force by dropping down 3 feet to the group and deploying an electric shock weapon.
这可以通过一个简单的模型来证明,在这个模型中,我们将观察一个活跃的射手,以及物体检测系统如何识别武器、跟踪罪犯并部署深度感应本地无人机,以通过胡椒喷雾来降级,然后使用武力升级,从 3 英尺高处跳下,并使用电击武器。

This figure is showing how a simple model that is developed using deep learning can be used to ensure public safety.
该图显示了如何使用深度学习开发的简单模型来确保公共安全。

For attaining this model, we have to use Machine Learning. Questions may arise in your mind what is this Machine Learning and Deep Learning as most of the people just enjoy the benefits of technology but very few of them are aware or interested to know about the terms and how they work. Here we are going to give you a concise lucid idea about these terms.
为了实现这个模型,我们必须使用机器学习。您可能会问什么是机器学习和深度学习,因为大多数人只是享受技术的好处,但很少有人知道或有兴趣了解这些术语及其工作原理。在这里,我们将为您提供有关这些术语的简洁明晰的概念。

What is Machine Learning?
什么是机器学习?

Machine Learning is a subset of Artificial Intelligence and Deep Learning is an important part of its’ broader family which includes deep neural networks, deep belief networks, and recurrent neural networks.² Mainly, in Deep Learning there are three fundamental architectures of neural network that perform well on different types of data which are FFNN, RNN, and CNN.
机器学习是人工智能的一个子集,而深度学习是其更广泛的家族的重要组成部分,该家族包括深度神经网络、深度信念网络和循环神经网络。²主要是,在深度学习中,存在三种基本的神经网络架构:在 FFNN、RNN 和 CNN 等不同类型的数据上表现良好。

Deep Neural Networks (DNNs)
深度神经网络 (DNN)

Deep Neural Networks (DNNs) are typically Feed Forward Networks (FFNNs) in which data flows from the input layer to the output layer without going backward³ and the links between the layers are one way which is in the forward direction and they never touch a node again.
深度神经网络(DNN)通常是前馈网络(FFNN),其中数据从输入层流向输出层,而不向后流动,层之间的链接是前向的一种方式,并且它们从不接触节点再次。

The outputs are obtained by supervised learning with datasets of some information based on 'what we want’ through back propagation. Like you go to a restaurant and the chef gives you an idea about the ingredients of your meal. FFNNs work in the same way as you will have the flavor of those specific ingredients while eating but just after finishing your meal you will forget what you have eaten. If the chef gives you the meal of same ingredients again you can’t recognize the ingredients, you have to start from scratch as you don’t have any memory of that. But the human brain doesn’t work like that.
输出是通过监督学习获得的,其中一些信息的数据集基于“我们想要的”,通过反向传播。就像您去一家餐馆,厨师会告诉您餐点的成分。 FFNN 的工作原理与您在吃饭时会感受到这些特定成分的味道相同,但在吃完饭后您会忘记自己吃了什么。如果厨师再次给你做同样食材的饭菜,你就认不出这些食材了,你必须从头开始,因为你对此没有任何记忆。但人脑并不是这样工作的。

Recurrent Neural Network (RNN)
循环神经网络 (RNN)

A Recurrent Neural Network (RNN) addresses this issue which is a FFNN with a time twist. This neural network isn’t stateless, has connections between passes and connections through time. They are a class of artificial neural network where connections between nodes form a directed graph along a sequence like features links from a layer to previous layers, allowing information to flow back into the previous parts of the network thus each model in the layers depends on past events, allowing information to persist.
循环神经网络 (RNN) 解决了这个问题,它是具有时间扭曲的 FFNN。这个神经网络不是无状态的,在传递和连接之间存在着随时间变化的连接。它们是一类人工神经网络,其中节点之间的连接沿着序列形成有向图,就像从一层到前一层的特征链接一样,允许信息流回网络的前一部分,因此层中的每个模型都依赖于过去的模型事件,允许信息持续存在。

In this way, RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. But they not only work on the information you feed but also on the related information from the past which means whatever you feed and train the network matters, like feeding it 'chicken’ then 'egg’ may give different output in comparison to 'egg’ then 'chicken’. RNNs also have problems like vanishing (or exploding) gradient/long-term dependency problem where information rapidly gets lost over time. Actually, it’s the weight which gets lost when it reaches a value of 0 or 1 000 000, not the neuron. But in this case, the previous state won’t be very informative as it’s the weight which stores the information from the past.
通过这种方式,RNN 可以使用其内部状态(记忆)来处理输入序列。这使得它们适用于诸如未分段、连接的手写识别或语音识别等任务。但它们不仅适用于您提供的信息,还适用于过去的相关信息,这意味着您提供和训练网络的任何内容都很重要,例如先喂“鸡”,然后喂“蛋”,与“蛋”相比可能会产生不同的输出然后是“鸡”。 RNN 还存在梯度消失(或爆炸)/长期依赖问题等问题,其中信息随着时间的推移会迅速丢失。实际上,当权重达到 0 或 1 000 000 时,损失的是权重,而不是神经元。但在这种情况下,之前的状态不会提供太多信息,因为它是存储过去信息的权重。

Long Short Term Memory (LSTM)
长短期记忆 (LSTM)

Thankfully, breakthroughs like Long Short Term Memory (LSTM) don’t have this problem! LSTMs are a special kind of RNN, capable of learning long-term dependencies which make RNN smart at remembering things that have happened in the past and finding patterns across time to make its next guesses make sense. LSTMs broke records for improved Machine Translation, Language Modeling and Multilingual Language Processing.
值得庆幸的是,像长短期记忆(LSTM)这样的突破不存在这个问题! LSTM 是一种特殊的 RNN,能够学习长期依赖关系,这使得 RNN 能够智能地记住过去发生的事情,并找到跨时间的模式,使其下一次的猜测变得有意义。 LSTM 打破了机器翻译、语言建模和多语言处理改进的记录。

Convolutional Neural Network (CNN)
卷积神经网络(CNN)

Next comes the Convolutional Neural Network (CNN, or ConvNet) which is a class of deep neural networks which is most commonly applied to analyzing visual imagery. Their other applications include video understanding, speech recognition and understanding natural language processing. Also, LSTM combined with Convolutional Neural Networks (CNNs) improved automatic image captioning like those are seen in Facebook. Thus you can see that RNN is more like helping us in data processing predicting our next step whereas CNN helps us in visuals analyzing.
接下来是卷积神经网络(CNN 或 ConvNet),它是一类最常应用于分析视觉图像的深度神经网络。他们的其他应用包括视频理解、语音识别和理解自然语言处理。此外,LSTM 与卷积神经网络 (CNN) 相结合改进了自动图像字幕,就像 Facebook 中看到的那样。因此你可以看到,RNN 更像是帮助我们进行数据处理,预测下一步,而 CNN 则帮助我们进行视觉分析。

RNN or CNN: Which One is Better?
RNN 或 CNN:哪一个更好?

Though RNNs operate over sequences of vectors: sequences in the input, the output, or in the most general case both in comparison with CNN which not only have constrained Application Programming Interface (API) but also fixed amount of computational steps. This is why CNN is kind of more powerful now than RNN. This is mostly because RNN has gradient vanishing and exploding problems (over 3 layers, the performance may drop) whereas CNN can be stacked into a very deep model, for which it’s been proven quite effective.
尽管 RNN 对向量序列进行操作:输入、输出中的序列,或者在最常见的情况下,与 CNN 相比,CNN 不仅具有受限的应用程序编程接口 (API),而且具有固定的计算步骤量。这就是为什么 CNN 现在比 RNN 更强大。这主要是因为RNN存在梯度消失和爆炸问题(超过3层,性能可能会下降),而CNN可以堆叠成非常深的模型,事实证明它非常有效。

But CNNs are not also flawless. A typical CNN can tell the type of an object but can’t specify their location. This is because CNN can regress one object at a time thus when multiple objects remain in the same visual field then the CNN bounding box regression cannot work well due to interference. As for example, CNN can detect the bird shown in the model below but if there are two birds of different species within the same visual field it can’t detect that.
但 CNN 也并非完美无缺。典型的 CNN 可以判断物体的类型,但无法指定它们的位置。这是因为 CNN 一次只能回归一个对象,因此当多个对象保持在同一视野中时,CNN 边界框回归由于干扰而无法正常工作。例如,CNN 可以检测到下面模型中显示的鸟,但如果同一视野内有两只不同物种的鸟,它就无法检测到。

While an R-CNN (R standing for regional, for object detection) can force the CNN to focus on a single region at a time improvising dominance of a specific object in a given region. Before feeding into CNN for classification and bounding box regression, the regions in the R-CNN are resized into equal size following detection by selective search algorithm. Therefore, it helps to specify a preferred object.
而 R-CNN(R 代表区域,用于对象检测)可以迫使 CNN 一次专注于单个区域,即兴发挥特定区域中特定对象的优势。在输入 CNN 进行分类和边界框回归之前,R-CNN 中的区域在通过选择性搜索算法检测后被调整为相同大小。因此,它有助于指定一个首选对象。

Are there any techniques to go one step further and locate exact pixels of each object instead of just bounding boxes? Yes, there is. Image segmentation is what Kaiming He and a team of researchers, including Girshick, explored at Facebook AI using an architecture known as Mask R-CNN which can satisfy our intuition a bit.
是否有任何技术可以更进一步定位每个对象的精确像素,而不仅仅是边界框?就在这里。图像分割是 Kaiming He 和包括 Girshick 在内的研究团队在 Facebook AI 中探索的内容,使用了一种名为 Mask R-CNN 的架构,它可以稍微满足我们的直觉。

How Our Designed Model is Going to Work?
我们设计的模型将如何运作?

In the previously mentioned model, we have combined RNN and CNN to make R-CNN which performs as Mask R-CNN. It can identify object outlines at the pixel level by adding a branch to Faster R-CNN that outputs a binary mask saying whether or not a given pixel is part of an object (such as a gun). This helps with Semantic and Instance Segmentation and to eliminate Background Movement. Our approach uses Augmented Reality to Sense Space, Depth, Dimensions, Angle — like a localized GPS which may help us detecting the body pose of a shooter and from which we can predict what may happen next by analyzing previous data. The drone is used there for mobility, discovery, close proximity encounter to save lives immediately.
在前面提到的模型中,我们将 RNN 和 CNN 结合起来制作 R-CNN,其性能类似于 Mask R-CNN。它可以通过向 Faster R-CNN 添加一个分支来识别像素级别的对象轮廓,该分支输出一个二进制掩码,说明给定像素是否是对象(例如枪)的一部分。这有助于语义和实例分割并消除背景移动。我们的方法使用增强现实技术来感知空间、深度、尺寸、角度——就像本地化的 GPS 一样,它可以帮助我们检测射手的身体姿势,并通过分析之前的数据来预测接下来会发生什么。无人机在那里用于移动、发现、近距离接触,以立即拯救生命。

We found the iPhone A12 Bionic Chip a great edge decentralized neural network engine as the latest iPhone XS max has 6.9 billion transistors, 6-core CPU, 8-core Neural Engine on SoC Bionic chip and can do 5 trillion operations per second which is suitable for machine learning and AR depth sensing.
我们发现 iPhone A12 仿生芯片是一个非常先进的去中心化神经网络引擎,因为最新的 iPhone XS max 在 SoC Bionic 芯片上拥有 69 亿个晶体管、6 核 CPU、8 核神经引擎,每秒可以执行 5 万亿次操作,非常适合用于机器学习和 AR 深度传感。

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多